Rohin Shah comments on Parameter counts in Machine Learning

Rohin Shah 1 Sep 2021 13:07 UTC
3 points
That’s fair. I was thinking of that as part of “compute needed during training”, but you could also split it up into “compute needed for gradient updates” and “compute needed to create data of sufficient quality”, and then say that the stable thing is the “compute needed for gradient updates”.