Bucky comments on Parameter counts in Machine Learning

Bucky 21 Jun 2021 12:29 UTC
9 points
Great collection of results. I particularly found the interactive graph useful.
I’m slightly confused by the trend lines (especially for Games and Other) - they don’t seem intuitively the best fits. It looks like they place a lot of importance on the high parameter recent models (possibly the cost for each datapoint is in parameter space rather than log(parameter) space?
- Jsevillamol 21 Jun 2021 14:33 UTC
  5 points
  Parent
  Thank you! I think you are right—by default the Altair library (what we used to plot the regressions) does OLS fitting of an exponential instead of fitting a linear model over the log transform. We’ll look into this and report back.
  - Daniel Kokotajlo 21 Jun 2021 15:07 UTC
    2 points
    Parent
    If you are still interested in fiddling with this graph, here’s a variant I’d love to see:
    Remove all the datapoints in each AI category that are not record-setting, such that each category just tracks the largest available models at any given time. Then compute the best fit lines for the resulting categories. (Because this is what would be useful for predicting what the biggest models will be in year X, whereas the current design is for predicting what the average model size will be in year X… right?)
    - Jsevillamol 28 Jun 2021 9:54 UTC
      3 points
      Parent
      Good suggestion! Understanding the trend of record-setting would be interesting indeed so that we avoid the pesky influence of the systems which are below the trend like CURL in the game domain.
      The problem with the naive setup of just regressing on record-setters is that is quite sensitive to noise—one early outlier in the trend can completely alter the result.
      I explore a similar problem in my paper Forecasting timelines of quantum computing, where we try to extrapolate progress on some key metrics like qubit count and gate error rate. The method we use in the paper to address this issue is to bootstrap the input and predict a range of possible growth rates—that way outliers do not completely dominate the result.
      I will probably not do it right now for this dataset, though I’d be interested in having other people try that if they are so inclined!
      - Daniel Kokotajlo 28 Jun 2021 11:19 UTC
        7 points
        Parent
        OK, sounds good! I know someone who might be interested...
        Another, very similar thing that would be good is to just delete all the non-record-setting data points and draw lines to connect the remaining dots.
        Also, it would be cool if we could delete all the Mixture of Experts models to see what the “dense” version of the trend looks like.
  - Jsevillamol 28 Jun 2021 9:45 UTC
    1 point
    Parent
    This is now fixed; see the updated graphs. We have also updated the eye ball estimates accordingly.