It looks like the AlphaGo models play a huge role in the “trend” of large-scale models.
In your spreadsheet, AlphaGo Master (January 2017) was much larger than anything that came before it (22x the training compute). AlphaGo Zero doubled that later the same year (October 2017) and remained the largest model by training compute for almost 4 years, until August 2021. By late 2021 the trend had caught up and now 5 models have exceeded AlphaGo Zero, all language models, led by Megatron-Turing NLG 530B (October 2021) which has 4x the training compute.
The trend looked steeper when AlphaGo Master & Zero were near the end of the trend line, when OpenAI was analyzing it in 2018. The trend for the “large-scale era” looks shallower now that those models are in the first half of the era. To me it looks like those models are off-trend, even compared to other “large scale” models. DeepMind was much more willing to throw compute at the AlphaGo models than anyone has been with any other model.
I am personally sympathetic to the view that AlphaGo Master and AlphaGo Zero are off-trend.
In the regression with all models the inclusion does not change the median slope, but drastically increases noise, as you can see for yourself in the visualization selecting the option ‘big_alphago_action = remove’ (see table below for a comparison of regressing the large model trend without vs with the big AlphaGo models).
In appendix B we study the effects of removing AlphaGo Zero and AlphaGo Master when studying record-setting models. The upper bound of the slope is affected dramatically, and the R2 fit is much better when we exclude them, see table 6 reproduced below.
It looks like the AlphaGo models play a huge role in the “trend” of large-scale models.
In your spreadsheet, AlphaGo Master (January 2017) was much larger than anything that came before it (22x the training compute). AlphaGo Zero doubled that later the same year (October 2017) and remained the largest model by training compute for almost 4 years, until August 2021. By late 2021 the trend had caught up and now 5 models have exceeded AlphaGo Zero, all language models, led by Megatron-Turing NLG 530B (October 2021) which has 4x the training compute.
The trend looked steeper when AlphaGo Master & Zero were near the end of the trend line, when OpenAI was analyzing it in 2018. The trend for the “large-scale era” looks shallower now that those models are in the first half of the era. To me it looks like those models are off-trend, even compared to other “large scale” models. DeepMind was much more willing to throw compute at the AlphaGo models than anyone has been with any other model.
Thanks for the comment!
I am personally sympathetic to the view that AlphaGo Master and AlphaGo Zero are off-trend.
In the regression with all models the inclusion does not change the median slope, but drastically increases noise, as you can see for yourself in the visualization selecting the option ‘big_alphago_action = remove’ (see table below for a comparison of regressing the large model trend without vs with the big AlphaGo models).
In appendix B we study the effects of removing AlphaGo Zero and AlphaGo Master when studying record-setting models. The upper bound of the slope is affected dramatically, and the R2 fit is much better when we exclude them, see table 6 reproduced below.
Thanks for the comment! That sounds like a good and fair analysis/explanation to me.