As I mentioned on Twitter, it’s amazing that they wrote an entire paper trying to estimate performance scaling with compute, and ignored what looks like the entire literature doing actual controlled highly-precise experiments on scaling up fixed architectures (no citations to any of them that I could see) in favor of grabbing random datapoints from the overall literature.
And the rest is not much better, like the de rigeur ‘green’ CO2 estimates (as if training DL actually emitted CO2, as if ‘green’ approaches aren’t just doomed from the start as the most efficient NNs always start from research on the very large models they would like to rule out, as if large high-performance NNs aren’t used in the real world in any way and do not replace even more CO2-intensive systems like say humans, as if CO2 costs are even the most important cost to begin with...). This isn’t a paper that needs any extensive critique, let us say.
Gwern asks”Why would you do that and ignore (mini literature review follows):”
Thompson did not ignore the papers Gwern cites. A number of them are in Thompson’s tables comparing prior work on scaling. Did Gwern tweet this criticism without even reading Thompson’s paper?
I did read it, and he did ignore them. Do you really think I criticized a paper publicly in harsh terms for not citing 12 different papers without even checking the bibliography or C-fing the titles/authors? Please look at the first 2020 paper version I was criticizing in 16 July 2020, when I wrote that comment, and don’t lazily misread the version posted 2 years later on 27 July 2022 which, not being a time traveler, I obviously could not have read or have been referring to (and which may well have included those refs because of my comments there & elsewhere).
(Not that I am impressed by their round 2 stuff which they tacked on—but at least now they acknowledge that prior scaling research exists and try to defend their very different approach at all.)
I believe Gwern had some harsh words for this paper. (See below) I’d be interested to see a response from fans of the paper.
Gwern asks”Why would you do that and ignore (mini literature review follows):”
Thompson did not ignore the papers Gwern cites. A number of them are in Thompson’s tables comparing prior work on scaling. Did Gwern tweet this criticism without even reading Thompson’s paper?
I did read it, and he did ignore them. Do you really think I criticized a paper publicly in harsh terms for not citing 12 different papers without even checking the bibliography or C-fing the titles/authors? Please look at the first 2020 paper version I was criticizing in 16 July 2020, when I wrote that comment, and don’t lazily misread the version posted 2 years later on 27 July 2022 which, not being a time traveler, I obviously could not have read or have been referring to (and which may well have included those refs because of my comments there & elsewhere).
(Not that I am impressed by their round 2 stuff which they tacked on—but at least now they acknowledge that prior scaling research exists and try to defend their very different approach at all.)
I stand corrected. Please forgive me.