“The Computational Limits of Deep Learning” by Neil C. Thompson, Kristjan Greenewald, Keeheon Lee, and Gabriel F. Manso
Links:
NB: This is a preprint and not peer-reviewed or accepted for publication as best I can tell, so more than usual you’ll have to make your own judgements about the quality of the results.
Abstract:
Deep learning’s recent history has been one of achievement: from triumphing over humans in the game of Go to world-leading performance in image recognition, voice recognition, translation, and other tasks. But this progress has come with a voracious appetite for computing power. This article reports on the computational demands of Deep Learning applications in five prominent application areas and shows that progress in all five is strongly reliant on increases in computing power. Extrapolating forward this reliance reveals that progress along current lines is rapidly becoming economically, technically, and environmentally unsustainable. Thus, continued progress in these applications will require dramatically more computationally-efficient methods, which will either have to come from changes to deep learning or from moving to other machine learning methods.
A few additional details: they look at papers in ML to see how much compute was required to get results, and extrapolate the trend lines to suggest we’re nearing the limits of what is economically feasible to do under the current regime. They believe this implies we’ll have to get more efficient if we want to see continued progress, such as by having more specialized and efficient hardware or by improving algorithms. My takeaway is that they believe most of the low hanging fruit in ML gains has already been picked, and additional gains in capabilities will not come as easily as past gains.
The straightforward implications for safety are that, if this is true, we are less near x-risk territory than it might appear we are if you were to only look at the “numerator” of the trend lines (what we can do) without consider the “denominator” of them (how much it costs). Not that we are necessary dramatically far from x-risk territory with ML, mind you, only that it’s not obviously very near term since the economic realities of deploying this technology will soon shift to naturally slow immediate progress without significant effort or innovation.
I believe Gwern had some harsh words for this paper. (See below) I’d be interested to see a response from fans of the paper.
Gwern asks”Why would you do that and ignore (mini literature review follows):”
Thompson did not ignore the papers Gwern cites. A number of them are in Thompson’s tables comparing prior work on scaling. Did Gwern tweet this criticism without even reading Thompson’s paper?
I did read it, and he did ignore them. Do you really think I criticized a paper publicly in harsh terms for not citing 12 different papers without even checking the bibliography or C-fing the titles/authors? Please look at the first 2020 paper version I was criticizing in 16 July 2020, when I wrote that comment, and don’t lazily misread the version posted 2 years later on 27 July 2022 which, not being a time traveler, I obviously could not have read or have been referring to (and which may well have included those refs because of my comments there & elsewhere).
(Not that I am impressed by their round 2 stuff which they tacked on—but at least now they acknowledge that prior scaling research exists and try to defend their very different approach at all.)
I stand corrected. Please forgive me.