I think this is plausible, but maybe a bit misleading in terms of real-world implications for AGI power/importance.
Looking at the scaling laws observed for language model pretraining performance vs model size, we see strongly sublinear increases in pretraining performance for linear increases in model size. In figure 3.8 of the GPT-3 paper, we also see that zero/few/many shot transfer learning to SuperGLUE benchmarks also scale sublinearly with model size.
However, the economic usefulness of a system depends on a lot more than just parameter count. Consider that Gorillas have 56% as many cortical neurons as humans (9.1 vs 16.3 billion; see this list), but a human is much more than twice as economically useful as a gorilla. Similarly, a merely human level AGI that was completely dedicated to accomplishing a given goal would likely be far more effective than a human. E.g., see the appendix of this Gwern post (under “On the absence of true fanatics”) for an example of how 100 perfectly dedicated (but otherwise ordinary) fanatics could likely destroy Goldman Sachs, if each were fully willing to dedicate years of hard work and sacrifice their lives to do so.
Don’t forget https://www.gwern.net/Complexity-vs-AI which deals with hippke’s argument more generally. We could also point out that scaling is not itself fixed as constant factors & exponents both improve over time, see the various experience curves & algorithmic progress datasets. (To paraphrase Eliezer, the IQ necessary to destroy the world drops by 10% after every doubling of cumulative research effort.)
I think this is plausible, but maybe a bit misleading in terms of real-world implications for AGI power/importance.
Looking at the scaling laws observed for language model pretraining performance vs model size, we see strongly sublinear increases in pretraining performance for linear increases in model size. In figure 3.8 of the GPT-3 paper, we also see that zero/few/many shot transfer learning to SuperGLUE benchmarks also scale sublinearly with model size.
However, the economic usefulness of a system depends on a lot more than just parameter count. Consider that Gorillas have 56% as many cortical neurons as humans (9.1 vs 16.3 billion; see this list), but a human is much more than twice as economically useful as a gorilla. Similarly, a merely human level AGI that was completely dedicated to accomplishing a given goal would likely be far more effective than a human. E.g., see the appendix of this Gwern post (under “On the absence of true fanatics”) for an example of how 100 perfectly dedicated (but otherwise ordinary) fanatics could likely destroy Goldman Sachs, if each were fully willing to dedicate years of hard work and sacrifice their lives to do so.
Don’t forget https://www.gwern.net/Complexity-vs-AI which deals with hippke’s argument more generally. We could also point out that scaling is not itself fixed as constant factors & exponents both improve over time, see the various experience curves & algorithmic progress datasets. (To paraphrase Eliezer, the IQ necessary to destroy the world drops by 10% after every doubling of cumulative research effort.)