As of right now, I expect we have at least a decade, perhaps two, until we get a human intelligence level generalizing AI (which is what I consider AGI). This is a controversial statement in these social circles, and I don’t have the bandwidth or resources to write a concrete and detailed argument, so I’ll simply state an overview here.
Scale shall increasingly require more and larger datacenters and a lot of power. Humanity’s track record at accomplishing megaprojects is abyssmal. If we find ourselves needing to build city-sized datacenters (with all the required infrastructure to maintain and supply it), I expect that humanity will take twice the initially estimated time and resources to build something with 80% of the planned capacity.
So the main questions for me, given my current model, are these:
How many OOMs of optimization power would you need for your search process, to stumble upon a neural network model (or more accurately, an algorithm), that is just general enough that it can start improving itself? (To be clear, I expect this level of generalization to be achieved when we create AI systems that can do ML experiments autonomously.)
How much more difficult will each OOM increase be? (For example, if we see an exponential increase in resources and time to build the infrastructure, that seems to compensate for the exponential increase in the optimization power provided.)
Both questions are very hard to answer with rigor I’d consider adequate given their importance. If you did press me to answer, however: my intuition is that we’d need at least three OOMs and that the OOM-increase difficulty would be exponential, which I approximate via a doubling of time taken. Given that Epoch’s historical trends imply that it takes two years for one OOM, I’d expect that we roughly have at least 2 + 4 + 8 = 14 years more before the labs stumble upon a proto-Clippy.
The current scaling speed is created by increasing funding for training projects, which isn’t sustainable without continued success. Without this, the speed goes down to the much slower FLOP/dollar trend of improving cost efficiency of compute, making better AI accelerators. The 2 + 4 + 8 years thing might describe gradual increase in funding, but there are still 2 OOMs of training compute beyond original GPT-4 that are already baked-in in the scale of the datacenters that are being built and didn’t yet produce deployed models. We’ll only observe this in full by late 2026, so the current capabilities don’t yet match the capabilities before a possible scaling slowdown.
you say “Human ingenuity is irrelevant. Lots of people believe they know the one last piece of the puzzle to get AGI, but I increasingly expect the missing pieces to be too alien for most researchers to stumble upon just by thinking about things without doing compute-intensive experiments.” and you link https://tsvibt.blogspot.com/2024/04/koan-divining-alien-datastructures-from.html for “too alien for most researchers to stumble upon just by thinking about things without doing compute-intensive experiments”
i feel like that post and that statement are in contradiction/tension or at best orthogonal
I think Mesa is saying something like “The missing pieces are too alien for us to expect to discover them by thinking/theorizing but we’ll brute-force the AI into finding/growing those missing pieces by dumping more compute into it anyway.” and Tsvi’s koan post is meant to illustrate how difficult it would be to think oneself into those missing pieces.
As of right now, I expect we have at least a decade, perhaps two, until we get a human intelligence level generalizing AI (which is what I consider AGI). This is a controversial statement in these social circles, and I don’t have the bandwidth or resources to write a concrete and detailed argument, so I’ll simply state an overview here.
Scale is the key variable driving progress to AGI. Human ingenuity is irrelevant. Lots of people believe they know the one last piece of the puzzle to get AGI, but I increasingly expect the missing pieces to be too alien for most researchers to stumble upon just by thinking about things without doing compute-intensive experiments.
Scale shall increasingly require more and larger datacenters and a lot of power. Humanity’s track record at accomplishing megaprojects is abyssmal. If we find ourselves needing to build city-sized datacenters (with all the required infrastructure to maintain and supply it), I expect that humanity will take twice the initially estimated time and resources to build something with 80% of the planned capacity.
So the main questions for me, given my current model, are these:
How many OOMs of optimization power would you need for your search process, to stumble upon a neural network model (or more accurately, an algorithm), that is just general enough that it can start improving itself? (To be clear, I expect this level of generalization to be achieved when we create AI systems that can do ML experiments autonomously.)
How much more difficult will each OOM increase be? (For example, if we see an exponential increase in resources and time to build the infrastructure, that seems to compensate for the exponential increase in the optimization power provided.)
Both questions are very hard to answer with rigor I’d consider adequate given their importance. If you did press me to answer, however: my intuition is that we’d need at least three OOMs and that the OOM-increase difficulty would be exponential, which I approximate via a doubling of time taken. Given that Epoch’s historical trends imply that it takes two years for one OOM, I’d expect that we roughly have at least 2 + 4 + 8 = 14 years more before the labs stumble upon a proto-Clippy.
The current scaling speed is created by increasing funding for training projects, which isn’t sustainable without continued success. Without this, the speed goes down to the much slower FLOP/dollar trend of improving cost efficiency of compute, making better AI accelerators. The 2 + 4 + 8 years thing might describe gradual increase in funding, but there are still 2 OOMs of training compute beyond original GPT-4 that are already baked-in in the scale of the datacenters that are being built and didn’t yet produce deployed models. We’ll only observe this in full by late 2026, so the current capabilities don’t yet match the capabilities before a possible scaling slowdown.
you say “Human ingenuity is irrelevant. Lots of people believe they know the one last piece of the puzzle to get AGI, but I increasingly expect the missing pieces to be too alien for most researchers to stumble upon just by thinking about things without doing compute-intensive experiments.” and you link https://tsvibt.blogspot.com/2024/04/koan-divining-alien-datastructures-from.html for “too alien for most researchers to stumble upon just by thinking about things without doing compute-intensive experiments”
i feel like that post and that statement are in contradiction/tension or at best orthogonal
I think Mesa is saying something like “The missing pieces are too alien for us to expect to discover them by thinking/theorizing but we’ll brute-force the AI into finding/growing those missing pieces by dumping more compute into it anyway.” and Tsvi’s koan post is meant to illustrate how difficult it would be to think oneself into those missing pieces.