Types of takeoff
When I first heard and thought about AI takeoff I found the argument convincing that as soon as an AI passed IQ 100, takeoff would become hyper exponentially fast. Progress would speed up, which would then compound on itself etc. However there other possibilities.
AGI is a barrier that requires >200 IQ to pass unless we copy biology?
Progress could be discontinuous, there could be IQ thresholds required to unlock better methods or architectures. Say we fixed our current compute capability, and with fixed human intelligence we may not be able to figure out the formula for AGI, in a similar way that the combined human intelligence hasn’t cracked many hard problems even with decades and the worlds smartest minds working on them (maths problems, Quantum gravity...). This may seem unlikely for AI, but to illustrate the principle, say we only allowed IQ<90 people to work on AI. Progress would stall. So IQ <90 software developers couldn’t unlock IQ>90 AI. Can IQ 160 developers with our current compute hardware unlock >160 AI?
To me the reason we don’t have AI now is that the architecture is very data inefficient and worse at generalization than say the mammalian brain, for example a cortical column. I expect that if we knew the neural code and could copy it, then we would get at least to very high human intelligence quickly as we have the compute.
From watching AI over my career it seems to be that even the highest IQ people and groups cant make progress by themselves without data, compute and biology to copy for guidance, in contrast to other fields. For example Einstein predicted gravitational waves long before they where discovered, but Turing or Von Neumann didn’t publish the Transformer architecture or suggest backpropagation. If we did not have access to neural tissue, would we still not have artificial NN? In a related note, I think there is an XKCD cartoon that says something like the brain has to be so complex that it cannot understand itself.
(I believe now that progress in theoretical physics and pure maths is slowing to a stall as further progress requires intellectual capacity beyond the combined ability of humanity. Without AI there will be no major advances in physics anymore even with ~100 years spent on it.)
After AGI is there another threshold?
Lets say we do copy biology/solve AGI and with our current hardware can get >10,000 AGI agents with >= IQ of the smartest humans. They then optimize the code so there is 100K agents with the same resources. but then optimization stalls. The AI wouldn’t know if it was because it had optimized as much as possible, or because it lacked the ability to find a better optimization.
Does our current system scale to AGI with 1GW/1 million GPU?
Lets say we don’t copy biology, but scaling our current systems to 1GW/1 million GPU and optimizing for a few years gets us to IQ 160 at all tasks. We would have an inferior architecture compensated by a massive increase in energy/FLOPS as compared to the human brain. Progress could theoretically stall at upper level human IQ for a time rather then takeoff. (I think this isn’t very likely however) There would of course be a significant overhang where capabilities would increase suddenly when the better architecture was found and applied to the data center hosting the AI.
Related note—why 1GW data centers won’t be a consistent requirement for AI leadership.
Based on this, then a 1GW or similar data center isn’t useful or necessary for long. If it doesn’t give a significant increase in capabilities, then it won’t be cost effective. If it does, then it would optimize itself so that such power isn’t needed anymore. Only in a small range of capability increase does it actually stay around.
To me its not clear the merits of the Pause movement and training compute caps. Someone here made the case that compute caps could actually speed up AGI as people would then pay more attention to finding better architectures rather than throwing resources into scaling existing inferior ones. However all things considered I can see a lot of downsides from large data centers and little upside. I see a specific possibility where they are build, don’t give the economic justification, decrease in value a lot, then are sold to owners that are not into cutting edge AI. Then when the more efficient architecture is discovered, they are suddenly very powerful without preparation. Worldwide caps on total GPU production would also help reduce similar overhang possibilities.
Yes the human brain was built using evolution, I have no disagreement that give us 100-1000 years with just tinkering etc we would likely get AGI. Its just that in our specific case we have bio to copy and it will get us there much faster.