Why I Think Abrupt AI Takeoff
(note: Quickly written. I’ve attempted to number my arguments “formally”, but I have no training in this format. Edits/suggestions welcome.)
Ability to generalize is “lumpy”: unpredictable—sometimes lots of inputs lead to little progress, sometimes small inputs lead to lots of progress.
Takeoff is threshold-based: When machines get enough ability to generalize, takeoff will happen.
thus
As we increase the ability of machines to generalize, we have a high chance of putting in a small amount of input and getting over the takeoff threshold.
Argument for claim 1:
Lumpy input->output curves exist in individual domains as a result of forming new and powerful abstractions in those domains.
Cognitive architecture components generalize across domains as in (4).
Cognitive architecture is itself a “domain” as in (4) and thus lumpy.
thus
(restated) If we get a new abstraction in the form of a cognitive architecture component that generalizes across domains, we will see rapid cross-domain progress, and thus ability to generalize is lumpy.
Evidence for claim 4 — “Narrow” lumpy generality — specifically, new and powerful abstraction gives rise to abrupt domain-specific performance improvements:
Anecdotally, humans have a lumpy learning experience. It is common for people to talk about “Aha moments”, “eureka moments” and so on.
AlphaGo Zero seems somehow relevant (although I have trouble pointing at precisely how, but it convinces me regardless)
The “Grokking” paper (thanks to Quintin Pope and other commenters!)
Evidence for claim 5 — cognitive architecture components generalize:
Chimps → humans: Humans’ cognitive architecture is very general and allows us to quickly form a broad set of domain-specific abstractions that perform extremely well with little energy.
Evidence for claim 2 — takeoff is threshold-based:
Chimps → humans: chimps didn’t take over the world but humans did.
(The usual arguments for hardware overhang, ability to scale and copy what works, etc.)
I don’t agree with the abrupt takeoff assumption. I think a lot of the evidence people use to support that assumption falls apart after further scrutiny. Some examples:
If point 9. refers to the grokking phenomenon, then you should know that grokking does not actually happen very suddenly. It just looks that way in the paper because they use a base-10 log scale on their x-axes. In figure 1, grokking starts to happen roughly 3% of the way through the training process.
The Big-Bench paper suggests the discontinuous improvements seen in language modeling are mostly measurement issues, and more precise quantification of model capabilities suggests smoother arcs of improvement across capabilities.
The “sharp left turn” in human capabilities relative to evolution seems entirely explained by the fact that the inner learning process (an organism’s within-lifetime learning) takes billions of steps for each outer step of evolution, and then dies, and all progress from the inner learner is lost. Human culture somewhat corrected that issue by allowing information to accumulate across generations, and so the greater optimization power used by the inner learning process immediately let the inner learner outstrip the outer optimizer. No need to hypothesize extreme returns on small improvements in generality. See also my comment on whether to expect a sharp left turn in AI training (no, because we won’t spend our compute as stupidly as evolution did).
Thanks! This is very helpful, and yes, I did mean to refer to grokking! Will update the post.
This has been my vague intuition as well, and I’m confused as to where exactly people think this argument goes wrong. So I would appreciate some rebuttals to this.
For 9, are you thinking of grokking?
See my comment.
Thanks! Am probably convinced by the third point, unsure about the others due to not having much time to think at the moment.
I think it would be a good idea to ask the question at the ongoing thread on AGI safety questions.