(3) “Rapid capability gain and large capability differences”,
(A) superhuman intelligence makes things break that don’t break at infrahuman levels,
(B) “you have to get [important parts of] the design right the first time”,
(C) “if something goes wrong at any level of abstraction, there may be powerful cognitive processes seeking out flaws and loopholes in your safety measures”, and the meta-level
(D) “these problems don’t show up in qualitatively the same way when people are pursuing their immediate incentives to get today’s machine learning systems working today”.
Eliezer: [...] I think that artificial general intelligence capabilities, once they exist, are going to scale too fast for that to be a useful way to look at the problem. AlphaZero going from 0 to 120 mph in four hours or a day—that is not out of the question here. And even if it’s a year, a year is still a very short amount of time for things to scale up.
[...] I’d say this is a thesis of capability gain. This is a thesis of how fast artificial general intelligence gains in power once it starts to be around, whether we’re looking at 20 years (in which case this scenario does not happen) or whether we’re looking at something closer to the speed at which Go was developed (in which case it does happen) or the speed at which AlphaZero went from 0 to 120 and better-than-human (in which case there’s a bit of an issue that you better prepare for in advance, because you’re not going to have very long to prepare for it once it starts to happen).
[...] Why do I think that? It’s not that simple. I mean, I think a lot of people who see the power of intelligence will already find that pretty intuitive, but if you don’t, then you should read my paper Intelligence Explosion Microeconomics about returns on cognitive reinvestment. It goes through things like the evolution of human intelligence and how the logic of evolutionary biology tells us that when human brains were increasing in size, there were increasing marginal returns to fitness relative to the previous generations for increasing brain size. Which means that it’s not the case that as you scale intelligence, it gets harder and harder to buy. It’s not the case that as you scale intelligence, you need exponentially larger brains to get linear improvements.
At least something slightly like the opposite of this is true; and we can tell this by looking at the fossil record and using some logic, but that’s not simple.
Sam: Comparing ourselves to chimpanzees works. We don’t have brains that are 40 times the size or 400 times the size of chimpanzees, and yet what we’re doing—I don’t know what measure you would use, but it exceeds what they’re doing by some ridiculous factor.
Eliezer: And I find that convincing, but other people may want additional details.
[...] AlphaZero seems to me like a genuine case in point. That is showing us that capabilities that in humans require a lot of tweaking and that human civilization built up over centuries of masters teaching students how to play Go, and that no individual human could invent in isolation… [...] AlphaZero blew past all of that in less than a day, starting from scratch, without looking at any of the games that humans played, without looking at any of the theories that humans had about Go, without looking at any of the accumulated knowledge that we had, and without very much in the way of special-case code for Go rather than chess—in fact, zero special-case code for Go rather than chess. And that in turn is an example that refutes another thesis about how artificial general intelligence develops slowly and gradually, which is: “Well, it’s just one mind; it can’t beat our whole civilization.”
I would say that there’s a bunch of technical arguments which you walk through, and then after walking through these arguments you assign a bunch of probability, maybe not certainty, to artificial general intelligence that scales in power very fast—a year or less. And in this situation, if alignment is technically difficult, if it is easy to screw up, if it requires a bunch of additional effort—in this scenario, if we have an arms race between people who are trying to get their AGI first by doing a little bit less safety because from their perspective that only drops the probability a little; and then someone else is like, “Oh no, we have to keep up. We need to strip off the safety work too. Let’s strip off a bit more so we can get in the front.”—if you have this scenario, and by a miracle the first people to cross the finish line have actually not screwed up and they actually have a functioning powerful artificial general intelligence that is able to prevent the world from ending, you have to prevent the world from ending. You are in a terrible, terrible situation. You’ve got your one miracle. And this follows from the rapid capability gain thesis and at least the current landscape for how these things are developing.
The question is simply “Can we do cognition of this quality at all?”[...] The speed and quantity of cognition isn’t the big issue, getting to that quality at all is the question. Once you’re there, you can solve any problem which can realistically be done with non-exponentially-vast amounts of that exact kind of cognition.
Eliezer also strongly believes that discrete jumps will happen. But the crux for him AFAIK is absolute capability and absolute speed of capability gain in AGI systems, not discontinuity per se (and not particular methods for improving capability, like recursive self-improvement). Hence in So Far: Unfriendly AI Edition Eliezer lists his key claims as:
(1) “Orthogonality thesis”,
(2) “Instrumental convergence”,
(3) “Rapid capability gain and large capability differences”,
(A) superhuman intelligence makes things break that don’t break at infrahuman levels,
(B) “you have to get [important parts of] the design right the first time”,
(C) “if something goes wrong at any level of abstraction, there may be powerful cognitive processes seeking out flaws and loopholes in your safety measures”, and the meta-level
(D) “these problems don’t show up in qualitatively the same way when people are pursuing their immediate incentives to get today’s machine learning systems working today”.
From Sam Harris’ interview of Eliezer (emphasis added):
See also: