This section seemed like an instance of you and Eliezer talking past each other in a way that wasn’t locating a mathematical model containing the features you both believed were important (e.g. things could go “whoosh” while still being continuous):
[Christiano][13:46]
Even if we just assume that your AI needs to go off in the corner and not interact with humans, there’s still a question of why the self-contained AI civilization is making ~0 progress and then all of a sudden very rapid progress
[Yudkowsky][13:46]
unfortunately a lot of what you are saying, from my perspective, has the flavor of, “but can’t you tell me about your predictions earlier on of the impact on global warming at the Homo erectus level”
you have stories about why this is like totally not a fair comparison
I do not share these stories
[Christiano][13:46]
I don’t understand either your objection nor the reductio
like, here’s how I think it works: AI systems improve gradually, including on metrics like “How long does it take them to do task X?” or “How high-quality is their output on task X?”
[Yudkowsky][13:47]
I feel like the thing we know is something like, there is a sufficiently high level where things go whooosh humans-from-hominids style
[Christiano][13:47]
We can measure the performance of AI on tasks like “Make further AI progress, without human input”
Any way I can slice the analogy, it looks like AI will get continuously better at that task
My claim is that the timescale of AI self-improvement, at the point it takes over from humans, is the same as the previous timescale of human-driven AI improvement. If it was a lot faster, you would have seen a takeover earlier instead.
This claim is true in your model. It also seems true to me about hominids, that is I think that cultural evolution took over roughly when its timescale was comparable to the timescale for biological improvements, though Eliezer disagrees
I thought Eliezer’s comment “there is a sufficiently high level where things go whooosh humans-from-hominids style” was missing the point. I think it might have been good to offer some quantitative models at that point though I haven’t had much luck with that.
I can totally grant there are possible models for why the AI moves quickly from “much slower than humans” to “much faster than humans,” but I wanted to get some model from Eliezer to see what he had in mind.
(I find fast takeoff from various frictions more plausible, so that the question mostly becomes one about how close we are to various kinds of efficient frontiers, and where we respectively predict civilization to be adequate/inadequate or progress to be predictable/jumpy.)
It seems to me that Eliezer’s model of AGI is bit like an engine, where if any important part is missing, the entire engine doesn’t move. You can move a broken steam locomotive as fast as you can push it, maybe 1km/h. The moment you insert the missing part, the steam locomotive accelerates up to 100km/h. Paul is asking “when does the locomotive move at 20km/h” and Eliezer says “when the locomotive is already at full steam and accelerating to 100km/h.” There’s no point where the locomotive is moving at 20km/h and not accelerating, because humans can’t push it that fast, and once the engine is working, it’s already accelerating to a much faster speed.
In Paul’s model, there IS such a thing as 95% AGI, and it’s 80% or 20% or 2% as powerful on some metric we can measure, whereas in Eliezer’s model there’s no such thing as 95% AGI. The 95% AGI is like a steam engine that’s missing it’s pistons, or some critical valve, and so it doesn’t provide any motive power at all. It can move as fast as humans can push it, but it doesn’t provide any power of it’s own.
And then Paul’s response to Eliezer is like “but engines don’t just appear without precedent, there’s worse partial versions of them beforehand, much more so if people are actually trying to do locomotion; so even if knocking out a piece of the AI that FOOMs would make it FOOM much slower, that doesn’t tell us much about the lead-up to FOOM, and doesn’t tell us that the design considerations that go into the FOOMer are particularly discontinuous with previously explored design considerations”?
Right, and history sides with Paul. The earliest steam engines were missing key insights and so operated slowly, used their energy very inefficiently, and were limited in what they could do. The first steam engines were used as pumps, and it took a while before they were powerful enough to even move their own weight (locomotion). Each progressive invention, from Savery to Newcomen to Watt dramatically improved the efficiency of the engine, and over time engines could do more and more things, from pumping to locomotion to machining to flight. It wasn’t just one sudden innovation and now we have an engine that can do all the things including even lifting itself against the pull of Earth’s gravity. It took time, and progress on smooth metrics, before we had extremely powerful and useful engines that powered the industrial revolution. That’s why the industrial revolution(s) took hundreds of years. It wasn’t one sudden insight that made it all click.
To which my Eliezer-model’s response is “Indeed, we should expect that the first AGI systems will be pathetic in relative terms, comparing them to later AGI systems. But the impact of the first AGI systems in absolute terms is dependent on computer-science facts, just as the impact of the first nuclear bombs was dependent on facts of nuclear physics. Nuclear bombs have improved enormously since Trinity and Little Boy, but there is no law of nature requiring all prototypes to have approximately the same real-world impact, independent of what the thing is a prototype of.”
My main concern is that progress on the frontier tends to be bursty.
There are many metrics of AI performance on particular tasks where performance does indeed increase fairly continuously on the larger scale, but not in detail. Over the scale of many years it goes from abysmal to terrible to merely bad to nearly human to worse than human in some ways but better than human in others, and then to superhuman. Each of these transitions is often a sharp jump, but you see steady progress if you plot it on a graph. When you combine with having thousands of types of tasks, you end up with an overview of even smoother progress over the whole field.
There are three problems I’m worried about.
The first is that “designing better AIs” may turn out to be a relatively narrow task, and subject to a lot more burstiness than broad spectrum performance that could steadily increase world GDP.
The second is that for purposes of the future of humanity, only the last step from human-adjacent to strictly superhuman really matters. On the scale of intelligence for all the beings we know about, chimpanzees are very nearly human, but the economic effect of chimpanzees is essentially zero.
The third is that we are nowhere near fully exploiting the hardware we have for AI, and I expect that to continue for quite a while.
I think any two of these three are enough for a fast takeoff with little warning.
This section seemed like an instance of you and Eliezer talking past each other in a way that wasn’t locating a mathematical model containing the features you both believed were important (e.g. things could go “whoosh” while still being continuous):
[Christiano][13:46]
Even if we just assume that your AI needs to go off in the corner and not interact with humans, there’s still a question of why the self-contained AI civilization is making ~0 progress and then all of a sudden very rapid progress
[Yudkowsky][13:46]
unfortunately a lot of what you are saying, from my perspective, has the flavor of, “but can’t you tell me about your predictions earlier on of the impact on global warming at the Homo erectus level”
you have stories about why this is like totally not a fair comparison
I do not share these stories
[Christiano][13:46]
I don’t understand either your objection nor the reductio
like, here’s how I think it works: AI systems improve gradually, including on metrics like “How long does it take them to do task X?” or “How high-quality is their output on task X?”
[Yudkowsky][13:47]
I feel like the thing we know is something like, there is a sufficiently high level where things go whooosh humans-from-hominids style
[Christiano][13:47]
We can measure the performance of AI on tasks like “Make further AI progress, without human input”
Any way I can slice the analogy, it looks like AI will get continuously better at that task
My claim is that the timescale of AI self-improvement, at the point it takes over from humans, is the same as the previous timescale of human-driven AI improvement. If it was a lot faster, you would have seen a takeover earlier instead.
This claim is true in your model. It also seems true to me about hominids, that is I think that cultural evolution took over roughly when its timescale was comparable to the timescale for biological improvements, though Eliezer disagrees
I thought Eliezer’s comment “there is a sufficiently high level where things go whooosh humans-from-hominids style” was missing the point. I think it might have been good to offer some quantitative models at that point though I haven’t had much luck with that.
I can totally grant there are possible models for why the AI moves quickly from “much slower than humans” to “much faster than humans,” but I wanted to get some model from Eliezer to see what he had in mind.
(I find fast takeoff from various frictions more plausible, so that the question mostly becomes one about how close we are to various kinds of efficient frontiers, and where we respectively predict civilization to be adequate/inadequate or progress to be predictable/jumpy.)
It seems to me that Eliezer’s model of AGI is bit like an engine, where if any important part is missing, the entire engine doesn’t move. You can move a broken steam locomotive as fast as you can push it, maybe 1km/h. The moment you insert the missing part, the steam locomotive accelerates up to 100km/h. Paul is asking “when does the locomotive move at 20km/h” and Eliezer says “when the locomotive is already at full steam and accelerating to 100km/h.” There’s no point where the locomotive is moving at 20km/h and not accelerating, because humans can’t push it that fast, and once the engine is working, it’s already accelerating to a much faster speed.
In Paul’s model, there IS such a thing as 95% AGI, and it’s 80% or 20% or 2% as powerful on some metric we can measure, whereas in Eliezer’s model there’s no such thing as 95% AGI. The 95% AGI is like a steam engine that’s missing it’s pistons, or some critical valve, and so it doesn’t provide any motive power at all. It can move as fast as humans can push it, but it doesn’t provide any power of it’s own.
And then Paul’s response to Eliezer is like “but engines don’t just appear without precedent, there’s worse partial versions of them beforehand, much more so if people are actually trying to do locomotion; so even if knocking out a piece of the AI that FOOMs would make it FOOM much slower, that doesn’t tell us much about the lead-up to FOOM, and doesn’t tell us that the design considerations that go into the FOOMer are particularly discontinuous with previously explored design considerations”?
Right, and history sides with Paul. The earliest steam engines were missing key insights and so operated slowly, used their energy very inefficiently, and were limited in what they could do. The first steam engines were used as pumps, and it took a while before they were powerful enough to even move their own weight (locomotion). Each progressive invention, from Savery to Newcomen to Watt dramatically improved the efficiency of the engine, and over time engines could do more and more things, from pumping to locomotion to machining to flight. It wasn’t just one sudden innovation and now we have an engine that can do all the things including even lifting itself against the pull of Earth’s gravity. It took time, and progress on smooth metrics, before we had extremely powerful and useful engines that powered the industrial revolution. That’s why the industrial revolution(s) took hundreds of years. It wasn’t one sudden insight that made it all click.
To which my Eliezer-model’s response is “Indeed, we should expect that the first AGI systems will be pathetic in relative terms, comparing them to later AGI systems. But the impact of the first AGI systems in absolute terms is dependent on computer-science facts, just as the impact of the first nuclear bombs was dependent on facts of nuclear physics. Nuclear bombs have improved enormously since Trinity and Little Boy, but there is no law of nature requiring all prototypes to have approximately the same real-world impact, independent of what the thing is a prototype of.”
My main concern is that progress on the frontier tends to be bursty.
There are many metrics of AI performance on particular tasks where performance does indeed increase fairly continuously on the larger scale, but not in detail. Over the scale of many years it goes from abysmal to terrible to merely bad to nearly human to worse than human in some ways but better than human in others, and then to superhuman. Each of these transitions is often a sharp jump, but you see steady progress if you plot it on a graph. When you combine with having thousands of types of tasks, you end up with an overview of even smoother progress over the whole field.
There are three problems I’m worried about.
The first is that “designing better AIs” may turn out to be a relatively narrow task, and subject to a lot more burstiness than broad spectrum performance that could steadily increase world GDP.
The second is that for purposes of the future of humanity, only the last step from human-adjacent to strictly superhuman really matters. On the scale of intelligence for all the beings we know about, chimpanzees are very nearly human, but the economic effect of chimpanzees is essentially zero.
The third is that we are nowhere near fully exploiting the hardware we have for AI, and I expect that to continue for quite a while.
I think any two of these three are enough for a fast takeoff with little warning.