Thanks for the write-up. I have very little knowledge in this field, but I’m confused on this point:
> 34. Coordination schemes between superintelligences are not things that humans can participate in (eg because humans can’t reason reliably about the code of superintelligences); a “multipolar” system of 20 superintelligences with different utility functions, plus humanity, has a natural and obvious equilibrium which looks like “the 20 superintelligences cooperate with each other but not with humanity”.
Yes. I am convinced that things like ‘oh we will be fine because the AGIs will want to establish proper rule of law’ or that we could somehow usefully be part of such deals are nonsense. I do think that the statement here on its own is unconvincing for someone not already convinced who isn’t inclined to be convinced. I agree with it because I was already convinced, but unlike many points that should be shorter this one should have probably been longer.
Can you link to or explain what convinced you of this?
To me, part of it seems dependent on take-off speed. In slower take-off worlds, it seems that agents would develop in a world in which laws/culture/norms were enforced at each step of the intelligence development process. Thus at each stage of development, AI agents would be operating in a competitive/cooperative world, eventually leading to a world of competition between many superintelligent AI agents with established Schelling points of cooperation that human agents could still participate in.
On the other hand, in faster/hard take-off worlds, I agree that cooperation would not be possible because the AI (or few multipolar AIs) would obviously not have an incentive to cooperate with much less powerful agents like humans.
Maybe there is an assumption of a hard take-off that I’m missing? Is this a part of M3?
In slower take-off worlds, it seems that agents would develop in a world in which laws/culture/norms were enforced at each step of the intelligence development process. Thus at each stage of development, AI agents would be operating in a competitive/cooperative world, eventually leading to a world of competition between many superintelligent AI agents with established Schelling points of cooperation that human agents could still participate in.
Suppose that many different actors have AGI systems; the systems have terminal goals like ‘maximize paperclips’, and these goals imply ‘kill any optimizers that don’t share my goals, if you find a way to do so without facing sufficiently-bad consequences’ (because your EV is higher if there are fewer optimizers trying to push the universe in different directions than what you want).
The systems nonetheless behave in prosocial ways, because they’re weak and wouldn’t win a conflict against humans. Instead, the AGI systems participate in a thriving global economy that includes humans as well as all the competing AGIs; and all parties accept the human-imposed legal environment, since nobody can just overthrow the humans.
One day, one of the AGI systems improves to the point where it unlocks a new technology that can reliably kill all humans, as well as destroying all of its AGI rivals. (E.g., molecular nanotechnology.) I predict that regardless of how well-behaved it’s been up to that point, it uses the technology and takes over. Do you predict otherwise?
Alternative scenario: One day, one of the AGI systems unlocks a new technology that can reliably kill all humans, but it isn’t strong enough to destroy rival AGI systems. In that case, by default I predict that it kills all humans and then carries on collaborating or competing with the other AGI systems in the new humanless equilibrium.
Alternative scenario 2: The new technology can kill all AGI systems as well as all humans, but the AGI made a binding precommitment to not use such technologies (if it finds them) against any agents that (a) are smart enough to inspect its source code and confidently confirm that it has made this precommitment, and (b) have verifiably made the same binding precommitment. Some or all of the other AGI systems may meet this condition, but humans don’t, so you get the “AGI systems coordinate, humans are left out” equilibrium Eliezer described.
This seems like a likely outcome of multipolar AGI worlds to me, and I don’t see how it matters whether there was a prior “Schelling point” or human legal code. AGIs can just agree to new rules/norms.
Alternative scenario 3: The AGI systems don’t even need a crazy new technology, because their collective power ends up being greater than humanity’s, and they agree to a “coordinate with similarly smart agents against weaker agents” pact. Again, I don’t see how it changes anything if they first spend eight years embedded in a human economy and human legal system, before achieving enough collective power or coordination ability to execute this. If a human-like legal system is useful, you can just negotiate a new one that goes into effect once the humans are dead.
“One day, one of the AGI systems improves to the point where it unlocks a new technology that can reliably kill all humans, as well as destroying all of its AGI rivals. (E.g., molecular nanotechnology.) I predict that regardless of how well-behaved it’s been up to that point, it uses the technology and takes over. Do you predict otherwise?”
I agree with this, given your assumptions. But this seems like a fast take off scenario, right? My main question wasn’t addressed — are we assuming a fast take off? I didn’t see that explicitly discussed.
My understanding is that common law isn’t easy to change, even if individual agents would prefer to. This is why there are Nash equilibria. Of course, if there’s a fast enough take off, then this is irrelevant.
I would define hard takeoff as “progress in cognitive ability from pretty-low-impact AI to astronomically high-impact AI is discontinuous, and fast in absolute terms”.
Unlocking a technology that lets you kill other powerful optimizers (e.g., nanotech) doesn’t necessarily require fast or discontinuous improvements to systems’ cognition. E.g., humans invented nuclear weapons just via accumulating knowledge over time; the invention wasn’t caused by us surgically editing the human brain a few years prior to improve its reasoning. (Though software improvements like ‘use scientific reasoning’, centuries prior, were obviously necessary.)
Thanks for the write-up. I have very little knowledge in this field, but I’m confused on this point:
Can you link to or explain what convinced you of this?
To me, part of it seems dependent on take-off speed. In slower take-off worlds, it seems that agents would develop in a world in which laws/culture/norms were enforced at each step of the intelligence development process. Thus at each stage of development, AI agents would be operating in a competitive/cooperative world, eventually leading to a world of competition between many superintelligent AI agents with established Schelling points of cooperation that human agents could still participate in.
On the other hand, in faster/hard take-off worlds, I agree that cooperation would not be possible because the AI (or few multipolar AIs) would obviously not have an incentive to cooperate with much less powerful agents like humans.
Maybe there is an assumption of a hard take-off that I’m missing? Is this a part of M3?
Suppose that many different actors have AGI systems; the systems have terminal goals like ‘maximize paperclips’, and these goals imply ‘kill any optimizers that don’t share my goals, if you find a way to do so without facing sufficiently-bad consequences’ (because your EV is higher if there are fewer optimizers trying to push the universe in different directions than what you want).
The systems nonetheless behave in prosocial ways, because they’re weak and wouldn’t win a conflict against humans. Instead, the AGI systems participate in a thriving global economy that includes humans as well as all the competing AGIs; and all parties accept the human-imposed legal environment, since nobody can just overthrow the humans.
One day, one of the AGI systems improves to the point where it unlocks a new technology that can reliably kill all humans, as well as destroying all of its AGI rivals. (E.g., molecular nanotechnology.) I predict that regardless of how well-behaved it’s been up to that point, it uses the technology and takes over. Do you predict otherwise?
Alternative scenario: One day, one of the AGI systems unlocks a new technology that can reliably kill all humans, but it isn’t strong enough to destroy rival AGI systems. In that case, by default I predict that it kills all humans and then carries on collaborating or competing with the other AGI systems in the new humanless equilibrium.
Alternative scenario 2: The new technology can kill all AGI systems as well as all humans, but the AGI made a binding precommitment to not use such technologies (if it finds them) against any agents that (a) are smart enough to inspect its source code and confidently confirm that it has made this precommitment, and (b) have verifiably made the same binding precommitment. Some or all of the other AGI systems may meet this condition, but humans don’t, so you get the “AGI systems coordinate, humans are left out” equilibrium Eliezer described.
This seems like a likely outcome of multipolar AGI worlds to me, and I don’t see how it matters whether there was a prior “Schelling point” or human legal code. AGIs can just agree to new rules/norms.
Alternative scenario 3: The AGI systems don’t even need a crazy new technology, because their collective power ends up being greater than humanity’s, and they agree to a “coordinate with similarly smart agents against weaker agents” pact. Again, I don’t see how it changes anything if they first spend eight years embedded in a human economy and human legal system, before achieving enough collective power or coordination ability to execute this. If a human-like legal system is useful, you can just negotiate a new one that goes into effect once the humans are dead.
“One day, one of the AGI systems improves to the point where it unlocks a new technology that can reliably kill all humans, as well as destroying all of its AGI rivals. (E.g., molecular nanotechnology.) I predict that regardless of how well-behaved it’s been up to that point, it uses the technology and takes over. Do you predict otherwise?”
I agree with this, given your assumptions. But this seems like a fast take off scenario, right? My main question wasn’t addressed — are we assuming a fast take off? I didn’t see that explicitly discussed.
My understanding is that common law isn’t easy to change, even if individual agents would prefer to. This is why there are Nash equilibria. Of course, if there’s a fast enough take off, then this is irrelevant.
I would define hard takeoff as “progress in cognitive ability from pretty-low-impact AI to astronomically high-impact AI is discontinuous, and fast in absolute terms”.
Unlocking a technology that lets you kill other powerful optimizers (e.g., nanotech) doesn’t necessarily require fast or discontinuous improvements to systems’ cognition. E.g., humans invented nuclear weapons just via accumulating knowledge over time; the invention wasn’t caused by us surgically editing the human brain a few years prior to improve its reasoning. (Though software improvements like ‘use scientific reasoning’, centuries prior, were obviously necessary.)