This all assumes that AGI does whatever its supposed operator wants it to do, and that other parties believe as much? I think the first part of this is very false, though the second part alas seems very realistic, so I think this misses the key thing that makes an AGI arms race lethal.
I expect that a dignified apocalypse looks like, “We could do limited things with this software and hope to not destroy the world, but as we ramp up the power and iterate the for-loops more times, the probability of destroying the world goes up along a logistic curve.” In “relatively optimistic” scenarios it will be obvious to operators and programmers that this curve is being ascended—that is, running the for-loops with higher bounds will produce an AGI with visibly greater social sophistication, increasing big-picture knowledge, visible crude attempts at subverting operators or escaping or replicating outside boxes, etc. We can then imagine the higher-ups demanding that crude patches be applied to get rid of the visible problems in order to ramp up the for-loops further, worrying that, if they don’t do this themselves, the Chinese will do that first with their stolen copy of the code. Somebody estimates a risk probability, somebody else tells them too bad, they need to take 5% more risk in order to keep up with the arms race. This resembles a nuclear arms race and deployment scenario where, even though there’s common knowledge that nuclear winter is a thing, you still end up with nuclear winter because people are instructed to incrementally deploy another 50 nuclear warheads at the cost of a 5% increase in triggering nuclear winter, and then the other side does the same. But this is at least a relatively more dignified death by poor Nash equilibrium, where people are taking everything as seriously as they took nuclear war back in the days when Presidents weren’t retired movie actors.
In less optimistic scenarios that realistically reflect the actual levels of understanding being displayed by programmers and managers in the most powerful organizations today, the programmers themselves just patch away the visible signs of impending doom and keep going, thinking that they have “debugged the software” rather than eliminated visible warning signs, being in denial for internal political reasons about how this is climbing a logistic probability curve towards ruin or how fast that curve is being climbed, not really having a lot of mental fun thinking about the doom they’re heading into and warding that off by saying, “But if we slow down, our competitors will catch up, and we don’t trust them to play nice” along of course with “Well, if Yudkowsky was right, we’re all dead anyways, so we may as well assume he was wrong”, and generally skipping straight to the fun part of running the AGI’s for-loops with as much computing power as is available to do the neatest possible things; and so we die in a less dignified fashion.
My point is that what you depict as multiple organizations worried about what other organizations will successfully do with an AGI being operated at maximum power, which is believed to do whatever its operator wants to do, reflects a scenario where everybody dies really fast, because they all share a mistaken optimistic belief about what happens when you operate AGIs at increasing capability. The real lethality of the arms race is that blowing past hopefully-visible warning signs or patching them out, and running your AGI at increasing power, creates an increasing risk of the whole world ending immediately. Your scenario is one where people don’t understand that and think that AGIs do whatever the operators want, so it’s a scenario where the outcome of the multipolar tensions is instant death as soon as the computing resources are sufficient for lethality.
If someone wants to estimate the overall existential risk attached to AGI, then it seems fitting that they would estimate the existential risk attached to the scenarios where we have 1) only unaligned AGI, 2) only aligned AGI, or 3) both. The scenario you portray is a subset of 1). I find it plausible. But most relevant discussion on this forum is devoted to 1) so I wanted to think about 2). If some non-zero probability is attached to 2), that should be a useful exercise.
I thought it was clear I was referring to Aligned AGI in the intro and the section heading. And of course, exploring a scenario doesn’t mean I think it is the only scenario that could materialise.
My point is that plausible scenarios for Aligned AGI give you AGI that remains aligned only when run within power bounds, and this seems to me like one of the largest facts affecting the outcome of arms-race dynamics.
Thanks for the clarification. If that’s the plausible scenario for Aligned AGI, then I was drawing a sharper line between Aligned and Unaligned than was warranted. I will edit some part of the text on my website to reflect that.
This all assumes that AGI does whatever its supposed operator wants it to do, and that other parties believe as much? I think the first part of this is very false, though the second part alas seems very realistic, so I think this misses the key thing that makes an AGI arms race lethal.
I expect that a dignified apocalypse looks like, “We could do limited things with this software and hope to not destroy the world, but as we ramp up the power and iterate the for-loops more times, the probability of destroying the world goes up along a logistic curve.” In “relatively optimistic” scenarios it will be obvious to operators and programmers that this curve is being ascended—that is, running the for-loops with higher bounds will produce an AGI with visibly greater social sophistication, increasing big-picture knowledge, visible crude attempts at subverting operators or escaping or replicating outside boxes, etc. We can then imagine the higher-ups demanding that crude patches be applied to get rid of the visible problems in order to ramp up the for-loops further, worrying that, if they don’t do this themselves, the Chinese will do that first with their stolen copy of the code. Somebody estimates a risk probability, somebody else tells them too bad, they need to take 5% more risk in order to keep up with the arms race. This resembles a nuclear arms race and deployment scenario where, even though there’s common knowledge that nuclear winter is a thing, you still end up with nuclear winter because people are instructed to incrementally deploy another 50 nuclear warheads at the cost of a 5% increase in triggering nuclear winter, and then the other side does the same. But this is at least a relatively more dignified death by poor Nash equilibrium, where people are taking everything as seriously as they took nuclear war back in the days when Presidents weren’t retired movie actors.
In less optimistic scenarios that realistically reflect the actual levels of understanding being displayed by programmers and managers in the most powerful organizations today, the programmers themselves just patch away the visible signs of impending doom and keep going, thinking that they have “debugged the software” rather than eliminated visible warning signs, being in denial for internal political reasons about how this is climbing a logistic probability curve towards ruin or how fast that curve is being climbed, not really having a lot of mental fun thinking about the doom they’re heading into and warding that off by saying, “But if we slow down, our competitors will catch up, and we don’t trust them to play nice” along of course with “Well, if Yudkowsky was right, we’re all dead anyways, so we may as well assume he was wrong”, and generally skipping straight to the fun part of running the AGI’s for-loops with as much computing power as is available to do the neatest possible things; and so we die in a less dignified fashion.
My point is that what you depict as multiple organizations worried about what other organizations will successfully do with an AGI being operated at maximum power, which is believed to do whatever its operator wants to do, reflects a scenario where everybody dies really fast, because they all share a mistaken optimistic belief about what happens when you operate AGIs at increasing capability. The real lethality of the arms race is that blowing past hopefully-visible warning signs or patching them out, and running your AGI at increasing power, creates an increasing risk of the whole world ending immediately. Your scenario is one where people don’t understand that and think that AGIs do whatever the operators want, so it’s a scenario where the outcome of the multipolar tensions is instant death as soon as the computing resources are sufficient for lethality.
Thanks for your comment.
If someone wants to estimate the overall existential risk attached to AGI, then it seems fitting that they would estimate the existential risk attached to the scenarios where we have 1) only unaligned AGI, 2) only aligned AGI, or 3) both. The scenario you portray is a subset of 1). I find it plausible. But most relevant discussion on this forum is devoted to 1) so I wanted to think about 2). If some non-zero probability is attached to 2), that should be a useful exercise.
I thought it was clear I was referring to Aligned AGI in the intro and the section heading. And of course, exploring a scenario doesn’t mean I think it is the only scenario that could materialise.
My point is that plausible scenarios for Aligned AGI give you AGI that remains aligned only when run within power bounds, and this seems to me like one of the largest facts affecting the outcome of arms-race dynamics.
Thanks for the clarification. If that’s the plausible scenario for Aligned AGI, then I was drawing a sharper line between Aligned and Unaligned than was warranted. I will edit some part of the text on my website to reflect that.