A. Contra “superhuman AI systems will be ‘goal-directed’”
I somewhat agree, see Consequentialism & Corrigibility. I’m a bit unclear on whether this is intended as an argument for “AGI almost definitely won’t have a zealous drive to control the universe” versus “AGI won’t necessarily have a zealous drive to control the universe”. I agree with the latter but not the former.
Also, the more different groups make AGIs, the more likely it is that someone will make one with a “zealous drive to control the universe”. Then we have to think about whether the non-zealous ones will have solved the problem posed by the zealous ones. In this context, there starts to be a contradiction between “we don’t need to worry about the non-zealous ones because they won’t be doing hardcore long-term consequentialist planning” versus “we don’t need to worry about the zealous ones because the non-zealous ones are so powerful and foresightful that, whatever plan the latter might come up with, the former can preemptively think of it and defend against it”. More on this topic in a forthcoming post hopefully in the next couple weeks. (EDIT—I added the link)
B. Contra “goal-directed AI systems’ goals will be bad”
I somewhat agree, see Section 14.6 here. Comments above also apply here, e.g. it’s not obvious that docile helpful human-norm-following AGIs will actually do what’s necessarily to defend against zealous universe-controlling AGIs, again wait for my forthcoming post.
Contra “superhuman AI would be sufficiently superior to humans to overpower humanity”
I mostly see these comments as arguments that “AI that can overpower humanity” might happen a bit later than one might otherwise expect, rather than arguments that it’s not going to happen at all. For example, if collaborative groups of humans are more successful than individual humans, well, sooner or later we’re going to have collaborative groups of AIs too. By the time we have a whole society of trillions of AIs, it stops feeling very reassuring. (The ability of AIs to self-replicate seems particularly relevant here.) If humans-using-tools are powerful, well sooner or later (I would argue sooner) AIs are going to be using tools too. (And inventing new tools.) The trust issue stops applying when we get to a world where AIs can start their own companies etc., and thus only need to trust each other (and the “each other” might be copies of themselves). The headroom argument seems adjacent to the lump-of-labor fallacy.
Hmm, OK, I guess the real point of all that is to argue for slow takeoff which then implies that doom is unlikely? (“at some point AI systems would account for most of the cognitive labor in the world. But if there is first an extended period of more minimal advanced AI presence, that would probably prevent an immediate death outcome, and improve humanity’s prospects for controlling a slow-moving AI power grab.”) Again, I’m not quite sure what we’re arguing. I think there’s still serious x-risk regardless of slow vs fast takeoff, and I think there’s still “less than certain doom” regardless of slow vs fast takeoff. In fact, I’m not even confident that x-risk is lower under slow takeoff than fast.
Well anyway, I have an object-level belief that there are already way more than enough GPUs on the planet to support AIs that can overpower humanity—see here—and I think that will be much more true by the time we have real-deal AGIs (which I for one expect to be probably after 2030 at least). I agree that this is a relevant empirical question though.
The idea that a superhuman AI would be able to rapidly destroy the world seems prima facie unlikely, since no other entity has ever done that.
I think there’s pretty good direct reason to believe that it is currently possible to start lots of simultaneously deadly pandemics and crop diseases etc., with an amount of competence already available to small teams of humans or maybe even individual humans. But we don’t currently have ongoing deliberate pandemics. I consider this pretty strong evidence that nobody on Earth with even moderate competence is trying to “destroy the world”, so to speak. So the fact that nobody has succeeded at doing so doesn’t really provide much evidence about the tractability of doing that. (Again, more on this topic in a forthcoming post.)
yeah, I suspect the largest bottleneck there is that trying to destroy the world is so strongly against human values that there are ~0 people (who aren’t severely mentally ill) who are genuinely trying to do that.
Great post!
I somewhat agree, see Consequentialism & Corrigibility. I’m a bit unclear on whether this is intended as an argument for “AGI almost definitely won’t have a zealous drive to control the universe” versus “AGI won’t necessarily have a zealous drive to control the universe”. I agree with the latter but not the former.
Also, the more different groups make AGIs, the more likely it is that someone will make one with a “zealous drive to control the universe”. Then we have to think about whether the non-zealous ones will have solved the problem posed by the zealous ones. In this context, there starts to be a contradiction between “we don’t need to worry about the non-zealous ones because they won’t be doing hardcore long-term consequentialist planning” versus “we don’t need to worry about the zealous ones because the non-zealous ones are so powerful and foresightful that, whatever plan the latter might come up with, the former can preemptively think of it and defend against it”. More on this topic in a forthcoming post hopefully in the next couple weeks. (EDIT—I added the link)
I somewhat agree, see Section 14.6 here. Comments above also apply here, e.g. it’s not obvious that docile helpful human-norm-following AGIs will actually do what’s necessarily to defend against zealous universe-controlling AGIs, again wait for my forthcoming post.
I mostly see these comments as arguments that “AI that can overpower humanity” might happen a bit later than one might otherwise expect, rather than arguments that it’s not going to happen at all. For example, if collaborative groups of humans are more successful than individual humans, well, sooner or later we’re going to have collaborative groups of AIs too. By the time we have a whole society of trillions of AIs, it stops feeling very reassuring. (The ability of AIs to self-replicate seems particularly relevant here.) If humans-using-tools are powerful, well sooner or later (I would argue sooner) AIs are going to be using tools too. (And inventing new tools.) The trust issue stops applying when we get to a world where AIs can start their own companies etc., and thus only need to trust each other (and the “each other” might be copies of themselves). The headroom argument seems adjacent to the lump-of-labor fallacy.
Hmm, OK, I guess the real point of all that is to argue for slow takeoff which then implies that doom is unlikely? (“at some point AI systems would account for most of the cognitive labor in the world. But if there is first an extended period of more minimal advanced AI presence, that would probably prevent an immediate death outcome, and improve humanity’s prospects for controlling a slow-moving AI power grab.”) Again, I’m not quite sure what we’re arguing. I think there’s still serious x-risk regardless of slow vs fast takeoff, and I think there’s still “less than certain doom” regardless of slow vs fast takeoff. In fact, I’m not even confident that x-risk is lower under slow takeoff than fast.
Well anyway, I have an object-level belief that there are already way more than enough GPUs on the planet to support AIs that can overpower humanity—see here—and I think that will be much more true by the time we have real-deal AGIs (which I for one expect to be probably after 2030 at least). I agree that this is a relevant empirical question though.
I think there’s pretty good direct reason to believe that it is currently possible to start lots of simultaneously deadly pandemics and crop diseases etc., with an amount of competence already available to small teams of humans or maybe even individual humans. But we don’t currently have ongoing deliberate pandemics. I consider this pretty strong evidence that nobody on Earth with even moderate competence is trying to “destroy the world”, so to speak. So the fact that nobody has succeeded at doing so doesn’t really provide much evidence about the tractability of doing that. (Again, more on this topic in a forthcoming post.)
yeah, I suspect the largest bottleneck there is that trying to destroy the world is so strongly against human values that there are ~0 people (who aren’t severely mentally ill) who are genuinely trying to do that.