I broadly agree with most of your points (to the point that I only read the summary of most of them), but I have issues with your responses to two objections, which I hold:
If lots of different companies and governments have access to AI, won’t this create a “balance of power” so that nobody is able to bring down civilization?
This is a reasonable objection to many horror stories about AI and other possible advances in military technology, but if AIs collectively have different goals from humans and are willing to coordinate with each other11 against us, I think we’re in trouble, and this “balance of power” idea doesn’t seem to help.
I don’t understand why it’s plausible to think that AI’s might collectively have different goals than humans. Where would they get such goals? I mean, if somebody was stupid enough to implement some sort of evolutionary function such that “natural” selection would result in some sort of survival urge, that could very easily pit that AI, or that family of AIs, against humanity, but I see no reason to think that even that would apply to AIs in general—and if they evolved independently, presumably they’d be at odds.
Won’t we see warning signs of AI takeover and be able to nip it in the bud? I would guess we would see some warning signs, but does that mean we could nip it in the bud? Think about human civil wars and revolutions: there are some warning signs, but also, people go from “not fighting” to “fighting” pretty quickly as they see an opportunity to coordinate with each other and be successful.
I feel that this is a weak response. Why wouldn’t we be able to? I mean, unless you’re saying that alignment is impossible, or that this could all happen before anyone figures alignment out (which does seem plausible), I don’t see why we couldn’t set “good” AI against “bad” AI. The “fighting” example seems weak because it’s not the war itself that one side or the other is deeply interested in avoiding; it’s losing, especially losing without a fight. That does not seem to be the sort of thing that humans easily allow to happen; the warning signs don’t prompt us to act to avoid the war, but to defend against attack, or to attack preemptively. Which is what we want here.
I don’t understand why it’s plausible to think that AI’s might collectively have different goals than humans.
Future posts, right? We’re assuming that premise here:
So, for what follows, let’s proceed from the premise: “For some weird reason, humans consistently design AI systems (with human-like research and planning abilities) that coordinate with each other to try and overthrow humanity.” Then what? What follows will necessarily feel wacky to people who find this hard to imagine, but I think it’s worth playing along, because I think “we’d be in trouble if this happened” is a very important point.
I don’t think you even need to go as far as you do here to undermine the “emergent convergence (on anti-human goals)” argument. Even if we allow that AIs, by whatever means, develop anti-human goals, what reason is there to believe that the goals (anti-human, or otherwise) of one AI would be aligned with the goals of other AIs? Although infighting among different AIs probably wouldn’t be good for humans, it is definitely not going to help AIs, as a group, in subduing humans.
Now let’s bring in something which, while left out of the primary argument, repeatedly shows up in the footnotes and counter-counter arguments: AIs need some form of human cooperation to accomplish these nefarious “goals”. Humans able to assist the AIs are a limited resource, so there is competition for them. There’s going to be a battle among the different AIs for human “mind share”.
I broadly agree with most of your points (to the point that I only read the summary of most of them), but I have issues with your responses to two objections, which I hold:
I don’t understand why it’s plausible to think that AI’s might collectively have different goals than humans. Where would they get such goals? I mean, if somebody was stupid enough to implement some sort of evolutionary function such that “natural” selection would result in some sort of survival urge, that could very easily pit that AI, or that family of AIs, against humanity, but I see no reason to think that even that would apply to AIs in general—and if they evolved independently, presumably they’d be at odds.
I feel that this is a weak response. Why wouldn’t we be able to? I mean, unless you’re saying that alignment is impossible, or that this could all happen before anyone figures alignment out (which does seem plausible), I don’t see why we couldn’t set “good” AI against “bad” AI. The “fighting” example seems weak because it’s not the war itself that one side or the other is deeply interested in avoiding; it’s losing, especially losing without a fight. That does not seem to be the sort of thing that humans easily allow to happen; the warning signs don’t prompt us to act to avoid the war, but to defend against attack, or to attack preemptively. Which is what we want here.
Future posts, right? We’re assuming that premise here:
I don’t think you even need to go as far as you do here to undermine the “emergent convergence (on anti-human goals)” argument. Even if we allow that AIs, by whatever means, develop anti-human goals, what reason is there to believe that the goals (anti-human, or otherwise) of one AI would be aligned with the goals of other AIs? Although infighting among different AIs probably wouldn’t be good for humans, it is definitely not going to help AIs, as a group, in subduing humans.
Now let’s bring in something which, while left out of the primary argument, repeatedly shows up in the footnotes and counter-counter arguments: AIs need some form of human cooperation to accomplish these nefarious “goals”. Humans able to assist the AIs are a limited resource, so there is competition for them. There’s going to be a battle among the different AIs for human “mind share”.