The obvious response to such plans is ‘but we would notice and fight back.’ To which the actual response is ‘maybe we would notice, but if we did notice we would not fight back in any effective or meaningful way even if the AI wasn’t actively stopping us from doing so, which it would be.’
I’d say the obvious counter-response to that is “who are ‘we’?”:
One of the most common ways I see people fail to have any effect at all is to think in terms of “we”. They come up with plans which “we” could follow, for some “we” which is not in fact going to follow that plan. And then they take political-flavored actions which symbolically promote the plan, but are not in fact going to result in “we” implementing the plan. (And also, usually, the “we” in question is too dysfunctional as a group to implement the plan even if all the individuals wanted to, because that is how approximately 100% of organizations of more than 10 people operate.) In cognitive terms, the plan is pretending that lots of other peoples’ actions are choosable/controllable, when in fact those other peoples’ actions are not choosable/controllable, at least relative to the planner’s actual capabilities.
And then the AI will actively target subgroups implicitly included in that “we” (other political/corporate actors, ideological movements, specific demographics), and optimize so that they’ll actively want to block other human subgroups from stopping the AI, and so this “we”, which implicitly meant “all of humanity, together”, will turn out to have never been a coherent entity to begin with.
Instead, “we” here at most represents some specific demographic/ideological movement/circle of doomers/etc., and, well, we have a lot of data on how successful those are at instantly halting all AI activity worldwide.
One addition I’d make here is: I think what people imagine, when they imagine “us” noticing an AGI going rogue and “fighting back”, is movie scenarios where the obviously evil AGI becomes obviously evil in a way that’s obvious to everyone, and then it’s a neatly arranged white-and-black humanity vs. machines all-out fight.
But in real life, such unambiguousness is rare. The monsters don’t look obviously evil, the signs of fatal issues are rarely blatant. Is this whiff of smoke a sign of fire, or just someone nearby being bad at cooking? Is this creepy guy actually planning to assault you, or you’re just being paranoid? Is this weird feeling in your chest a sign of an impending heart attack, or just some biological noise? Is this epidemic truly following an exponential curve, or it’s going to peter out somehow?
Are you really, really sure the threat is so major? So sure you’d actually take those drastic actions — call emergency services, throw a fit, declare a quarantine — and risk wasting resources and doing harm and looking foolish for overreacting? Nah, wouldn’t do to panic, that’s not socially appropriate at all. Better act very concerned, but in a calm, high-status fashion. Maybe it’ll all work itself out on its own!
And the AGI, if it’s worth the name, would not fail to exploit this. It may start clearly acting to amass power, but there would always be a prosocial, plausible-sounding justification for why it’s doing that, it’d never stop making pleasant noises about having people’s best interests at heart, it’d never stop being genuinely useful to someone such that there’d always be clear harm in shutting it down. The doubt would never go away.
Much like there’s no fire alarm for AGI, there would be no fire alarm for the treacherous turn. There would never be a moment, except maybe right before the end, where “we must stop the malign AGI from killing us all!” would sound obviously right to everyone. There would always be ambiguity, this sort of message would always appear a bit histrionic, an extremist stance, a stance that — gasp — can have downsides if genuinely implemented, and what if we then turn around and realize we jumped at shadows?
(Which also, by the way, is why all the various arguments like “we’d have so many more resources than the AGI at the start, look how many nukes we have, brawn sometimes beats brains, we can totally prevail in an all-out war!” have nothing to do with the realities of AGI Ruin.)
While I definitely agree that a fight between humanity and AGI will never look like humanity vs AGI due to the issues with the abstraction of humanity, I do think one key disagreement I have with this comment is that I don’t think that there is no fire alarm for AGI, and in general my model is that if anything a lot of people will support very severe restrictions on AI and AI progress for safety. I think this already happened several months ago, and there people got freaked out about AI, and that was merely GPT-4. We will get a lot of fire alarms, especially via safety incidents. A lot of people are already primed for apocalyptic narratives, and if AI progresses in a big way, this will fan the flames into a potential AI-killer, supported by politicians. It’s not impossible for tech companies to defuse this, but damn is it hard to defuse.
I worry about the opposite problem, in that if existential risk concerns look less and less likely, AI regulation may nonetheless become quite severe, and the AI organizations built by LessWrongers have systematic biases that will prevent them from updating to this position.
I definitely agree with this take, because I generally hate the society/we abstractions, but one caveat is that AI resistance by a lot of people could be very strong, especially because the public is probably primed for apocalyptic narratives.
These tweets are at least somewhat of an argument that AI will be resisted pretty heavily.
Public-at-large’s opinion also doesn’t shift policies at lightning speeds, I think, especially if the AI manages to get a few high-power corporations/politicians on its side (whether via genuine utility, bribery, or blackmail) plus some subsets of the public as well (by benefiting them).
It wouldn’t look like “the AI is gathering power, entire humanity freaks out and shuts it down”, it would at best look like “the AI is gathering power, large subsets of humanity freak out and try to shut it down, but a smaller subset resists this, it turns into a massive socio-politico-legislative conflict that drags on for years”. And that’s already the loss condition: as that will be happening, the AI will be doing more AI research, and as the result, would be improving its ability to wage this conflict (in addition to its mundane power-gathering pursuits), while humanity’s strategists would at best be as good as they ever were. The outcome of this dynamic seems pre-determined. (Hell, you don’t even need to posit this (very rudimentary and obviously possible) version of self-improvement for this argument to go through! All you need is that the AGI be just a touch better than humanity at strategy.)
I’d say the obvious counter-response to that is “who are ‘we’?”:
And then the AI will actively target subgroups implicitly included in that “we” (other political/corporate actors, ideological movements, specific demographics), and optimize so that they’ll actively want to block other human subgroups from stopping the AI, and so this “we”, which implicitly meant “all of humanity, together”, will turn out to have never been a coherent entity to begin with.
Instead, “we” here at most represents some specific demographic/ideological movement/circle of doomers/etc., and, well, we have a lot of data on how successful those are at instantly halting all AI activity worldwide.
One addition I’d make here is: I think what people imagine, when they imagine “us” noticing an AGI going rogue and “fighting back”, is movie scenarios where the obviously evil AGI becomes obviously evil in a way that’s obvious to everyone, and then it’s a neatly arranged white-and-black humanity vs. machines all-out fight.
But in real life, such unambiguousness is rare. The monsters don’t look obviously evil, the signs of fatal issues are rarely blatant. Is this whiff of smoke a sign of fire, or just someone nearby being bad at cooking? Is this creepy guy actually planning to assault you, or you’re just being paranoid? Is this weird feeling in your chest a sign of an impending heart attack, or just some biological noise? Is this epidemic truly following an exponential curve, or it’s going to peter out somehow?
Are you really, really sure the threat is so major? So sure you’d actually take those drastic actions — call emergency services, throw a fit, declare a quarantine — and risk wasting resources and doing harm and looking foolish for overreacting? Nah, wouldn’t do to panic, that’s not socially appropriate at all. Better act very concerned, but in a calm, high-status fashion. Maybe it’ll all work itself out on its own!
And the AGI, if it’s worth the name, would not fail to exploit this. It may start clearly acting to amass power, but there would always be a prosocial, plausible-sounding justification for why it’s doing that, it’d never stop making pleasant noises about having people’s best interests at heart, it’d never stop being genuinely useful to someone such that there’d always be clear harm in shutting it down. The doubt would never go away.
Much like there’s no fire alarm for AGI, there would be no fire alarm for the treacherous turn. There would never be a moment, except maybe right before the end, where “we must stop the malign AGI from killing us all!” would sound obviously right to everyone. There would always be ambiguity, this sort of message would always appear a bit histrionic, an extremist stance, a stance that — gasp — can have downsides if genuinely implemented, and what if we then turn around and realize we jumped at shadows?
The status-quo bias, asymmetric justice, the Copenhagen interpretation of ethics, threat ambiguity — all of that would be acting to ensure this. “Humanity vs. AGI” will never look like “humanity vs. AGI” to humanity.
(Which also, by the way, is why all the various arguments like “we’d have so many more resources than the AGI at the start, look how many nukes we have, brawn sometimes beats brains, we can totally prevail in an all-out war!” have nothing to do with the realities of AGI Ruin.)
While I definitely agree that a fight between humanity and AGI will never look like humanity vs AGI due to the issues with the abstraction of humanity, I do think one key disagreement I have with this comment is that I don’t think that there is no fire alarm for AGI, and in general my model is that if anything a lot of people will support very severe restrictions on AI and AI progress for safety. I think this already happened several months ago, and there people got freaked out about AI, and that was merely GPT-4. We will get a lot of fire alarms, especially via safety incidents. A lot of people are already primed for apocalyptic narratives, and if AI progresses in a big way, this will fan the flames into a potential AI-killer, supported by politicians. It’s not impossible for tech companies to defuse this, but damn is it hard to defuse.
I worry about the opposite problem, in that if existential risk concerns look less and less likely, AI regulation may nonetheless become quite severe, and the AI organizations built by LessWrongers have systematic biases that will prevent them from updating to this position.
I definitely agree with this take, because I generally hate the society/we abstractions, but one caveat is that AI resistance by a lot of people could be very strong, especially because the public is probably primed for apocalyptic narratives.
These tweets are at least somewhat of an argument that AI will be resisted pretty heavily.
https://twitter.com/daniel_271828/status/1696794764136562943
https://twitter.com/daniel_271828/status/1696770364549087310
Public-at-large’s opinion also doesn’t shift policies at lightning speeds, I think, especially if the AI manages to get a few high-power corporations/politicians on its side (whether via genuine utility, bribery, or blackmail) plus some subsets of the public as well (by benefiting them).
It wouldn’t look like “the AI is gathering power, entire humanity freaks out and shuts it down”, it would at best look like “the AI is gathering power, large subsets of humanity freak out and try to shut it down, but a smaller subset resists this, it turns into a massive socio-politico-legislative conflict that drags on for years”. And that’s already the loss condition: as that will be happening, the AI will be doing more AI research, and as the result, would be improving its ability to wage this conflict (in addition to its mundane power-gathering pursuits), while humanity’s strategists would at best be as good as they ever were. The outcome of this dynamic seems pre-determined. (Hell, you don’t even need to posit this (very rudimentary and obviously possible) version of self-improvement for this argument to go through! All you need is that the AGI be just a touch better than humanity at strategy.)
The date of AI Takeover is not the day the AI takes over. If an unaligned human-level-ish AGI is allowed to lodge itself into the human economy and do business at scale, it’s already curtains.