One addition I’d make here is: I think what people imagine, when they imagine “us” noticing an AGI going rogue and “fighting back”, is movie scenarios where the obviously evil AGI becomes obviously evil in a way that’s obvious to everyone, and then it’s a neatly arranged white-and-black humanity vs. machines all-out fight.
But in real life, such unambiguousness is rare. The monsters don’t look obviously evil, the signs of fatal issues are rarely blatant. Is this whiff of smoke a sign of fire, or just someone nearby being bad at cooking? Is this creepy guy actually planning to assault you, or you’re just being paranoid? Is this weird feeling in your chest a sign of an impending heart attack, or just some biological noise? Is this epidemic truly following an exponential curve, or it’s going to peter out somehow?
Are you really, really sure the threat is so major? So sure you’d actually take those drastic actions — call emergency services, throw a fit, declare a quarantine — and risk wasting resources and doing harm and looking foolish for overreacting? Nah, wouldn’t do to panic, that’s not socially appropriate at all. Better act very concerned, but in a calm, high-status fashion. Maybe it’ll all work itself out on its own!
And the AGI, if it’s worth the name, would not fail to exploit this. It may start clearly acting to amass power, but there would always be a prosocial, plausible-sounding justification for why it’s doing that, it’d never stop making pleasant noises about having people’s best interests at heart, it’d never stop being genuinely useful to someone such that there’d always be clear harm in shutting it down. The doubt would never go away.
Much like there’s no fire alarm for AGI, there would be no fire alarm for the treacherous turn. There would never be a moment, except maybe right before the end, where “we must stop the malign AGI from killing us all!” would sound obviously right to everyone. There would always be ambiguity, this sort of message would always appear a bit histrionic, an extremist stance, a stance that — gasp — can have downsides if genuinely implemented, and what if we then turn around and realize we jumped at shadows?
(Which also, by the way, is why all the various arguments like “we’d have so many more resources than the AGI at the start, look how many nukes we have, brawn sometimes beats brains, we can totally prevail in an all-out war!” have nothing to do with the realities of AGI Ruin.)
While I definitely agree that a fight between humanity and AGI will never look like humanity vs AGI due to the issues with the abstraction of humanity, I do think one key disagreement I have with this comment is that I don’t think that there is no fire alarm for AGI, and in general my model is that if anything a lot of people will support very severe restrictions on AI and AI progress for safety. I think this already happened several months ago, and there people got freaked out about AI, and that was merely GPT-4. We will get a lot of fire alarms, especially via safety incidents. A lot of people are already primed for apocalyptic narratives, and if AI progresses in a big way, this will fan the flames into a potential AI-killer, supported by politicians. It’s not impossible for tech companies to defuse this, but damn is it hard to defuse.
I worry about the opposite problem, in that if existential risk concerns look less and less likely, AI regulation may nonetheless become quite severe, and the AI organizations built by LessWrongers have systematic biases that will prevent them from updating to this position.
One addition I’d make here is: I think what people imagine, when they imagine “us” noticing an AGI going rogue and “fighting back”, is movie scenarios where the obviously evil AGI becomes obviously evil in a way that’s obvious to everyone, and then it’s a neatly arranged white-and-black humanity vs. machines all-out fight.
But in real life, such unambiguousness is rare. The monsters don’t look obviously evil, the signs of fatal issues are rarely blatant. Is this whiff of smoke a sign of fire, or just someone nearby being bad at cooking? Is this creepy guy actually planning to assault you, or you’re just being paranoid? Is this weird feeling in your chest a sign of an impending heart attack, or just some biological noise? Is this epidemic truly following an exponential curve, or it’s going to peter out somehow?
Are you really, really sure the threat is so major? So sure you’d actually take those drastic actions — call emergency services, throw a fit, declare a quarantine — and risk wasting resources and doing harm and looking foolish for overreacting? Nah, wouldn’t do to panic, that’s not socially appropriate at all. Better act very concerned, but in a calm, high-status fashion. Maybe it’ll all work itself out on its own!
And the AGI, if it’s worth the name, would not fail to exploit this. It may start clearly acting to amass power, but there would always be a prosocial, plausible-sounding justification for why it’s doing that, it’d never stop making pleasant noises about having people’s best interests at heart, it’d never stop being genuinely useful to someone such that there’d always be clear harm in shutting it down. The doubt would never go away.
Much like there’s no fire alarm for AGI, there would be no fire alarm for the treacherous turn. There would never be a moment, except maybe right before the end, where “we must stop the malign AGI from killing us all!” would sound obviously right to everyone. There would always be ambiguity, this sort of message would always appear a bit histrionic, an extremist stance, a stance that — gasp — can have downsides if genuinely implemented, and what if we then turn around and realize we jumped at shadows?
The status-quo bias, asymmetric justice, the Copenhagen interpretation of ethics, threat ambiguity — all of that would be acting to ensure this. “Humanity vs. AGI” will never look like “humanity vs. AGI” to humanity.
(Which also, by the way, is why all the various arguments like “we’d have so many more resources than the AGI at the start, look how many nukes we have, brawn sometimes beats brains, we can totally prevail in an all-out war!” have nothing to do with the realities of AGI Ruin.)
While I definitely agree that a fight between humanity and AGI will never look like humanity vs AGI due to the issues with the abstraction of humanity, I do think one key disagreement I have with this comment is that I don’t think that there is no fire alarm for AGI, and in general my model is that if anything a lot of people will support very severe restrictions on AI and AI progress for safety. I think this already happened several months ago, and there people got freaked out about AI, and that was merely GPT-4. We will get a lot of fire alarms, especially via safety incidents. A lot of people are already primed for apocalyptic narratives, and if AI progresses in a big way, this will fan the flames into a potential AI-killer, supported by politicians. It’s not impossible for tech companies to defuse this, but damn is it hard to defuse.
I worry about the opposite problem, in that if existential risk concerns look less and less likely, AI regulation may nonetheless become quite severe, and the AI organizations built by LessWrongers have systematic biases that will prevent them from updating to this position.