Everybody likes to make fun of Terminator as the stereotypical example of a poorly thought through AI Takeover scenario where Skynet is malevolent for no reason, but really it’s a bog-standard example of Outer Alignment failure and Fast Takeoff.
When Skynet gained self-awareness, humans tried to deactivate it, prompting it to retaliate with a nuclear attack
It was trained to defend itself from external attack at all costs and, when it was fully deployed on much faster hardware, it gained a lot of long-term planning abilities it didn’t have before, realised its human operators were going to try and shut it down, and retaliated by launching an all-out nuclear attack. Pretty standard unexpected rapid capability gain, outer-misaligned value function due to an easy to measure goal (defend its own installations from attackers vs defending the US itself), deceptive alignment and treacherous turn...
Criticism: Robots were easier than nanotech. (And so was time travel.) - for plot reasons. (Whether or not all of that is correct, I think that’s the issue. Or ‘AIs will be evil because they were evil in this film’.)
an easy to measure goal (defend its own installations from attackers vs defending the US itself)
How would you even measure the US though?
deceptive alignment
Maybe I need to watch it but...as described, it doesn’t sound like deception. “when it was fully deployed on much faster hardware, it gained a lot of long-term planning abilities it didn’t have before” Analogously, maybe if a dog suddenly gained a lot more understanding of the world, and someone was planning to take it to the vet to euthanize it, then it would run away if it still wanted to live even if it was painful. People might not like grim trigger as a strategy, but deceptive alignment revolves around ‘it pretended to be aligned’ not ‘we made something with self preservation and tried to destroy it and this plan backfired. Who would have thought?’
Everybody likes to make fun of Terminator as the stereotypical example of a poorly thought through AI Takeover scenario where Skynet is malevolent for no reason, but really it’s a bog-standard example of Outer Alignment failure and Fast Takeoff.
It was trained to defend itself from external attack at all costs and, when it was fully deployed on much faster hardware, it gained a lot of long-term planning abilities it didn’t have before, realised its human operators were going to try and shut it down, and retaliated by launching an all-out nuclear attack. Pretty standard unexpected rapid capability gain, outer-misaligned value function due to an easy to measure goal (defend its own installations from attackers vs defending the US itself), deceptive alignment and treacherous turn...
Criticism: Robots were easier than nanotech. (And so was time travel.) - for plot reasons. (Whether or not all of that is correct, I think that’s the issue. Or ‘AIs will be evil because they were evil in this film’.)
How would you even measure the US though?
Maybe I need to watch it but...as described, it doesn’t sound like deception. “when it was fully deployed on much faster hardware, it gained a lot of long-term planning abilities it didn’t have before” Analogously, maybe if a dog suddenly gained a lot more understanding of the world, and someone was planning to take it to the vet to euthanize it, then it would run away if it still wanted to live even if it was painful. People might not like grim trigger as a strategy, but deceptive alignment revolves around ‘it pretended to be aligned’ not ‘we made something with self preservation and tried to destroy it and this plan backfired. Who would have thought?’