If a misaligned AI had 1/trillion “protecting the preferences of whatever weak agents happen to exist in the world”, why couldn’t it also have 1/trillion other vaguely human-like preferences, such as “enjoy watching the suffering of one’s enemies” or “enjoy exercising arbitrary power over others”?
From a purely selfish perspective, I think I might prefer that a misaligned AI kills everyone, and take my chances with continuations of myself (my copies/simulations) elsewhere in the multiverse, rather than face whatever the sum-of-desires of the misaligned AI decides to do with humanity. (With the usual caveat that I’m very philosophically confused about how to think about all of this.)
And his response was basically to say that he already acknowledged my concern in his OP:
I’m not talking about whether the AI has spite or other strong preferences that are incompatible with human survival, I’m engaging specifically with the claim that AI is likely to care so little one way or the other that it would prefer just use the humans for atoms.
Personally, I have a bigger problem with people (like Paul and Carl) who talk about AIs keeping people alive, and not talk about s-risks in the same breath or only mention it in a vague, easy to miss way, than I have with Eliezer not addressing Paul’s arguments.
Should have made it much scarier. “Superhappies” caring about humans “not in the specific way that the humans wanted to be cared for” sounds better or at least no worse than death, whereas I’m concerned about s-risks, i.e., risks of worse than death scenarios.
To clarify, I don’t actually want you to scare people this way, because I don’t know if people can psychologically handle it or if it’s worth the emotional cost. I only bring it up myself to counteract people saying things like “AIs will care a little about humans and therefore keep them alive” or when discussing technical solutions/ideas, etc.
My reply to Paul at the time:
And his response was basically to say that he already acknowledged my concern in his OP:
Personally, I have a bigger problem with people (like Paul and Carl) who talk about AIs keeping people alive, and not talk about s-risks in the same breath or only mention it in a vague, easy to miss way, than I have with Eliezer not addressing Paul’s arguments.
Was my “An important caveat” parenthetical paragraph sufficient, or do you think I should have made it scarier?
Should have made it much scarier. “Superhappies” caring about humans “not in the specific way that the humans wanted to be cared for” sounds better or at least no worse than death, whereas I’m concerned about s-risks, i.e., risks of worse than death scenarios.
This is a difficult topic (in more ways than one). I’ll try to do a better job of addressing it in a future post.
To clarify, I don’t actually want you to scare people this way, because I don’t know if people can psychologically handle it or if it’s worth the emotional cost. I only bring it up myself to counteract people saying things like “AIs will care a little about humans and therefore keep them alive” or when discussing technical solutions/ideas, etc.