“You perceive, of course, that this destroys the world.”
If the AI modifies humans so that humans want whatever happens to already exist (say, diffuse clouds of hydrogen), then this is clearly a failure scenario.
But what if the Dark Lords of the Matrix reprogrammed everyone to like murder, from the perspective of both the murderer and the murderee? Should the AI use everyone’s prior preferences as morality, and reprogram us again to hate murder? Should the AI use prior preferences, and forcibly stop everyone from murdering each other, even if this causes us a great deal of emotional trauma? Or should the AI recalibrate morality to everyone’s current preferences, and start creating lots of new humans to enable more murders?
So that gets down to the following question: should the AI set up its CEV-based utility function only once when the AI is first initialized over the population of humanity that exists at that time (or otherwise somehow cache that state so that CEV calculations can refer to it), or should it be continuously recalibrating it as humanity changes?
Which of these approaches (or some third one I haven’t anticipated) does EY’s design use? I’m not able to pick it out of the CEV paper, though that’s likely because I don’t have the necessary technical background.
Edit: The gloss definition of “interpreted as we wish that interpreted”, part of the poetic summary description of CEV, seems to imply that the CEV will update itself to match humanity as it updates itself. So: if the Dark Lords change our preferences so significantly that we can’t be coherently argued out of it, then we’d end up in the endless murder scenario. Hopefully that doesn’t happen.
“You perceive, of course, that this destroys the world.”
If the AI modifies humans so that humans want whatever happens to already exist (say, diffuse clouds of hydrogen), then this is clearly a failure scenario.
But what if the Dark Lords of the Matrix reprogrammed everyone to like murder, from the perspective of both the murderer and the murderee? Should the AI use everyone’s prior preferences as morality, and reprogram us again to hate murder? Should the AI use prior preferences, and forcibly stop everyone from murdering each other, even if this causes us a great deal of emotional trauma? Or should the AI recalibrate morality to everyone’s current preferences, and start creating lots of new humans to enable more murders?
So that gets down to the following question: should the AI set up its CEV-based utility function only once when the AI is first initialized over the population of humanity that exists at that time (or otherwise somehow cache that state so that CEV calculations can refer to it), or should it be continuously recalibrating it as humanity changes?
Which of these approaches (or some third one I haven’t anticipated) does EY’s design use? I’m not able to pick it out of the CEV paper, though that’s likely because I don’t have the necessary technical background.
Edit: The gloss definition of “interpreted as we wish that interpreted”, part of the poetic summary description of CEV, seems to imply that the CEV will update itself to match humanity as it updates itself. So: if the Dark Lords change our preferences so significantly that we can’t be coherently argued out of it, then we’d end up in the endless murder scenario. Hopefully that doesn’t happen.