Suppose I would end up walking out the window. And it would be the wrong action for me to take. I would be foiled by a bunch of bad heuristics and biases I’d internalized over the course of my omnicidal plot. There would be no agent corresponding to me whose values would be satisfied by this.
It would be not unlike, say, manipulating and gaslighting someone until they decide to kill their entire family. This would be against the values the person would claim as their “truer” ones, but in the moment, under the psychological pressure and the influence of some convincing lies, it’d (incorrectly) feel to them like a good idea.
Suppose I would end up walking out the window. And it would be the wrong action for me to take. I would be foiled by a bunch of bad heuristics and biases I’d internalized over the course of my omnicidal plot. There would be no agent corresponding to me whose values would be satisfied by this.
It would be not unlike, say, manipulating and gaslighting someone until they decide to kill their entire family. This would be against the values the person would claim as their “truer” ones, but in the moment, under the psychological pressure and the influence of some convincing lies, it’d (incorrectly) feel to them like a good idea.