I agree with your point as stated, but I think a sharper distinction between utility-maximizing and reward-maximizing reveals more alternatives.
A reward-maximizing agent attempts to predict A; D maximizes this predicted future A.
A utility-maximizing agent has direct access to A; D applies A to evaluate possible futures, and maximizes A.
In the first case, a superintelligent D would want to wrestle control of A and modify it.
In the second case, when D thinks about the planned modification of A, it evaluates this possible future using the current A. It sees that the current A does not value this future particularly highly. Therefore, it does not wirehead.
I agree with your point as stated, but I think a sharper distinction between utility-maximizing and reward-maximizing reveals more alternatives.
A reward-maximizing agent attempts to predict A; D maximizes this predicted future A.
A utility-maximizing agent has direct access to A; D applies A to evaluate possible futures, and maximizes A.
In the first case, a superintelligent D would want to wrestle control of A and modify it.
In the second case, when D thinks about the planned modification of A, it evaluates this possible future using the current A. It sees that the current A does not value this future particularly highly. Therefore, it does not wirehead.