Why do nonmyopic agents end up power-seeking? Because the supervisor rates some states highly, and so the agent is incentivised to gain power in order to reach those states.
Why do myopic agents end up power-seeking? Because to train a competitive myopic agent, the supervisor will need to calculate how much approval they assign to actions based on how much those actions contribute to reaching valuable states. So the agent will be rewarded for taking actions which acquire it more power, since the supervisor will predict that those contribute to reaching valuable states.
(You might argue that, if the supervisor doesn’t want the agent to be power-seeking, they’ll only approve of actions which gain the agent more power in specified ways. But equivalently a reward function can also penalise unauthorised power-gaining, given equal ability to notice it by the supervisors in both cases.)
I now think that I was thinking of myopic cognition, whereas you are talking about myopic training. Oops! This is obvious in hindsight (and now I’m wondering how I missed it), but maybe you could edit the post to draw a clear contrast?
Why do nonmyopic agents end up power-seeking? Because the supervisor rates some states highly, and so the agent is incentivised to gain power in order to reach those states.
Why do myopic agents end up power-seeking? Because to train a competitive myopic agent, the supervisor will need to calculate how much approval they assign to actions based on how much those actions contribute to reaching valuable states. So the agent will be rewarded for taking actions which acquire it more power, since the supervisor will predict that those contribute to reaching valuable states.
(You might argue that, if the supervisor doesn’t want the agent to be power-seeking, they’ll only approve of actions which gain the agent more power in specified ways. But equivalently a reward function can also penalise unauthorised power-gaining, given equal ability to notice it by the supervisors in both cases.)
I now think that I was thinking of myopic cognition, whereas you are talking about myopic training. Oops! This is obvious in hindsight (and now I’m wondering how I missed it), but maybe you could edit the post to draw a clear contrast?
Ah, makes sense. There’s already a paragraph on this (starting “I should note that so far”), but I’ll edit to mention it earlier.
This is likely the crux of our disagreement, but I don’t have time to reply ATM. Hope to return to this.