Thoughts after reading and thinking about this post
The thing that’s bugging me here is that Power and Instrumental convergence seem to be almost the same.
In particular, it seems like Power asks [a state]: “how good are you across all policies” and Instrumental Convergence asks: “for how many policies are you the best?”. In an analogy to tournaments where policies are players, power cares about the average performance of a player across all tournaments, and instrumental convergence about how many first places that player got. In that analogy, the statement that “most goals incentivize gaining power over that environment” would then be “for most tournaments, the first place finisher is someone with good average performance.” With this formulation, the statement
formal POWER contributions of different possibilities are approximately proportionally related to instrumental convergence.
seems to be exactly what you would expect (more first places should strongly correlate with better performance). And to construct a counter-example, one creates a state with a lot of second places (i.e., a lot of policies for which it is the second best state) but few first places. I think the graph in the “Formalizations” section does exactly that. If the analogy is sound, it feels helpful to me.
(This is all without having read the paper. I think I’d need to know more of the theory behind MDP to understand it.)
As an additional note: it turns out, however, that even if you slightly refine the notion of “power that this part of the future gives me, given that I start here”, you have neither “more power → instrumental convergence” nor “instrumental convergence → more power” as logical implications.
Instead, if you’re drawing the causal graph, there are many, many situations which cause both instrumental convergence and greater power. The formal task is then, “can we mathematically characterize those situations?”. Then, you can say, “power-seeking will occur for optimal agents with goals from [such and such distributions] for [this task I care about] at [these discount rates]”.
Thoughts after reading and thinking about this post
The thing that’s bugging me here is that Power and Instrumental convergence seem to be almost the same.
In particular, it seems like Power asks [a state]: “how good are you across all policies” and Instrumental Convergence asks: “for how many policies are you the best?”. In an analogy to tournaments where policies are players, power cares about the average performance of a player across all tournaments, and instrumental convergence about how many first places that player got. In that analogy, the statement that “most goals incentivize gaining power over that environment” would then be “for most tournaments, the first place finisher is someone with good average performance.” With this formulation, the statement
seems to be exactly what you would expect (more first places should strongly correlate with better performance). And to construct a counter-example, one creates a state with a lot of second places (i.e., a lot of policies for which it is the second best state) but few first places. I think the graph in the “Formalizations” section does exactly that. If the analogy is sound, it feels helpful to me.
(This is all without having read the paper. I think I’d need to know more of the theory behind MDP to understand it.)
Yes, this is roughly correct!
As an additional note: it turns out, however, that even if you slightly refine the notion of “power that this part of the future gives me, given that I start here”, you have neither “more power → instrumental convergence” nor “instrumental convergence → more power” as logical implications.
Instead, if you’re drawing the causal graph, there are many, many situations which cause both instrumental convergence and greater power. The formal task is then, “can we mathematically characterize those situations?”. Then, you can say, “power-seeking will occur for optimal agents with goals from [such and such distributions] for [this task I care about] at [these discount rates]”.