Imagine that policies decompose into two components, π=ρ⊗σ. For instance, they may be different sets of parameters in a neural network. We can then talk about the effect of one of the components by considering how it influences the power/injectivity of the features with respect to the other component.
Suppose, for instance, that ρ is such that the policy just ends up acting in a completely random-twitching way. Technically σ has a lot of effect too, in that it chaotically controls the pattern of the twitching, but in terms of the features f, σ is basically constant. This is a low power situation, and if one actually specified what f would be, then a TurnTrout-style argument could probably prove that such values of ρ would be avoided for power-seeking reasons. On the other hand, if ρ made the policy act like an optimizer which optimizes a utility function over the features of f with the utility function being specified by σ, then that would lead to a lot more power/injectivity.
On the other hand, I wonder if there’s a limit to this style of argument. Too much noninjectivity would require crazy interaction effects to fill out the space in a Hilbert-curve-style way, which would be hard to optimize?
Imagine that policies decompose into two components, π=ρ⊗σ. For instance, they may be different sets of parameters in a neural network. We can then talk about the effect of one of the components by considering how it influences the power/injectivity of the features with respect to the other component.
Suppose, for instance, that ρ is such that the policy just ends up acting in a completely random-twitching way. Technically σ has a lot of effect too, in that it chaotically controls the pattern of the twitching, but in terms of the features f, σ is basically constant. This is a low power situation, and if one actually specified what f would be, then a TurnTrout-style argument could probably prove that such values of ρ would be avoided for power-seeking reasons. On the other hand, if ρ made the policy act like an optimizer which optimizes a utility function over the features of f with the utility function being specified by σ, then that would lead to a lot more power/injectivity.
On the other hand, I wonder if there’s a limit to this style of argument. Too much noninjectivity would require crazy interaction effects to fill out the space in a Hilbert-curve-style way, which would be hard to optimize?
Actually upon thinking further I don’t think this argument works, at least not as it is written right now.