This in turns leads to one of the strongest result of Alex’s paper: for any “well-behaved” distribution on reward functions, if the environment has the sort of symmetry I mentioned, then for at least half of the permutations of this distribution, at least half of the probability mass will be on reward functions for which the optimal policy is power-seeking.
Clarification:
The instrumental convergence (formally, optimality probability) results apply to all distributions over reward functions. So, the “important” part of my results apply to permutations of arbitrary distributions—no well-behavedness is required.
The formal-POWER results apply to bounded distributions over reward functions. This guarantees that POWER’s expectation is well-defined.
The paper isn’t currently very clear on that point—only mentioning it in footnote 1 on page 6.
Clarification:
The instrumental convergence (formally, optimality probability) results apply to all distributions over reward functions. So, the “important” part of my results apply to permutations of arbitrary distributions—no well-behavedness is required.
The formal-POWER results apply to bounded distributions over reward functions. This guarantees that POWER’s expectation is well-defined.
The paper isn’t currently very clear on that point—only mentioning it in footnote 1 on page 6.