drocta comments on Seeking Power is Convergently Instrumental in a Broad Class of Environments

drocta Aug 8, 2021, 11:44 PM
1 point
Do I understand correctly that in general the elements of A, B, C, are achievable probability distributions over the set of n possible outcomes? (But that in the examples given with the deterministic environments, these are all standard basis vectors / one-hot vectors / deterministic distributions ?)
And, in the case where these outcomes are deterministic, and A and B are disjoint, and A is much larger than B, then given a utility function on the possible outcomes in A or B, a random permutation of this utility function will, with high probability, have the optimal (or a weakly optimal) outcome be in A?
(Specifically, if I haven’t messed up, if asymptotically (as |B| goes to infinity) $\frac{| B |^{2}}{| A | + 1} \to 0$ then the probability of there being something in A which is weakly better than anything in B goes to 1 , and if $\frac{| B |^{2}}{| A | + 1} \to r$ then the probability goes to at least $e^{- r}$ , I think?
Coming from $\frac{(\frac{| A |}{| B |}) | B |! | A |!}{(| A | + | B |)!} = \frac{| A |!}{(| A | - | B |)!} \frac{| A |!}{(| A | + | B |)!} = \prod_{k = 0}^{| B | - 1} \frac{| A | - k}{| A | + | B | - k} = \prod_{k = 0}^{| B | - 1} (1 - \frac{| B |}{| A | + | B | - k})$ )
While I’d readily believe it, I don’t really understand why this extends to the case where the elements of A and B aren’t deterministic outcomes but distributions over outcomes. Maybe I need to review some of the prior posts.
Like, what if every element of A was a probability distribution with over 3 different observation-histories (each with probability ¹⁄₃) , and every element of B was a probability distribution over 2 different observation-histories (each with probability ¹⁄₂)? (e.g. if one changes pixel 1 at time 1, then in addition to the state of the pixel grid, one observes at random either a orange light or a purple light, while if one instead changes pixel 2 at time 1, in addition to the pixel grid state, one observes at random either a red, green, or blue light, in addition to the pixel grid) Then no permutation of the set of observations-histories would convert any element of A into an element of B, nor visa versa.
- TurnTrout Aug 9, 2021, 5:12 PM
  3 points
  Parent
  Do I understand correctly that in general the elements of A, B, C, are achievable probability distributions over the set of n possible outcomes? (But that in the examples given with the deterministic environments, these are all standard basis vectors / one-hot vectors / deterministic distributions ?)
  Yes.
  Then no permutation of the set of observations-histories would convert any element of A into an element of B, nor visa versa.
  Nice catch. In the stochastic case, you do need a permutation-enforced similarity, as you say (see definition 6.1: similarity of visit distribution sets in the paper). They won’t apply for all A, B, because that would prove way too much.