Ofer comments on Seeking Power is Convergently Instrumental in a Broad Class of Environments

Ofer 9 Aug 2021 17:06 UTC
LW: 5 AF: 5
AF

They would change quantitatively, but the upshot would probably be similar. For example, for the Kolmogorov prior, you could prove theorems like “for every reward function that <doesn’t do the thing>, there are N reward functions that <do the thing> that each have at most a small constant more complexity” (since you can construct them by taking the original reward function and then apply the relevant permutation / move through the orbit, and that second step has constant K-complexity). Alex sketches out a similar argument in this post.

I don’t see how this works. If you need $n$ bits to specify the permutation for that “second step”, the probability of each of the N reward functions will be smaller than the original one’s by a factor of $2^{- n}$ . So you need $n$ to be smaller than $l o g_{2} (N)$ which is impossible?
- Rohin Shah 10 Aug 2021 7:18 UTC
  LW: 2 AF: 2
  AF Parent
  So you need $n$ to be smaller than $l o g_{2} (N)$ which is impossible?
  You’re right, in my previous comment it should be “at most a small constant more complexity + $O (log N)$ ”, to specify the number of times that the permutation has to be applied.
  I still think the upshot is similar; in this case it’s “power-seeking is at best a little less probable than non-power-seeking, but you should really not expect that the bound is tight”.
  - Ofer 10 Aug 2021 16:10 UTC
    LW: 12 AF: 11
    AF Parent
    I still don’t see how this works. The “small constant” here is actually the length of a program that needs to contain a representation of the entire MDP (because the program needs to simulate the MDP for each possible permutation). So it’s not a constant; it’s an unbounded integer.
    
    Even if we restrict the discussion to a given very-simple-MDP, the program needs to contain way more than 100 bits (just to represent the MDP + the logic that checks whether a given permutation satisfies the relevant condition). So the probability of the POWER-seeking reward functions that are constructed here is smaller than $2^{- 100}$ of the probability of the non-POWER-seeking reward functions. [EDIT: I mean, the probability of the constructed reward functions may happen to be larger, but the proof sketch doesn’t show that it is.]
    
    (As an aside, the permutations that we’re dealing with here are equal to their own inverse, so it’s not useful to apply them multiple times.)
    - Rohin Shah 10 Aug 2021 20:57 UTC
      LW: 5 AF: 5
      AF Parent
      Yes, good point. I retract the original claim.
      As an aside, the permutations that we’re dealing with here are equal to their own inverse, so it’s not useful to apply them multiple times.
      (You’re right, what you do here is you search for the kth permutation satisfying the theorem’s requirements, where k is the specified number.)
    - TurnTrout 10 Aug 2021 16:27 UTC
      LW: 2 AF: 2
      AF Parent
      contain a representation of the entire MDP (because the program needs to simulate the MDP for each possible permutation)
      We aren’t talking about MDPs, we’re talking about a broad class of environments which are represented via joint probability distributions over actions and observations. See post title.
      it’s an unbounded integer.
      I don’t follow.
      the program needs to contain way more than 100 bits
      See the arguments in the post Rohin linked for why this argument is gesturing at something useful even if takes some more bits.
      But IMO the basic idea in this case is, you can construct reasonably simple utility functions like “utility 1 if history has the agent taking action $a_{i}$ at time step $t$ given action-observation history prefix $h_{1 : t - 1}$ , and 0 otherwise.” This is reasonably short, and you can apply it for all actions and time steps.
      Sure, the complexity will vary a little bit (probably later time steps will be more complex), but basically you can produce reasonably simple programs which make any sequence of actions optimal. And so I agree with Rohin that simplicity priors on u-AOH will quantitatively—but not qualitatively affect the conclusions for the generic u-AOH case. [EDIT: this reasoning is different from the one Rohin gave, TBC]