Ofer comments on Power-seeking for successive choices

Ofer 13 Aug 2021 18:12 UTC
LW: 1 AF: 1
AF

So I think it is an accurate description, in that it flags that “options” is not just the normal intuitive version of options.

I think the quoted description is not at all what the theorems in the paper show, no matter what concept the word “options” (in scare quotes) refers to. In order to apply the theorems we need to show that an involution with certain properties exist; not that <some set of things after action 1> is larger than <some set of things after action 2>.

To be more specific, the concept that the word “options” refers to here is recurrent state distributions. If the quoted description was roughly correct, there would not be a problem with applying the theorems in stochastic environments. But in fact the theorems can almost never be applied in stochastic environments. For example, suppose action 1 leads to more available “options”, and action 2 causes “immediate death” with probability 0.7515746, and that precise probability does not appear in any transition that follows action 1. We cannot apply the theorems because no involution with the necessary properties exists.
- TurnTrout 13 Aug 2021 18:29 UTC
  LW: 6 AF: 5
  AF Parent
  You’re being unhelpfully pedantic. The quoted portion even includes the phrase “As a quick summary (read the paper and sequence if you want more details)”! This reads to me as an attempted pre-emption of “gotcha” comments.
  The phenomena you discuss are explained in the paper (EDIT: top of page 9), and in other posts, and discussed at length in other comment threads. But this post isn’t about the stochastic sensitivity issue, and I don’t think it should have to talk about the sensitivity issue.
  - Ofer 16 Aug 2021 12:21 UTC
    LW: 1 AF: 1
    AF Parent
    
    The phenomena you discuss are explained in the paper (EDIT: top of page 9), and in other posts, and discussed at length in other comment threads. But this post isn’t about the stochastic sensitivity issue, and I don’t think it should have to talk about the sensitivity issue.
    
    I noticed that after my previous comment you’ve edited your comment to include the page number and the link. Thanks.
    
    I still couldn’t find in the paper (top of page 9) an explanation for the “stochastic sensitivity issue”. Perhaps you were referring to the following:
    
    randomly generated MDPs are unlikely to satisfy our sufficient conditions for POWER-seeking tendencies
    
    But the issue is with stochastic MDPs, not randomly generated MDPs.
    
    Re the linked post section, I couldn’t find there anything about stochastic MDPs.
    - TurnTrout 16 Aug 2021 16:26 UTC
      LW: 2 AF: 2
      AF Parent
      For (3), environments which “almost” have the right symmetries should also “almost” obey the theorems. To give a quick, non-legible sketch of my reasoning:
      For the uniform distribution over reward functions on the unit hypercube ( $[0, 1]^{| S |}$ ), optimality probability should be Lipschitz continuous on the available state visit distributions (in some appropriate sense). Then if the theorems are “almost” obeyed, instrumentally convergent actions still should have extremely high probability, and so most of the orbits still have to agree.
      So I don’t currently view (3) as a huge deal. I’ll probably talk more about that another time.
      - Ofer 18 Aug 2021 16:39 UTC
        LW: 1 AF: 1
        AF Parent
        That quote does not seem to mention the “stochastic sensitivity issue”. In the post that you linked to, “(3)” refers to:
        
        Not all environments have the right symmetries
        But most ones we think about seem to
        
        So I’m still not sure what you meant when you wrote “The phenomena you discuss are explained in the paper (EDIT: top of page 9), and in other posts, and discussed at length in other comment threads.”
        
        (Again, I’m not aware of any previous mention of the “stochastic sensitivity issue” other than in my comment here.)
  - Ofer 13 Aug 2021 19:05 UTC
    −1 points
    AF Parent
    
    The phenomena you discuss are explainted in the paper, and in other posts, and discussed at length in other comment threads.
    
    I haven’t found an explanation about the “stochastic sensitivity issue” in the paper, can you please point me to a specific section/page/quote? All that I found about this in the paper was the sentence:
    
    Our theorems apply to stochastic environments, but we present a deterministic case study for clarity.
    
    (I’m also not aware of previous posts/threads that discuss this, other than my comment here.)
    
    I brought up this issue as a demonstration of the implications of incorrectly assuming that the theorems in the paper apply when there are more “options” available after action 1 than after action 2.
    
    (I argue that this issue shows that the informal description in the OP does not correctly describe the theorems in the paper, and it’s not just a matter of omitting details.)
    What links here?
    Ofer's comment on Power-seeking for successive choices by adamShimi (16 Aug 2021 12:21 UTC; 1 point)