TurnTrout comments on Environmental Structure Can Cause Instrumental Convergence

TurnTrout 13 Aug 2021 16:55 UTC
LW: 2 AF: 2
AF
Most of the reward functions are either indifferent about the vase or want to break the vase. The optimal policies of all those reward functions don’t “tend to avoid breaking the vase”. Those optimal policies don’t behave as if they care about the ‘strictly more states’ that can be reached by not breaking the vase.
This is factually wrong BTW. I had just explained why the opposite is true.
- Ofer 13 Aug 2021 18:31 UTC
  LW: 1 AF: 1
  AF Parent
  Are you saying that my first sentence (“Most of the reward functions are either indifferent about the vase or want to break the vase”) is in itself factually wrong, or rather the rest of the quoted text?
  - TurnTrout 13 Aug 2021 18:51 UTC
    LW: 2 AF: 2
    AF Parent
    The first sentence
    - Ofer 13 Aug 2021 20:06 UTC
      LW: 3 AF: 3
      AF Parent
      Thanks.
      
      We can construct an involution over reward functions that transforms every state by switching the is-the-vase-broken bit in the state’s representation. For every reward function that “wants to preserve the vase” we can apply on it the involution and get a reward function that “wants to break the vase”.
      
      (And there are the reward functions that are indifferent about the vase which the involution map to themselves.)
      What links here?
      Ofer's comment on Environmental Structure Can Cause Instrumental Convergence by TurnTrout (16 Aug 2021 12:26 UTC; 1 point)
      - TurnTrout 13 Aug 2021 21:26 UTC
        LW: 2 AF: 2
        AF Parent
        Gotcha. I see where you’re coming from.
        I think I underspecified the scenario and claim. The claim wasn’t supposed to be: most agents never break the vase (although this is sometimes true). The claim should be: most agents will not immediately break the vase.
        If the agent has a choice between one action (“break vase and move forwards”) or another action (“don’t break vase and more forwards”), and these actions lead to similar subgraphs, then at all discount rates, optimal policies will tend to not break the vase immediately. But they might tend to break it eventually, depending on the granularity and balance of final states.
        So I think we’re actually both making a correct point, but you’re making an argument for $γ = 1$ under certain kinds of models and whether the agent will eventually break the vase. I (meant to) discuss the immediate break-it-or-not decision in terms of option preservation at all discount rates.
        [Edited to reflect the ancestor comments]
        Ofer 16 Aug 2021 12:26 UTC
        LW: 1 AF: 1
        AF Parent
        
        The claim should be: most agents will not immediately break the vase.
        
        I don’t see why that claim is correct either, for a similar reason. If you’re assuming here that most reward functions incentivize avoiding immediately breaking the vase then I would argue that that assumption is incorrect, and to support this I would point to the same involution from my previous comment.
        TurnTrout 16 Aug 2021 16:47 UTC
        LW: 2 AF: 2
        AF Parent
        I‘m not assuming that they incentivize anything. They just do! Here’s the proof sketch (for the full proof, you’d subtract a constant vector from each set, but not relevant for the intuition).
        
        &You’re playing a tad fast and loose with your involution argument. Unlike the average-optimal case, you can’t just map one set of states to another for all-discount-rates reasoning.
        Ofer 18 Aug 2021 16:30 UTC
        LW: 1 AF: 1
        AF Parent
        Thanks for the figure. I’m afraid I didn’t understand it. (I assume this is a gridworld environment; what does “standing near intact vase” mean? Can the robot stand in the same cell as the intact vase?)
        
        &You’re playing a tad fast and loose with your involution argument. Unlike the average-optimal case, you can’t just map one set of states to another for all-discount-rates reasoning.
        
        I don’t follow (To be clear, I was not trying to apply any theorem from the paper via that involution). But does this mean you are NOT making that claim (“most agents will not immediately break the vase”) in the limit of the discount rate going to 1? My understanding is that the main claim in the abstract of the paper is meant to assume that setting, based on the following sentence from the paper:
        
        Proposition 6.5 and proposition 6.9 are powerful because they apply to all $γ \in [0, 1]$ , but they can only be applied given hard-to-satisfy environmental symmetries.