TurnTrout comments on Environmental Structure Can Cause Instrumental Convergence

TurnTrout 13 Aug 2021 21:26 UTC
LW: 2 AF: 2
AF
Gotcha. I see where you’re coming from.
I think I underspecified the scenario and claim. The claim wasn’t supposed to be: most agents never break the vase (although this is sometimes true). The claim should be: most agents will not immediately break the vase.
If the agent has a choice between one action (“break vase and move forwards”) or another action (“don’t break vase and more forwards”), and these actions lead to similar subgraphs, then at all discount rates, optimal policies will tend to not break the vase immediately. But they might tend to break it eventually, depending on the granularity and balance of final states.
So I think we’re actually both making a correct point, but you’re making an argument for $γ = 1$ under certain kinds of models and whether the agent will eventually break the vase. I (meant to) discuss the immediate break-it-or-not decision in terms of option preservation at all discount rates.
[Edited to reflect the ancestor comments]
- Ofer 16 Aug 2021 12:26 UTC
  LW: 1 AF: 1
  AF Parent
  
  The claim should be: most agents will not immediately break the vase.
  
  I don’t see why that claim is correct either, for a similar reason. If you’re assuming here that most reward functions incentivize avoiding immediately breaking the vase then I would argue that that assumption is incorrect, and to support this I would point to the same involution from my previous comment.
  - TurnTrout 16 Aug 2021 16:47 UTC
    LW: 2 AF: 2
    AF Parent
    I‘m not assuming that they incentivize anything. They just do! Here’s the proof sketch (for the full proof, you’d subtract a constant vector from each set, but not relevant for the intuition).
    
    &You’re playing a tad fast and loose with your involution argument. Unlike the average-optimal case, you can’t just map one set of states to another for all-discount-rates reasoning.
    - Ofer 18 Aug 2021 16:30 UTC
      LW: 1 AF: 1
      AF Parent
      Thanks for the figure. I’m afraid I didn’t understand it. (I assume this is a gridworld environment; what does “standing near intact vase” mean? Can the robot stand in the same cell as the intact vase?)
      
      &You’re playing a tad fast and loose with your involution argument. Unlike the average-optimal case, you can’t just map one set of states to another for all-discount-rates reasoning.
      
      I don’t follow (To be clear, I was not trying to apply any theorem from the paper via that involution). But does this mean you are NOT making that claim (“most agents will not immediately break the vase”) in the limit of the discount rate going to 1? My understanding is that the main claim in the abstract of the paper is meant to assume that setting, based on the following sentence from the paper:
      
      Proposition 6.5 and proposition 6.9 are powerful because they apply to all $γ \in [0, 1]$ , but they can only be applied given hard-to-satisfy environmental symmetries.