Rohin Shah comments on Environmental Structure Can Cause Instrumental Convergence

Rohin Shah 25 Jun 2021 15:32 UTC
LW: 2 AF: 2
AF
Thanks for the correction.
That one in particular isn’t a counterexample as stated, because you can’t construct a subgraph isomorphism for it. When writing this I thought that actually meant I didn’t need more of a caveat (contrary to what I said to you earlier), but now thinking about it a third time I really do need the “no cycles” caveat. The counterexample is:
Z <--> S --> A --> B
With every state also having a self-loop.
In this case, the involution {S:B, Z:A}, would suggest that S --> Z would have more options than S --> A, but optimal policies will take S --> A more often than S --> Z.
(The theorem effectively says “no cycles” by conditioning on the policy being S --> Z or S --> A, in which case the S --> Z --> S --> S --> … possibility is not actually possible, and the involution doesn’t actually go through.)
EDIT: I’ve changed to say that the actions lead to disjoint parts of the state space.
- TurnTrout 25 Jun 2021 15:50 UTC
  LW: 4 AF: 4
  AF Parent
  My take on it has been, the theorem’s bottleneck assumption implies that you can’t reach S again after taking action a1 or a2, which rules out cycles.
  - Rohin Shah 25 Jun 2021 15:57 UTC
    LW: 2 AF: 2
    AF Parent
    Yeah actually that works too
- Ofer 25 Jun 2021 20:36 UTC
  LW: 1 AF: 1
  AF Parent
  
  That one in particular isn’t a counterexample as stated, because you can’t construct a subgraph isomorphism for it.
  
  Probably not an important point, but I don’t see why we can’t use the identity isomorphism (over the part of the state space that a1 leads to).
  - Rohin Shah 26 Jun 2021 7:41 UTC
    LW: 2 AF: 2
    AF Parent
    This particular argument is not talking about farsightedness—when we talk about having more options, each option is talking about the entire journey and exact timesteps, rather than just the destination. Since all the “journeys” starting with the S --> Z action go to Z first, and all the “journeys” starting with the S --> A action go to A first, the isomorphism has to map A to Z and vice versa, so that $ϕ (T (S, a_{1})) = T (S, a_{2})$ .
    (What assumption does this correspond to in the theorem? In the theorem, the involution has to map $F_{a}$ to a subset of $F_{a^{'}}$ ; every possibility in $F_{a_{1}}$ starts with A, and every possibility in $F_{a_{2}}$ starts with Z, so you need to map A to Z.)