Pattern comments on Environmental Structure Can Cause Instrumental Convergence

Pattern 10 Aug 2021 20:49 UTC
2 points
For example, when γ≈1, most reward functions provably incentivize not immediately dying in Pac-Man.
Do they incentivize beating the level? Or is that treated like dying—irreversible, and undesirable, with a preference for waiting, say, to choose which order to eat the last two pieces in? (What about eating ghosts, which eventually come back? Eating the power up that enables eating ghosts, but is consumed in the process?)
- TurnTrout 10 Aug 2021 21:24 UTC
  2 points
  Parent
  Under the natural Pac-Man model (where different levels have different mechanics), then yes, agents will tend to want to beat the level—because at any point in time, most of the remaining possibilities are in future levels, not the current one.
  Eating ghosts is more incidental; the agent will probably tend to eat ghosts as an instrumental move for beating the level.
  - Pattern 11 Aug 2021 16:19 UTC
    2 points
    Parent
    Perhaps I was over interpreting the diagram with the ‘wait’ option, then.
    - TurnTrout 11 Aug 2021 16:36 UTC
      2 points
      Parent
      Can you say more? I don’t think there’s a way to “wait” in Pac-Man, although I suppose you could always loop around the level in a particular repeating fashion such that you keep revisiting the same state.
      - Pattern 11 Aug 2021 16:45 UTC
        2 points
        Parent
        loop around the level in a particular repeating fashion
        That’s what I meant by wait.
        On second thought, it can’t be a ‘wait to choose between 2+ options unless there are 2+ options’, because the end of the level isn’t a choice between 2 things. (Although if we pay attention to the last 2 things Pac-Man has to eat, then there’s a choice between the order to eat them in, but that leads to the same state, so it probably doesn’t matter.)
        
        Mostly I was trying to figure out how this generalizes, because it seemed like it was as much about winning as losing (because both end the game):
        A portion of a Tic-Tac-Toe game-tree against a fixed opponent policy. Whenever we make a move that ends the game, we can’t go anywhere else – we have to stay put. Then most reward functions incentivize the green actions over the black actions: average-reward optimal policies are particularly likely to take moves which keep the game going. The logic is that any
        lose-immediately-with-given-black-move
        reward function can be permuted into a
        stay-alive-with-green-move
        reward function.