For example, when γ≈1, most reward functions provably incentivize not immediately dying in Pac-Man.
Do they incentivize beating the level? Or is that treated like dying—irreversible, and undesirable, with a preference for waiting, say, to choose which order to eat the last two pieces in? (What about eating ghosts, which eventually come back? Eating the power up that enables eating ghosts, but is consumed in the process?)
Under the natural Pac-Man model (where different levels have different mechanics), then yes, agents will tend to want to beat the level—because at any point in time, most of the remaining possibilities are in future levels, not the current one.
Eating ghosts is more incidental; the agent will probably tend to eat ghosts as an instrumental move for beating the level.
Can you say more? I don’t think there’s a way to “wait” in Pac-Man, although I suppose you could always loop around the level in a particular repeating fashion such that you keep revisiting the same state.
loop around the level in a particular repeating fashion
That’s what I meant by wait.
On second thought, it can’t be a ‘wait to choose between 2+ options unless there are 2+ options’, because the end of the level isn’t a choice between 2 things. (Although if we pay attention to the last 2 things Pac-Man has to eat, then there’s a choice between the order to eat them in, but that leads to the same state, so it probably doesn’t matter.)
Mostly I was trying to figure out how this generalizes, because it seemed like it was as much about winning as losing (because both end the game):
A portion of a Tic-Tac-Toe game-tree against a fixed opponent policy. Whenever we make a move that ends the game, we can’t go anywhere else – we have to stay put. Then most reward functions incentivize the green actions over the black actions: average-reward optimal policies are particularly likely to take moves which keep the game going. The logic is that any
Do they incentivize beating the level? Or is that treated like dying—irreversible, and undesirable, with a preference for waiting, say, to choose which order to eat the last two pieces in? (What about eating ghosts, which eventually come back? Eating the power up that enables eating ghosts, but is consumed in the process?)
Under the natural Pac-Man model (where different levels have different mechanics), then yes, agents will tend to want to beat the level—because at any point in time, most of the remaining possibilities are in future levels, not the current one.
Eating ghosts is more incidental; the agent will probably tend to eat ghosts as an instrumental move for beating the level.
Perhaps I was over interpreting the diagram with the ‘wait’ option, then.
Can you say more? I don’t think there’s a way to “wait” in Pac-Man, although I suppose you could always loop around the level in a particular repeating fashion such that you keep revisiting the same state.
That’s what I meant by wait.
On second thought, it can’t be a ‘wait to choose between 2+ options unless there are 2+ options’, because the end of the level isn’t a choice between 2 things. (Although if we pay attention to the last 2 things Pac-Man has to eat, then there’s a choice between the order to eat them in, but that leads to the same state, so it probably doesn’t matter.)
Mostly I was trying to figure out how this generalizes, because it seemed like it was as much about winning as losing (because both end the game):