Scott Garrabrant comments on Single player extensive-form games as a model of UDT

Scott Garrabrant 25 Feb 2014 18:37 UTC
6 points
0
You are ignoring an important detail.

1) In the (standard?) interpretation of an extensive form of a game, a strategy is a probability distribution on functions from information sets to outputs.

2) In another interpretation, a strategy is a function from information sets to independent probability distributions on outputs.

These are different. I thing that you are assuming the second interpretation, but I also believe that the first interpretation is more common.

These two interpretations are in fact very different. In the first interpretation, there is never any incentive to use randomness in a one player game and it is not possible to win the absent minded driver problem.

In some sense, it does not matter, you can take a problem modeled as one, and change it to be modeled as the other, at least approximately if you want to have a finite game graph. However when taking a real world problem and coming up with a game model, you have to be aware of what interpretation you are using.

I feel like which interpretation is more accurate is an actual empirical question that I do not know the answer to. It boils down to the following question:

I have copied you, and put the two identical copies in identical situation. You must both choose A or B. If you choose the same thing, you lose if you choose different things, you win.

Under one interpretation, you choose the same thing and both lose no matter what. Under the other interpretation, you can independently choose randomly, and win with probability ¹⁄₂.
- badger 25 Feb 2014 20:02 UTC
  4 points
  Parent
  The first option is standard. When the second interpretation comes up, those strategies are referred to as behavior strategies.
  
  If every information set is visited at most once in the course of play, then the game satisfies no-absent-mindedness and every behavior strategy can be represented as a standard mixed strategy (but some mixed strategies don’t have equivalent behavior strategies).
  
  Kuhn’s theorem says the game has perfect recall (roughly players never forget anything and there is a clear progression of time) if and only if mixed and behavior strategies are equivalent.
  - Scott Garrabrant 25 Feb 2014 20:12 UTC
    0 points
    Parent
    Thank you, I did not know the terminology.
    
    The types of games we care about that inspire us to have to use UDT do not have perfect recall, so whether or not behavior strategies are possible is an important question. It also feels like an empirical question.
- cousin_it 25 Feb 2014 20:08 UTC
  0 points
  Parent
  I think interpretation 1 is usually called “mixed strategy” and interpretation 2 is “behavioral strategy”. In the post I was indeed assuming interpretation 2. Thanks for pointing that out! I have edited the post accordingly.
  - Scott Garrabrant 25 Feb 2014 20:15 UTC
    0 points
    Parent
    I do not think it is something you should just assume. I think it is an empirical question. I think that behavioral strategies might not be realistic, because they seem to depend on non-determinism.
    - cousin_it 25 Feb 2014 20:17 UTC
      3 points
      Parent
      Well, in the Absent-Minded Driver problem it seems reasonable to allow the driver to flip a coin whenever he’s faced with a choice. Why do you think that’s unrealistic?
      - Scott Garrabrant 25 Feb 2014 20:32 UTC
        1 point
        Parent
        Hmm. I was thinking that determinism requires that you get the same output in the same situation, but I guess I was not accounting for the fact that we do not require the two nodes in the information set to be the same situation, we only require that they are indistinguishable to the agent.
        
        It does seem realistic to have the absent minded driver flip a coin. (although perhaps it is better to model that as a third option of flipping a coin, which points to chance node.)
        
        On the other hand, If I am a deterministic Turing machine, and Omega simulates me and puts a dollar in whichever of two boxes he predicts I will not pick, then I cannot win this game unless I have an outside source of randomness.
        
        It seems like in different situations, you want different models. It seems to me like you have two different types of agents: a deterministic dUDT agent and a randomized rUDT agent. We should be looking at both, because they are not the same. I also do not know which one I am as a human.
        
        By asking about the Absent-Minded Driver with a coin, you were phrasing the problem so that it does not matter, because an rUDT agent is just a dUDT agent which has access to a fair coin that he can flip any number of times at no cost.
        cousin_it 25 Feb 2014 20:43 UTC
        0 points
        Parent
        I agree that there is a difference, and I don’t know which model describes humans better. It doesn’t seem to matter much in any of our toy problems though, apart from AMD where we really want randomness. So I think I’m going to keep the post as is, with the understanding that you can remove randomness from the model if you really want to.
        Scott Garrabrant 25 Feb 2014 20:59 UTC
        0 points
        Parent
        I agree that that is a good solution. Since adding randomness to a node is something that can be done in a formulaic way, it makes sense to have information sets which are just labeled as “you can use behavioral strategies here” It also makes sense to have them labeled as such by default.
        
        I do not think that agents wanting but not having randomness is any more pathological than Newcomb’s problem (Although that is already pretty pathological)
- Scott Garrabrant 25 Feb 2014 18:41 UTC
  0 points
  Parent
  I suppose there is a third interpretation, which is that the player can mix and match between two types of randomness, and make the options in some instances of an information set correlated with some instances of an information set. This is not useful in a 1 player game, however, because any interpretation 1 type randomness does not help at all, so if the game is 1 player, you might as well just call this interpretation 2.