Vladimir_Nesov comments on The Absent-Minded Driver

Vladimir_Nesov 16 Sep 2009 20:29 UTC
0 points
What do you mean? p is the only control parameter… You consider a set of “global” mixed strategies, indexed by p, and pick one that leads to the best outcome, without worrying about where your mind that does this calculation is currently located and under what conditions you are thinking this thought.
What links here?
- khafra's comment on The I-Less Eye by rwallace (29 Mar 2010 20:24 UTC; 0 points)
- SilasBarta 16 Sep 2009 20:36 UTC
  0 points
  Parent
  
  What do you mean? p is the only control parameter…
  
  Perhaps, but it’s an innovation to think of the problem in terms of “solving for the random fraction of times I’m going to do them”. That is, even considering that you should add randomness in between your decision and what you do, is an insight. What focused your attention on optimizing with respect to p?
  - Vladimir_Nesov 16 Sep 2009 20:42 UTC
    1 point
    Parent
    Mixed strategy is a standard concept, so here we are considering a set S of all (global) mixed strategies available for the game. When you are searching for the best strategy, you are maximizing the payoff over S. You are searching for the mixed strategy that gives the best payoff. What UDT tells is that you should just do that, even if you are considering what to do in a situation where some of the options have run out, and, as here, even if you have no idea where you are. “The best strategy” quite literally means
    
    $s^{*} = arg {max}_{s \in S} E U (s$ )
    
    The only parameter for a given strategy is the probability of turning, so it’s natural to index the strategies by that probability. This indexing is a mapping t:[0,1]->S that places a mixed strategy in correspondence with a value of turning probability. Now, we can rewrite the expected utility maximization in terms of probability:
    
    $s^{*} = t (p^{*}$ ,\%20p^*=\arg\max_{p\in%20[0,1]}%20EU(t(p)))
    
    For a strategy corresponding to turning probability p, it’s easy to express corresponding expected utility:
    
    $E U (t (p$ )%20=%20(1-p)\cdot%200%20+%20p\cdot%20((1-p)\cdot%204%20+%20p\cdot%201))%20=p^2+4p(1-p))
    
    We now can find the optimal strategy as
    
    $s^{*} = t (p^{*}$ ,\%20p^*=\arg\max_{p\in%20[0,1]}(p^2+4p(1-p)))
    - SilasBarta 16 Sep 2009 22:11 UTC
      1 point
      Parent
      Okay, that’s making more sense—the part where you get to parameterizing p as a real is what I was interested in.
      
      But do you do the same thing when applying UDT to Newcomb’s problem? Do you consider it a necessary part of UDT that you take p (with 0<=p<=1) as a continuous parameter to maximize over, where p is the probability of one-boxing?
      - Vladimir_Nesov 17 Sep 2009 2:40 UTC
        1 point
        Parent
        Fundamentally, this depends on the setting—you might not be given a random number generator (randomness is defined with respect to the game), and so the strategies that depend on a random value won’t be available in the set of strategies to choose from. In Newcomb’s problem, the usual setting is that you have to be fairly deterministic or Omega punishes you (so that a small probability of two-boxing may even be preferable to pure one-boxing, or not, depending on Omega’s strategy), or Omega may be placed so that your strategy is always deterministic for it (effectively, taking mixed strategies out of the set of allowed ones).