taw comments on The Absent-Minded Driver

taw 16 Sep 2009 2:49 UTC
52 points
You cannot assume any α, and choose p based on it, for α depends on p. You just introduced a time loop into your example.
What links here?
- cousin_it's comment on Stupid Questions Open Thread Round 4 by lukeprog (27 Aug 2012 1:17 UTC; 11 points)
- pengvado 16 Sep 2009 4:40 UTC
  62 points
  Parent
  Indeed, though it doesn’t have to be a time loop, just a logical dependency. Your expected payoff is α[p^2+4(1-p)p] + (1-α)[p+4(1-p)]. Since you will make the same decision both times, the only coherent state is α=1/(p+1). Thus expected payoff is (8p-6p^2)/(p+1), whose maximum is at about p=0.53. What went wrong this time? Well, while this is what you should use to answer bets about your payoff (assuming such bets are offered independently at every intersection), it is not the quantity you should maximize: it double counts the path where you visit both X and Y, which involves two instances of the decision but pays off only once.
  - Eliezer Yudkowsky 16 Sep 2009 8:12 UTC
    9 points
    Parent
    Mod parents WAY up! I should’ve tried to solve this problem on my own, but I wasn’t expecting it to be solved in the comments like that!
    
    Awesome. I’m steadily upgrading my expected utilities of handing decision-theory problems to Less Wrong.
    
    EDIT 2016: Wei Dai below is correct, this was my first time encountering this problem and I misunderstood the point Wei Dai was trying to make.
    - Paul Crowley 19 Sep 2009 15:18 UTC
      29 points
      Parent
      You make it sound as if you expect to expect a higher utility in the future than you currently expect...
    - Wei Dai 18 Sep 2009 10:07 UTC
      11 points
      Parent
      The parents that you referred to are now at 17 and 22 points, which seems a bit mad to me. Spotting the errors in P&R’s reasoning isn’t really the problem. The problem is to come up with a general decision algorithm that both works (in the sense of making the right decisions) and (if possible) makes epistemic sense.
      
      So far, we know that UDT works but it doesn’t compute or make use of “probability of being at X” so epistemically it doesn’t seem very satisfying. Does TDT give the right answer when applied to this problem? If so, how? (It’s not specified formally enough that I can just apply it mechanically.) Does this problem suggest any improvements or alternative algorithms?
      
      Awesome. I’m steadily upgrading my expected utilities of handing decision-theory problems to Less Wrong.
      
      Again, that seems to imply that the problem is solved, and I don’t quite see how the parent comments have done that.
      What links here?
      Comparison of decision theories (with a focus on logical-counterfactual decision theories) by riceissa (16 Mar 2019 21:15 UTC; 78 points)
      SilasBarta's comment on The Absent-Minded Driver by Wei Dai (23 Sep 2009 12:02 UTC; 0 points)
      SilasBarta's comment on The Absent-Minded Driver by Wei Dai (23 Sep 2009 12:07 UTC; 0 points)
      - SilasBarta 18 Sep 2009 15:53 UTC
        1 point
        Parent
        
        The problem is to come up with a general decision algorithm that both works (in the sense of making the right decisions) and (if possible) makes epistemic sense.
        
        I presented a solution in a comment here which I think satisfies these: It gives the right answer and consistently handles the case of “partial knowledge” about one’s intersection, and correctly characterizes your epistemic condition in the absent-minded case.
        What links here?
        SilasBarta's comment on The Absent-Minded Driver by Wei Dai (23 Sep 2009 12:02 UTC; 0 points)
        SilasBarta's comment on The Absent-Minded Driver by Wei Dai (23 Sep 2009 12:07 UTC; 0 points)
      - entirelyuseless 2 Jul 2017 0:32 UTC
        0 points
        Parent
        I don’t see why the problem is not solved. The probability of being at X depends directly on how I am deciding whether to turn. So I cannot possibly use that probability to decide whether to turn; I need to decide on how I will turn first, and then I can calculate the probability of being at X. This results in the original solution.
        
        This also shows that Eliezer was mistaken in claiming that any algorithm involving randomness can be improved by making it deterministic.
  - justinpombrio 1 Jul 2017 22:14 UTC
    0 points
    Parent
    And then you can correct for the double-counting. When would you like to count your chickens? It’s safe to count them at X or Y.
    
    If you count them at X, then how much payoff do you expect at the end? Relative to when you’ll be counting your payoff, the relative likelihood that you are at X is 1. And the expected payoff if you are at X is p^2 + 4p(1-p). This gives a total expected payoff of P(X) E(X) = 1 (p^2 + 4p(1-p)) = p^2 + 4p(1-p).
    
    If you count them at Y, then you much payoff do you expect at the end? Relative to when you’ll be counting your payoff, the relative likelihood that you are at Y is p. And the expected payoff if you are at Y is p + 4(1-p). This gives a total expected payoff of P(Y) E(Y) = p (p + 4(1-p)) = p^2 + 4(1-p).
    
    I’m annoyed that English requires a tense on all verbs. “You are” above should be tenseness.
    
    EDIT: formatting
  - casebash 23 Apr 2016 10:27 UTC
    0 points
    Parent
    One way to describe this is to note that choosing the action that maximises the expectation of value is not the same as choosing that action that can be expected to produce the most value. So choosing p=0.53 maximises our expectations, not our expectation of production of value.
    - Chris_Leong 22 Jun 2018 13:50 UTC
      5 points
      Parent
      Doesn’t seem to want to let me edit the comment above, but I could have explained this clearer. The figure (8p-6p^2)/(p+1) is actually a weighted mean of Ex and Ey where these are the expected values at X and Y respectively. Specifically, this value is:
      (1*Ex+p*Ey)/(1+p)
      Now, the expected value calculated from the planning optimal decision which is just Ex. We shouldn’t be surprised that the weighted mean is quite a different value.
  - Antisuji 16 Sep 2009 18:32 UTC
    0 points
    Parent
    
    Since you will make the same decision both times, the only coherent state is α=1/(p+1).
    
    I’m curious how you arrived at this. Shouldn’t it be α = (1/2)p + (1 - p) = 1 - p/2? (The other implies that you are a thirder in the Sleeping Beauty Problem—didn’t Nick Bostrum have the last word on that one?) The payoff becomes α[p^2+4p(1-p)] + (1-α)[p+4(1-p)] = (1 - p/2)(4p − 3p^2) + (p/2)(4 − 3p) = 6p - (13/2)p^2 + (3/2)p^3, which has a (local) maximum around p = 0.577.
    
    The conclusion remains the same, of course.
    - PhilGoetz 20 Sep 2009 7:52 UTC
      2 points
      Parent
      alpha = 1/(p+1) because the driver is at Y p times for every 1 time the driver is at X; so times the driver is at X / (times the driver is at X or Y) = 1 / (p+1).
      
      The problem with pengvado’s calculation isn’t double counting. It purports to give an expected payoff when made at X, which doesn’t count the expected payoff at Y. The problem is that it doesn’t really give an expected payoff. alpha purports to be the probability that you are at X; yet the calculation must be made at X, not at Y (where alpha will clearly be wrong). This means we can’t speak of a “probability of being at X”; alpha simply is 1 if we use this equation and believe it gives us an expected value.
      
      Or look at it this way: Before you introduce alpha into the equation, you can solve it and get the actual optimal value for p. Once you introduce alpha into the equation, you guarantee that the driver will have false beliefs some of the time, because alpha = 1/(p+1), and so the driver can’t have the correct alpha both at X and at Y. You have added a source of error, and will not find the optimal solution.
      - casebash 23 Apr 2016 10:44 UTC
        0 points
        Parent
        If you want to find the value of p that leads to the optimal decision you need to look at the impact on expected value of choosing one p or another, not just consider expected value at the end. Currently, it maximises expectations, not value created, with situations where you pass through X and Y being double counted.
    - pengvado 17 Sep 2009 0:46 UTC
      2 points
      Parent
      I’m a “who’s offering the bet”er on Sleeping Beauty (which Bostrom has said is consistent with, though not identical to, his own model). And in this case I specified bets offered and paid separately at each intersection, which corresponds to the thirder conclusion.
- Jonathan_Graehl 17 Sep 2009 0:05 UTC
  0 points
  Parent
  The paper covered that, but good point.