loqi comments on The Absent-Minded Driver

loqi 25 Sep 2009 7:07 UTC
0 points
I’m pretty sure Wei Dai is correct. I’ll try and explain it differently. Here’s a rendering of the problem in some kind of pseudolisp:
```
(start (p q)
  (0.4 "uninformative-x"
    (p "continue"
      (p "continue" 1)
      (else "exit" 4))
    (else "exit" 0))
  (0.4 "uninformative-y"
    (q "continue"
      (q "continue" 1)
      (else "exit" 4))
    (else "exit" 0))
  (0.2 "informative"
    (p "continue"
      (q "continue" 1)
      (else "exit" 4))
    (else "exit" 0)))
```
Now evaluate with the strategy under discussion, (start 1 0):
```
(0.4 "uninformative-x"
  (1 "continue"
    (1 "continue" 1)
    (0 "exit" 4))
  (0 "exit" 0))
(0.4 "uninformative-y"
  (0 "continue"
    (0 "continue" 1)
    (1 "exit" 4))
  (1 "exit" 0))
(0.2 "informative"
  (1 "continue"
    (0 "continue" 1)
    (1 "exit" 4))
  (0 "exit" 0))
```
Prune the zeros:
```
(0.4 "uninformative-x"
  (1 "continue"
    (1 "continue" 1)))
(0.4 "uninformative-y"
  (1 "exit" 0))
(0.2 "informative"
  (1 "continue"
    (1 "exit" 4)))
```
Combine the linear paths:
```
(0.4 "uninformative-x/continue/continue" 1)
(0.4 "uninformative-y/exit" 0)
(0.2 "informative/continue/exit" 4)
```
You seem to be treating .2/.4/.4 as being continue-exit/exit-exit/continue-continue, which isn’t the right way to look at it.

I’d be interested in seeing what you think is wrong with the above derivation, ideally in terms of the actual decision problem at hand. Remember, p and q are decision parameters. They parameterize an agent, not an expectation. When p and q are 0 or 1, the agent is essentially a function of type “Bool → Bool”. How could a stateless agent of that type implement a better strategy than limiting itself to those three options?
- SilasBarta 25 Sep 2009 12:26 UTC
  0 points
  Parent
  Again, what’s wrong with that derivation is it leaves out the possibility of “disinformative”, and therefore assumes more knowledge about your intersection than you can really have. (By zeroing the probability of “Y then X” it concentrates the probability mass in a way that decreases the entropy of the variable more than your knowledge can justify.)
  
  In writing the world-program in a way that categorizes your guess as “informative”, you’re implicitly assuming some memory of what you drew before: “Okay, so now I’m on the second one, which shows the Y-card …”
  
  Now, can you explain what’s wrong with my derivation?
  - loqi 25 Sep 2009 18:00 UTC
    1 point
    Parent
    Again, what’s wrong with that derivation is it leaves out the possibility of “disinformative”
    
    By “disinformative”, do you mean that intersection X has hint Y and vice versa? This is not possible in the scenario Wei Dai describes.
    
    In writing the world-program in a way that categorizes your guess as “informative”
    
    Ah, this seems to be a point of confusion. The world program does not categorize your guess, at all. The “informative” label in my derivation refers to the correctness of the provided hints. Whether or not the hints are both correct is a property of the world.
    
    you’re implicitly assuming some memory of what you drew before: “Okay, so now I’m on the second one, which shows the Y-card …”
    
    No, I am merely examining the possible paths from the outside. You seem to be confusing the world program with the agent. In the “informative/continue/exit” case, I am saying “okay, so now the driver is on the second one”. This does not imply that the driver is aware of this fact.
    
    Now, can you explain what’s wrong with my derivation?
    
    I think so. You’re approaching the problem from a “first-person perspective”, rather than using the given structure of the world, so you’re throwing away conditional information under the guise of implementing a stateless agent. But the agent can still look at the entire problem ahead of time and make a decision incorporating this information without actually needing to remember what’s happened once he begins.
    
    At the first intersection, the state space of the world (not the agent) hasn’t yet branched, so your approach gives the correct answer. At the second intersection, we (the authors of the strategy, not the agent) must update your “guess odds” conditional on having seen X at the first intersection.
    
    Your tree was:
    
    (0.4 "exit" 0) (0.6 "continue" (0.6 "exit" 4) (0.4 "continue" 1))
    The outer probabilities are correct, but the inner probabilities haven’t been conditioned on seeing X at the first intersection. Two out of three times that the agent sees X at the first intersection, he will see X again at the second intersection. So, assuming the p=1 q=0 strategy, the statement “Given .2/.4/.4, you will see Y 60% of the time at Y” is false.
    - SilasBarta 25 Sep 2009 19:51 UTC
      2 points
      Parent
      
      You’re approaching the problem from a “first-person perspective”, rather than using the given structure of the world, so you’re throwing away conditional information under the guise of implementing a stateless agent. But the agent can still look at the entire problem ahead of time and make a decision incorporating this information without actually needing to remember what’s happened once he begins.
      
      Okay, this is where I think the misunderstanding is. When I posited the variable r, I posited it to mean the probability of correctly guessing the intersection. In other words, you receive information at that point such that it moves your estimate of which intersection you’re at, accounting for other inferences you may have made about the problem, including from examining it from the outside and setting your p, to r. So the way the r is defined, it screens off knowledge gained from deciding to use p and q.
      
      Now, this might not be a particularly relevant generalization of the problem, I now grant that. But under the premises, it’s correct. A better generalization would be to find out your probability distribution across X and Y (given your choice of p), and then assume someone gives you b bits of evidence (decrease in the KL Divergence of your estimate from the true distribution), and find the best strategy from there.
      
      And for that matter Wei_Dai’s solution, given his way of incorporating partial knowledge of one’s intersection, is also correct, and also probably not the best way to generalize the problem because it basically asks, “what strategy should you pick, given that you have a probably t of not being an absent-minded driver, and a probability 1 - t of being an absent-minded driver?”
      - loqi 25 Sep 2009 21:18 UTC
        0 points
        Parent
        
        And for that matter Wei_Dai’s solution, given his way of incorporating partial knowledge of one’s intersection, is also correct
        
        Thanks, this clarifies the state of the discussion. I was basically arguing against the assertion that it was not.
        
        and also probably not the best way to generalize the problem because it basically asks, “what strategy should you pick, given that you have a probably t of not being an absent-minded driver, and a probability 1 - t of being an absent-minded driver?”
        
        I don’t think I understand this. The resulting agent is always stateless, so it is always an absent-minded driver.
        
        Are you looking for a way of incorporating information “on-the-fly” that the original strategy couldn’t account for? I could be missing something, but I don’t see how this is possible. In order for some hint H to function as useful information, you need to have estimates for P(H|X) and P(H|Y) ahead of time. But with these estimates on hand, you’ll have already incorporated them into your strategy. Therefore, your reaction to the observation of H or the lack thereof is already determined. And since the agent is stateless, the observation can’t affect anything beyond that decision.
        
        It seems that there is just “no room” for additional information to enter this problem except from the outside.