JGWeissman comments on The Absent-Minded Driver

JGWeissman 16 Sep 2009 21:15 UTC
5 points
There are 3 possible sequences of actions:
1. exit at the first intersection
2. continue at the first intersection, exit at the second intersection
3. continue at the first intersection, continue at the third intersection
The payoffs are such that sequence 2 is the best available, but, having complete knowledge of your decision and no knowledge of which intersection you are at makes sequence 2 impossible. By sacrificing that knowledge, you make available the best sequence, though you can no longer be sure you will take it. In this case, the possibility of performing the best sequence is more valuable than full knowledge of which sequence you actually perform.
- SilasBarta 17 Sep 2009 20:59 UTC
  4 points
  Parent
  By the way, I went ahead and calculated what effect probabilistic knowledge of one’s current intersection has on the payoff, so as to know the value of this knowledge. So I calculated the expected return, given that you have a probability r (with 0.5<r<=1) of correctly guessing what intersection you’re in, and choose the optimal path based on whichever is most likely.
  
  In the original problem the max payoff (under p=2/3) is ⁴⁄₃. I found that to beat that, you only need r to be greater than 52.05%, barely better than chance. Alternately, that’s only 0.0012 bits of the 1 bit of information contained by the knowledge of which intersection you’re at! (Remember that if you have less than .0012 bits, you can just revert to the p=2/3 method from the original problem, which is better than trying to use your knowledge.)
  
  Proof: At X, you have probability r of continuing, then at Y you have a probability r of exiting and (1-r) of continuing.
  
  Thus, EU(r) = r(4r + (1-r) ) = r(3r+1).
  
  Then solve for when EU(r)=4/3, the optimum in the fully ignorant case, which is at r = 0.5205.
  What links here?
  - SilasBarta's comment on The Absent-Minded Driver by Wei Dai (17 Sep 2009 21:05 UTC; 3 points)
  - Wei Dai 17 Sep 2009 22:30 UTC
    14 points
    Parent
    You made a mistake here, which is assuming that when you guess you are at X, you should choose CONTINUE with probability 1, and when you guess you are at Y, you should choose EXIT with probability 1. In fact you can improve your expected payoff using a mixed strategy, in which case you can always do better when you have more information.
    
    Here’s the math. Suppose when you are at an intersection, you get a clue that reads either ‘X’ or ‘Y’. This clue is determined by a dice roll at START. With probability .49, you get ‘X’ at both intersections. With probability .49, you get ‘Y’ at both intersections. With probability .02, you get ‘X’ at the X intersection, and ‘Y’ at the Y intersection.
    
    Now, at START, your decision consists of a pair of probabilities, where p is your probability to CONTINUE after seeing ‘X’, and q is your probability to CONTINUE after seeing ‘Y’. Your expected payoff is:
    
    .02 * (p*q + 4*(p*(1-q))) + .49 * (p*p + 4*(p*(1-p))) + .49 * (q*q + 4*(q*(1-q)))
    
    which is maximized at p=0.680556, q=0.652778. And your expected payoff is 1.33389 which is > ⁴⁄₃.
    What links here?
    SilasBarta's comment on The Absent-Minded Driver by Wei Dai (17 Sep 2009 21:05 UTC; 3 points)
    SilasBarta's comment on Anthropic reasoning and correlated decision making by Stuart_Armstrong (24 Sep 2009 20:16 UTC; 1 point)
    - SilasBarta 17 Sep 2009 23:11 UTC
      0 points
      Parent
      Wow, good catch! (In any case, I had realized that if you have probability less than 52.05%, you shouldn’t go with the most likely, but rather, revert the original p=2/3 method at the very least.)
      
      The formula you gave for the mixed strategy (with coefficients .02, .49, .49) corresponds to a 51% probability of guessing right at any given light. (If the probability of guessing right is r, the coefficients should be 2r-1,1-r,1-r.) It actually raises the threshold for which choosing based on the most probable, with no other randomness, becomes the better strategy, but not by much—just to about 52.1%, by my calculations.
      
      So that means the threshold is 0.0013 bits instead of 0.0012 :-P
      
      (Yeah, I did guess and check because I couldn’t think of a better way on this computer.)
      - Wei Dai 17 Sep 2009 23:20 UTC
        4 points
        Parent
        I think you might still be confused, but the nature of your confusion isn’t quite clear to me. Are you saying that if r>52.1%, the best strategy is a pure one again? That’s not true. See this calculation with coefficients .2, .4, .4.
        
        ETA: Also, I think talking about this r, which is supposed to be a guess of “being at X”, is unnecessarily confusing, because how that probability should be computed from a given problem statement, and whether it’s meaningful at all, are under dispute. I suggest thinking in terms of what you should plan to do when you are at START.
        SilasBarta 18 Sep 2009 0:55 UTC
        0 points
        Parent
        
        I think you might still be confused, but the nature of your confusion isn’t quite clear to me. Are you saying that if r>52.1%, the best strategy is a pure one again? That’s not true. See this calculation with coefficients .2, .4, .4.
        
        That yields a payoff of 1.42, which is less than what the pure strategy gives in the equivalent case corresponding to .2/.4/.4, which is a 60% chance of guessing right. Since the payoff is r*(3r+1), the situation you described has a payoff of 1.68 under a pure strategy of choosing based on your best guess.
        
        ETA: Also, I think talking about this r, which is supposed to be a guess of “being at X”, is unnecessarily confusing, because how that probability should be computed from a given problem statement, and whether it’s meaningful at all, are under dispute.
        
        I specifically avoided defining r as the probability of “being at X”; r is the probability of guessing correctly (and therefore of picking the best option as if it were true), whichever signal you’re at, and it’s equivalent to choosing 2r-1,r-1,r-1 as the coefficients in your phrasing. The only thing possibly counterintuitive is that your ignorance maximizes at r=0.5 rather than zero. Less than 50%, and you just flip your prediction.
        Wei Dai 18 Sep 2009 1:08 UTC
        2 points
        Parent
        
        Since the payoff is r*(3r+1), the situation you described has a payoff of 1.68 under a pure strategy of choosing based on your best guess.
        
        No, it doesn’t. This is what I meant by “r” being confusing. Given .2/.4/.4, if you always pick CONTINUE when you see hint ‘X’ and EXIT when you see hint ‘Y’, your expected payoff (computed at START) is actually:
        
        .4 0 + .4 1 + .2 * 4 = 1.2.
        SilasBarta 18 Sep 2009 1:22 UTC
        0 points
        Parent
        Incorrect. Given .2/.4/.4, you will see X 60% of the time at X, and Y 60% of the time at Y. So your payoff, computed at START, is:
        
        .4 * 0 + .6 * (4 *.6 + .4* 1) = 1.68
        
        You seem to be treating .2/.4/.4 as being continue-exit/exit-exit/continue-continue, which isn’t the right way to look at it.
        Wei Dai 18 Sep 2009 1:41 UTC
        2 points
        Parent
        Please go back to what I wrote before (I’ve changed the numbers to .2/.4/.4 below):
        
        Suppose when you are at an intersection, you get a clue that reads either ‘X’ or ‘Y’. This clue is determined by a dice roll at START. With probability .4, you get ‘X’ at both intersections. With probability .4, you get ‘Y’ at both intersections. With probability .2, you get ‘X’ at the X intersection, and ‘Y’ at the Y intersection.
        
        I’ll go over the payoff calculation in detail, but if you’re still confused after this, perhaps we should take it to private messages to avoid cluttering up the comments.
        
        Your proposed strategy is to CONTINUE upon seeing the hint ‘X’ and EXIT upon seeing the hint ‘Y’. With .4 probability, you’ll get ‘Y’ at both intersections, but you EXIT upon seeing the first ‘Y’ so you get 0 payoff in that case. With .4 probability, you get ‘X’ at both intersections, so you choose CONTINUE both times and end up at C for payoff 1. With .2 probability, you get ‘X’ at X and ‘Y’ at Y, so you choose CONTINUE and then EXIT for a payoff of 4, thus:
        
        .4 0 + .4 1 + .2 * 4 = 1.2
        SilasBarta 18 Sep 2009 4:52 UTC
        1 point
        Parent
        I’m not confused; I probably should have stopped you at your original derivation for the partial-knowledge case but didn’t want to check your algebra. And setting up problems like these is important and tricky, so this discussion belongs here.
        
        So, I think the problem with your setup is that you don’t make the outcome space fully symmetric because you don’t have an equal chance of drawing Y at X and X at Y (compared to your chance of drawing X at X and Y at Y).
        
        To formalize it for the general case of partial knowledge, plus probabilistic knowledge given action, we need to look at four possibilities: Drawing XY, XX, YY, and YX, only the first of which is correct. If, as I defined it before, the probability of being right at any given exit is r, the corresponding probabilities are: r^2, r(1-r), r(1-r), and (1-r)(1-r).
        
        So then I have the expected payoff as a function of p, q, and r as:
        
        (r^2)(p*q + 4p*(1 - q)) + r(1 - r)(p^2 + 4p(1 - p)) +r(1 - r)(q^2 + 4q(1 - q)) + (1 - r)(1 - r)(p*q + 4q(1 - p))
        
        This nicely explains the previous results:
        
        The original problem is the case of complete ignorance, r=1/2, which has a maximum ⁴⁄₃ where p and q are such that they average out to choosing “continue” at one’s current intersection ²⁄₃ of the time. (And this, I think, shows you how to correctly answer while explicitly and correctly representing your probability of being at a given intersection.)
        
        The case of (always) continuing on (guessing) X and not continuing on (guessing) Y corresponds to p=1 and q=0, which reduces to r*(3r+1), the equation I originally had.
        
        Furthermore, it shows how to beat the payoff of ⁴⁄₃ when your r is under 52%. For 51% (the original case you looked at), the max payoff is 1.361 {p=1, q=0.306388} (don’t know how to show it on Alpha, since you have to constrain p and q to 0 thru 1).
        
        Also, it shows I was in error about the 52% threshold, and the mixed strategy actually dominates all the way up to about r = 61%, at which point of course p=1 and q has shrunk to zero (continue when you see X, exit when you see Y). This corresponds to 32 millibits, much higher than my earlier estimate of 1.3 millibits.
        
        Interesting!
        What links here?
        SilasBarta's comment on Anthropic reasoning and correlated decision making by Stuart_Armstrong (24 Sep 2009 20:16 UTC; 1 point)
        SilasBarta's comment on The Absent-Minded Driver by Wei Dai (18 Sep 2009 15:53 UTC; 1 point)
        SilasBarta's comment on The Absent-Minded Driver by Wei Dai (23 Sep 2009 21:30 UTC; 0 points)
        SilasBarta 22 Sep 2009 21:56 UTC
        −10 points
        Parent
        What a crock. I presented my reasoning clearly and showed how it seamlessly and correctly handles the various nuances of the situation, including partial knowledge. If I’m wrong, I’m wrong for a non-obvious reason, and no, Wei_Dai hasn’t shown what’s wrong with this specific handling of the problem.
        
        Whoever’s been modding me down on this thread, kindly explain yourself. And if that person is Wei_Dai: shame on you. Modding is not a tool for helping you win arguments.
        What links here?
        Z_M_Davis's comment on The Absent-Minded Driver by Wei Dai (23 Sep 2009 3:38 UTC; 5 points)
        Expand this thread
        Z_M_Davis 22 Sep 2009 23:03 UTC
        7 points
        Parent
        Downvoted for complaining about being downvoted and for needless speculation about the integrity of other commenters. (Some other contributions to this thread have been upvoted.)
        SilasBarta 22 Sep 2009 23:09 UTC
        −4 points
        Parent
        I’m not complaining about being downvoted. I’m complaining about
        
        a) being downvoted
        
        b) on an articulate, relevant post
        
        c) without an explanation
        
        In the absence of any one of those, I wouldn’t complain. I would love to hear where I’m wrong, because it’s far from obvious. (Yes, the exchange seems tedious and repetitive, but I present new material here.)
        
        And I wasn’t speculating; I was just reminding the community of the general lameness of downvoting someone you’re in an argument with, whether or not that’s Wei_Dai.
        anonym 23 Sep 2009 3:45 UTC
        8 points
        Parent
        Imagine the noise if everybody complained whenever they were downvoted and believed that all 3 of your criteria were applicable.
        SilasBarta 23 Sep 2009 4:10 UTC
        −4 points
        Parent
        I’m not going by my beliefs. Take yours, or the proverbial “reasonable person’s” judgment. Would you or that person judge b) as being true?
        
        Are a) and c) in dispute? Again, my concern is actually not with being downmodded (I would have dropped this long ago if it were); it’s with the lack of an explanation. If no one can be bothered to respond to such a post that spells out its reasoning so clearly and claims to have solved the dilemma—fine, but leave it alone. If you’re going to make the effort, try to make sense too.
        wedrifid 23 Sep 2009 6:44 UTC
        2 points
        Parent
        
        And I wasn’t speculating; I was just reminding the community of the general lameness of downvoting someone you’re in an argument with, whether or not that’s Wei_Dai.
        
        I’m far more likely to downvote someone I’m in an argument with. Mostly because I am actually reading their posts in detail and am far more likely to notice woo.
        SilasBarta 23 Sep 2009 12:12 UTC
        −8 points
        Parent
        Then why not just vote up your own comments? After all, you must have even more insight into those, right? It’s not like you’re going to be swayed by personal investment in not losing face or anything.
        
        Yeah, I know, there’s that pesky thing about how you can’t upvote your own comments. Pff. There’s no such thing as fair, right? Just use a different account. Sheesh.
        Vladimir_Nesov 23 Sep 2009 20:52 UTC
        3 points
        Parent
        You must be an inherently angry person. Or an evil mutant. Something like that =)
        wedrifid 23 Sep 2009 12:56 UTC
        1 point
        Parent
        In cases where I believe a post of mine has been unjustly downvoted the only thing stopping me from creating another account and upvoting myself is that I just don’t care enough to bother. Of course if there was any particular challenge involved in gaming the system in that way then that would perhaps be incentive enough...
        SilasBarta 23 Sep 2009 2:34 UTC
        −8 points
        Parent
        Okay, so far that’s 3-4 people willing to mod me down, zero people willing to point out the errors in a clearly articulated post.
        
        I’m sure we can do better than that, can’t we, LW?
        
        If it’s that bad, I’m sure one of you can type the two or three sentences necessary to effortlessly demolish it.
        
        ETA: Or not.
        Z_M_Davis 23 Sep 2009 3:38 UTC
        5 points
        Parent
        
        Okay, so far that’s 3-4 people willing to mod me down, zero people willing to point out the errors in a clearly articulated post.
        
        This seems like a non-sequitur to me. It’s your comment of 22 September 2009 09:56:05PM that’s sitting at −4; none of your clear and articulate responses to Dai have negative scores anymore.
        SilasBarta 23 Sep 2009 4:05 UTC
        −5 points
        Parent
        No non-sequitur. That’s still, um, zero explanation for the errors in a post that resolves all the issues of the AMD problem, and still at least 4 people modding me down for requesting that a downmod for that kind of post come with some sort of explanation.
        
        If there’s a non-sequitur, it’s the fact that the unjustified downmods were only corrected after I complained about them, and I got downmodded even more than before, and this sequence of events is used to justify the claim that my comments have gotten what they deserved.
        Douglas_Knight 23 Sep 2009 4:41 UTC
        9 points
        Parent
        1 or 2 people downmod you and you devote 6 posts to whining about it? This is a broadcast medium. Of course the 5 people who voted you down for wasting their time aren’t going to explain why the first 1 or 2 people didn’t like the first post.
        
        a post that resolves all the issues of the AMD problem
        
        It didn’t say that to me. So much for articulate.
        
        If it’s oh so important, don’t leave it buried at the bottom a thread of context. Write something new. Why should we care about your parameter, rather than Wei Dai’s? Why should we care about any parameter?
        What links here?
        SilasBarta's comment on Why the beliefs/values dichotomy? by Wei Dai (27 Oct 2009 15:26 UTC; 0 points)
        SilasBarta 23 Sep 2009 12:07 UTC
        0 points
        Parent
        
        If it’s oh so important, don’t leave it buried at the bottom a thread of context.
        
        It may surprise you to note that I linked to the comment from a very visible place in the discussion.
        
        Why should we care about your parameter, rather than Wei Dai’s? Why should we care about any parameter?
        
        Because Wei Dai asked for how to generate a solution that makes epistemic sense, and mine was the only one that accurately incorporated the concept of “probability of being at a given intersection”.
        
        And of course, Wei_Dai saw fit to use the p q r parameters just the same.
        Douglas_Knight 23 Sep 2009 22:57 UTC
        1 point
        Parent
        
        If it’s oh so important, don’t leave it buried at the bottom a thread of context.
        
        It may surprise you to note that I linked to the comment from a very visible place in the discussion.
        
        Exhuming it and putting it on display doesn’t solve the problem of context. People who clicked through (I speak from experience) didn’t see how it did what the link said it did. It was plausible that if I reread the thread it would mean something, but my vague memory of the thread was that it went off in a boring direction.
        
        My questions were not intended for you to answer here, yet further removed from the context where one might care about their answers. If you write something self-contained that explains why I should care about it, I’ll read it.
        SilasBarta 23 Sep 2009 23:09 UTC
        −6 points
        Parent
        So you were interested in seeing the solution, but not looking at the context of the thread for anything that wasn’t familiar? Doesn’t sound like much of an interest to me. If I had repeated myself with a separate self-contained explanation, you would be whining that I’m spamming the same thing all over the place.
        
        You weren’t aware that I put a prominent link to the discussion that resolves all the thread’s issues. That’s okay! Really! You don’t need to cover it up by acting like you knew about it all along. Wei_Dai cares about the new parameters q and r. The post I linked to explains what it accomplishes. Now, think up a new excuse.
        SilasBarta 23 Sep 2009 12:02 UTC
        0 points
        Parent
        
        If it’s oh so important, don’t leave it buried at the bottom a thread of context.
        
        It may surprise you to note that I linked to the comment from a very visible place in the discussion.
        
        Why should we care about your parameter, rather than Wei Dai’s? Why should we care about any parameter?
        
        Because Wei Dai asked for how to generate a solution that makes epistemic sense, and mine was the only one that accurately incorporated the concept of “probability of being at a given intersection”.
        
        And of course, Wei_Dai saw fit to use the p q r parameters just the same.
        wedrifid 23 Sep 2009 6:33 UTC
        2 points
        Parent
        No ‘justification’ necessary. ‘Deserved’ is irrelevant and there is no such thing as ‘fair’.
        
        If I didn’t accept that then LessWrong (and most of the universe) would be downright depressing. People are (often) stupid and votes here represent a very different thing than reward for insight.
        
        Just hold the unknown down-voters in silent contempt briefly then go along with your life. Plenty more upvotes will come. And you know, the votes that my comments receive don’t seem to be all that correlated with the quality of the contributed insight. One reason for this is that the more elusive an insight is the less likely it is to agree with what people already think. This phenomenon more or less drives status assignment in academia. The significance of votes I get here rather pales in comparison.
        
        Douglas makes a good suggestion. Want people to appreciate (or even just comprehend) what you are saying? Put in some effort to make a top level post. Express the problem, provide illustrations (verbal or otherwise) and explain your solution. You’ll get all sorts of status.
        SilasBarta 23 Sep 2009 12:17 UTC
        −2 points
        Parent
        In the past, I took a comment seriously from you that was satire. Is this one of those, too? Sometimes it’s hard to tell.
        
        If it’s serious, then my answer is that whatever “clutter” my comment here gave, it would give even more as a top level post, which probably can’t give more explanation than my post already did.
        
        By the way, just a “heads-up”: I count 6+ comments from others on meta-talk, 8+ down-mods, and 0 explanations for the errors in my solution. Nice work, guys.
        Z_M_Davis 23 Sep 2009 16:14 UTC
        13 points
        Parent
        
        I count 6+ comments from others on meta-talk, 8+ down-mods, and 0 [sic] explanations for the errors in my solution. Nice work, guys.
        
        If it is in fact the case that your complaints are legitimately judged a negative contribution, then you should expect to be downvoted and criticized on those particular comments, regardless of whether or not your solution is correct. There’s nothing contradictory about simultaneously believing both that your proposed solution is correct, and that your subsequent complaints are a negative contribution.
        
        I don’t feel like taking the time to look over your solution. Maybe it’s perfect. Wonderful! Spectacular! This world becomes a little brighter every time someone solves a math problem. But could you please, please consider toning down the hostility just a bit? These swipes at other commenters’ competence and integrity are really unpleasant to read.
        
        ADDENDUM: Re tone, consider the difference between “I wonder why this was downvoted, could someone please explain?” (which is polite) and “What a crock,” followed by shaming a counterfactual Wei Dai (which is rude).
        SilasBarta 23 Sep 2009 17:05 UTC
        −2 points
        Parent
        
        If it is in fact the case that your complaints are legitimately judged a negative contribution, then you should expect to be downvoted and criticized on those particular comments, regardless of whether or not your solution is correct. There’s nothing contradictory about simultaneously believing both that your proposed solution is correct, and that your subsequent complaints are a negative contribution.
        
        Actually, there is something contradictory when those whiny comments were necessary for the previous, relevant comments to get their deserved karma. Your position here is basically: “Yeah, we were wrong to accuse you of those crimes, but you were still a jerk for pulling all that crap about ‘I plead not guilty!’ and ‘I didn’t do it!’, wah, wah, wah...”
        
        At the very least, it should buy me more leeway than get for such a tone in isolation.
        
        But could you please, please consider toning down the hostility just a bit?
        
        Sure thing.
        
        I trust that other posters will be more judicious with their voting and responses as well.
        wedrifid 23 Sep 2009 13:14 UTC
        2 points
        Parent
        No satire. I just don’t find expecting the universe to ‘behave itself’ according to my ideals to be particularly pleasant so I don’t wast my emotional energy on it.
        
        I have upvoted your comment because it appears to be a useful (albeit somewhat information dense) contribution. I personally chose not to downvote your complaints because I do empathise with your frustration even if I don’t think your complaints are a useful way to get you what you want.
        orthonormal 23 Sep 2009 4:43 UTC
        6 points
        Parent
        I was the one who downvoted the parent, because you criticized Wei Dai’s correct solution by arguing about a different problem than the one you agreed to several comments upstream.
        
        Yes, you’d get a different solution if you assumed that the random variable for information gave independent readings at X and Y, instead of being engineered for maximum correlation. But that’s not the problem Wei Dai originally stated, and his solution to the original problem is unambiguously correct. (I suspect, but haven’t checked, that a mixed strategy beats the pure one on your problem setup as well.)
        
        I simply downvoted rather than commented, because (a) I was feeling tired and (b) your mistake seemed pretty clear to me. I don’t think that was a violation of LW custom.
        What links here?
        Z_M_Davis's comment on The Absent-Minded Driver by Wei Dai (23 Sep 2009 16:14 UTC; 13 points)
        SilasBarta 23 Sep 2009 11:55 UTC
        1 point
        Parent
        
        I was the one who downvoted the parent, because you criticized Wei Dai’s correct solution by arguing about a different problem than the one you agreed to several comments upstream
        
        I didn’t change the problem; I pointed out that he hadn’t been appropriately representing the existing problem when trying to generalize it to partial information. Having previously agreed with his (incorrect) assumptions in no way obligates me to persist in my error, especially when the exchange makes it clear!
        
        his solution to the original problem is unambiguously correct. (I suspect, but haven’t checked, that a mixed strategy beats the pure one on your problem setup as well.)
        
        Which original problem? (If the ABM problem as stated, then my solution is gives the same p=2/3 result. If it’s the partial knowledge variant, Wei_Dei doesn’t have an unambiguously correct solution when he fails to include the possibility of picking Y at X and X at Y like he did for the reverse.) Further, I do have the mixed strategy dominating—but only up to r = 61%. Feel free to find an optimum where one of p and q is not 1 or 0 while r is greater than 61%.
        
        Yes, you’d get a different solution if you assumed that the random variable for information gave independent readings at X and Y, instead of being engineered for maximum correlation.
        
        That wasn’t the reason for our different solutions.
        
        I simply downvoted rather than commented, because (a) I was feeling tired and (b) your mistake seemed pretty clear to me.
        
        Well, I hope you’re no longer tired, and you can check my approach one more time.
        loqi 25 Sep 2009 7:07 UTC
        0 points
        Parent
        I’m pretty sure Wei Dai is correct. I’ll try and explain it differently. Here’s a rendering of the problem in some kind of pseudolisp:
        
        (start (p q) (0.4 "uninformative-x" (p "continue" (p "continue" 1) (else "exit" 4)) (else "exit" 0)) (0.4 "uninformative-y" (q "continue" (q "continue" 1) (else "exit" 4)) (else "exit" 0)) (0.2 "informative" (p "continue" (q "continue" 1) (else "exit" 4)) (else "exit" 0)))
        Now evaluate with the strategy under discussion, (start 1 0):
        
        (0.4 "uninformative-x" (1 "continue" (1 "continue" 1) (0 "exit" 4)) (0 "exit" 0)) (0.4 "uninformative-y" (0 "continue" (0 "continue" 1) (1 "exit" 4)) (1 "exit" 0)) (0.2 "informative" (1 "continue" (0 "continue" 1) (1 "exit" 4)) (0 "exit" 0))
        Prune the zeros:
        
        (0.4 "uninformative-x" (1 "continue" (1 "continue" 1))) (0.4 "uninformative-y" (1 "exit" 0)) (0.2 "informative" (1 "continue" (1 "exit" 4)))
        Combine the linear paths:
        
        (0.4 "uninformative-x/continue/continue" 1) (0.4 "uninformative-y/exit" 0) (0.2 "informative/continue/exit" 4)
        
        You seem to be treating .2/.4/.4 as being continue-exit/exit-exit/continue-continue, which isn’t the right way to look at it.
        
        I’d be interested in seeing what you think is wrong with the above derivation, ideally in terms of the actual decision problem at hand. Remember, p and q are decision parameters. They parameterize an agent, not an expectation. When p and q are 0 or 1, the agent is essentially a function of type “Bool → Bool”. How could a stateless agent of that type implement a better strategy than limiting itself to those three options?
        SilasBarta 25 Sep 2009 12:26 UTC
        0 points
        Parent
        Again, what’s wrong with that derivation is it leaves out the possibility of “disinformative”, and therefore assumes more knowledge about your intersection than you can really have. (By zeroing the probability of “Y then X” it concentrates the probability mass in a way that decreases the entropy of the variable more than your knowledge can justify.)
        
        In writing the world-program in a way that categorizes your guess as “informative”, you’re implicitly assuming some memory of what you drew before: “Okay, so now I’m on the second one, which shows the Y-card …”
        
        Now, can you explain what’s wrong with my derivation?
        loqi 25 Sep 2009 18:00 UTC
        1 point
        Parent
        
        Again, what’s wrong with that derivation is it leaves out the possibility of “disinformative”
        
        By “disinformative”, do you mean that intersection X has hint Y and vice versa? This is not possible in the scenario Wei Dai describes.
        
        In writing the world-program in a way that categorizes your guess as “informative”
        
        Ah, this seems to be a point of confusion. The world program does not categorize your guess, at all. The “informative” label in my derivation refers to the correctness of the provided hints. Whether or not the hints are both correct is a property of the world.
        
        you’re implicitly assuming some memory of what you drew before: “Okay, so now I’m on the second one, which shows the Y-card …”
        
        No, I am merely examining the possible paths from the outside. You seem to be confusing the world program with the agent. In the “informative/continue/exit” case, I am saying “okay, so now the driver is on the second one”. This does not imply that the driver is aware of this fact.
        
        Now, can you explain what’s wrong with my derivation?
        
        I think so. You’re approaching the problem from a “first-person perspective”, rather than using the given structure of the world, so you’re throwing away conditional information under the guise of implementing a stateless agent. But the agent can still look at the entire problem ahead of time and make a decision incorporating this information without actually needing to remember what’s happened once he begins.
        
        At the first intersection, the state space of the world (not the agent) hasn’t yet branched, so your approach gives the correct answer. At the second intersection, we (the authors of the strategy, not the agent) must update your “guess odds” conditional on having seen X at the first intersection.
        
        Your tree was:
        
        (0.4 "exit" 0) (0.6 "continue" (0.6 "exit" 4) (0.4 "continue" 1))
        The outer probabilities are correct, but the inner probabilities haven’t been conditioned on seeing X at the first intersection. Two out of three times that the agent sees X at the first intersection, he will see X again at the second intersection. So, assuming the p=1 q=0 strategy, the statement “Given .2/.4/.4, you will see Y 60% of the time at Y” is false.
        Expand this thread
        SilasBarta 25 Sep 2009 19:51 UTC
        2 points
        Parent
        
        You’re approaching the problem from a “first-person perspective”, rather than using the given structure of the world, so you’re throwing away conditional information under the guise of implementing a stateless agent. But the agent can still look at the entire problem ahead of time and make a decision incorporating this information without actually needing to remember what’s happened once he begins.
        
        Okay, this is where I think the misunderstanding is. When I posited the variable r, I posited it to mean the probability of correctly guessing the intersection. In other words, you receive information at that point such that it moves your estimate of which intersection you’re at, accounting for other inferences you may have made about the problem, including from examining it from the outside and setting your p, to r. So the way the r is defined, it screens off knowledge gained from deciding to use p and q.
        
        Now, this might not be a particularly relevant generalization of the problem, I now grant that. But under the premises, it’s correct. A better generalization would be to find out your probability distribution across X and Y (given your choice of p), and then assume someone gives you b bits of evidence (decrease in the KL Divergence of your estimate from the true distribution), and find the best strategy from there.
        
        And for that matter Wei_Dai’s solution, given his way of incorporating partial knowledge of one’s intersection, is also correct, and also probably not the best way to generalize the problem because it basically asks, “what strategy should you pick, given that you have a probably t of not being an absent-minded driver, and a probability 1 - t of being an absent-minded driver?”
        loqi 25 Sep 2009 21:18 UTC
        0 points
        Parent
        
        And for that matter Wei_Dai’s solution, given his way of incorporating partial knowledge of one’s intersection, is also correct
        
        Thanks, this clarifies the state of the discussion. I was basically arguing against the assertion that it was not.
        
        and also probably not the best way to generalize the problem because it basically asks, “what strategy should you pick, given that you have a probably t of not being an absent-minded driver, and a probability 1 - t of being an absent-minded driver?”
        
        I don’t think I understand this. The resulting agent is always stateless, so it is always an absent-minded driver.
        
        Are you looking for a way of incorporating information “on-the-fly” that the original strategy couldn’t account for? I could be missing something, but I don’t see how this is possible. In order for some hint H to function as useful information, you need to have estimates for P(H|X) and P(H|Y) ahead of time. But with these estimates on hand, you’ll have already incorporated them into your strategy. Therefore, your reaction to the observation of H or the lack thereof is already determined. And since the agent is stateless, the observation can’t affect anything beyond that decision.
        
        It seems that there is just “no room” for additional information to enter this problem except from the outside.
- SilasBarta 16 Sep 2009 22:17 UTC
  0 points
  Parent
  
  In this case, the possibility of performing the best sequence is more valuable than full knowledge of which sequence you actually perform.
  
  Okay, that’s a more specific, helpful answer.