What I’m more interested in is: doesn’t the UDT/planning-optimal solution imply that injecting randomness can improve an algorithm, which is a big no-no? Because you’re saying (and you’re right AFAICT) that the best strategy is to randomly choose whether to continue, with a bias in favor of continuing.
Also, could someone go through the steps of how UDT generates this solution, specifically, how it brings to your attention the possibility of expressing the payoff as a function of p? (Sorry, but I’m a bit of a straggler on these decision theory posts.)
imply that injecting randomness can improve an algorithm
Hm. This would actually appear to be a distinct class of cases in which randomness is “useful”, but it’s important to note that a simple deterministic algorithm would do better if we were allowed to remember our own past actions—i.e. this is a very special case. I should probably think about it further, but offhand, it looks like the reason for randomized-optimality is that the optimal deterministic algorithm has been prohibited in a way that makes all remaining deterministic algorithms stupid.
More generally, game theory often produces situations where randomization is clearly the rational strategy when you do not know with certainty the other player’s action. The example given in this post is straightforwardly convertible into a game involving Nature, as many games do, where Nature plays first and puts you into one of two states; if you are in X and play Continue, Nature plays again and the game loops. The solution to that game is the same as that derived above, I believe.
The point is, for a broad class of games/decision problems where Nature has a substantial role, we might run into similar issues that can be resolved a similar way.
I think the motivation for this problem is that memory in general is a limited resource, and a decision theory should be able to handle cases were recall is imperfect. I don’t believe that there was a deliberate attempt to prohibit the optimal deterministic algorithm in order to make randomized algorithms look good.
I don’t think that resolves the issue. As I demonstrated in this comment, if you have some probabilistic knowledge of which intersection you’re at, you can do better than the p=2/3 method. Specifically, as long as you have 0.0012 bits of information about which intersection you’re at (i.e. assign a greater than 52% chance of guessing correctly), you’re better off choosing based on what seems most likely.
However—and this is the kicker—that means that if you have between 0 and 0.0012 bits of information about your intersection, you’re best off throwing that information away entirely and going with the method that’s optimal for when you’re fully forgetful. So it’s still a case where throwing away information helps you.
ETA: False alarm; Wei_Dai corrects me here—you can still use your knowledge to do better than 4⁄3 when your probability of guessing right is between 50% and 52.05%.
It’s probably better not to think of this as a randomized algorithm. Here is a simpler example of what I mean.
Suppose you have two urns in front of you. One urn is full of N white marbles, and the other urn is empty. Your task is to take a marble out of the first urn, paint it either red or blue, place it in the second urn, and then repeat this process until the first urn is empty. Moreover, when you are done, something very close to two-thirds of the marbles in the second urn must be red.
The catch, of course, is that you have very poor short-term memory, so you can never remember how many marbles you’ve painted or what colors you’ve painted them.
The “randomized algorithm” solution would be to use a pseudo-random number generator to produce a number x between 0 and 1 for each marble, and to paint that marble red if and only if x < 2⁄3.
But there is a non-random way to think of that procedure. Suppose instead that, before you start painting your N marbles, you set out a box of N poker chips, of which you know (that is, have reason to be highly confident) that very nearly two-thirds are red. You then proceed to paint marbles according to the following algorithm. After taking a marble in hand, you select a poker chip non-randomly from the box, and then paint the marble the same color as that poker chip.
This is a non-random algorithm that you can use with confidence, but which requires no memory. And, as I see it, the method with the pseudo-random number generator amounts to the same thing. By deciding to use the generator, you are determining N numbers: the next N numbers that the generator will produce. Moreover, if you know how the generator is constructed, you know (that is, have reason to be highly confident) that very nearly two-thirds of those numbers will be less than 2⁄3. To my mind, this is functionally identical to the poker chip procedure.
This is a non-random algorithm that you can use with confidence, but which requires no memory.
You mean it does not require memory in your brain, because you implemented your memory with the poker chips. It is quite convenient they were available.
My point is that it’s no more convenient than having the pseudo-random number generator available. I maintain that the generator is implementing your memory in functionally the same sense. For example, you are effectively guaranteed not to get the same number twice, just as you are effectively guaranteed not to get the same poker chip twice.
ETA: After all, something in the generator must be keeping track of the passage of the marbles for you. Otherwise the generator would keep producing the same number over and over.
Rather than using a PRNG (which, as you say, requires memory), you could use a source of actual randomness (e.g. quantum decay). Then you don’t really have extra memory with the randomized algorithm, do you?
I thought of this as well, but it does not really matter because it is the ability to produce the different output in each case event that gives part of the functionality of memory, that is, the ability to distinguish between events. Granted, this is not as effective as deterministically understood memory, where you know in advance which output you get at each event. Essentially, it is memory with the drawback that you don’t understand how it works to the extent that you are uncertain how it correlates with what you wanted to remember.
After all, something in the generator must be keeping track of the passage of the marbles for you. Otherwise the generator would keep producing the same number over and over.
Randomness ‘degenerates’ (perhaps by action of a malicious daemon) into non-randomness, and so it can do better and no worse than a non-random approach?
(If the environments and agents are identical down to the source of randomness, then the agent defaults to a pure strategy; but with ‘genuine’ randomness, random sources that are different between instances of the agent, the agent can actually implement the better mixed strategy?)
I’m having trouble parsing your questions. You have some sentences that end in question marks. Are you asking whether I agree with those sentences? I’m having trouble understanding the assertions made by those sentences, so I can’t tell whether I agree with them (if that was what you were asking).
The claim that I was making could be summed up as follows. I described an agent using a PRNG to solve a problem involving painting marbles. The usual way to view such a solution is as
deterministic amnesiac agent PLUS randomness.
My suggestion was instead to view the solution as
deterministic amnesiac agent
PLUS
a particular kind of especially limited memory
PLUS
an algorithm that takes the contents of that memory as input and produces an output that is almost guaranteed to have a certain property.
The especially limited memory is the part of the PRNG that remembers what the next seed should be. If there weren’t some kind of memory involved in the PRNG’s operation, the PRNG would keep using the same seed over and over again, producing the same “random” number again and again.
The algorithm is the algorithm that the PRNG uses to turn the first seed into a sequence of pseudo-random numbers.
The certain property of that sequence is the property of having two-thirds of its terms being less than 2⁄3.
I maintain that the generator is implementing your memory in functionally the same sense.
That is fair enough, though the reason I find scenario at all interesting is that it illustrates the utility of a random strategy under certain conditions.
For me, finding an equivalent nonrandom strategy helps to dispel confusion.
I like your characterization above that the PRNG is “memory with the drawback that you don’t understand how it works to the extent that you are uncertain how it correlates with what you wanted to remember.” Another way to say it is that the PRNG gives you exactly what you need with near-certainty, while normal memory gives you extra information that happens to be useless for this problem.
What is “random” about the PRNG (the exact sequence of numbers) is extra stuff that you happen not to need. What you need from the PRNG (N numbers, of which two-thirds are less than 2⁄3) is not random but a near-certainty. So, although you’re using a so-called pseudo-random number generator, you’re really using an aspect of it that’s not random in any significant sense. For this reason, I don’t think that the PRNG algorithm should be called “random”, any more than is the poker chip algorithm.
Very clever! It is indeed true that if you forget all previous marble paintings, the best way to ensure that 2⁄3 get painted one color is to paint it that color with p = 2⁄3.
And interestingly, I can think of several examples of my own life when I’ve been in that situation. For example, when I’m playing Alpha Centauri, I want to make sure I have a good mix of artillery, infantry, and speeders, but it’s tedious to keep track of how many I have of each, so I just pick in a roughly random way, but biased toward those that I want in higher proportions.
I’m going to see if I can map the urn/marble-painting problem back onto the absent-minded driver problem.
If you’re allowed to use external memory, why not just write down how many you painted of each color? Note that memory is different from a random number generator; for example, a random number generator can be used (imperfectly) to coordinate with a group of people with no communication, whereas memory would require communication but could give perfect results.
Jaynes’ principle is that injecting randomness shouldn’t help you figure out what’s true; implementing a mixed strategy of action is a different matter.
Although the distinction is a bit too fuzzy (in my head) for comfort.
I’m concerned about more than just Jaynes’s principle. Injecting noise between your decision, and what you turn out to do, shouldn’t be able to help you either. Choosing “continue” 2⁄3 of the time is the same as “Always continue, but add a random disturbance that reverses this 1⁄3 of the time.”
continue at the first intersection, exit at the second intersection
continue at the first intersection, continue at the third intersection
The payoffs are such that sequence 2 is the best available, but, having complete knowledge of your decision and no knowledge of which intersection you are at makes sequence 2 impossible. By sacrificing that knowledge, you make available the best sequence, though you can no longer be sure you will take it. In this case, the possibility of performing the best sequence is more valuable than full knowledge of which sequence you actually perform.
By the way, I went ahead and calculated what effect probabilistic knowledge of one’s current intersection has on the payoff, so as to know the value of this knowledge. So I calculated the expected return, given that you have a probability r (with 0.5<r<=1) of correctly guessing what intersection you’re in, and choose the optimal path based on whichever is most likely.
In the original problem the max payoff (under p=2/3) is 4⁄3. I found that to beat that, you only need r to be greater than 52.05%, barely better than chance. Alternately, that’s only 0.0012 bits of the 1 bit of information contained by the knowledge of which intersection you’re at! (Remember that if you have less than .0012 bits, you can just revert to the p=2/3 method from the original problem, which is better than trying to use your knowledge.)
Proof: At X, you have probability r of continuing, then at Y you have a probability r of exiting and (1-r) of continuing.
Thus, EU(r) = r(4r + (1-r) ) = r(3r+1).
Then solve for when EU(r)=4/3, the optimum in the fully ignorant case, which is at r = 0.5205.
You made a mistake here, which is assuming that when you guess you are at X, you should choose CONTINUE with probability 1, and when you guess you are at Y, you should choose EXIT with probability 1. In fact you can improve your expected payoff using a mixed strategy, in which case you can always do better when you have more information.
Here’s the math. Suppose when you are at an intersection, you get a clue that reads either ‘X’ or ‘Y’. This clue is determined by a dice roll at START. With probability .49, you get ‘X’ at both intersections. With probability .49, you get ‘Y’ at both intersections. With probability .02, you get ‘X’ at the X intersection, and ‘Y’ at the Y intersection.
Now, at START, your decision consists of a pair of probabilities, where p is your probability to CONTINUE after seeing ‘X’, and q is your probability to CONTINUE after seeing ‘Y’. Your expected payoff is:
Wow, good catch! (In any case, I had realized that if you have probability less than 52.05%, you shouldn’t go with the most likely, but rather, revert the original p=2/3 method at the very least.)
The formula you gave for the mixed strategy (with coefficients .02, .49, .49) corresponds to a 51% probability of guessing right at any given light. (If the probability of guessing right is r, the coefficients should be 2r-1,1-r,1-r.) It actually raises the threshold for which choosing based on the most probable, with no other randomness, becomes the better strategy, but not by much—just to about 52.1%, by my calculations.
So that means the threshold is 0.0013 bits instead of 0.0012 :-P
(Yeah, I did guess and check because I couldn’t think of a better way on this computer.)
I think you might still be confused, but the nature of your confusion isn’t quite clear to me. Are you saying that if r>52.1%, the best strategy is a pure one again? That’s not true. See this calculation with coefficients .2, .4, .4.
ETA: Also, I think talking about this r, which is supposed to be a guess of “being at X”, is unnecessarily confusing, because how that probability should be computed from a given problem statement, and whether it’s meaningful at all, are under dispute. I suggest thinking in terms of what you should plan to do when you are at START.
I think you might still be confused, but the nature of your confusion isn’t quite clear to me. Are you saying that if r>52.1%, the best strategy is a pure one again? That’s not true. See this calculation with coefficients .2, .4, .4.
That yields a payoff of 1.42, which is less than what the pure strategy gives in the equivalent case corresponding to .2/.4/.4, which is a 60% chance of guessing right. Since the payoff is r*(3r+1), the situation you described has a payoff of 1.68 under a pure strategy of choosing based on your best guess.
ETA: Also, I think talking about this r, which is supposed to be a guess of “being at X”, is unnecessarily confusing, because how that probability should be computed from a given problem statement, and whether it’s meaningful at all, are under dispute.
I specifically avoided defining r as the probability of “being at X”; r is the probability of guessing correctly (and therefore of picking the best option as if it were true), whichever signal you’re at, and it’s equivalent to choosing 2r-1,r-1,r-1 as the coefficients in your phrasing. The only thing possibly counterintuitive is that your ignorance maximizes at r=0.5 rather than zero. Less than 50%, and you just flip your prediction.
Since the payoff is r*(3r+1), the situation you described has a payoff of 1.68 under a pure strategy of choosing based on your best guess.
No, it doesn’t. This is what I meant by “r” being confusing. Given .2/.4/.4, if you always pick CONTINUE when you see hint ‘X’ and EXIT when you see hint ‘Y’, your expected payoff (computed at START) is actually:
Please go back to what I wrote before (I’ve changed the numbers to .2/.4/.4 below):
Suppose when you are at an intersection, you get a clue that reads either ‘X’ or ‘Y’. This clue is determined by a dice roll at START. With probability .4, you get ‘X’ at both intersections. With probability .4, you get ‘Y’ at both intersections. With probability .2, you get ‘X’ at the X intersection, and ‘Y’ at the Y intersection.
I’ll go over the payoff calculation in detail, but if you’re still confused after this, perhaps we should take it to private messages to avoid cluttering up the comments.
Your proposed strategy is to CONTINUE upon seeing the hint ‘X’ and EXIT upon seeing the hint ‘Y’. With .4 probability, you’ll get ‘Y’ at both intersections, but you EXIT upon seeing the first ‘Y’ so you get 0 payoff in that case. With .4 probability, you get ‘X’ at both intersections, so you choose CONTINUE both times and end up at C for payoff 1. With .2 probability, you get ‘X’ at X and ‘Y’ at Y, so you choose CONTINUE and then EXIT for a payoff of 4, thus:
I’m not confused; I probably should have stopped you at your original derivation for the partial-knowledge case but didn’t want to check your algebra. And setting up problems like these is important and tricky, so this discussion belongs here.
So, I think the problem with your setup is that you don’t make the outcome space fully symmetric because you don’t have an equal chance of drawing Y at X and X at Y (compared to your chance of drawing X at X and Y at Y).
To formalize it for the general case of partial knowledge, plus probabilistic knowledge given action, we need to look at four possibilities: Drawing XY, XX, YY, and YX, only the first of which is correct. If, as I defined it before, the probability of being right at any given exit is r, the corresponding probabilities are: r^2, r(1-r), r(1-r), and (1-r)(1-r).
So then I have the expected payoff as a function of p, q, and r as:
The original problem is the case of complete ignorance, r=1/2, which has a maximum 4⁄3 where p and q are such that they average out to choosing “continue” at one’s current intersection 2⁄3 of the time. (And this, I think, shows you how to correctly answer while explicitly and correctly representing your probability of being at a given intersection.)
The case of (always) continuing on (guessing) X and not continuing on (guessing) Y corresponds to p=1 and q=0, which reduces to r*(3r+1), the equation I originally had.
Furthermore, it shows how to beat the payoff of 4⁄3 when your r is under 52%. For 51% (the original case you looked at), the max payoff is 1.361 {p=1, q=0.306388} (don’t know how to show it on Alpha, since you have to constrain p and q to 0 thru 1).
Also, it shows I was in error about the 52% threshold, and the mixed strategy actually dominates all the way up to about r = 61%, at which point of course p=1 and q has shrunk to zero (continue when you see X, exit when you see Y). This corresponds to 32 millibits, much higher than my earlier estimate of 1.3 millibits.
What a crock. I presented my reasoning clearly and showed how it seamlessly and correctly handles the various nuances of the situation, including partial knowledge. If I’m wrong, I’m wrong for a non-obvious reason, and no, Wei_Dai hasn’t shown what’s wrong with this specific handling of the problem.
Whoever’s been modding me down on this thread, kindly explain yourself. And if that person is Wei_Dai: shame on you. Modding is not a tool for helping you win arguments.
Downvoted for complaining about being downvoted and for needless speculation about the integrity of other commenters. (Some other contributions to this thread have been upvoted.)
I’m not complaining about being downvoted. I’m complaining about
a) being downvoted
b) on an articulate, relevant post
c) without an explanation
In the absence of any one of those, I wouldn’t complain. I would love to hear where I’m wrong, because it’s far from obvious. (Yes, the exchange seems tedious and repetitive, but I present new material here.)
And I wasn’t speculating; I was just reminding the community of the general lameness of downvoting someone you’re in an argument with, whether or not that’s Wei_Dai.
I’m not going by my beliefs. Take yours, or the proverbial “reasonable person’s” judgment. Would you or that person judge b) as being true?
Are a) and c) in dispute? Again, my concern is actually not with being downmodded (I would have dropped this long ago if it were); it’s with the lack of an explanation. If no one can be bothered to respond to such a post that spells out its reasoning so clearly and claims to have solved the dilemma—fine, but leave it alone. If you’re going to make the effort, try to make sense too.
And I wasn’t speculating; I was just reminding the community of the general lameness of downvoting someone you’re in an argument with, whether or not that’s Wei_Dai.
I’m far more likely to downvote someone I’m in an argument with. Mostly because I am actually reading their posts in detail and am far more likely to notice woo.
Then why not just vote up your own comments? After all, you must have even more insight into those, right? It’s not like you’re going to be swayed by personal investment in not losing face or anything.
Yeah, I know, there’s that pesky thing about how you can’t upvote your own comments. Pff. There’s no such thing as fair, right? Just use a different account. Sheesh.
In cases where I believe a post of mine has been unjustly downvoted the only thing stopping me from creating another account and upvoting myself is that I just don’t care enough to bother. Of course if there was any particular challenge involved in gaming the system in that way then that would perhaps be incentive enough...
Okay, so far that’s 3-4 people willing to mod me down, zero people willing to point out the errors in a clearly articulated post.
This seems like a non-sequitur to me. It’s your comment of 22 September 2009 09:56:05PM that’s sitting at −4; none of your clear and articulate responses to Dai have negative scores anymore.
No non-sequitur. That’s still, um, zero explanation for the errors in a post that resolves all the issues of the AMD problem, and still at least 4 people modding me down for requesting that a downmod for that kind of post come with some sort of explanation.
If there’s a non-sequitur, it’s the fact that the unjustified downmods were only corrected after I complained about them, and I got downmodded even more than before, and this sequence of events is used to justify the claim that my comments have gotten what they deserved.
1 or 2 people downmod you and you devote 6 posts to whining about it? This is a broadcast medium. Of course the 5 people who voted you down for wasting their time aren’t going to explain why the first 1 or 2 people didn’t like the first post.
a post that resolves all the issues of the AMD problem
It didn’t say that to me. So much for articulate.
If it’s oh so important, don’t leave it buried at the bottom a thread of context. Write something new. Why should we care about your parameter, rather than Wei Dai’s? Why should we care about any parameter?
If it’s oh so important, don’t leave it buried at the bottom a thread of context.
It may surprise you to note that I linked to the comment from a very visible place in the discussion.
Why should we care about your parameter, rather than Wei Dai’s? Why should we care about any parameter?
Because Wei Dai asked for how to generate a solution that makes epistemic sense, and mine was the only one that accurately incorporated the concept of “probability of being at a given intersection”.
And of course, Wei_Dai saw fit to use the p q r parameters just the same.
If it’s oh so important, don’t leave it buried at the bottom a thread of context.
It may surprise you to note that I linked to the comment from a very visible place in the discussion.
Exhuming it and putting it on display doesn’t solve the problem of context. People who clicked through (I speak from experience) didn’t see how it did what the link said it did. It was plausible that if I reread the thread it would mean something, but my vague memory of the thread was that it went off in a boring direction.
My questions were not intended for you to answer here, yet further removed from the context where one might care about their answers. If you write something self-contained that explains why I should care about it, I’ll read it.
So you were interested in seeing the solution, but not looking at the context of the thread for anything that wasn’t familiar? Doesn’t sound like much of an interest to me. If I had repeated myself with a separate self-contained explanation, you would be whining that I’m spamming the same thing all over the place.
You weren’t aware that I put a prominent link to the discussion that resolves all the thread’s issues. That’s okay! Really! You don’t need to cover it up by acting like you knew about it all along. Wei_Dai cares about the new parameters q and r. The post I linked to explains what it accomplishes. Now, think up a new excuse.
If it’s oh so important, don’t leave it buried at the bottom a thread of context.
It may surprise you to note that I linked to the comment from a very visible place in the discussion.
Why should we care about your parameter, rather than Wei Dai’s? Why should we care about any parameter?
Because Wei Dai asked for how to generate a solution that makes epistemic sense, and mine was the only one that accurately incorporated the concept of “probability of being at a given intersection”.
And of course, Wei_Dai saw fit to use the p q r parameters just the same.
No ‘justification’ necessary. ‘Deserved’ is irrelevant and there is no such thing as ‘fair’.
If I didn’t accept that then LessWrong (and most of the universe) would be downright depressing. People are (often) stupid and votes here represent a very different thing than reward for insight.
Just hold the unknown down-voters in silent contempt briefly then go along with your life. Plenty more upvotes will come. And you know, the votes that my comments receive don’t seem to be all that correlated with the quality of the contributed insight. One reason for this is that the more elusive an insight is the less likely it is to agree with what people already think. This phenomenon more or less drives status assignment in academia. The significance of votes I get here rather pales in comparison.
Douglas makes a good suggestion. Want people to appreciate (or even just comprehend) what you are saying? Put in some effort to make a top level post. Express the problem, provide illustrations (verbal or otherwise) and explain your solution. You’ll get all sorts of status.
In the past, I took a comment seriously from you that was satire. Is this one of those, too? Sometimes it’s hard to tell.
If it’s serious, then my answer is that whatever “clutter” my comment here gave, it would give even more as a top level post, which probably can’t give more explanation than my post already did.
By the way, just a “heads-up”: I count 6+ comments from others on meta-talk, 8+ down-mods, and 0 explanations for the errors in my solution. Nice work, guys.
I count 6+ comments from others on meta-talk, 8+ down-mods, and 0 [sic] explanations for the errors in my solution. Nice work, guys.
If it is in fact the case that your complaints are legitimately judged a negative contribution, then you should expect to be downvoted and criticized on those particular comments, regardless of whether or not your solution is correct. There’s nothing contradictory about simultaneously believing both that your proposed solution is correct, and that your subsequent complaints are a negative contribution.
I don’t feel like taking the time to look over your solution. Maybe it’s perfect. Wonderful! Spectacular! This world becomes a little brighter every time someone solves a math problem. But could you please, please consider toning down the hostility just a bit? These swipes at other commenters’ competence and integrity are really unpleasant to read.
ADDENDUM: Re tone, consider the difference between “I wonder why this was downvoted, could someone please explain?” (which is polite) and “What a crock,” followed by shaming a counterfactual Wei Dai (which is rude).
If it is in fact the case that your complaints are legitimately judged a negative contribution, then you should expect to be downvoted and criticized on those particular comments, regardless of whether or not your solution is correct. There’s nothing contradictory about simultaneously believing both that your proposed solution is correct, and that your subsequent complaints are a negative contribution.
Actually, there is something contradictory when those whiny comments were necessary for the previous, relevant comments to get their deserved karma. Your position here is basically: “Yeah, we were wrong to accuse you of those crimes, but you were still a jerk for pulling all that crap about ‘I plead not guilty!’ and ‘I didn’t do it!’, wah, wah, wah...”
At the very least, it should buy me more leeway than get for such a tone in isolation.
But could you please, please consider toning down the hostility just a bit?
Sure thing.
I trust that other posters will be more judicious with their voting and responses as well.
No satire. I just don’t find expecting the universe to ‘behave itself’ according to my ideals to be particularly pleasant so I don’t wast my emotional energy on it.
I have upvoted your comment because it appears to be a useful (albeit somewhat information dense) contribution. I personally chose not to downvote your complaints because I do empathise with your frustration even if I don’t think your complaints are a useful way to get you what you want.
I was the one who downvoted the parent, because you criticized Wei Dai’s correct solution by arguing about a different problem than the one you agreed to several comments upstream.
Yes, you’d get a different solution if you assumed that the random variable for information gave independent readings at X and Y, instead of being engineered for maximum correlation. But that’s not the problem Wei Dai originally stated, and his solution to the original problem is unambiguously correct. (I suspect, but haven’t checked, that a mixed strategy beats the pure one on your problem setup as well.)
I simply downvoted rather than commented, because (a) I was feeling tired and (b) your mistake seemed pretty clear to me. I don’t think that was a violation of LW custom.
I was the one who downvoted the parent, because you criticized Wei Dai’s correct solution by arguing about a different problem than the one you agreed to several comments upstream
I didn’t change the problem; I pointed out that he hadn’t been appropriately representing the existing problem when trying to generalize it to partial information. Having previously agreed with his (incorrect) assumptions in no way obligates me to persist in my error, especially when the exchange makes it clear!
his solution to the original problem is unambiguously correct. (I suspect, but haven’t checked, that a mixed strategy beats the pure one on your problem setup as well.)
Which original problem? (If the ABM problem as stated, then my solution is gives the same p=2/3 result. If it’s the partial knowledge variant, Wei_Dei doesn’t have an unambiguously correct solution when he fails to include the possibility of picking Y at X and X at Y like he did for the reverse.) Further, I do have the mixed strategy dominating—but only up to r = 61%. Feel free to find an optimum where one of p and q is not 1 or 0 while r is greater than 61%.
Yes, you’d get a different solution if you assumed that the random variable for information gave independent readings at X and Y, instead of being engineered for maximum correlation.
That wasn’t the reason for our different solutions.
I simply downvoted rather than commented, because (a) I was feeling tired and (b) your mistake seemed pretty clear to me.
Well, I hope you’re no longer tired, and you can check my approach one more time.
You seem to be treating .2/.4/.4 as being continue-exit/exit-exit/continue-continue, which isn’t the right way to look at it.
I’d be interested in seeing what you think is wrong with the above derivation, ideally in terms of the actual decision problem at hand. Remember, p and q are decision parameters. They parameterize an agent, not an expectation. When p and q are 0 or 1, the agent is essentially a function of type “Bool → Bool”. How could a stateless agent of that type implement a better strategy than limiting itself to those three options?
Again, what’s wrong with that derivation is it leaves out the possibility of “disinformative”, and therefore assumes more knowledge about your intersection than you can really have. (By zeroing the probability of “Y then X” it concentrates the probability mass in a way that decreases the entropy of the variable more than your knowledge can justify.)
In writing the world-program in a way that categorizes your guess as “informative”, you’re implicitly assuming some memory of what you drew before: “Okay, so now I’m on the second one, which shows the Y-card …”
Now, can you explain what’s wrong with my derivation?
Again, what’s wrong with that derivation is it leaves out the possibility of “disinformative”
By “disinformative”, do you mean that intersection X has hint Y and vice versa? This is not possible in the scenario Wei Dai describes.
In writing the world-program in a way that categorizes your guess as “informative”
Ah, this seems to be a point of confusion. The world program does not categorize your guess, at all. The “informative” label in my derivation refers to the correctness of the provided hints. Whether or not the hints are both correct is a property of the world.
you’re implicitly assuming some memory of what you drew before: “Okay, so now I’m on the second one, which shows the Y-card …”
No, I am merely examining the possible paths from the outside. You seem to be confusing the world program with the agent. In the “informative/continue/exit” case, I am saying “okay, so now the driver is on the second one”. This does not imply that the driver is aware of this fact.
Now, can you explain what’s wrong with my derivation?
I think so. You’re approaching the problem from a “first-person perspective”, rather than using the given structure of the world, so you’re throwing away conditional information under the guise of implementing a stateless agent. But the agent can still look at the entire problem ahead of time and make a decision incorporating this information without actually needing to remember what’s happened once he begins.
At the first intersection, the state space of the world (not the agent) hasn’t yet branched, so your approach gives the correct answer. At the second intersection, we (the authors of the strategy, not the agent) must update your “guess odds” conditional on having seen X at the first intersection.
The outer probabilities are correct, but the inner probabilities haven’t been conditioned on seeing X at the first intersection. Two out of three times that the agent sees X at the first intersection, he will see X again at the second intersection. So, assuming the p=1 q=0 strategy, the statement “Given .2/.4/.4, you will see Y 60% of the time at Y” is false.
You’re approaching the problem from a “first-person perspective”, rather than using the given structure of the world, so you’re throwing away conditional information under the guise of implementing a stateless agent. But the agent can still look at the entire problem ahead of time and make a decision incorporating this information without actually needing to remember what’s happened once he begins.
Okay, this is where I think the misunderstanding is. When I posited the variable r, I posited it to mean the probability of correctly guessing the intersection. In other words, you receive information at that point such that it moves your estimate of which intersection you’re at, accounting for other inferences you may have made about the problem, including from examining it from the outside and setting your p, to r. So the way the r is defined, it screens off knowledge gained from deciding to use p and q.
Now, this might not be a particularly relevant generalization of the problem, I now grant that. But under the premises, it’s correct. A better generalization would be to find out your probability distribution across X and Y (given your choice of p), and then assume someone gives you b bits of evidence (decrease in the KL Divergence of your estimate from the true distribution), and find the best strategy from there.
And for that matter Wei_Dai’s solution, given his way of incorporating partial knowledge of one’s intersection, is also correct, and also probably not the best way to generalize the problem because it basically asks, “what strategy should you pick, given that you have a probably t of not being an absent-minded driver, and a probability 1 - t of being an absent-minded driver?”
And for that matter Wei_Dai’s solution, given his way of incorporating partial knowledge of one’s intersection, is also correct
Thanks, this clarifies the state of the discussion. I was basically arguing against the assertion that it was not.
and also probably not the best way to generalize the problem because it basically asks, “what strategy should you pick, given that you have a probably t of not being an absent-minded driver, and a probability 1 - t of being an absent-minded driver?”
I don’t think I understand this. The resulting agent is always stateless, so it is always an absent-minded driver.
Are you looking for a way of incorporating information “on-the-fly” that the original strategy couldn’t account for? I could be missing something, but I don’t see how this is possible. In order for some hint H to function as useful information, you need to have estimates for P(H|X) and P(H|Y) ahead of time. But with these estimates on hand, you’ll have already incorporated them into your strategy. Therefore, your reaction to the observation of H or the lack thereof is already determined. And since the agent is stateless, the observation can’t affect anything beyond that decision.
It seems that there is just “no room” for additional information to enter this problem except from the outside.
Thanks! :-) But I still don’t understand what made you express the payoff as a function of p. Was it just something you thought of when applying UDT (perhaps after knowing that’s how someone else approached the problem), or is there something about UDT that required you to do that?
What do you mean? p is the only control parameter… You consider a set of “global” mixed strategies, indexed by p, and pick one that leads to the best outcome, without worrying about where your mind that does this calculation is currently located and under what conditions you are thinking this thought.
What do you mean? p is the only control parameter…
Perhaps, but it’s an innovation to think of the problem in terms of “solving for the random fraction of times I’m going to do them”. That is, even considering that you should add randomness in between your decision and what you do, is an insight. What focused your attention on optimizing with respect to p?
Mixed strategy is a standard concept, so here we are considering a set S of all (global) mixed strategies available for the game. When you are searching for the best strategy, you are maximizing the payoff over S. You are searching for the mixed strategy that gives the best payoff. What UDT tells is that you should just do that, even if you are considering what to do in a situation where some of the options have run out, and, as here, even if you have no idea where you are. “The best strategy” quite literally means
s∗=argmaxs∈SEU(s)
The only parameter for a given strategy is the probability of turning, so it’s natural to index the strategies by that probability. This indexing is a mapping t:[0,1]->S that places a mixed strategy in correspondence with a value of turning probability. Now, we can rewrite the expected utility maximization in terms of probability:
Okay, that’s making more sense—the part where you get to parameterizing p as a real is what I was interested in.
But do you do the same thing when applying UDT to Newcomb’s problem? Do you consider it a necessary part of UDT that you take p (with 0<=p<=1) as a continuous parameter to maximize over, where p is the probability of one-boxing?
Fundamentally, this depends on the setting—you might not be given a random number generator (randomness is defined with respect to the game), and so the strategies that depend on a random value won’t be available in the set of strategies to choose from. In Newcomb’s problem, the usual setting is that you have to be fairly deterministic or Omega punishes you (so that a small probability of two-boxing may even be preferable to pure one-boxing, or not, depending on Omega’s strategy), or Omega may be placed so that your strategy is always deterministic for it (effectively, taking mixed strategies out of the set of allowed ones).
S() is suppose to be an implementation of UDT. By looking at the world program P, it should determine that among all possible input-output mappings, those that return “EXIT” for 1⁄3 of all inputs (doesn’t matter which ones) maximize average payoff. What made me express the payoff as a function of p is by stepping through what S is supposed to do as an implementation of UDT.
I’m still confused. Your response seems to just say, “I did it because it works.”—which is a great reason! But I want to know if UDT gave you more guidance than that.
Does UDT require that you look at the consequences of doing something p% of the time (irrespective of which ones), on all problems?
Basically, I’m in the position of that guy/gal that everyone here probably helped out in high school:
“How do you do the proof in problem 29?”
“Oh, just used identities 3 and 5, solve for t, and plug it back into the original equation.”
“But how did you know to do that?”
Does UDT require that you look at the consequences of doing something p% of the time (irrespective of which ones), on all problems?
No, UDT (at least in my formulation) requires that you look at all possible input-output mappings, and choose the one that is optimal. In this case it so happens that any function that returns “EXIT” for 1⁄3 of inputs is optimal.
What I’m more interested in is: doesn’t the UDT/planning-optimal solution imply that injecting randomness can improve an algorithm, which is a big no-no? Because you’re saying (and you’re right AFAICT) that the best strategy is to randomly choose whether to continue, with a bias in favor of continuing.
Also, could someone go through the steps of how UDT generates this solution, specifically, how it brings to your attention the possibility of expressing the payoff as a function of p? (Sorry, but I’m a bit of a straggler on these decision theory posts.)
Hm. This would actually appear to be a distinct class of cases in which randomness is “useful”, but it’s important to note that a simple deterministic algorithm would do better if we were allowed to remember our own past actions—i.e. this is a very special case. I should probably think about it further, but offhand, it looks like the reason for randomized-optimality is that the optimal deterministic algorithm has been prohibited in a way that makes all remaining deterministic algorithms stupid.
More generally, game theory often produces situations where randomization is clearly the rational strategy when you do not know with certainty the other player’s action. The example given in this post is straightforwardly convertible into a game involving Nature, as many games do, where Nature plays first and puts you into one of two states; if you are in X and play Continue, Nature plays again and the game loops. The solution to that game is the same as that derived above, I believe.
The point is, for a broad class of games/decision problems where Nature has a substantial role, we might run into similar issues that can be resolved a similar way.
I think the motivation for this problem is that memory in general is a limited resource, and a decision theory should be able to handle cases were recall is imperfect. I don’t believe that there was a deliberate attempt to prohibit the optimal deterministic algorithm in order to make randomized algorithms look good.
I don’t think that resolves the issue. As I demonstrated in this comment, if you have some probabilistic knowledge of which intersection you’re at, you can do better than the p=2/3 method. Specifically, as long as you have 0.0012 bits of information about which intersection you’re at (i.e. assign a greater than 52% chance of guessing correctly), you’re better off choosing based on what seems most likely.
However—and this is the kicker—that means that if you have between 0 and 0.0012 bits of information about your intersection, you’re best off throwing that information away entirely and going with the method that’s optimal for when you’re fully forgetful. So it’s still a case where throwing away information helps you.
ETA: False alarm; Wei_Dai corrects me here—you can still use your knowledge to do better than 4⁄3 when your probability of guessing right is between 50% and 52.05%.
It’s probably better not to think of this as a randomized algorithm. Here is a simpler example of what I mean.
Suppose you have two urns in front of you. One urn is full of N white marbles, and the other urn is empty. Your task is to take a marble out of the first urn, paint it either red or blue, place it in the second urn, and then repeat this process until the first urn is empty. Moreover, when you are done, something very close to two-thirds of the marbles in the second urn must be red.
The catch, of course, is that you have very poor short-term memory, so you can never remember how many marbles you’ve painted or what colors you’ve painted them.
The “randomized algorithm” solution would be to use a pseudo-random number generator to produce a number x between 0 and 1 for each marble, and to paint that marble red if and only if x < 2⁄3.
But there is a non-random way to think of that procedure. Suppose instead that, before you start painting your N marbles, you set out a box of N poker chips, of which you know (that is, have reason to be highly confident) that very nearly two-thirds are red. You then proceed to paint marbles according to the following algorithm. After taking a marble in hand, you select a poker chip non-randomly from the box, and then paint the marble the same color as that poker chip.
This is a non-random algorithm that you can use with confidence, but which requires no memory. And, as I see it, the method with the pseudo-random number generator amounts to the same thing. By deciding to use the generator, you are determining N numbers: the next N numbers that the generator will produce. Moreover, if you know how the generator is constructed, you know (that is, have reason to be highly confident) that very nearly two-thirds of those numbers will be less than 2⁄3. To my mind, this is functionally identical to the poker chip procedure.
You mean it does not require memory in your brain, because you implemented your memory with the poker chips. It is quite convenient they were available.
My point is that it’s no more convenient than having the pseudo-random number generator available. I maintain that the generator is implementing your memory in functionally the same sense. For example, you are effectively guaranteed not to get the same number twice, just as you are effectively guaranteed not to get the same poker chip twice.
ETA: After all, something in the generator must be keeping track of the passage of the marbles for you. Otherwise the generator would keep producing the same number over and over.
Rather than using a PRNG (which, as you say, requires memory), you could use a source of actual randomness (e.g. quantum decay). Then you don’t really have extra memory with the randomized algorithm, do you?
I thought of this as well, but it does not really matter because it is the ability to produce the different output in each case event that gives part of the functionality of memory, that is, the ability to distinguish between events. Granted, this is not as effective as deterministically understood memory, where you know in advance which output you get at each event. Essentially, it is memory with the drawback that you don’t understand how it works to the extent that you are uncertain how it correlates with what you wanted to remember.
Have you read this comment of mine from another branch of this conversation?
Randomness ‘degenerates’ (perhaps by action of a malicious daemon) into non-randomness, and so it can do better and no worse than a non-random approach?
(If the environments and agents are identical down to the source of randomness, then the agent defaults to a pure strategy; but with ‘genuine’ randomness, random sources that are different between instances of the agent, the agent can actually implement the better mixed strategy?)
I’m having trouble parsing your questions. You have some sentences that end in question marks. Are you asking whether I agree with those sentences? I’m having trouble understanding the assertions made by those sentences, so I can’t tell whether I agree with them (if that was what you were asking).
The claim that I was making could be summed up as follows. I described an agent using a PRNG to solve a problem involving painting marbles. The usual way to view such a solution is as
My suggestion was instead to view the solution as
The especially limited memory is the part of the PRNG that remembers what the next seed should be. If there weren’t some kind of memory involved in the PRNG’s operation, the PRNG would keep using the same seed over and over again, producing the same “random” number again and again.
The algorithm is the algorithm that the PRNG uses to turn the first seed into a sequence of pseudo-random numbers.
The certain property of that sequence is the property of having two-thirds of its terms being less than 2⁄3.
OK, that’s clearer. And different from what I thought you were saying.
That is fair enough, though the reason I find scenario at all interesting is that it illustrates the utility of a random strategy under certain conditions.
For me, finding an equivalent nonrandom strategy helps to dispel confusion.
I like your characterization above that the PRNG is “memory with the drawback that you don’t understand how it works to the extent that you are uncertain how it correlates with what you wanted to remember.” Another way to say it is that the PRNG gives you exactly what you need with near-certainty, while normal memory gives you extra information that happens to be useless for this problem.
What is “random” about the PRNG (the exact sequence of numbers) is extra stuff that you happen not to need. What you need from the PRNG (N numbers, of which two-thirds are less than 2⁄3) is not random but a near-certainty. So, although you’re using a so-called pseudo-random number generator, you’re really using an aspect of it that’s not random in any significant sense. For this reason, I don’t think that the PRNG algorithm should be called “random”, any more than is the poker chip algorithm.
Very clever! It is indeed true that if you forget all previous marble paintings, the best way to ensure that 2⁄3 get painted one color is to paint it that color with p = 2⁄3.
And interestingly, I can think of several examples of my own life when I’ve been in that situation. For example, when I’m playing Alpha Centauri, I want to make sure I have a good mix of artillery, infantry, and speeders, but it’s tedious to keep track of how many I have of each, so I just pick in a roughly random way, but biased toward those that I want in higher proportions.
I’m going to see if I can map the urn/marble-painting problem back onto the absent-minded driver problem.
If you’re allowed to use external memory, why not just write down how many you painted of each color? Note that memory is different from a random number generator; for example, a random number generator can be used (imperfectly) to coordinate with a group of people with no communication, whereas memory would require communication but could give perfect results.
Have you read this comment of mine from another branch of this conversation?
Take 3 marbles out of the urn. Paint one of them blue, and then the other two red. Put all three in the second urn. Repeat
Yeah, yeah—white marbles are half-medusas, so if you can see more than one you die, or something.
Taking more than one marble out of the urn at a time violates the task description. Don’t fight the hypothetical!
… That’s what the second paragraph is for...
Jaynes’ principle is that injecting randomness shouldn’t help you figure out what’s true; implementing a mixed strategy of action is a different matter.
Although the distinction is a bit too fuzzy (in my head) for comfort.
I’m concerned about more than just Jaynes’s principle. Injecting noise between your decision, and what you turn out to do, shouldn’t be able to help you either. Choosing “continue” 2⁄3 of the time is the same as “Always continue, but add a random disturbance that reverses this 1⁄3 of the time.”
How can that addition of randomness help you?
The randomness helps in this case because the strategy of determining your actions by which intersection you are at is not available.
Yes, the problem deletes the knowledge of what intersection I’m at. How does it help to further delete knowledge of what my decision is?
There are 3 possible sequences of actions:
exit at the first intersection
continue at the first intersection, exit at the second intersection
continue at the first intersection, continue at the third intersection
The payoffs are such that sequence 2 is the best available, but, having complete knowledge of your decision and no knowledge of which intersection you are at makes sequence 2 impossible. By sacrificing that knowledge, you make available the best sequence, though you can no longer be sure you will take it. In this case, the possibility of performing the best sequence is more valuable than full knowledge of which sequence you actually perform.
By the way, I went ahead and calculated what effect probabilistic knowledge of one’s current intersection has on the payoff, so as to know the value of this knowledge. So I calculated the expected return, given that you have a probability r (with 0.5<r<=1) of correctly guessing what intersection you’re in, and choose the optimal path based on whichever is most likely.
In the original problem the max payoff (under p=2/3) is 4⁄3. I found that to beat that, you only need r to be greater than 52.05%, barely better than chance. Alternately, that’s only 0.0012 bits of the 1 bit of information contained by the knowledge of which intersection you’re at! (Remember that if you have less than .0012 bits, you can just revert to the p=2/3 method from the original problem, which is better than trying to use your knowledge.)
Proof: At X, you have probability r of continuing, then at Y you have a probability r of exiting and (1-r) of continuing.
Thus, EU(r) = r(4r + (1-r) ) = r(3r+1).
Then solve for when EU(r)=4/3, the optimum in the fully ignorant case, which is at r = 0.5205.
You made a mistake here, which is assuming that when you guess you are at X, you should choose CONTINUE with probability 1, and when you guess you are at Y, you should choose EXIT with probability 1. In fact you can improve your expected payoff using a mixed strategy, in which case you can always do better when you have more information.
Here’s the math. Suppose when you are at an intersection, you get a clue that reads either ‘X’ or ‘Y’. This clue is determined by a dice roll at START. With probability .49, you get ‘X’ at both intersections. With probability .49, you get ‘Y’ at both intersections. With probability .02, you get ‘X’ at the X intersection, and ‘Y’ at the Y intersection.
Now, at START, your decision consists of a pair of probabilities, where p is your probability to CONTINUE after seeing ‘X’, and q is your probability to CONTINUE after seeing ‘Y’. Your expected payoff is:
.02 * (p*q + 4*(p*(1-q))) + .49 * (p*p + 4*(p*(1-p))) + .49 * (q*q + 4*(q*(1-q)))
which is maximized at p=0.680556, q=0.652778. And your expected payoff is 1.33389 which is > 4⁄3.
Wow, good catch! (In any case, I had realized that if you have probability less than 52.05%, you shouldn’t go with the most likely, but rather, revert the original p=2/3 method at the very least.)
The formula you gave for the mixed strategy (with coefficients .02, .49, .49) corresponds to a 51% probability of guessing right at any given light. (If the probability of guessing right is r, the coefficients should be 2r-1,1-r,1-r.) It actually raises the threshold for which choosing based on the most probable, with no other randomness, becomes the better strategy, but not by much—just to about 52.1%, by my calculations.
So that means the threshold is 0.0013 bits instead of 0.0012 :-P
(Yeah, I did guess and check because I couldn’t think of a better way on this computer.)
I think you might still be confused, but the nature of your confusion isn’t quite clear to me. Are you saying that if r>52.1%, the best strategy is a pure one again? That’s not true. See this calculation with coefficients .2, .4, .4.
ETA: Also, I think talking about this r, which is supposed to be a guess of “being at X”, is unnecessarily confusing, because how that probability should be computed from a given problem statement, and whether it’s meaningful at all, are under dispute. I suggest thinking in terms of what you should plan to do when you are at START.
That yields a payoff of 1.42, which is less than what the pure strategy gives in the equivalent case corresponding to .2/.4/.4, which is a 60% chance of guessing right. Since the payoff is r*(3r+1), the situation you described has a payoff of 1.68 under a pure strategy of choosing based on your best guess.
I specifically avoided defining r as the probability of “being at X”; r is the probability of guessing correctly (and therefore of picking the best option as if it were true), whichever signal you’re at, and it’s equivalent to choosing 2r-1,r-1,r-1 as the coefficients in your phrasing. The only thing possibly counterintuitive is that your ignorance maximizes at r=0.5 rather than zero. Less than 50%, and you just flip your prediction.
No, it doesn’t. This is what I meant by “r” being confusing. Given .2/.4/.4, if you always pick CONTINUE when you see hint ‘X’ and EXIT when you see hint ‘Y’, your expected payoff (computed at START) is actually:
.4 0 + .4 1 + .2 * 4 = 1.2.
Incorrect. Given .2/.4/.4, you will see X 60% of the time at X, and Y 60% of the time at Y. So your payoff, computed at START, is:
.4 * 0 + .6 * (4 *.6 + .4* 1) = 1.68
You seem to be treating .2/.4/.4 as being continue-exit/exit-exit/continue-continue, which isn’t the right way to look at it.
Please go back to what I wrote before (I’ve changed the numbers to .2/.4/.4 below):
I’ll go over the payoff calculation in detail, but if you’re still confused after this, perhaps we should take it to private messages to avoid cluttering up the comments.
Your proposed strategy is to CONTINUE upon seeing the hint ‘X’ and EXIT upon seeing the hint ‘Y’. With .4 probability, you’ll get ‘Y’ at both intersections, but you EXIT upon seeing the first ‘Y’ so you get 0 payoff in that case. With .4 probability, you get ‘X’ at both intersections, so you choose CONTINUE both times and end up at C for payoff 1. With .2 probability, you get ‘X’ at X and ‘Y’ at Y, so you choose CONTINUE and then EXIT for a payoff of 4, thus:
.4 0 + .4 1 + .2 * 4 = 1.2
I’m not confused; I probably should have stopped you at your original derivation for the partial-knowledge case but didn’t want to check your algebra. And setting up problems like these is important and tricky, so this discussion belongs here.
So, I think the problem with your setup is that you don’t make the outcome space fully symmetric because you don’t have an equal chance of drawing Y at X and X at Y (compared to your chance of drawing X at X and Y at Y).
To formalize it for the general case of partial knowledge, plus probabilistic knowledge given action, we need to look at four possibilities: Drawing XY, XX, YY, and YX, only the first of which is correct. If, as I defined it before, the probability of being right at any given exit is r, the corresponding probabilities are: r^2, r(1-r), r(1-r), and (1-r)(1-r).
So then I have the expected payoff as a function of p, q, and r as:
(r^2)(p*q + 4p*(1 - q)) + r(1 - r)(p^2 + 4p(1 - p)) +r(1 - r)(q^2 + 4q(1 - q)) + (1 - r)(1 - r)(p*q + 4q(1 - p))
This nicely explains the previous results:
The original problem is the case of complete ignorance, r=1/2, which has a maximum 4⁄3 where p and q are such that they average out to choosing “continue” at one’s current intersection 2⁄3 of the time. (And this, I think, shows you how to correctly answer while explicitly and correctly representing your probability of being at a given intersection.)
The case of (always) continuing on (guessing) X and not continuing on (guessing) Y corresponds to p=1 and q=0, which reduces to r*(3r+1), the equation I originally had.
Furthermore, it shows how to beat the payoff of 4⁄3 when your r is under 52%. For 51% (the original case you looked at), the max payoff is 1.361 {p=1, q=0.306388} (don’t know how to show it on Alpha, since you have to constrain p and q to 0 thru 1).
Also, it shows I was in error about the 52% threshold, and the mixed strategy actually dominates all the way up to about r = 61%, at which point of course p=1 and q has shrunk to zero (continue when you see X, exit when you see Y). This corresponds to 32 millibits, much higher than my earlier estimate of 1.3 millibits.
Interesting!
What a crock. I presented my reasoning clearly and showed how it seamlessly and correctly handles the various nuances of the situation, including partial knowledge. If I’m wrong, I’m wrong for a non-obvious reason, and no, Wei_Dai hasn’t shown what’s wrong with this specific handling of the problem.
Whoever’s been modding me down on this thread, kindly explain yourself. And if that person is Wei_Dai: shame on you. Modding is not a tool for helping you win arguments.
Downvoted for complaining about being downvoted and for needless speculation about the integrity of other commenters. (Some other contributions to this thread have been upvoted.)
I’m not complaining about being downvoted. I’m complaining about
a) being downvoted
b) on an articulate, relevant post
c) without an explanation
In the absence of any one of those, I wouldn’t complain. I would love to hear where I’m wrong, because it’s far from obvious. (Yes, the exchange seems tedious and repetitive, but I present new material here.)
And I wasn’t speculating; I was just reminding the community of the general lameness of downvoting someone you’re in an argument with, whether or not that’s Wei_Dai.
Imagine the noise if everybody complained whenever they were downvoted and believed that all 3 of your criteria were applicable.
I’m not going by my beliefs. Take yours, or the proverbial “reasonable person’s” judgment. Would you or that person judge b) as being true?
Are a) and c) in dispute? Again, my concern is actually not with being downmodded (I would have dropped this long ago if it were); it’s with the lack of an explanation. If no one can be bothered to respond to such a post that spells out its reasoning so clearly and claims to have solved the dilemma—fine, but leave it alone. If you’re going to make the effort, try to make sense too.
I’m far more likely to downvote someone I’m in an argument with. Mostly because I am actually reading their posts in detail and am far more likely to notice woo.
Then why not just vote up your own comments? After all, you must have even more insight into those, right? It’s not like you’re going to be swayed by personal investment in not losing face or anything.
Yeah, I know, there’s that pesky thing about how you can’t upvote your own comments. Pff. There’s no such thing as fair, right? Just use a different account. Sheesh.
You must be an inherently angry person. Or an evil mutant. Something like that =)
In cases where I believe a post of mine has been unjustly downvoted the only thing stopping me from creating another account and upvoting myself is that I just don’t care enough to bother. Of course if there was any particular challenge involved in gaming the system in that way then that would perhaps be incentive enough...
Okay, so far that’s 3-4 people willing to mod me down, zero people willing to point out the errors in a clearly articulated post.
I’m sure we can do better than that, can’t we, LW?
If it’s that bad, I’m sure one of you can type the two or three sentences necessary to effortlessly demolish it.
ETA: Or not.
This seems like a non-sequitur to me. It’s your comment of 22 September 2009 09:56:05PM that’s sitting at −4; none of your clear and articulate responses to Dai have negative scores anymore.
No non-sequitur. That’s still, um, zero explanation for the errors in a post that resolves all the issues of the AMD problem, and still at least 4 people modding me down for requesting that a downmod for that kind of post come with some sort of explanation.
If there’s a non-sequitur, it’s the fact that the unjustified downmods were only corrected after I complained about them, and I got downmodded even more than before, and this sequence of events is used to justify the claim that my comments have gotten what they deserved.
1 or 2 people downmod you and you devote 6 posts to whining about it? This is a broadcast medium. Of course the 5 people who voted you down for wasting their time aren’t going to explain why the first 1 or 2 people didn’t like the first post.
It didn’t say that to me. So much for articulate.
If it’s oh so important, don’t leave it buried at the bottom a thread of context. Write something new. Why should we care about your parameter, rather than Wei Dai’s? Why should we care about any parameter?
It may surprise you to note that I linked to the comment from a very visible place in the discussion.
Because Wei Dai asked for how to generate a solution that makes epistemic sense, and mine was the only one that accurately incorporated the concept of “probability of being at a given intersection”.
And of course, Wei_Dai saw fit to use the p q r parameters just the same.
Exhuming it and putting it on display doesn’t solve the problem of context. People who clicked through (I speak from experience) didn’t see how it did what the link said it did. It was plausible that if I reread the thread it would mean something, but my vague memory of the thread was that it went off in a boring direction.
My questions were not intended for you to answer here, yet further removed from the context where one might care about their answers. If you write something self-contained that explains why I should care about it, I’ll read it.
So you were interested in seeing the solution, but not looking at the context of the thread for anything that wasn’t familiar? Doesn’t sound like much of an interest to me. If I had repeated myself with a separate self-contained explanation, you would be whining that I’m spamming the same thing all over the place.
You weren’t aware that I put a prominent link to the discussion that resolves all the thread’s issues. That’s okay! Really! You don’t need to cover it up by acting like you knew about it all along. Wei_Dai cares about the new parameters q and r. The post I linked to explains what it accomplishes. Now, think up a new excuse.
It may surprise you to note that I linked to the comment from a very visible place in the discussion.
Because Wei Dai asked for how to generate a solution that makes epistemic sense, and mine was the only one that accurately incorporated the concept of “probability of being at a given intersection”.
And of course, Wei_Dai saw fit to use the p q r parameters just the same.
No ‘justification’ necessary. ‘Deserved’ is irrelevant and there is no such thing as ‘fair’.
If I didn’t accept that then LessWrong (and most of the universe) would be downright depressing. People are (often) stupid and votes here represent a very different thing than reward for insight.
Just hold the unknown down-voters in silent contempt briefly then go along with your life. Plenty more upvotes will come. And you know, the votes that my comments receive don’t seem to be all that correlated with the quality of the contributed insight. One reason for this is that the more elusive an insight is the less likely it is to agree with what people already think. This phenomenon more or less drives status assignment in academia. The significance of votes I get here rather pales in comparison.
Douglas makes a good suggestion. Want people to appreciate (or even just comprehend) what you are saying? Put in some effort to make a top level post. Express the problem, provide illustrations (verbal or otherwise) and explain your solution. You’ll get all sorts of status.
In the past, I took a comment seriously from you that was satire. Is this one of those, too? Sometimes it’s hard to tell.
If it’s serious, then my answer is that whatever “clutter” my comment here gave, it would give even more as a top level post, which probably can’t give more explanation than my post already did.
By the way, just a “heads-up”: I count 6+ comments from others on meta-talk, 8+ down-mods, and 0 explanations for the errors in my solution. Nice work, guys.
If it is in fact the case that your complaints are legitimately judged a negative contribution, then you should expect to be downvoted and criticized on those particular comments, regardless of whether or not your solution is correct. There’s nothing contradictory about simultaneously believing both that your proposed solution is correct, and that your subsequent complaints are a negative contribution.
I don’t feel like taking the time to look over your solution. Maybe it’s perfect. Wonderful! Spectacular! This world becomes a little brighter every time someone solves a math problem. But could you please, please consider toning down the hostility just a bit? These swipes at other commenters’ competence and integrity are really unpleasant to read.
ADDENDUM: Re tone, consider the difference between “I wonder why this was downvoted, could someone please explain?” (which is polite) and “What a crock,” followed by shaming a counterfactual Wei Dai (which is rude).
Actually, there is something contradictory when those whiny comments were necessary for the previous, relevant comments to get their deserved karma. Your position here is basically: “Yeah, we were wrong to accuse you of those crimes, but you were still a jerk for pulling all that crap about ‘I plead not guilty!’ and ‘I didn’t do it!’, wah, wah, wah...”
At the very least, it should buy me more leeway than get for such a tone in isolation.
Sure thing.
I trust that other posters will be more judicious with their voting and responses as well.
No satire. I just don’t find expecting the universe to ‘behave itself’ according to my ideals to be particularly pleasant so I don’t wast my emotional energy on it.
I have upvoted your comment because it appears to be a useful (albeit somewhat information dense) contribution. I personally chose not to downvote your complaints because I do empathise with your frustration even if I don’t think your complaints are a useful way to get you what you want.
I was the one who downvoted the parent, because you criticized Wei Dai’s correct solution by arguing about a different problem than the one you agreed to several comments upstream.
Yes, you’d get a different solution if you assumed that the random variable for information gave independent readings at X and Y, instead of being engineered for maximum correlation. But that’s not the problem Wei Dai originally stated, and his solution to the original problem is unambiguously correct. (I suspect, but haven’t checked, that a mixed strategy beats the pure one on your problem setup as well.)
I simply downvoted rather than commented, because (a) I was feeling tired and (b) your mistake seemed pretty clear to me. I don’t think that was a violation of LW custom.
I didn’t change the problem; I pointed out that he hadn’t been appropriately representing the existing problem when trying to generalize it to partial information. Having previously agreed with his (incorrect) assumptions in no way obligates me to persist in my error, especially when the exchange makes it clear!
Which original problem? (If the ABM problem as stated, then my solution is gives the same p=2/3 result. If it’s the partial knowledge variant, Wei_Dei doesn’t have an unambiguously correct solution when he fails to include the possibility of picking Y at X and X at Y like he did for the reverse.) Further, I do have the mixed strategy dominating—but only up to r = 61%. Feel free to find an optimum where one of p and q is not 1 or 0 while r is greater than 61%.
That wasn’t the reason for our different solutions.
Well, I hope you’re no longer tired, and you can check my approach one more time.
I’m pretty sure Wei Dai is correct. I’ll try and explain it differently. Here’s a rendering of the problem in some kind of pseudolisp:
Now evaluate with the strategy under discussion, (start 1 0):
Prune the zeros:
Combine the linear paths:
I’d be interested in seeing what you think is wrong with the above derivation, ideally in terms of the actual decision problem at hand. Remember, p and q are decision parameters. They parameterize an agent, not an expectation. When p and q are 0 or 1, the agent is essentially a function of type “Bool → Bool”. How could a stateless agent of that type implement a better strategy than limiting itself to those three options?
Again, what’s wrong with that derivation is it leaves out the possibility of “disinformative”, and therefore assumes more knowledge about your intersection than you can really have. (By zeroing the probability of “Y then X” it concentrates the probability mass in a way that decreases the entropy of the variable more than your knowledge can justify.)
In writing the world-program in a way that categorizes your guess as “informative”, you’re implicitly assuming some memory of what you drew before: “Okay, so now I’m on the second one, which shows the Y-card …”
Now, can you explain what’s wrong with my derivation?
By “disinformative”, do you mean that intersection X has hint Y and vice versa? This is not possible in the scenario Wei Dai describes.
Ah, this seems to be a point of confusion. The world program does not categorize your guess, at all. The “informative” label in my derivation refers to the correctness of the provided hints. Whether or not the hints are both correct is a property of the world.
No, I am merely examining the possible paths from the outside. You seem to be confusing the world program with the agent. In the “informative/continue/exit” case, I am saying “okay, so now the driver is on the second one”. This does not imply that the driver is aware of this fact.
I think so. You’re approaching the problem from a “first-person perspective”, rather than using the given structure of the world, so you’re throwing away conditional information under the guise of implementing a stateless agent. But the agent can still look at the entire problem ahead of time and make a decision incorporating this information without actually needing to remember what’s happened once he begins.
At the first intersection, the state space of the world (not the agent) hasn’t yet branched, so your approach gives the correct answer. At the second intersection, we (the authors of the strategy, not the agent) must update your “guess odds” conditional on having seen X at the first intersection.
Your tree was:
The outer probabilities are correct, but the inner probabilities haven’t been conditioned on seeing X at the first intersection. Two out of three times that the agent sees X at the first intersection, he will see X again at the second intersection. So, assuming the p=1 q=0 strategy, the statement “Given .2/.4/.4, you will see Y 60% of the time at Y” is false.
Okay, this is where I think the misunderstanding is. When I posited the variable r, I posited it to mean the probability of correctly guessing the intersection. In other words, you receive information at that point such that it moves your estimate of which intersection you’re at, accounting for other inferences you may have made about the problem, including from examining it from the outside and setting your p, to r. So the way the r is defined, it screens off knowledge gained from deciding to use p and q.
Now, this might not be a particularly relevant generalization of the problem, I now grant that. But under the premises, it’s correct. A better generalization would be to find out your probability distribution across X and Y (given your choice of p), and then assume someone gives you b bits of evidence (decrease in the KL Divergence of your estimate from the true distribution), and find the best strategy from there.
And for that matter Wei_Dai’s solution, given his way of incorporating partial knowledge of one’s intersection, is also correct, and also probably not the best way to generalize the problem because it basically asks, “what strategy should you pick, given that you have a probably t of not being an absent-minded driver, and a probability 1 - t of being an absent-minded driver?”
Thanks, this clarifies the state of the discussion. I was basically arguing against the assertion that it was not.
I don’t think I understand this. The resulting agent is always stateless, so it is always an absent-minded driver.
Are you looking for a way of incorporating information “on-the-fly” that the original strategy couldn’t account for? I could be missing something, but I don’t see how this is possible. In order for some hint H to function as useful information, you need to have estimates for P(H|X) and P(H|Y) ahead of time. But with these estimates on hand, you’ll have already incorporated them into your strategy. Therefore, your reaction to the observation of H or the lack thereof is already determined. And since the agent is stateless, the observation can’t affect anything beyond that decision.
It seems that there is just “no room” for additional information to enter this problem except from the outside.
Okay, that’s a more specific, helpful answer.
I just added to the post with an answer.
Thanks! :-) But I still don’t understand what made you express the payoff as a function of p. Was it just something you thought of when applying UDT (perhaps after knowing that’s how someone else approached the problem), or is there something about UDT that required you to do that?
What do you mean? p is the only control parameter… You consider a set of “global” mixed strategies, indexed by p, and pick one that leads to the best outcome, without worrying about where your mind that does this calculation is currently located and under what conditions you are thinking this thought.
Perhaps, but it’s an innovation to think of the problem in terms of “solving for the random fraction of times I’m going to do them”. That is, even considering that you should add randomness in between your decision and what you do, is an insight. What focused your attention on optimizing with respect to p?
Mixed strategy is a standard concept, so here we are considering a set S of all (global) mixed strategies available for the game. When you are searching for the best strategy, you are maximizing the payoff over S. You are searching for the mixed strategy that gives the best payoff. What UDT tells is that you should just do that, even if you are considering what to do in a situation where some of the options have run out, and, as here, even if you have no idea where you are. “The best strategy” quite literally means
s∗=argmaxs∈SEU(s )
The only parameter for a given strategy is the probability of turning, so it’s natural to index the strategies by that probability. This indexing is a mapping t:[0,1]->S that places a mixed strategy in correspondence with a value of turning probability. Now, we can rewrite the expected utility maximization in terms of probability:
s∗=t(p∗ ,\%20p^*=\arg\max_{p\in%20[0,1]}%20EU(t(p)))
For a strategy corresponding to turning probability p, it’s easy to express corresponding expected utility:
EU(t(p )%20=%20(1-p)\cdot%200%20+%20p\cdot%20((1-p)\cdot%204%20+%20p\cdot%201))%20=p^2+4p(1-p))
We now can find the optimal strategy as
s∗=t(p∗ ,\%20p^*=\arg\max_{p\in%20[0,1]}(p^2+4p(1-p)))
Okay, that’s making more sense—the part where you get to parameterizing p as a real is what I was interested in.
But do you do the same thing when applying UDT to Newcomb’s problem? Do you consider it a necessary part of UDT that you take p (with 0<=p<=1) as a continuous parameter to maximize over, where p is the probability of one-boxing?
Fundamentally, this depends on the setting—you might not be given a random number generator (randomness is defined with respect to the game), and so the strategies that depend on a random value won’t be available in the set of strategies to choose from. In Newcomb’s problem, the usual setting is that you have to be fairly deterministic or Omega punishes you (so that a small probability of two-boxing may even be preferable to pure one-boxing, or not, depending on Omega’s strategy), or Omega may be placed so that your strategy is always deterministic for it (effectively, taking mixed strategies out of the set of allowed ones).
S() is suppose to be an implementation of UDT. By looking at the world program P, it should determine that among all possible input-output mappings, those that return “EXIT” for 1⁄3 of all inputs (doesn’t matter which ones) maximize average payoff. What made me express the payoff as a function of p is by stepping through what S is supposed to do as an implementation of UDT.
Does that make sense?
I’m still confused. Your response seems to just say, “I did it because it works.”—which is a great reason! But I want to know if UDT gave you more guidance than that.
Does UDT require that you look at the consequences of doing something p% of the time (irrespective of which ones), on all problems?
Basically, I’m in the position of that guy/gal that everyone here probably helped out in high school:
“How do you do the proof in problem 29?” “Oh, just used identities 3 and 5, solve for t, and plug it back into the original equation.” “But how did you know to do that?”
No, UDT (at least in my formulation) requires that you look at all possible input-output mappings, and choose the one that is optimal. In this case it so happens that any function that returns “EXIT” for 1⁄3 of inputs is optimal.