What I was trying to say, then, is that I don’t understand why there’s any debate about the validity of a decision theory that gets this wrong. I’m surprised everyone doesn’t just go, “Oh, obviously any decision theory that says two-boxing is ‘rational’ is an invalid theory.”
I’m surprised that this is a point of debate. I’m surprised, so I’m wondering, what am I missing?
Did I manage to make my question clearer like that?
I can say that for me personally, the hard part—that I did not get past till reading about it here—was noticing that there is actually such a variable as “what decision theory to use”; using a naive CDT sort of thing simply seemed rational /a priori/. Insufficient grasp of the nameless virtue, you could say.
...no? I didn’t realize that the decision theory could be varied, that the obvious decision theory could be invalid, so I hit a point of confusion with little idea what to do about it.
But you’re not saying that you would ever have actually decided to two-box rather than take box B if you found yourself in that situation, are you?
I mean, you would always have decided, if you found yourself in that situation, that you were the kind of person Omega would have predicted to choose box B, right?
Ahah. So do you remember if you were confused in yourself, for reasons generated by your own brain, or just by your knowledge that some experts were saying two-boxing was the ‘rational’ strategy?
Okay, I DO expect to see lots of ‘people are crazy, the world is mad’ stuff, yeah, I just wouldn’t expect to see it on something like this from the kind of people who work on things like Causal Decision Theory! :P
So I guess what I really want to do first is CHECK which option is really most popular among such people: two-boxing, or predictably choosing box B?
Problem is, I’m not sure how to perform that check. Can anyone help me there?
It is fairly hard to perform such checks. We don’t have many situations which are analogous to Newcomb’s problem. We don’t have perfect predictors and most situations humans are in can be considered “iterated”. At least, we can consider most people to be using their ‘iterated’ reasoning by mistake when we put them in once off situations.
The closest analogy that we can get reliable answers out of is the ‘ultimatum game’ with high stakes… in which people really do refuse weeks worth of wages.
By the way, have you considered what you would do if the boxes were transparent? Just sitting there. Omega long gone and you can see piles of cash in front of you… It’s tricky. :)
Suppose my decision algorithm for the “both boxes are transparent” case is to take only box B if and only if it is empty, and to take both boxes if and only if box B has a million dollars in it. How does Omega respond? No matter how it handles box B, it’s implied prediction will be wrong.
Perhaps just as slippery, what if my algorithm is to take only box B if and only if it contains a million dollars, and to take both boxes if and only if box B is empty? In this case, anything Omega predicts will be accurate, so what prediction does it make?
Come to think of it, I could implement the second algorithm (and maybe the first) if a million dollars weighs enough compared to the boxes. Suppose my decision algorithm outputs: “Grab box B and test it’s weight, and maybe shake it a bit. If it clearly has a million dollars in it, take only box B. Otherwise, take both boxes.” If that’s my algorithm, then I don’t think the problem actually tells us what Omega predicts, and thus what outcome I’m getting.
The naive presentation of the transparent problem is circular, and for that reason ill defined (what you do depends on what’s in the boxes depends on omega’s prediction depends on what you do...). A plausible version of the transparent newcomb’s problem involves Omega:
Predicting what you’d do if you saw box B full (and never mind the case where box B is empty).
Predicting what you’d do if you saw box B empty (and never mind the case where box B is full).
Predicting what you’d do in both cases, and filling box B if and only if you’d one-box in both of them.
Or variations of those. There’s no circularity when he only makes such “conditional” predictions.
He could use the same algorithms in the non-transparent case, and they would reduce to the normal newcomb’s problem usually, but prevent you from doing any tricky business if you happen to bring an X-ray imager (or kitchen scales) and try to observe the state of box B.
Suppose my decision algorithm for the “both boxes are transparent” case is to take only box B if and only if it is empty, and to take both boxes if and only if box B has a million dollars in it. How does Omega respond? No matter how it handles box B, it’s implied prediction will be wrong.
Death by lightning.
I typically include such disclaimers such as the above in a footnote or more precisely targeted problem specification so as to avoid any avoid-the-question technicalities. The premise is not that Omega is an idiot or a sloppy game-designer.
Come to think of it, I could implement the second algorithm (and maybe the first) if a million dollars weighs enough compared to the boxes. Suppose my decision algorithm outputs: “Grab box B and test it’s weight, and maybe shake it a bit. If it clearly has a million dollars in it, take only box B. Otherwise, take both boxes.” If that’s my algorithm, then I don’t think the problem actually tells us what Omega predicts, and thus what outcome I’m getting.
You took box B. Putting it down again doesn’t help you. Finding ways to be cleverer than Omega is not a winning solution to Newcomblike problems.
Suppose my decision algorithm for the “both boxes are transparent” case is to take only box B if and only if it is empty, and to take both boxes if and only if box B has a million dollars in it. How does Omega respond? No matter how it handles box B, it’s implied prediction will be wrong.
Box B appears full of money; however, after you take both boxes, you find that the money in Box B is Monopoly money. The money in Box A remains genuine, however.
Box B appears empty, however, on opening it you find, written on the bottom of the box, the full details of a bank account opened by Omega, containing one million dollars, together with written permission for you to access said account.
In short, even with transparent boxes, there’s a number of ways for Omega to lie to you about the contents of Box B, and in this manner control your choice. If Omega is constrained to not lie about the contents of Box B, then it gets a bit trickier; Omega can still maintain an over 90% success rate by presenting the same choice to plenty of other people with an empty box B (since most people will likely take both boxes if they know B is empty).
Or, alternatively, Omega can decide to offer you the choice at a time when Omega predicts you won’t live long enough to make it.
Perhaps just as slippery, what if my algorithm is to take only box B if and only if it contains a million dollars, and to take both boxes if and only if box B is empty? In this case, anything Omega predicts will be accurate, so what prediction does it make?
That depends; instead of making a prediction here, Omega is controlling your choice. Whether you get the million dollars or not in this case depends on whether Omega wants you to have the million dollars or not, in furtherance of whatever other plans Omega is planning.
Omega doesn’t need to predict your choice; in the transparent-box case, Omega needs to predict your decision algorithm.
“The boxes are transparent” doesn’t literally mean “light waves pass through the boxes” given the description of the problem; it means “you can determine what’s inside the boxes without (and before) opening them”.
Responding by saying “maybe you can see into the boxes but you can’t tell if the money inside is fake” is being hyper-literal and ignoring what people really mean when they specify “suppose the boxes are transparent”.
In which case, if you are determined to show that Omega’s prediction is incorrect, and Omega can predict that determination, then the only way that Omega can avoid making an incorrect prediction is either to modify you in some manner (until you are no longer determined to make Omega’s prediction incorrect), or to deny you the chance to make the choice entirely.
For example, Omega might modify you by changing your circumstances; e.g. giving a deadly disease to someone close to you; which can be cured, but only at a total cost of all the money you are able to raise plus $1000. If Omega then offers the choice (with box B empty) most people would take both boxes, in order to be able to afford the cure.
Alternatively, given such a contrary precommitment, Omega may simply never offer you the choice at all; or might offer you the choice three seconds before you get struck by lightning.
“Omega puts money inside the boxes, you just never live to get it” is as outside the original problem as “the boxes are transparent, you just don’t understand what you’re seeing when you look in them” is outside the transparent problem. Just because the premise of the problem doesn’t explicitly say ”… and you get the contents of the boxes” doesn’t mean the paradox can be resolved by saying you don’t get the contents of the boxes—that’s being hyper-literal again. Likewise, just because the problem doesn’t say ”… and Omega can’t modify you to change your choice” doesn’t mean that the paradox can be resolved by saying that Omega can modify you.to change your choice—the problem is about decision theory, and Omega doesn’t have capabilities that are irrelevant to what the problem is about.
The problem, as stated, as far as I can tell gives Omega three options:
Fail to correctly predict what the person will choose
Refuse to participate
Cheat
It is likely that Omega will try to correctly predict what the person will choose; that is, Omega will strive to ignore the first option. If Omega offers the choice to this hypothetical person in the first place, then Omega is not taking the second option.
That leaves the third option; to cheat. I expect that this is the choice that Omega will be most likely to take; one of the easiest ways to do this is by ignoring the spirit of the constraints and taking the exact literal meaning. (Another way is to creatively misunderstand the spirit of the rules as given).
So I provided some suggestions with regard to how Omega might cheat; such as arranging that the decision is never made.
If you think that’s outside the problem, then I’m curious; what do you think Omega would do?
If you think that’s outside the problem, then I’m curious; what do you think Omega would do?
The point here is that the question is inconsistent. It is impossible for an Omega that can predict with high accuracy to exist, as you’ve correctly pointed out it leads to a situation where Omega must either fail to participate, refuse to participate or cheat, which are all out of bounds of the problem.
I don’t think it’s ever wise to ignore the possibility of a superintelligent AI cheating, in some manner.
If we ignore that possibility, then yes, the question would be inconsistent; which implies that if the situation were to actually appear to happen, then it would be quite likely that either:
The situation has been misunderstood; or
Someone is cheating
Since it is far easier for Omega, being an insane superintelligence, to cheat than it is for someone to cheat Omega, it seems likeliest that if anyone is cheating, then it is Omega.
After all, Omega had and did not take the option to refuse to participate.
I expect that this is the choice that Omega will be most likely to take; one of the easiest ways to do this is by ignoring the spirit of the constraints and taking the exact literal meaning.
The constraints aren’t constraints on Omega; the constraints are constraints on the reader—they tell the reader what he is supposed to use as the premises of the scenario. Omega cannot cheat unless the reader interprets the description of the problem to mean that Omega is willing to cheat. And if the reader does interpret it that way, it’s the reader, not Omega, who’s violating the spirit of the constraints and being hyper-literal.
what do you think Omega would do?
I think that depending on the human’s intentions, and assuming the human is a perfect reasoner, the conditions of the problem are contradictory. Omega can’t always predict the human—it’s logically impossible.
The premise is that Omega offers you the deal. If Omega’s predictions are always successful because it won’t offer the deal when it can’t predict the result, you can use me as Omega and I’d do as well as him—I just never offer the deal.
The (non-nitpicked version of the) transparent box case shows what’s wrong with the concept: Since your strategy might involve figuring out what Omega would have done, it may be in principle impossible for Omega to predict what you’re going to do, as Omega is indirectly trying to predict itself, leading to an undecideability paradox. The transparent boxes just make this simpler because you can “figure out” what Omega would have done by looking into the transparent boxes.
Of course, if you are not a perfect reasoner, it might be possible that Omega can always predict you, but then the question is no longer “which choice should I make”, it’s “which choice should I make within the limits of my imperfect reasoning”. And answering that requires formalizing exactly how your reasoning is limited, which is rather hard.
Thanks, but I meant not a check on what these CDT-studying-type people would DO if actually in that situation, but a check on whether they actually say that two-boxing would be the “rational” thing to do in that hypothetical situation.
I haven’t considered you transparency question, no. Does that mean Omega did exactly what he would have done if the boxes were opaque, except that they are in fact transparent (a fact that did not figure into the prediction)? Because in that case I’d just see the million in B, and the thousand in A, and of course take ’em both.
Otherwise, Omega should be able to predict as well as me that, if I knew the rules of this game were that, if I decided to predictably choose to take only box B and leave A alone, box B would contain a million, and both boxes are transparent (and this transparency is figured into the prediction), I would expect to see a million in box B, take it, and just walk away from the paltry thousand in A.
I conclude that the rational action for a player in the Newcomb Paradox is
taking both boxes, but that rational agents will usually take only one box
because they have rationally adopted the disposition to do so.″
They agree that agents who can self-modify will take one box. But they call that action “irrational”. So, the debate really boils down to the definition of the term “rational”—and is not really concerned with the decision that rational agents who can self-modifiy will actually take.
If my analysis here is correct, the dispute is really all about terminology.
Yup yup, you’re right, of course.
What I was trying to say, then, is that I don’t understand why there’s any debate about the validity of a decision theory that gets this wrong. I’m surprised everyone doesn’t just go, “Oh, obviously any decision theory that says two-boxing is ‘rational’ is an invalid theory.”
I’m surprised that this is a point of debate. I’m surprised, so I’m wondering, what am I missing?
Did I manage to make my question clearer like that?
I can say that for me personally, the hard part—that I did not get past till reading about it here—was noticing that there is actually such a variable as “what decision theory to use”; using a naive CDT sort of thing simply seemed rational /a priori/. Insufficient grasp of the nameless virtue, you could say.
Meaning you’re in the same boat as me? Confused as to why this ever became a point of debate in the first place?
...no? I didn’t realize that the decision theory could be varied, that the obvious decision theory could be invalid, so I hit a point of confusion with little idea what to do about it.
But you’re not saying that you would ever have actually decided to two-box rather than take box B if you found yourself in that situation, are you?
I mean, you would always have decided, if you found yourself in that situation, that you were the kind of person Omega would have predicted to choose box B, right?
I am still so majorly confused here. :P
I have no idea! IIRC I leaned towards one-boxing, but I was honestly confused about it.
Ahah. So do you remember if you were confused in yourself, for reasons generated by your own brain, or just by your knowledge that some experts were saying two-boxing was the ‘rational’ strategy?
It’s a good question. You aren’t missing anything. And “people are crazy, the world is mad” isn’t always sufficient. ;)
Ha! =]
Okay, I DO expect to see lots of ‘people are crazy, the world is mad’ stuff, yeah, I just wouldn’t expect to see it on something like this from the kind of people who work on things like Causal Decision Theory! :P
So I guess what I really want to do first is CHECK which option is really most popular among such people: two-boxing, or predictably choosing box B?
Problem is, I’m not sure how to perform that check. Can anyone help me there?
It is fairly hard to perform such checks. We don’t have many situations which are analogous to Newcomb’s problem. We don’t have perfect predictors and most situations humans are in can be considered “iterated”. At least, we can consider most people to be using their ‘iterated’ reasoning by mistake when we put them in once off situations.
The closest analogy that we can get reliable answers out of is the ‘ultimatum game’ with high stakes… in which people really do refuse weeks worth of wages.
By the way, have you considered what you would do if the boxes were transparent? Just sitting there. Omega long gone and you can see piles of cash in front of you… It’s tricky. :)
Suppose my decision algorithm for the “both boxes are transparent” case is to take only box B if and only if it is empty, and to take both boxes if and only if box B has a million dollars in it. How does Omega respond? No matter how it handles box B, it’s implied prediction will be wrong.
Perhaps just as slippery, what if my algorithm is to take only box B if and only if it contains a million dollars, and to take both boxes if and only if box B is empty? In this case, anything Omega predicts will be accurate, so what prediction does it make?
Come to think of it, I could implement the second algorithm (and maybe the first) if a million dollars weighs enough compared to the boxes. Suppose my decision algorithm outputs: “Grab box B and test it’s weight, and maybe shake it a bit. If it clearly has a million dollars in it, take only box B. Otherwise, take both boxes.” If that’s my algorithm, then I don’t think the problem actually tells us what Omega predicts, and thus what outcome I’m getting.
The naive presentation of the transparent problem is circular, and for that reason ill defined (what you do depends on what’s in the boxes depends on omega’s prediction depends on what you do...). A plausible version of the transparent newcomb’s problem involves Omega:
Predicting what you’d do if you saw box B full (and never mind the case where box B is empty).
Predicting what you’d do if you saw box B empty (and never mind the case where box B is full).
Predicting what you’d do in both cases, and filling box B if and only if you’d one-box in both of them.
Or variations of those. There’s no circularity when he only makes such “conditional” predictions.
He could use the same algorithms in the non-transparent case, and they would reduce to the normal newcomb’s problem usually, but prevent you from doing any tricky business if you happen to bring an X-ray imager (or kitchen scales) and try to observe the state of box B.
Death by lightning.
I typically include such disclaimers such as the above in a footnote or more precisely targeted problem specification so as to avoid any avoid-the-question technicalities. The premise is not that Omega is an idiot or a sloppy game-designer.
You took box B. Putting it down again doesn’t help you. Finding ways to be cleverer than Omega is not a winning solution to Newcomblike problems.
Box B appears full of money; however, after you take both boxes, you find that the money in Box B is Monopoly money. The money in Box A remains genuine, however.
Box B appears empty, however, on opening it you find, written on the bottom of the box, the full details of a bank account opened by Omega, containing one million dollars, together with written permission for you to access said account.
In short, even with transparent boxes, there’s a number of ways for Omega to lie to you about the contents of Box B, and in this manner control your choice. If Omega is constrained to not lie about the contents of Box B, then it gets a bit trickier; Omega can still maintain an over 90% success rate by presenting the same choice to plenty of other people with an empty box B (since most people will likely take both boxes if they know B is empty).
Or, alternatively, Omega can decide to offer you the choice at a time when Omega predicts you won’t live long enough to make it.
That depends; instead of making a prediction here, Omega is controlling your choice. Whether you get the million dollars or not in this case depends on whether Omega wants you to have the million dollars or not, in furtherance of whatever other plans Omega is planning.
Omega doesn’t need to predict your choice; in the transparent-box case, Omega needs to predict your decision algorithm.
“The boxes are transparent” doesn’t literally mean “light waves pass through the boxes” given the description of the problem; it means “you can determine what’s inside the boxes without (and before) opening them”.
Responding by saying “maybe you can see into the boxes but you can’t tell if the money inside is fake” is being hyper-literal and ignoring what people really mean when they specify “suppose the boxes are transparent”.
Fair enough. I am at times overly literal.
In which case, if you are determined to show that Omega’s prediction is incorrect, and Omega can predict that determination, then the only way that Omega can avoid making an incorrect prediction is either to modify you in some manner (until you are no longer determined to make Omega’s prediction incorrect), or to deny you the chance to make the choice entirely.
For example, Omega might modify you by changing your circumstances; e.g. giving a deadly disease to someone close to you; which can be cured, but only at a total cost of all the money you are able to raise plus $1000. If Omega then offers the choice (with box B empty) most people would take both boxes, in order to be able to afford the cure.
Alternatively, given such a contrary precommitment, Omega may simply never offer you the choice at all; or might offer you the choice three seconds before you get struck by lightning.
“Omega puts money inside the boxes, you just never live to get it” is as outside the original problem as “the boxes are transparent, you just don’t understand what you’re seeing when you look in them” is outside the transparent problem. Just because the premise of the problem doesn’t explicitly say ”… and you get the contents of the boxes” doesn’t mean the paradox can be resolved by saying you don’t get the contents of the boxes—that’s being hyper-literal again. Likewise, just because the problem doesn’t say ”… and Omega can’t modify you to change your choice” doesn’t mean that the paradox can be resolved by saying that Omega can modify you.to change your choice—the problem is about decision theory, and Omega doesn’t have capabilities that are irrelevant to what the problem is about.
The problem, as stated, as far as I can tell gives Omega three options:
Fail to correctly predict what the person will choose
Refuse to participate
Cheat
It is likely that Omega will try to correctly predict what the person will choose; that is, Omega will strive to ignore the first option. If Omega offers the choice to this hypothetical person in the first place, then Omega is not taking the second option.
That leaves the third option; to cheat. I expect that this is the choice that Omega will be most likely to take; one of the easiest ways to do this is by ignoring the spirit of the constraints and taking the exact literal meaning. (Another way is to creatively misunderstand the spirit of the rules as given).
So I provided some suggestions with regard to how Omega might cheat; such as arranging that the decision is never made.
If you think that’s outside the problem, then I’m curious; what do you think Omega would do?
The point here is that the question is inconsistent. It is impossible for an Omega that can predict with high accuracy to exist, as you’ve correctly pointed out it leads to a situation where Omega must either fail to participate, refuse to participate or cheat, which are all out of bounds of the problem.
I don’t think it’s ever wise to ignore the possibility of a superintelligent AI cheating, in some manner.
If we ignore that possibility, then yes, the question would be inconsistent; which implies that if the situation were to actually appear to happen, then it would be quite likely that either:
The situation has been misunderstood; or
Someone is cheating
Since it is far easier for Omega, being an insane superintelligence, to cheat than it is for someone to cheat Omega, it seems likeliest that if anyone is cheating, then it is Omega.
After all, Omega had and did not take the option to refuse to participate.
The constraints aren’t constraints on Omega; the constraints are constraints on the reader—they tell the reader what he is supposed to use as the premises of the scenario. Omega cannot cheat unless the reader interprets the description of the problem to mean that Omega is willing to cheat. And if the reader does interpret it that way, it’s the reader, not Omega, who’s violating the spirit of the constraints and being hyper-literal.
I think that depending on the human’s intentions, and assuming the human is a perfect reasoner, the conditions of the problem are contradictory. Omega can’t always predict the human—it’s logically impossible.
In the first case, Omega does not offer you the deal, and you receive $0, proving that it is possible to do worse than a two-boxer.
In the second case, you are placed into a superposition of taking one box and both boxes, receiving the appropriate reward in each.
In the third case, you are counted as ‘selecting’ both boxes, since it’s hard to convince Omega that grabbing a box doesn’t count as selecting it.
The premise is that Omega offers you the deal. If Omega’s predictions are always successful because it won’t offer the deal when it can’t predict the result, you can use me as Omega and I’d do as well as him—I just never offer the deal.
The (non-nitpicked version of the) transparent box case shows what’s wrong with the concept: Since your strategy might involve figuring out what Omega would have done, it may be in principle impossible for Omega to predict what you’re going to do, as Omega is indirectly trying to predict itself, leading to an undecideability paradox. The transparent boxes just make this simpler because you can “figure out” what Omega would have done by looking into the transparent boxes.
Of course, if you are not a perfect reasoner, it might be possible that Omega can always predict you, but then the question is no longer “which choice should I make”, it’s “which choice should I make within the limits of my imperfect reasoning”. And answering that requires formalizing exactly how your reasoning is limited, which is rather hard.
Thanks, but I meant not a check on what these CDT-studying-type people would DO if actually in that situation, but a check on whether they actually say that two-boxing would be the “rational” thing to do in that hypothetical situation.
I haven’t considered you transparency question, no. Does that mean Omega did exactly what he would have done if the boxes were opaque, except that they are in fact transparent (a fact that did not figure into the prediction)? Because in that case I’d just see the million in B, and the thousand in A, and of course take ’em both.
Otherwise, Omega should be able to predict as well as me that, if I knew the rules of this game were that, if I decided to predictably choose to take only box B and leave A alone, box B would contain a million, and both boxes are transparent (and this transparency is figured into the prediction), I would expect to see a million in box B, take it, and just walk away from the paltry thousand in A.
This make sense?
I think this is the position of classical theorists on self-modifiying agents:
From Rationality, Dispositions, and the Newcomb Paradox:
They agree that agents who can self-modify will take one box. But they call that action “irrational”. So, the debate really boils down to the definition of the term “rational”—and is not really concerned with the decision that rational agents who can self-modifiy will actually take.
If my analysis here is correct, the dispute is really all about terminology.