Newcomb II: Newer and Comb-ier
So, you’re walking along and a giant Greek letter appears. It snaps its serifs, and two boxes materialize before you with a puff of reality-defying smoke. The one on your left is clear as a blue sky and crammed full of cash. The one on your right is featureless and perfectly opaque.
“I’ve heard of you! You’re Omega. In the ’60s, you were flying around giving everyone money! I’ve thought about this and decided I would probably one-box. Although now that I’m here, a guaranteed pile of money sure looks nice...” You’re delighted by your good fortune, but indecision paralyzes you. You lean rightward, then glance back at the tempting transparent box.
“Whoa there, cowboy. If what you’ve just told me is true, your big payday in the right box is as empty as a politician’s promise.”
Your step halts. “Hold on. I thought this worked like...you stuff the right box with bills if, and only if, you predicted I’d pick that one alone.”
“I used to, sure. For a solid year, I was handing out cash like Halloween candy, about 3000 times a day. Got a good chunk of data, and then, yawn, it got boring.”
You frown. “Wait, what do you mean it got boring? Aren’t you a super-intelligence? Wasn’t it boringly predictable from the beginning? I heard you were right every time.”
“Well, predicting whether people will one-box or two-box is a no-brainer. Their emotional aftermath? Just as easy to guess. Lots of jolly one-boxers, plenty of grumpy two-boxers, and a sprinkle of joyous two-boxers. All too predictable. But boy, the mental gymnastics and philosophical essays people churned out? That was the real popcorn material. Some believed I was simply rewarding a form of irrationality, while others argued that whatever I rewarded was, by definition, rational..”
Your frown deepens. “Well, which is it? Is it more rational to one-box or to two-box?”
“”LOL, no idea, dude. I don’t really have a concept of ‘rational’ or ‘irrational’. I just play my games, watch the fallout, and enjoy the show. It’s like you watching ripples in a pond. And right now, my current kick is to present you with a choice. The right box? As empty as a deserted ghost town. But this time, I’ve tossed in a new twist: do anything different from what you’d do in the classic Newcomb problem, and poof, you’re gone, dead, an ex-parrot. After a year of this, I’ll have my happy two-boxers, and my not-so-happy one-boxers. But what will really make my millennia will be the unhappy two-boxers and the ecstatic one-boxers. Which one will you be?”
“I understand the happy two-boxers. They have lots of money, and those who were proponents of causal decision theory will feel vindicated. And the unhappy one-boxers, I get. They had committed to a certain course of action which they thought would be positive and proved to be a bad idea. But what’s up with the other groups?”
“Well, here’s the thing. I’m not going to end up killing anyone—it turns out that superintelligences which are also Greek letters are psuedo-aligned by default. (You can ponder whether that’s me fibbing about my death threat, or if everyone will just believe me and end up making the consistent choice). The unhappy two-boxers are an interesting bunch. They’ve convinced themselves that one-boxing was the pinnacle of rational thought and prided themselves on their rationality, only to discover their self-deception. They claimed one-boxing for social cred, but when the chips were down, they were always going to take both boxes. Now they’re filthy rich and facing a severe identity crisis! But the cream of the crop? The joyous one-boxers. They staked their lot on one-boxing, hoping for a big win, wealth, or something. And now, even though that very decision has backfired spectacularly (you haven’t counted, but that clear box is overflowing with riches) they still manage to persuade themselves they made the right call! Think about it—they essentially said, ‘in a situation where I could cause myself to have more utility, I precommit to not doing so’, and then, entirely predictably, ended up with less utility than they could have had. But they’re thrilled, because they think this is the true path to maximizing utility!” Omega roles through the air, laughing uproariously. “I don’t think you have a word for the hyperemotion I get out of this. Oh, wait—amusement. That’s probably the right word. Well? What will it be, champ?”
---
TL; DR: Why on earth would you decide now to one-box in Newcomb-likes? There is a vanishingly small chance of that happening to you, and it seems equally plausible a priori that you could get rewarded for the opposite commitment. (In fact, more plausible—the position of ‘I will make the decisions which I expect to cause my utility to increase the most’ seems likely to...cause your utility to increase more than other positions). Yes, if you hear about and believe there is an actual Omega doing this actual thing, go ahead and decide then (and not before), “If this particular thing which is happening to people does happen to me, I will one-box”, but your default should probably be two-boxing.
So if I understood this correctly, in this variant of Newcomb’s Problem (NP), which I’ll call Proxy Newcomb’s Problem (PNP), you get $1000 if you two-box in NP and also two-box in PNP, otherwise you get $0.
With UDT, you don’t need to precommit in advance, you just act according to a precommitment you should’ve made, from a state of knowledge that hasn’t updated on actuality of the current situation. The usual convention is to give the thought experiment (together with any relevant counterfactuals) a lot of probability, so that its implausibility doesn’t distract from the problem. This doesn’t simultaneously extend to other related thought experiments that are not part of this one.
More carefully, a thought experiment X (which could be PNP or NP) usually has a prior state of knowledge X1 where the players know the rules of the thought experiment and that it’s happening, but without yet specifying which of the possibilities within it take place. And also another possibly more narrow state of knowledge X2 that describes the specific situation within the thought experiment that’s taken as a point of view for its statement, what is being observed. To apply UDT to the thought experiment is to decide on a strategy from the state of knowledge X1, even as you currently occupy a different state of knowledge X2, and then enact the part of that strategy that pertains to the situation X2.
Here, we have two thought experiments, PNP and NP, but PNP references the strategy for NP. Usually, the UDT strategy for NP would be the strategy chosen from the state of knowledge NP1 (which is essentially the same as NP2, the distinction becomes important in cases like Transparent Newcomb’s Problem and for things like CDT). But in PNP, the UDT strategy is to be chosen in the state of knowledge PNP1, so it becomes unclear what “strategy in NP” means, because it’s unclear what state of knowledge the strategy in NP is to be chosen from. It can’t really be chosen from the state of knowledge PNP1, because then NP is not expected in reality. And in the prior state of knowledge where neither NP1 nor PNP1 are assumed, it becomes a competition between the tiny probabilities of NP and PNP. But if the strategy in NP is chosen from state of knowledge NP1, it’s not under control of the strategy for PNP chosen from state PNP1.
In other words, there doesn’t seem to be a way of communicating that PNP is happening and NP isn’t, to the hypothetical of NP happening, thus influencing what you do in counterfactual NP in order to do well in actual PNP. And absent that, you do in NP what you would do if it’s actual (from state of knowledge NP1), ignoring possibility of PNP (which NP1 doesn’t expect). If somehow knowledge of actuality of PNP is allowed to be added to NP1, and you control actions within NP from state of knowledge PNP1, then the correct strategy is obviously to two-box in both. But the problem statement is very confusing on this point.
Upon reflection, it was probably a mistake for me to write this phrased as a story/problem/thought experiment. I should probably have just written a shorter post titled something like “Newcomb’s problem provides no (interesting, non-trivial) evidence against using causal decision theory.” I had some fun writing this, though, and (mistakenly?) hoped that people would have fun reading it.
I think I disagree somewhat that “PNP references the strategy for NP”. I think many (most?) LW people have decided they are “the type of person who one-boxes in NP”, and believe that says something positive about them in their actual life. This post is an attempt to push back on that.
It seems from your comment that you think of “What I, Vladimir Nesov, would do in a thought experiment” as different from what you would actually do in real life. (eg, when you say “the problem statement is very confusing on this point.”). I think of both as being much more closely tied.
Possibly the confusion comes from the difference between what you-VN-would-actually-do and what you think is correct/optimal/rational behavior? Like, in a thought experiment, you don’t actually try to imagine or predict what real-you would do, you just wonder what optimal behavior/strategy is? In that case, I agree that this is a confusing problem statement.
The point of UDT as I understand it is that you should be the sort of person who predictably one-boxes in NP. This seems incorrect to me. I think if you are the sort of person who one-boxes in a surprise NP, you will have worse outcomes in general, and that if you have a surprise NP, you should two-box. If you know you will be confronted with NP tomorrow, then sure, you should decide to one-box ahead of time. But I think deciding now to “be the sort of person who would one-box in NP,” (or equivalently, deciding now to commit to a decision theory which will result in that) is a mistake.
Eliezer Yudkowsky and the whole UDT crowd seem to think that you should commit to a decision theory which seems like a bad one to me, on the basis that it would be rational to have precommitted if you end up in this situation. They seem to have convinced most LW people of this. I think they are wrong. I think CDT is a better decision theory which is more intuitive. I agree CDT gives a suboptimal outcome in surprise-NP, but I think any decision theory can give a good or bad outcome in corner-cases, along the lines of “You meet a superintelligent agent which will punish people who use (good decision theory) and reward those who use (bad decision theory).” Thus, NP shouldn’t count as a strike against CDT.
That was long and meandering… Can you succinctly explain a setup where two-boxers win?
How? Are you alluding to “Regret of Rationality” in https://www.lesswrong.com/posts/6ddcsdA2c2XpNpE5x/newcomb-s-problem-and-regret-of-rationality ?
Succinctly, if someone runs into an omega which says “I will give you $1,000,000 if you are someone who would have two-boxed in Newcomb. If you would have one-boxed, I will kill your family”, then the two-boxers have much better outcomes than the one-boxers. You may object that this seems silly and artificial. I think it is no more so than the original problem.
And yes—I think EY is very wrong in the post you link to, and this is a response to the consensus LW view that one-boxing is correct.
The post doesn’t seem to allow this possibility, it seems to say that the opaque box is empty. Relevant quote:
The intention was to portray the transparent box as having lots of money—call it $1,000,000.
Well, certainly in the setup you describe there is no reason to one-box. But that is not the Newcomb’s setup? So, you are solving a different problem, assuming it even needs solving.
Well, if you were confronted with Newcomb’s problem, would you one-box or two box? How fully do you endorse your answer as being “correct” or maximally rational, or anything along those lines?
I’m not trying to argue against anyone who says they aren’t sure, but they think they would one-box or two-box in some hypothetical, or anyone who has thought carefully about the possible existence of unknown unknowns and come down on the “I have no idea what’s optimal, but I’ve predetermined to do X for the sake of predictability” side for either X.
I am arguing against people who think that Newcomb’s problem means causal decision theory is wrong, and that they have a better alternative. I think Newcomb’s provides no (interesting, nontrivial) evidence against CDT.
I don’t get it. If I make an inconsistent choice, is this Omega going to kill me or not? Or is this uncertainty part of the problem?
I think in-story you believes that you will be killed if you make an inconsistent choice (or at least thinks there is a high enough chance that they do choose consistently).
The point of the post isn’t so much the specific set up, as it is an attempt to argue that Newcomb’s problem doesn’t provide any reason to be against causal decision theory.