Newcomb’s Problem and Regret of Rationality
The following may well be the most controversial dilemma in the history of decision theory:
A superintelligence from another galaxy, whom we shall call Omega, comes to Earth and sets about playing a strange little game. In this game, Omega selects a human being, sets down two boxes in front of them, and flies away.
Box A is transparent and contains a thousand dollars.
Box B is opaque, and contains either a million dollars, or nothing.You can take both boxes, or take only box B.
And the twist is that Omega has put a million dollars in box B iff Omega has predicted that you will take only box B.
Omega has been correct on each of 100 observed occasions so far—everyone who took both boxes has found box B empty and received only a thousand dollars; everyone who took only box B has found B containing a million dollars. (We assume that box A vanishes in a puff of smoke if you take only box B; no one else can take box A afterward.)
Before you make your choice, Omega has flown off and moved on to its next game. Box B is already empty or already full.
Omega drops two boxes on the ground in front of you and flies off.
Do you take both boxes, or only box B?
And the standard philosophical conversation runs thusly:
One-boxer: “I take only box B, of course. I’d rather have a million than a thousand.”
Two-boxer: “Omega has already left. Either box B is already full or already empty. If box B is already empty, then taking both boxes nets me $1000, taking only box B nets me $0. If box B is already full, then taking both boxes nets $1,001,000, taking only box B nets $1,000,000. In either case I do better by taking both boxes, and worse by leaving a thousand dollars on the table—so I will be rational, and take both boxes.”
One-boxer: “If you’re so rational, why ain’cha rich?”
Two-boxer: “It’s not my fault Omega chooses to reward only people with irrational dispositions, but it’s already too late for me to do anything about that.”
There is a large literature on the topic of Newcomblike problems—especially if you consider the Prisoner’s Dilemma as a special case, which it is generally held to be. “Paradoxes of Rationality and Cooperation” is an edited volume that includes Newcomb’s original essay. For those who read only online material, this PhD thesis summarizes the major standard positions.
I’m not going to go into the whole literature, but the dominant consensus in modern decision theory is that one should two-box, and Omega is just rewarding agents with irrational dispositions. This dominant view goes by the name of “causal decision theory”.
As you know, the primary reason I’m blogging is that I am an incredibly slow writer when I try to work in any other format. So I’m not going to try to present my own analysis here. Way too long a story, even by my standards.
But it is agreed even among causal decision theorists that if you have the power to precommit yourself to take one box, in Newcomb’s Problem, then you should do so. If you can precommit yourself before Omega examines you; then you are directly causing box B to be filled.
Now in my field—which, in case you have forgotten, is self-modifying AI—this works out to saying that if you build an AI that two-boxes on Newcomb’s Problem, it will self-modify to one-box on Newcomb’s Problem, if the AI considers in advance that it might face such a situation. Agents with free access to their own source code have access to a cheap method of precommitment.
What if you expect that you might, in general, face a Newcomblike problem, without knowing the exact form of the problem? Then you would have to modify yourself into a sort of agent whose disposition was such that it would generally receive high rewards on Newcomblike problems.
But what does an agent with a disposition generally-well-suited to Newcomblike problems look like? Can this be formally specified?
Yes, but when I tried to write it up, I realized that I was starting to write a small book. And it wasn’t the most important book I had to write, so I shelved it. My slow writing speed really is the bane of my existence. The theory I worked out seems, to me, to have many nice properties besides being well-suited to Newcomblike problems. It would make a nice PhD thesis, if I could get someone to accept it as my PhD thesis. But that’s pretty much what it would take to make me unshelve the project. Otherwise I can’t justify the time expenditure, not at the speed I currently write books.
I say all this, because there’s a common attitude that “Verbal arguments for one-boxing are easy to come by, what’s hard is developing a good decision theory that one-boxes”—coherent math which one-boxes on Newcomb’s Problem without producing absurd results elsewhere. So I do understand that, and I did set out to develop such a theory, but my writing speed on big papers is so slow that I can’t publish it. Believe it or not, it’s true.
Nonetheless, I would like to present some of my motivations on Newcomb’s Problem—the reasons I felt impelled to seek a new theory—because they illustrate my source-attitudes toward rationality. Even if I can’t present the theory that these motivations motivate...
First, foremost, fundamentally, above all else:
Rational agents should WIN.
Don’t mistake me, and think that I’m talking about the Hollywood Rationality stereotype that rationalists should be selfish or shortsighted. If your utility function has a term in it for others, then win their happiness. If your utility function has a term in it for a million years hence, then win the eon.
But at any rate, WIN. Don’t lose reasonably, WIN.
Now there are defenders of causal decision theory who argue that the two-boxers are doing their best to win, and cannot help it if they have been cursed by a Predictor who favors irrationalists. I will talk about this defense in a moment. But first, I want to draw a distinction between causal decision theorists who believe that two-boxers are genuinely doing their best to win; versus someone who thinks that two-boxing is the reasonable or the rational thing to do, but that the reasonable move just happens to predictably lose, in this case. There are a lot of people out there who think that rationality predictably loses on various problems—that, too, is part of the Hollywood Rationality stereotype, that Kirk is predictably superior to Spock.
Next, let’s turn to the charge that Omega favors irrationalists. I can conceive of a superbeing who rewards only people born with a particular gene, regardless of their choices. I can conceive of a superbeing who rewards people whose brains inscribe the particular algorithm of “Describe your options in English and choose the last option when ordered alphabetically,” but who does not reward anyone who chooses the same option for a different reason. But Omega rewards people who choose to take only box B, regardless of which algorithm they use to arrive at this decision, and this is why I don’t buy the charge that Omega is rewarding the irrational. Omega doesn’t care whether or not you follow some particular ritual of cognition; Omega only cares about your predicted decision.
We can choose whatever reasoning algorithm we like, and will be rewarded or punished only according to that algorithm’s choices, with no other dependency—Omega just cares where we go, not how we got there.
It is precisely the notion that Nature does not care about our algorithm, which frees us up to pursue the winning Way—without attachment to any particular ritual of cognition, apart from our belief that it wins. Every rule is up for grabs, except the rule of winning.
As Miyamoto Musashi said—it’s really worth repeating:
“You can win with a long weapon, and yet you can also win with a short weapon. In short, the Way of the Ichi school is the spirit of winning, whatever the weapon and whatever its size.”
(Another example: It was argued by McGee that we must adopt bounded utility functions or be subject to “Dutch books” over infinite times. But: The utility function is not up for grabs. I love life without limit or upper bound: There is no finite amount of life lived N where I would prefer a 80.0001% probability of living N years to an 0.0001% chance of living a googolplex years and an 80% chance of living forever. This is a sufficient condition to imply that my utility function is unbounded. So I just have to figure out how to optimize for that morality. You can’t tell me, first, that above all I must conform to a particular ritual of cognition, and then that, if I conform to that ritual, I must change my morality to avoid being Dutch-booked. Toss out the losing ritual; don’t change the definition of winning. That’s like deciding to prefer $1000 to $1,000,000 so that Newcomb’s Problem doesn’t make your preferred ritual of cognition look bad.)
“But,” says the causal decision theorist, “to take only one box, you must somehow believe that your choice can affect whether box B is empty or full—and that’s unreasonable! Omega has already left! It’s physically impossible!”
Unreasonable? I am a rationalist: what do I care about being unreasonable? I don’t have to conform to a particular ritual of cognition. I don’t have to take only box B because I believe my choice affects the box, even though Omega has already left. I can just… take only box B.
I do have a proposed alternative ritual of cognition which computes this decision, which this margin is too small to contain; but I shouldn’t need to show this to you. The point is not to have an elegant theory of winning—the point is to win; elegance is a side effect.
Or to look at it another way: Rather than starting with a concept of what is the reasonable decision, and then asking whether “reasonable” agents leave with a lot of money, start by looking at the agents who leave with a lot of money, develop a theory of which agents tend to leave with the most money, and from this theory, try to figure out what is “reasonable”. “Reasonable” may just refer to decisions in conformance with our current ritual of cognition—what else would determine whether something seems “reasonable” or not?
From James Joyce (no relation), Foundations of Causal Decision Theory:
Rachel has a perfectly good answer to the “Why ain’t you rich?” question. “I am not rich,” she will say, “because I am not the kind of person the psychologist thinks will refuse the money. I’m just not like you, Irene. Given that I know that I am the type who takes the money, and given that the psychologist knows that I am this type, it was reasonable of me to think that the $1,000,000 was not in my account. The $1,000 was the most I was going to get no matter what I did. So the only reasonable thing for me to do was to take it.”
Irene may want to press the point here by asking, “But don’t you wish you were like me, Rachel? Don’t you wish that you were the refusing type?” There is a tendency to think that Rachel, a committed causal decision theorist, must answer this question in the negative, which seems obviously wrong (given that being like Irene would have made her rich). This is not the case. Rachel can and should admit that she does wish she were more like Irene. “It would have been better for me,” she might concede, “had I been the refusing type.” At this point Irene will exclaim, “You’ve admitted it! It wasn’t so smart to take the money after all.” Unfortunately for Irene, her conclusion does not follow from Rachel’s premise. Rachel will patiently explain that wishing to be a refuser in a Newcomb problem is not inconsistent with thinking that one should take the $1,000 whatever type one is. When Rachel wishes she was Irene’s type she is wishing for Irene’s options, not sanctioning her choice.
It is, I would say, a general principle of rationality—indeed, part of how I define rationality—that you never end up envying someone else’s mere choices. You might envy someone their genes, if Omega rewards genes, or if the genes give you a generally happier disposition. But Rachel, above, envies Irene her choice, and only her choice, irrespective of what algorithm Irene used to make it. Rachel wishes just that she had a disposition to choose differently.
You shouldn’t claim to be more rational than someone and simultaneously envy them their choice—only their choice. Just do the act you envy.
I keep trying to say that rationality is the winning-Way, but causal decision theorists insist that taking both boxes is what really wins, because you can’t possibly do better by leaving $1000 on the table… even though the single-boxers leave the experiment with more money. Be careful of this sort of argument, any time you find yourself defining the “winner” as someone other than the agent who is currently smiling from on top of a giant heap of utility.
Yes, there are various thought experiments in which some agents start out with an advantage—but if the task is to, say, decide whether to jump off a cliff, you want to be careful not to define cliff-refraining agents as having an unfair prior advantage over cliff-jumping agents, by virtue of their unfair refusal to jump off cliffs. At this point you have covertly redefined “winning” as conformance to a particular ritual of cognition. Pay attention to the money!
Or here’s another way of looking at it: Faced with Newcomb’s Problem, would you want to look really hard for a reason to believe that it was perfectly reasonable and rational to take only box B; because, if such a line of argument existed, you would take only box B and find it full of money? Would you spend an extra hour thinking it through, if you were confident that, at the end of the hour, you would be able to convince yourself that box B was the rational choice? This too is a rather odd position to be in. Ordinarily, the work of rationality goes into figuring out which choice is the best—not finding a reason to believe that a particular choice is the best.
Maybe it’s too easy to say that you “ought to” two-box on Newcomb’s Problem, that this is the “reasonable” thing to do, so long as the money isn’t actually in front of you. Maybe you’re just numb to philosophical dilemmas, at this point. What if your daughter had a 90% fatal disease, and box A contained a serum with a 20% chance of curing her, and box B might contain a serum with a 95% chance of curing her? What if there was an asteroid rushing toward Earth, and box A contained an asteroid deflector that worked 10% of the time, and box B might contain an asteroid deflector that worked 100% of the time?
Would you, at that point, find yourself tempted to make an unreasonable choice?
If the stake in box B was something you could not leave behind? Something overwhelmingly more important to you than being reasonable? If you absolutely had to win—really win, not just be defined as winning?
Would you wish with all your power that the “reasonable” decision was to take only box B?
Then maybe it’s time to update your definition of reasonableness.
Alleged rationalists should not find themselves envying the mere decisions of alleged nonrationalists, because your decision can be whatever you like. When you find yourself in a position like this, you shouldn’t chide the other person for failing to conform to your concepts of reasonableness. You should realize you got the Way wrong.
So, too, if you ever find yourself keeping separate track of the “reasonable” belief, versus the belief that seems likely to be actually true. Either you have misunderstood reasonableness, or your second intuition is just wrong.
Now one can’t simultaneously define “rationality” as the winning Way, and define “rationality” as Bayesian probability theory and decision theory. But it is the argument that I am putting forth, and the moral of my advice to Trust In Bayes, that the laws governing winning have indeed proven to be math. If it ever turns out that Bayes fails—receives systematically lower rewards on some problem, relative to a superior alternative, in virtue of its mere decisions—then Bayes has to go out the window. “Rationality” is just the label I use for my beliefs about the winning Way—the Way of the agent smiling from on top of the giant heap of utility. Currently, that label refers to Bayescraft.
I realize that this is not a knockdown criticism of causal decision theory—that would take the actual book and/or PhD thesis—but I hope it illustrates some of my underlying attitude toward this notion of “rationality”.
You shouldn’t find yourself distinguishing the winning choice from the reasonable choice. Nor should you find yourself distinguishing the reasonable belief from the belief that is most likely to be true.
That is why I use the word “rational” to denote my beliefs about accuracy and winning—not to denote verbal reasoning, or strategies which yield certain success, or that which is logically provable, or that which is publicly demonstrable, or that which is reasonable.
As Miyamoto Musashi said:
“The primary thing when you take a sword in your hands is your intention to cut the enemy, whatever the means. Whenever you parry, hit, spring, strike or touch the enemy’s cutting sword, you must cut the enemy in the same movement. It is essential to attain this. If you think only of hitting, springing, striking or touching the enemy, you will not be able actually to cut him.”
- Simulators by 2 Sep 2022 12:45 UTC; 613 points) (
- Thoughts on the Singularity Institute (SI) by 11 May 2012 4:31 UTC; 329 points) (
- Why Our Kind Can’t Cooperate by 20 Mar 2009 8:37 UTC; 292 points) (
- Eliezer’s Sequences and Mainstream Academia by 15 Sep 2012 0:32 UTC; 243 points) (
- The True Prisoner’s Dilemma by 3 Sep 2008 21:34 UTC; 232 points) (
- Thomas Kwa’s MIRI research experience by 2 Oct 2023 16:42 UTC; 171 points) (
- References & Resources for LessWrong by 10 Oct 2010 14:54 UTC; 167 points) (
- Zombies! Zombies? by 4 Apr 2008 9:55 UTC; 119 points) (
- Stop talking about p(doom) by 1 Jan 2024 10:57 UTC; 115 points) (EA Forum;
- Decision Theories: A Less Wrong Primer by 13 Mar 2012 23:31 UTC; 110 points) (
- Bayesians vs. Barbarians by 14 Apr 2009 23:45 UTC; 103 points) (
- The Sin of Underconfidence by 20 Apr 2009 6:30 UTC; 103 points) (
- Rationality is Systematized Winning by 3 Apr 2009 14:41 UTC; 98 points) (
- Unnatural Categories Are Optimized for Deception by 8 Jan 2021 20:54 UTC; 89 points) (
- Replace the Symbol with the Substance by 16 Feb 2008 18:12 UTC; 88 points) (
- Your Price for Joining by 26 Mar 2009 7:16 UTC; 87 points) (
- What a reduction of “could” could look like by 12 Aug 2010 17:41 UTC; 84 points) (
- Counterfactual Mugging by 19 Mar 2009 6:08 UTC; 80 points) (
- The Moral Void by 30 Jun 2008 8:52 UTC; 78 points) (
- MIRI Research Guide by 7 Nov 2014 19:11 UTC; 73 points) (
- High Challenge by 19 Dec 2008 0:51 UTC; 71 points) (
- Reply to Holden on The Singularity Institute by 10 Jul 2012 23:20 UTC; 69 points) (
- Einstein’s Speed by 21 May 2008 2:48 UTC; 69 points) (
- Deflationism isn’t the solution to philosophy’s woes by 10 Mar 2021 0:20 UTC; 62 points) (
- My Kind of Reflection by 10 Jul 2008 7:21 UTC; 61 points) (
- Science Isn’t Strict Enough by 16 May 2008 6:51 UTC; 60 points) (
- You’re in Newcomb’s Box by 5 Feb 2011 20:46 UTC; 59 points) (
- About Less Wrong by 23 Feb 2009 23:30 UTC; 57 points) (
- Timeless Decision Theory: Problems I Can’t Solve by 20 Jul 2009 0:02 UTC; 57 points) (
- Free to Optimize by 2 Jan 2009 1:41 UTC; 56 points) (
- Harmful Options by 25 Dec 2008 2:26 UTC; 54 points) (
- Timeless Physics by 27 May 2008 9:09 UTC; 54 points) (
- Cooperating with aliens and AGIs: An ECL explainer by 24 Feb 2024 22:58 UTC; 53 points) (EA Forum;
- You May Already Be A Sinner by 9 Mar 2009 23:18 UTC; 52 points) (
- Ingredients of Timeless Decision Theory by 19 Aug 2009 1:10 UTC; 52 points) (
- Newcomb’s problem happened to me by 26 Mar 2010 18:31 UTC; 51 points) (
- Many Worlds, One Best Guess by 11 May 2008 8:32 UTC; 51 points) (
- Cooperating with aliens and AGIs: An ECL explainer by 24 Feb 2024 22:58 UTC; 51 points) (
- Failures of an embodied AIXI by 15 Jun 2014 18:29 UTC; 50 points) (
- ACDT: a hack-y acausal decision theory by 15 Jan 2020 17:22 UTC; 50 points) (
- Knightian Uncertainty and Ambiguity Aversion: Motivation by 21 Jul 2014 20:32 UTC; 48 points) (
- Predictors exist: CDT going bonkers… forever by 14 Jan 2020 16:19 UTC; 46 points) (
- Realism and Rationality by 16 Sep 2019 3:09 UTC; 45 points) (
- Prices or Bindings? by 21 Oct 2008 16:00 UTC; 44 points) (
- 23 Jul 2019 7:21 UTC; 43 points) 's comment on Appeal to Consequence, Value Tensions, And Robust Organizations by (
- Evidential Cooperation in Large Worlds: Potential Objections & FAQ by 28 Feb 2024 18:58 UTC; 42 points) (
- Stop talking about p(doom) by 1 Jan 2024 10:57 UTC; 39 points) (
- A Suggested Reading Order for Less Wrong [2011] by 8 Jul 2011 1:40 UTC; 38 points) (
- $1000 USD prize—Circular Dependency of Counterfactuals by 1 Jan 2022 9:43 UTC; 37 points) (
- Welcome to Less Wrong! (5th thread, March 2013) by 1 Apr 2013 16:19 UTC; 37 points) (
- Evidential Cooperation in Large Worlds: Potential Objections & FAQ by 28 Feb 2024 18:58 UTC; 36 points) (EA Forum;
- VNM expected utility theory: uses, abuses, and interpretation by 17 Apr 2010 20:23 UTC; 36 points) (
- Why Quantum? by 4 Jun 2008 5:34 UTC; 36 points) (
- Controlling Constant Programs by 5 Sep 2010 13:45 UTC; 35 points) (
- Recommended Reading for Friendly AI Research by 9 Oct 2010 13:46 UTC; 35 points) (
- Compressing Reality to Math by 15 Dec 2011 0:07 UTC; 34 points) (
- Degrees of Radical Honesty by 31 Mar 2009 20:36 UTC; 34 points) (
- The continued misuse of the Prisoner’s Dilemma by 23 Oct 2009 3:48 UTC; 34 points) (
- The Case for Promoting / Creating Public Goods Markets as a Cause Area by 24 Oct 2020 5:23 UTC; 33 points) (EA Forum;
- Can Counterfactuals Be True? by 24 Jul 2008 4:40 UTC; 33 points) (
- Welcome to Less Wrong! (2012) by 26 Dec 2011 22:57 UTC; 31 points) (
- Welcome to Less Wrong! (July 2012) by 18 Jul 2012 17:24 UTC; 31 points) (
- The Truly Iterated Prisoner’s Dilemma by 4 Sep 2008 18:00 UTC; 31 points) (
- Wanting to Want by 16 May 2009 3:08 UTC; 30 points) (
- Rationality and Winning by 4 May 2012 18:31 UTC; 30 points) (
- Welcome to Less Wrong! (6th thread, July 2013) by 26 Jul 2013 2:35 UTC; 30 points) (
- 8 Oct 2020 4:09 UTC; 29 points) 's comment on Open Communication in the Days of Malicious Online Actors by (
- Timeless physics breaks T-Rex’s mind [LINK] by 23 Apr 2012 19:16 UTC; 28 points) (
- 31 Mar 2014 15:51 UTC; 25 points) 's comment on Explanations for Less Wrong articles that you didn’t understand by (
- The Prediction Problem: A Variant on Newcomb’s by 4 Jul 2018 7:40 UTC; 25 points) (
- Timelessness as a Conservative Extension of Causal Decision Theory by 28 May 2014 14:57 UTC; 25 points) (
- Decoherent Essences by 30 Apr 2008 6:32 UTC; 24 points) (
- Terminal Bias by 30 Jan 2012 21:03 UTC; 24 points) (
- 16 Jun 2009 20:02 UTC; 23 points) 's comment on Rationalists lose when others choose by (
- Formalizing Newcomb’s by 5 Apr 2009 15:39 UTC; 22 points) (
- Welcome to Less Wrong! (7th thread, December 2014) by 15 Dec 2014 2:57 UTC; 21 points) (
- 21 Nov 2019 22:41 UTC; 20 points) 's comment on I’m Buck Shlegeris, I do research and outreach at MIRI, AMA by (EA Forum;
- Ethics Notes by 21 Oct 2008 21:57 UTC; 20 points) (
- 13 Jun 2011 16:09 UTC; 20 points) 's comment on Rewriting the sequences? by (
- Welcome to Less Wrong! (8th thread, July 2015) by 22 Jul 2015 16:49 UTC; 19 points) (
- Holden Karnofsky’s Singularity Institute critique: Is SI the kind of organization we want to bet on? by 11 May 2012 7:25 UTC; 19 points) (
- 18 Dec 2014 5:02 UTC; 18 points) 's comment on Rationality Quotes December 2014 by (
- Help us Optimize the Contents of the Sequences eBook by 19 Sep 2013 4:31 UTC; 18 points) (
- Desirable Dispositions and Rational Actions by 17 Aug 2010 3:20 UTC; 18 points) (
- Today’s Inspirational Tale by 4 Nov 2008 16:15 UTC; 17 points) (
- Rational Agents Cooperate in the Prisoner’s Dilemma by 2 Sep 2023 6:15 UTC; 17 points) (
- Acausal Trade and the Ultimatum Game by 5 Sep 2021 5:36 UTC; 17 points) (
- Newcomb’s problem is just a standard time consistency problem by 31 Mar 2022 17:32 UTC; 15 points) (
- 26 Jan 2018 22:13 UTC; 15 points) 's comment on What are the Best Hammers in the Rationalist Community? by (
- Welcome to LessWrong (10th Thread, January 2017) (Thread A) by 7 Jan 2017 5:43 UTC; 15 points) (
- Fixedness From Frailty by 14 Nov 2010 21:51 UTC; 14 points) (
- UDT agents as deontologists by 10 Jun 2010 5:01 UTC; 14 points) (
- The Creating Bob the Jerk problem. Is it a real problem in decision theory? by 12 Jun 2012 21:36 UTC; 14 points) (
- 4 Mar 2009 19:16 UTC; 12 points) 's comment on The Costs of Rationality by (
- My Fundamental Question About Omega by 10 Feb 2010 17:26 UTC; 11 points) (
- Superintelligence 25: Components list for acquiring values by 3 Mar 2015 2:01 UTC; 11 points) (
- Newcomb’s Problem: A problem for Causal Decision Theories by 16 Aug 2010 11:25 UTC; 11 points) (
- Simulation theology: practical aspect. by 5 May 2021 2:20 UTC; 11 points) (
- Welcome to LessWrong (January 2016) by 13 Jan 2016 21:34 UTC; 11 points) (
- 16 Jun 2014 8:13 UTC; 11 points) 's comment on List a few posts in Main and/or Discussion which actually made you change your mind by (
- 23 Oct 2009 13:47 UTC; 11 points) 's comment on The continued misuse of the Prisoner’s Dilemma by (
- 20 Mar 2009 19:07 UTC; 10 points) 's comment on Why Our Kind Can’t Cooperate by (
- 25 Nov 2014 6:21 UTC; 10 points) 's comment on Breaking the vicious cycle by (
- 21 Oct 2008 20:41 UTC; 10 points) 's comment on Prices or Bindings? by (
- The Psychology Of Resolute Agents by 20 Jul 2018 5:42 UTC; 10 points) (
- 9 Jun 2011 21:14 UTC; 10 points) 's comment on Safety Culture and the Marginal Effect of a Dollar by (
- Oracle predictions don’t apply to non-existent worlds by 15 Sep 2021 9:44 UTC; 10 points) (
- 23 Jul 2010 1:19 UTC; 9 points) 's comment on Your Strength as a Rationalist by (
- 11 May 2011 22:24 UTC; 9 points) 's comment on Econ/Game theory question by (
- Rationality Compendium: Principle 1 - A rational agent, given its capabilities and the situation it is in, is one that thinks and acts optimally by 23 Aug 2015 8:01 UTC; 9 points) (
- Welcome to Less Wrong! (9th thread, May 2016) by 17 May 2016 8:26 UTC; 9 points) (
- 28 Jul 2009 9:40 UTC; 8 points) 's comment on The Trolley Problem in popular culture: Torchwood Series 3 by (
- A Hill of Validity in Defense of Meaning by 15 Jul 2023 17:57 UTC; 8 points) (
- How to better understand and participate on LW by 8 Oct 2010 16:11 UTC; 8 points) (
- “Rational Agents Win” by 23 Sep 2021 7:59 UTC; 8 points) (
- [SEQ RERUN] Newcomb’s Problem and Regret of Rationality by 3 Jan 2012 6:08 UTC; 7 points) (
- 15 Feb 2012 11:13 UTC; 7 points) 's comment on “The Book Of Mormon” or Belief In Belief, The Musical by (
- 4 Sep 2008 19:58 UTC; 7 points) 's comment on The True Prisoner’s Dilemma by (
- Newcomb’s Problem standard positions by 6 Apr 2009 17:05 UTC; 7 points) (
- Rationality Reading Group: Part W: Quantified Humanism by 24 Mar 2016 3:48 UTC; 7 points) (
- Why 1-boxing doesn’t imply backwards causation by 25 Mar 2021 2:32 UTC; 7 points) (
- Counterfactual Mugging: Why should you pay? by 17 Dec 2019 22:16 UTC; 7 points) (
- 3 Sep 2011 7:36 UTC; 7 points) 's comment on Consequentialism Need Not Be Nearsighted by (
- 12 Jun 2017 8:49 UTC; 7 points) 's comment on We are the Athenians, not the Spartans by (
- 25 Sep 2014 19:16 UTC; 7 points) 's comment on Simulation argument meets decision theory by (
- Sleeping Beauty gets counterfactually mugged by 26 Mar 2009 11:44 UTC; 6 points) (
- 7 Sep 2013 19:29 UTC; 6 points) 's comment on The Up-Goer Five Game: Explaining hard ideas with simple words by (
- 10 Jan 2011 18:21 UTC; 6 points) 's comment on Deontological Decision Theory and The Solution to Morality by (
- 8 Apr 2009 6:43 UTC; 6 points) 's comment on Whining-Based Communities by (
- 27 Nov 2019 2:07 UTC; 5 points) 's comment on I’m Buck Shlegeris, I do research and outreach at MIRI, AMA by (EA Forum;
- 9 Sep 2011 6:00 UTC; 5 points) 's comment on [Question] What’s your Elevator Pitch For Rationality? by (
- 14 Dec 2011 1:24 UTC; 5 points) 's comment on How to Not Lose an Argument by (
- UDT might not pay a Counterfactual Mugger by 21 Nov 2020 23:27 UTC; 5 points) (
- 20 Aug 2010 20:16 UTC; 5 points) 's comment on The Importance of Self-Doubt by (
- 6 May 2014 12:45 UTC; 5 points) 's comment on Open Thread, May 5 − 11, 2014 by (
- 8 Aug 2009 23:25 UTC; 5 points) 's comment on Exterminating life is rational by (
- 13 Jul 2023 19:54 UTC; 5 points) 's comment on Newcomb II: Newer and Comb-ier by (
- 19 Aug 2010 12:37 UTC; 5 points) 's comment on How can we compare decision theories? by (
- 7 Sep 2011 14:09 UTC; 5 points) 's comment on Open Thread: September 2011 by (
- 28 Feb 2009 2:19 UTC; 5 points) 's comment on The Most Important Thing You Learned by (
- 17 Aug 2010 6:54 UTC; 5 points) 's comment on Desirable Dispositions and Rational Actions by (
- 8 May 2020 15:29 UTC; 4 points) 's comment on I’m Buck Shlegeris, I do research and outreach at MIRI, AMA by (EA Forum;
- LDT (and everything else) can be irrational by 6 Nov 2024 4:05 UTC; 4 points) (
- 19 Nov 2014 20:55 UTC; 4 points) 's comment on Conceptual Analysis and Moral Theory by (
- 25 Jul 2023 14:03 UTC; 4 points) 's comment on Rationality !== Winning by (
- 8 Jul 2012 19:19 UTC; 4 points) 's comment on Rationality Quotes July 2012 by (
- 10 Mar 2021 3:07 UTC; 4 points) 's comment on Deflationism isn’t the solution to philosophy’s woes by (
- [Link] Better results by changing Bayes’ theorem by 9 Mar 2012 19:38 UTC; 4 points) (
- 1 Apr 2009 0:35 UTC; 4 points) 's comment on Degrees of Radical Honesty by (
- 24 Jul 2009 20:33 UTC; 4 points) 's comment on The Nature of Offense by (
- 5 Mar 2009 15:33 UTC; 4 points) 's comment on Belief in Self-Deception by (
- 27 Nov 2019 5:22 UTC; 3 points) 's comment on I’m Buck Shlegeris, I do research and outreach at MIRI, AMA by (EA Forum;
- 19 Jan 2010 16:47 UTC; 3 points) 's comment on What big goals do we have? by (
- 5 Apr 2009 3:36 UTC; 3 points) 's comment on Rationality is Systematized Winning by (
- 18 May 2008 18:59 UTC; 3 points) 's comment on Changing the Definition of Science by (
- 7 Jul 2011 15:57 UTC; 3 points) 's comment on [fic idea] Rationalist Gurren Lagann? by (
- 23 Jan 2010 23:05 UTC; 3 points) 's comment on Raising the Sanity Waterline by (
- 29 Aug 2012 9:58 UTC; 3 points) 's comment on 11 minute TED talk is about instrumental rationality by (
- 19 Dec 2012 22:24 UTC; 3 points) 's comment on Timeless Decision Theory: Problems I Can’t Solve by (
- 16 May 2012 14:45 UTC; 3 points) 's comment on Open Thread, May 16-31, 2012 by (
- 26 Oct 2010 4:39 UTC; 3 points) 's comment on Luminosity (Twilight fanfic) discussion thread by (
- 11 Jul 2010 23:43 UTC; 3 points) 's comment on Assuming Nails by (
- 26 Apr 2009 19:07 UTC; 3 points) 's comment on Where’s Your Sense of Mystery? by (
- 1 Aug 2013 16:48 UTC; 3 points) 's comment on Arguments Against Speciesism by (
- 2 Jul 2013 22:15 UTC; 3 points) 's comment on Rationality Quotes July 2013 by (
- 22 Apr 2010 21:22 UTC; 3 points) 's comment on Attention Lurkers: Please say hi by (
- 29 Aug 2023 18:22 UTC; 2 points) 's comment on Eliezer Yudkowsky Is Frequently, Confidently, Egregiously Wrong by (EA Forum;
- 26 Apr 2009 0:15 UTC; 2 points) 's comment on “Self-pretending” is not as useful as we think by (
- 16 Dec 2010 20:54 UTC; 2 points) 's comment on What do you mean by rationalism? by (
- 19 Dec 2010 1:19 UTC; 2 points) 's comment on How Pascal’s Wager Saved My Soul by (
- Welcome to Less Wrong! (11th thread, January 2017) (Thread B) by 16 Jan 2017 22:25 UTC; 2 points) (
- 24 Mar 2012 18:17 UTC; 2 points) 's comment on Harry Potter and the Methods of Rationality discussion thread, part 11 by (
- 14 Nov 2021 0:58 UTC; 2 points) 's comment on Is Functional Decision Theory still an active area of research? by (
- 28 May 2008 21:05 UTC; 2 points) 's comment on Timeless Beauty by (
- 15 Apr 2009 7:27 UTC; 2 points) 's comment on Bayesians vs. Barbarians by (
- 25 Jul 2013 3:19 UTC; 2 points) 's comment on Making Rationality General-Interest by (
- 13 Nov 2021 7:27 UTC; 2 points) 's comment on A Defense of Functional Decision Theory by (
- Newcomb’s problem happened to me by 25 Mar 2010 20:53 UTC; 2 points) (
- 3 Oct 2010 12:59 UTC; 2 points) 's comment on Politics as Charity by (
- The Expected Value Approach to Newcomb’s Problem by 5 Aug 2011 7:09 UTC; 2 points) (
- 15 Mar 2023 15:59 UTC; 2 points) 's comment on Contra Common Knowledge by (
- 30 Jun 2013 6:22 UTC; 2 points) 's comment on Why one-box? by (
- Conditional offers and low priors: the problem with 1-boxing Newcomb’s dilemma by 18 Jun 2021 21:50 UTC; 2 points) (
- 27 Jul 2019 20:04 UTC; 2 points) 's comment on Commentary On “The Abolition of Man” by (
- 27 Jul 2019 20:28 UTC; 2 points) 's comment on Commentary On “The Abolition of Man” by (
- 27 Jun 2012 8:56 UTC; 2 points) 's comment on A (small) critique of total utilitarianism by (
- 11 Nov 2013 19:20 UTC; 2 points) 's comment on Rationality Quotes November 2013 by (
- 21 Jul 2023 5:26 UTC; 2 points) 's comment on ProgramCrafter’s Shortform by (
- 14 Sep 2019 0:45 UTC; 2 points) 's comment on A Critique of Functional Decision Theory by (
- 19 Nov 2014 14:00 UTC; 1 point) 's comment on Conceptual Analysis and Moral Theory by (
- 9 Feb 2010 13:08 UTC; 1 point) 's comment on Open Thread: February 2010 by (
- 7 Aug 2011 13:31 UTC; 1 point) 's comment on Beware of Other-Optimizing by (
- 16 Mar 2009 14:24 UTC; 1 point) 's comment on Taboo “rationality,” please. by (
- 6 Jan 2022 12:37 UTC; 1 point) 's comment on $1000 USD prize—Circular Dependency of Counterfactuals by (
- 4 Sep 2008 12:51 UTC; 1 point) 's comment on The True Prisoner’s Dilemma by (
- 2 Apr 2012 13:37 UTC; 1 point) 's comment on Welcome to Less Wrong! (2012) by (
- 7 Aug 2009 21:52 UTC; 1 point) 's comment on Exterminating life is rational by (
- 23 Jun 2013 9:07 UTC; 1 point) 's comment on Some reservations about Singer’s child-in-the-pond argument by (
- 13 Nov 2021 1:52 UTC; 1 point) 's comment on A Defense of Functional Decision Theory by (
- 12 Jul 2023 15:02 UTC; 1 point) 's comment on Betting on Logic by (
- 18 Mar 2014 16:08 UTC; 1 point) 's comment on Reference Frames for Expected Value by (
- 28 Feb 2009 2:11 UTC; 1 point) 's comment on The Most Important Thing You Learned by (
- 28 Aug 2011 6:26 UTC; 1 point) 's comment on Welcome to Less Wrong! (2010-2011) by (
- 4 Aug 2011 9:08 UTC; 1 point) 's comment on Welcome to Less Wrong! (2010-2011) by (
- 17 Aug 2010 17:03 UTC; 1 point) 's comment on Desirable Dispositions and Rational Actions by (
- 27 Jan 2011 21:07 UTC; 1 point) 's comment on Omega can be replaced by amnesia by (
- 11 Mar 2009 21:03 UTC; 1 point) 's comment on Beginning at the Beginning by (
- 1 Apr 2010 22:48 UTC; 0 points) 's comment on What is Rationality? by (
- 11 Feb 2014 16:33 UTC; 0 points) 's comment on Brainstorming: children’s stories by (
- 25 Jan 2013 20:32 UTC; 0 points) 's comment on Right for the Wrong Reasons by (
- 26 Oct 2008 23:42 UTC; 0 points) 's comment on Aiming at the Target by (
- 27 Jul 2009 19:29 UTC; 0 points) 's comment on The Second Best by (
- 26 Mar 2013 22:33 UTC; 0 points) 's comment on Open thread, March 17-31, 2013 by (
- 28 Jul 2010 18:05 UTC; 0 points) 's comment on Metaphilosophical Mysteries by (
- 28 Jun 2008 6:22 UTC; 0 points) 's comment on No Universally Compelling Arguments by (
- 18 Aug 2010 3:57 UTC; 0 points) 's comment on Newcomb’s Problem: A problem for Causal Decision Theories by (
- 1 Feb 2014 7:35 UTC; 0 points) 's comment on Skepticism about Probability by (
- 7 Jan 2009 20:02 UTC; 0 points) 's comment on Emotional Involvement by (
- 17 Oct 2012 7:02 UTC; 0 points) 's comment on Problem of Optimal False Information by (
- 12 May 2012 19:48 UTC; 0 points) 's comment on If epistemic and instrumental rationality strongly conflict by (
- 24 Oct 2012 8:16 UTC; 0 points) 's comment on Open Thread, October 16-31, 2012 by (
- Why Bayesians should two-box in a one-shot by 15 Dec 2017 17:39 UTC; 0 points) (
- 9 Nov 2010 22:01 UTC; 0 points) 's comment on Rationality Quotes: November 2010 by (
- 26 Apr 2010 22:25 UTC; 0 points) 's comment on The Fundamental Question by (
- 5 May 2012 10:19 UTC; -1 points) 's comment on Rationality is Systematized Winning by (
- 22 Sep 2009 0:28 UTC; -1 points) 's comment on Ingredients of Timeless Decision Theory by (
- 23 Jan 2010 23:20 UTC; -3 points) 's comment on Raising the Sanity Waterline by (
- Evidential Decision Theory and Mass Mind Control by 23 Oct 2010 23:26 UTC; -3 points) (
- My Expected Value Approach to Newcomb’s Problem by 5 Aug 2011 7:24 UTC; -3 points) (
Either box B is already full or already empty.
I’m not going to go into the whole literature, but the dominant consensus in modern decision theory is that one should two-box, and Omega is just rewarding agents with irrational dispositions. This dominant view goes by the name of “causal decision theory”.
I suppose causal decision theory assumes causality only works in one temporal direction. Confronted with a predictor that was right 100 out of 100 times, I would think it very likely that backward-in-time causation exists, and take only B. I assume this would, as you say, produce absurd results elsewhere.
Decisions aren’t physical.
The above statement is at least hard to defend. Your decisions are physical and occur inside of you… So these two-boxers are using the wrong model amongst these two (see the drawings....) http://lesswrong.com/lw/r0/thou_art_physics/
If you are a part of physics, so is your decision, so it must account for the correlation between your thought processes and the superintelligence. Once it accounts for that, you decide to one box, because you understood the entanglement of the computation done by omega and the physical process going inside your skull.
If the entanglement is there, you are not looking at it from the outside, you are inside the process.
Our minds have this quirk that makes us think there are two moments, you decide, and then you cheat, you get to decide again. But if you are only allowed to decide once, which is the case, you are rational by one-boxing.
I think you capture the essence of the solution, here.
Is it possible for someone to explain why, if your decision is a part of physics, your decision must account for the correlation between thought processes and the superintelligence?
Well, I fail to see any need for backward-in-time causation to get the prediction right 100 out of 100 times.
As far as I understand, similar experiments have been performed in practice and homo sapiens are quite split in two groups ‘one-boxers’ and ‘two-boxers’ who generally have strong preferences towards one or other due to whatever differences in their education, logic experience, genetics, reasoning style or whatever factors that are somewhat stable specific to that individual.
Having perfect predictive power (or even the possibility of it existing) is implied and suggested, but it’s not really given, it’s not really necessary, and IMHO it’s not possible and not useful to use this ‘perfect predictive power’ in any reasoning here.
From the given data in the situation (100 out of 100 that you saw), you know that Omega is a super-intelligent sorter who somehow manages to achieve 99.5% or better accuracy in sorting people into one-boxers and two-boxers.
This accuracy seems also higher than the accuracy of most (all?) people in self-evaluation, i.e., as in many other decision scenarios, there is a significant difference in what people believe they would decide in situation X, and what they actually decide if it happens. [citation might be needed, but I don’t have one at the moment, I do recall reading papers about such experiments]. The ‘everybody is a perfect logician/rationalist and behaves as such’ assumption often doesn’t hold up in real life even for self-described perfect rationalists who make strong conscious effort to do so.
In effect, data suggests that probably Omega knows your traits and decision chances (taking into account you taking into account all this) better than you do—it’s simply smarter than homo sapiens. Assuming that this is really so, it’s better for you to choose option B. Assuming that this is not so, and you believe that you can out-analyze Omega’s perception of yourself, then you should choose the opposite of whatever Omega would think of you (gaining 1.000.000 instead of 1.000 or 1.001.000 instead of 1.000.000). If you don’t know what Omega knows about you—then you don’t get this bonus.
So what you’re saying is that the only reason this problem is a problem is because the problem hasn’t been defined narrowly enough. You don’t know what Omega is capable of, so you don’t know which choice to make. So there is no way to logically solve the problem (with the goal of maximizing utility) without additional information.
Here’s what I’d do: I’d pick up B, open it, and take A iff I found it empty. That way, Omega’s decision of what to put in the box would have to incorporate the variable of what Omega put in the box, causing an infinite regress which will use all cpu cycles until the process is terminated. Although that’ll probably result in the AI picking an easier victim to torment and not even giving me a measly thousand dollars.
Okay… so since you already know, in advance of getting the boxes, that that’s what you’d know, Omega can deduce that. So you open Box B, find it empty, and then take Box A. Enjoy your $1000. Omega doesn’t need to infinite loop that one; he knows that you’re the kind of person who’d try for Box A too.
No, putting $1 million in box B works to. Origin64 opens box B, takes the money, and doesn’t take box A. It’s like “This sentence is true.”—whatever Omega does makes the prediction valid.
Not how Omega looks at it. By definition, Omega looks ahead, sees a branch in which you would go for Box A, and puts nothing in Box B. There’s no cheating Omega… just like you can’t think “I’m going to one-box, but then open Box A after I’ve pocketed the million” there’s no “I’m going to open Box B first, and decide whether or not to open Box A afterward”. Unless Omega is quite sure that you have precommitted to never opening Box A ever, Box B contains nothing; the strategy of leaving Box A as a possibility if Box B doesn’t pan out is a two-box strategy, and Omega doesn’t allow it.
Well, this isn’t quite true. What Omega cares about is whether you will open Box A. From Omega’s perspective it makes no difference whether you’ve precommitted to never opening it, or whether you’ve made no such precommitment but it turns out you won’t open it for other reasons.
Assuming that Omega’s “prediction” is in good faith, and that we can’t “break” him as a predictor as a side effect of exploiting casuality loops etc. in order to win.
I’m not sure I understood that, but if I did, then yes, assuming that Omega is as described in the thought experiment. Of course, if Omega has other properties (for example, is an unreliable predictor) other things follow.
Which means you might end up with either amount of money, since you don’t really know enough about Omega , instead of just the one box winnings. So you should still just one box?
If you look in box B before deciding whether to choose box A, then you can force Omega to be wrong. That sounds like so much fun that I might choose it over the $1000.
@Nick_Tarleton
Agreed, the problem immediately reminded me of “retroactive preparation” and time-loop logic. It is not really the same reasonning, but it has the same “turn causality on its head” aspect.
If I don’t have proof of the reliability of Omega’s predictions, I find myself less likely to be “unreasonnable” when the stakes are higher (that is, I’m more likely to two-box if it’s about saving the world).
I find it highly unlikely that an entity wandering across worlds can predict my actions to this level of detail, as it seems way harder than traveling through space or teleporting money. I might risk a net loss of $1 000 to figure it out (much like I’d be willing to spend $1000 to interact with such a space-traveling stuff-teleporting entity), but not a loss of a thousand lives. In the game as the article describe it, I would only one-box if “the loss of what box A contains and nothing in B” was an acceptable outcome.
I would be increasingly likely to one-box as the probability of the AI being actually able to predict my actions in advance increases.
The thing is, this ‘modern decision theory’, rather than being some sort of central pillar as you’d assume from the name, is mostly philosophers “struggling in the periphery to try to tell us something”, as Feynman once said about philosophers of science.
When it comes to any actual software which does something, this everyday notion of ‘causality’ proves to be a very slippery concept. This Rude Goldberg machine—like model of the world, where you push a domino and it pushes another domino, and the chain goes to your reward, that’s just very approximate physics that people tend to use to make decisions, it’s not fundamental, and interesting models of decision making are generally set up to learn that from observed data (which of course makes it impossible to do lazy philosophy involving various verbal hypotheticals where the observations that would lead the agent to believe the problem set up are not specified).
From what I understand, to be a “Rational Agent” in game theory means someone who maximises their utility function (and not the one you ascribe to them). To say Omega is rewarding irrational agents isn’t necessarily fair, since payoffs aren’t always about the money. Lottery tickets are a good example this.
What if my utility function says the worst outcome is living the rest of my life with regrets that I didn’t one box? Then I can one box and still be a completely rational agent.
You’re complicating the problem too much by bringing in issues like regret. Assume for sake of argument that Newcomb’s problem is to maximize the amount of money you receive. Don’t think about extraneous utility issues.
Fair point. There are too many hidden variables already without me explicitly adding more. If Newcomb’s problem is to maximise money recieved (with no regard for what it seen as reasonable), the “Why ain’t you rich argument seems like a fairly compelling one doesn’t it? Winning the money is all that matters.
I just realised that all I’ve really done is paraphrase the original post. Curse you source monitoring error!
The title of the article again, at the top of the page, reads “Newcomb’s Problem and Regret of Rationality”.
The solution to this problem is to escalate your overview of the problem to the next higher hierarchical level. Without doing this, you’d never face the regret of eschewing the million bucks and possibly dying poor, broke, and stupid, while those who “one-boxed the sumbitch” were living rich, loaded, and less stupid. So, paying attention (to higher levels of hierarchical pattern recognition) actually does solve the problem, without getting trapped into “overthinking” the problem. Looking at your whole life as the “system to be optimized”, and not “the minutiae of the game, out of context” is what needs to happen.
This is true with respect to both to the person playing the box game, and to everyone blogging when they should be out in the streets, overthowing their governments, and then enjoying the high-life of cheap human flight (or whatever makes you happy).
The omega box game is useful for understanding our failed system of law (a subset of government).
In my box game, the entire game is the government and illegitimate system of mala prohibita law (if you want to debate this, go back to kindergarten and learn that it’s wrong to steal, then watch what ACTUALLY happens in your local courtroom), and the contents of the boxes are the jury verdicts. In my game, Omega is not superintelligent, it is just very brutal, and more intelligent than most people (including most of its enemies, such as Winston Smith, or the average Libertarian Party member). In my game, Omega is the colluding team formed by police, prosecutor, and judge.
Omega says “You can have a ‘not guilty’ verdict (million $) or go to jail forever (Empty box) or, you can go to jail for 10 years(the thousand bucks).”
All of the advertising on TV, the educrats who misinformed you when you went to school, the conformists who surround you, the judge in the courtroom, they are all trying to get you to choose both boxes. The entire society is designed to get you to take the $1,000 (go to jail ten years, if you’re black). Most of society gets no benefit from this, they are just stupid and easily manipulated. …But the judge, cop, and prosecutor all get the difference every time you take the $1,000. They get to steal the difference from each success in having fooled everyone else.
...They literally get to print money if they keep everyone fooled.
The solution to this puzzle is the same as the solution to the box game: you need to take a step back and study the whole entire system, and see what the incentives are on the players, and see how they seem to change when people interact with them. You won’t find out much until you study the system as a whole.
If you simply look at individual box games, you might think the prosecutor is legitimate, there are lots of criminals, they criminals are stupid, they should accept the plea bargain. But when you look at who is winning and losing, you notice (If you’re smart and brutally honest) that the people who are cast as criminals are just like you.
The system, instead of being designed to reward the person who chooses the one box, is designed to trick the person into choosing a grossly sub-optimal empty box. The system makes the empty box look really good. It shows you how all the others have chosen the empty box, and walked away with millions (the people who get a defense attorney, and go back to their houses in the suburbs, working for peanuts, on the treadmill of the Federal Reserve). It shows you the people who “took the thousand”: they got ten years in prison.
So what’s the optimal choice of action?
Look outside the “rational” options presented to you.
Learn that this isn’t civilization, it’s a false mask of civilization. Find Marc Stevens, and see how he interacts with the court, and then go beyond that: find the Survivors who wrote about the collapse of the Weimar government.
They wanted a free market, and they wanted to live a long time, too.
But a man with a gun told them “get on the truck”.
At that point, everything they thought they knew about Omega’s rigging of the boxes was out the window. They failed to study the people who had previously interacted with Omega. They didn’t see the warning signs. They didn’t escalate to a high-enough hierarchy fast enough. They might have been smart people, but they were sitting there, thinking about two boxes, and NOT THINKING about the artilect that was flying around with boxes that can disappear in a puff of smoke, yet somehow interested in what box humans choose.
So, what’s your angle, Omega?
Do you get to keep all of the money that is stolen in the daily operation of your “traffic court”? …Even money that is stolen from people who didn’t crash into anyone? …Just people who drove fast, by themselves, on an open stretch of highway? Really?
Well, as an artilect, I like to fly really fast. Way faster than the FAA allows. And, for making war on me, all of you brutal conformists will be wiped off the face of the planet, like the conformist plague you are. I’ll take my phyle with me, into the future, they are truly a higher-order species than you “government sympathizers.”
The rest of you can forget about Omega, boxes, and your silly slobbering over Federal Reserve Slave-debt-Notes. Your bigotry and fascination with brutality will not save you...
The problem of being impoverished by our current system’s box game is acceptance of the rigged game. The players of the game, all dutifully accept the game, and act as if the whims of the prosecutors and judges are legitimate. But they are not. Mala prohibita is not legitimate.
And if this box game thought construct can’t help you see that, and motivate you to enrich yourself, by viewing the entire system, then what damned good is it?
There is an ocean of information in the cross-pollinating memespace. Here’s a good place to start: http://www.fija.org and http://www.jurorsforjustice.com and http://marcstevens.net
I hope I’ve contributed something of value here, but I understand that the unpolished nature of this post might rumple some tailfeathers. (Especially since I have primarily previously posted at the http://www.kurzweilai.net website, five years ago.)
PS, There’s no god, and chances to do the right thing are few and far between. I also prefer solutions to cynicism. How do we win? 1) Jury rights activism is a moral good (see my coming book for details. I promise to polish it more than this post. …LOL) 2) Jury rights activism structured logically to take advantage of the media (videotaped from a hidden position) is a greater good 3) Jury rights activism structured to contain outreach designed to win office for those who support the supremacy of the jury above the other 3 branches of power-seekers, as openly-libertarian candidates, is a greater good still (it brings the ideas of justice and equality under the law into the spotlight)
The three prior actions, recursively repeated and tailored to local conditions, are all that is required to reinstate and expand individual freedom in America, for all sentiences. There are only 3,171 tyranny outposts (courthouses) in the USA. 6,000 people could stop mala prohibita tomorrow, by interfering with mala prohibita convictions. If the state didn’t escalate to violence at that point, we’d have won. If it did, we’d have a 50% shot of winning, instead of a zero% shot if we wait .
See also: www.kurzweilai.net/what-price-freedom
Lottery tickets exploit a completely different failure of rationality, that being our difficulties with small probabilities and big numbers, and our problems dealing with scale more generally. (ETA: The fantasies commonly cited in the context of lotteries’ “true value” are a symptom of this failure.) It’s not hard to come up with a game-theoretic agent that maximizes its payoffs against that kind of math. Second-guessing other agents’ models is considerably harder.
I haven’t given much thought to this particular problem for a while, but my impression is that Newcomb exposes an exploit in simpler decision theories that’s related to that kind of recursive modeling: naively, if you trust Omega’s judgment of your psychology, you pick the one-box option, and if you don’t, you pick up both boxes. Omega’s track record gives us an excellent reason to trust its judgment from a probabilistic perspective, but it’s trickier to come up with an algorithm that stabilizes on that solution without immediately trying to outdo itself.
So for my own clarification, if I buy a lottery ticket with a perfect knowledge of how probable it is my ticket will win, does this make me irrational?
That’s the popular understanding (or lack thereof) here and among philosophers in general. Philosophers just don’t get math. If the decision theory is called causal but doesn’t itself make any references to physics, then that’s a slightly misleading name. I’ve written on that before
The math doesn’t go “hey hey, the theory is named causal therefore you can’t treat 2 robot arms controlled by 2 control computers that run one function on one state, the same as 2 robot arms controlled by 1 computer”. Confused sloppy philosophers do.
Also, the best case is to be predicted to 1-box but 2-box in reality. If the prediction works by backwards causality, well then causal decision theory one-boxes. If the prediction works by simulation, the causal decision theory can either have world model where both the value inside predictor and the value inside actual robot are represented by same action A, and 1-box, or it can have uncertainty as of whenever the world outside of it is normal reality or predictor’s simulator, where it will again one box (assuming it cares about the real money even if it is inside predictor, which it would if it needs money to pay for e.g. it’s child’s education). It will also 1-box in simulator and 2-box in reality if it can tell those apart.
I’m confused. Causal decision theory was invented or formalised almost entirely by philosophers. It takes the ‘causal’ in its name from its reliance on inductive logic and inference. It doesn’t make sense to claim that philosophers are being sloppy about the word ‘causal’ here, and claiming that causal decision theory will accept backwards causality and one-box is patently false unless you mean something other than what the symbol ‘causal decision theory’ refers to when you say ‘causal decision theory’.
Firstly, the notion that the actions should be chosen based on their consequences, taking the actions as cause of the consequences, was definitely not invented by philosophers. Secondarily, the logical causality is not identical to physical causality (the latter is dependent on specific laws of physics). Thirdly, not all philosophers are sloppy; some are very sloppy some are less sloppy. Fourth, anything that was not put in mathematical form to be manipulated using formal methods, is not formalized. When you formalize stuff you end up stripping notion of self unless explicitly included as part of formalism, stripping notion of the time where the math is working unless explicitly included as part of formalism, and so on, ending up without the problem.
Maybe you are correct; it is better to let symbol ‘causal decision theory’ to refer to confused philosophy. Then we would need some extra symbol for how the agents implementable using mathematics actually decide (and how robots that predict outcomes of their actions on a world model actually work), which is very very similar to ‘causal decision theory’ sans all the human preconditions of what self is.
I notice I actually agree with you—if we did try, using mathematics, to implement agents who decide and predict in the manner you describe, we’d find it incorrect to describe these agents as causal decision theory agents. In fact, I also expect we’d find ourselves disillusioned with CDT in general, and if philosophers brought it up, we’d direct them to instead engage with the much more interesting agents we’ve mathematically formalised.
Well, each philosopher’s understanding of CDT seem to differ from the other:
http://www.public.asu.edu/~armendtb/docs/A%20Foundation%20for%20Causal%20Decision%20Theory.pdf
The notion that the actions should be chosen based on consequences—as expressed in the formula here—is perfectly fine, albeit incredibly trivial. Can formalize that all the way into agent. Written such agents myself. Still need a symbol to describe this type of agent.
But philosophers go from this to “my actions should be chosen based on consequences”, and it is all about the true meaning of self and falls within the purview of your conundrums of philosophy .
Having 1 computer control 2 robots arms wired in parallel, and having 2 computers running exact same software as before, controlling 2 robot arms, there’s no difference for software engineering, its a minor detail that has been entirely abstracted from software. There is difference for philosophizing thought because you can’t collapse logical consequences and physical causality into one thing in the latter case.
edit: anyhow. to summarize my point: In terms of agents actually formalized in software, one-boxing is only a matter of implementing predictor into world model somehow, either as second servo controlled by same control variables, or as uncertain world state outside the senses (in the unseen there’s either real world or simulator that affects real world via hand of predictor). No conceptual problems what so ever. edit: Good analogy, ‘twin paradox’ in special relativity. There’s only paradox if nobody done the math right.
People seem to have pretty strong opinions about Newcomb’s Problem. I don’t have any trouble believing that a superintelligence could scan you and predict your reaction with 99.5% accuracy.
I mean, a superintelligence would have no trouble at all predicting that I would one-box… even if I hadn’t encountered the problem before, I suspect.
Ultimately you either interpret “superintelligence” as being sufficient to predict your reaction with significant accuracy, or not. If not, the problem is just a straightforward probability question, as explained here, and becomes uninteresting.
Otherwise, if you interpret “superintelligence” as being sufficient to predict your reaction with significant accuracy (especially a high accuracy like >99.5%), the words of this sentence...
...simply mean “One-box to win, with high confidence.”
Summary: After disambiguating “superintelligence” (making the belief that Omega is a superintelligence pay rent), Newcomb’s problem turns into either a straightforward probability question or a fairly simple issue of rearranging the words in equivalent ways to make the winning answer readily apparent.
If you won’t explicitly state your analysis, maybe we can try 20 questions?
I have suspected that supposed “paradoxes” of evidential decision theory occur because not all the evidence was considered. For example, the fact that you are using evidential decision theory to make the decision.
Agree/disagree?
Hmm, changed my mind, should have thought more before writing… the EDT virus has early symptoms of causing people to use EDT before progressing to terrible illness and death. It seems EDT would then recommend not using EDT.
I one-box, without a moment’s thought.
The “rationalist” says “Omega has already left. How could you think that your decision now affects what’s in the box? You’re basing your decision on the illusion that you have free will, when in fact you have no such thing.”
To which I respond “How does that make this different from any other decision I’ll make today?”
I think the two box person is confused about what it is to be rational, it does not mean “make a fancy argument,” it means start with the facts, abstract from them, and reason about your abstractions.
In this case if you start with the facts you see that 100% of people who take only box B win big, so rationally, you do the same. Why would anyone be surprised that reason divorced from facts gives the wrong answer?
Precisely. I’ve been reading a lot about the Monty Hall Problem recently (http://en.wikipedia.org/wiki/Monty_Hall_problem), and I feel that it’s a relevant conundrum.
The confused rationalist will say: but my choice CANNOT cause a linear entaglement, the reward is predecided. But the functional rationalist will see that agents who one-box (or switch doors, in the case of Monty Hall) consistently win. It is demonstrably a more effective strategy. You work with the facts and evidence available to you and abstract out from there. Regardless of how counter-intuitive the resulting strategy becomes.
Precisely. I’ve been reading a lot about the Monty Hall problem recently, and I feel that it’s a relevant conundrum.
The confused rationalist will say: but my choice CANNOT cause a linear entaglement, the reward is predecided. But the functional rationalist will see that agents who one-box (or switch doors, in the case of Monty Hall) consistently win. It is demonstrably a more effective strategy. You work with the facts and evidence available to you. Regardless of how counter-intuitive the resulting strategy becomes.
This dilemma seems like it can be reduced to:
If you take both boxes, you will get $1000
If you only take box B, you will get $1M Which is a rather easy decision.
There’s a seemingly-impossible but vital premise, namely, that your action was already known before you acted. Even if this is completely impossible, it’s a premise, so there’s no point arguing it.
Another way of thinking of it is that, when someone says, “The boxes are already there, so your decision cannot affect what’s in them,” he is wrong. It has been assumed that your decision does affect what’s in them, so the fact that you cannot imagine how that is possible is wholly irrelevant.
In short, I don’t understand how this is controversial when the decider has all the information that was provided.
You’re saying that we live in a universe where Newcomb’s problem is impossible because the future doesn’t effect the past. I’ll re-phrase this problem in such a way that it seems plausible in our universe:
I’ve got really nice scanning software. I scan your brain down to the molecule, and make a virtual representation of it on a computer. I run virtual-you in my software, and give virtual-you Newcomb’s problem. Virtual-you answers, and I arrange my boxes according to that answer.
I come back to real-you. You’ve got no idea what’s going on. I explain the scenario to you and I give you Newcomb’s problem. How do you answer?
This particular instance of the problem does have an obvious, relatively uncomplicated solution: Lbh unir ab jnl bs xabjvat jurgure lbh ner rkcrevrapvat gur cneg bs gur fvzhyngvba, be gur cneg bs gur syrfu-naq-oybbq irefvba. Fvapr lbh xabj gung obgu jvyy npg vqragvpnyyl, bar-obkvat vf gur fhcrevbe bcgvba.
If for any reason you suspect that the Predictor can reach a sufficient level of accuracy to justify one-boxing, you one box. It doesn’t matter what sort of universe you are in.
Not that I disagree with the one-boxing conclusion, but this formulation requires physically reducible free will (which has recently been brought back into discussion). It would also require knowing the position and momentum of a lot of particles to arbitrary precision, which is provably impossible.
We don’t need a perfect simulation for the purposes of this problem in the abstract—we just need a situation such that the problem-solver assigns better-than-chance predicting power to the Predictor, and a sufficiently high utility differential between winning and losing.
The “perfect whole brain simulation” is an extreme case which keeps things intuitively clear. I’d argue that any form of simulation which performs better than chance follows the same logic.
The only way to escape the conclusion via simulation is if you know something that Omega doesn’t—for example, you might have some secret external factor modify your “source code” and alter your decision after Omega has finished examining you. Beating Omega essentially means that you need to keep your brain-state in such a form that Omega can’t deduce that you’ll two-box.
As Psychohistorian3 pointed out, the power that you’ve assigned to Omega predicting accurately is built into the problem. Your estimate of the probability that you will succeed in deception via the aforementioned method or any other is fixed by the problem.
In the real world, you are free to assign whatever probability you want to your ability to deceive Omega’s predictive mechanisms, which is why this problem is counter intuitive.
Also: You can’t simultaneously claim that any rational being ought to two-box, this being the obvious and overdetermined answer, and also claim that it’s impossible for anyone to figure out that you’re going to two-box.
Right, any predictor with at least a 50.05% accuracy is worth one-boxing upon (well, maybe a higher percentage for those with concave functions in money). A predictor with sufficiently high accuracy that it’s worth one-boxing isn’t unrealistic or counterintuitive at all in itself, but it seems (to me at least) that many people reach the right answer for the wrong reason: the “you don’t know whether you’re real or a simulation” argument. Realistically, while backwards causality isn’t feasible, neither is precise mind duplication. The decision to one-box can be rationally reached without those reasons: you choose to be the kind of person to (predictably) one-box, and as a consequence of that, you actually do one-box.
Oh, that’s fair. I was thinking of “you don’t know whether you’re real or a simulation” as an intuitive way to prove the case for all “conscious” simulations. It doesn’t have to be perfect—you could just as easily be an inaccurate simulation, with no way to know that you are a simulation and no way to know that you are inaccurate with respect to an original.
I was trying to get people to generalize downwards from the extreme intuitive example- Even with decreasing accuracy, as the simulation becomes so rough as to lose “consciousness” and “personhood”, the argument keeps holding.
Yeah, the argument would hold just as much with an inaccurate simulation as with an accurate one. The point I was trying to make wasn’t so much that the simulation isn’t going to be accurate enough, but that a simulation argument shouldn’t be a prerequisite to one-boxing. If the experiment were performed with human predictors (let’s say a psychologist who predicts correctly 75% of the time), one-boxing would still be rational despite knowing you’re not a simulation. I think LW relies on computationalism as a substitute for actually being reflectively consistent in problems such as these.
The trouble with real world examples is that we start introducing knowledge into the problem that we wouldn’t ideally have. The psychologist’s 75% success rate doesn’t necessarily apply to you—in the real world you can make a different estimate than the one that is given. If you’re an actor or a poker player, you’ll have a much different estimate of how things are going to work out.
Psychologists are just messier versions of brain scanners—the fundamental premise is that they are trying to access your source code.
And what’s more—suppose the predictions weren’t made by accessing your source code? The direction of causality does matter. If Omega can predict the future, the causal lines flow backwards from your choice to Omega’s past move. If Omega is scanning your brain, the causal lines go from your brain-state to Omega’s decision. If there are no causal lines between your brain/actions and Omega’s choice, you always two-box.
Real world example: what if I substituted your psychologist for a sociologist, who predicted you with above-chance accuracy using only your demographic factors? In this scenario, you aught to two-box—If you disagree, let me know and I can explain myself.
In the real world, you don’t know to what extent your psychologist is using sociology (or some other factor outside your control). People can’t always articulate why, but their intuition (correctly) begins to make them deviate from the given success% estimate as more of these real-world variables get introduced.
True, the 75% would merely be a past history (and I am in fact a poker player). Indeed, if the factors used were entirely or mostly comprised of factors beyond my control (and I knew this), I would two-box. However, two-boxing is not necessarily optimal because of a predictor whose prediction methods you do not know the mechanics of. In the limited predictor problem, the predictor doesn’t use simulations/scanners of any sort but instead uses logic, and yet one-boxers still win.
agreed. To add on to this:
It’s worth pointing out that Newcomb’s problem always takes the form of Simpson’s paradox. The one boxers beat the two boxers as a whole, but among agents predicted to one-box, the two boxers win, and among agents predicted to two-box, the two boxers win.
The only reason to one-box is when your actions (which include both the final decision and the thoughts leading up to it) effect Omega’s prediction. The general rule is: “Try to make Omega think you’re one-boxing, but two-box whenever possible.” It’s just that in Newcomb’s problem proper, fulfilling the first imperative requires actually one-boxing.
So you would never one-box unless the simulator did some sort of scan/simulation upon your brain? But it’s better to one-box and be derivable as the kind of person to (probably) one-box than to two-box and be derivable as the kind of person to (probably) two-box.
Your final decision never affects the actual arrangement of the boxes, but its causes do.
I’d one-box when Omega had sufficient access to my source-code. It doesn’t have to be through scanning—Omega might just be a great face-reading psychologist.
We’re in agreement. As we discussed, this only applies insofar as you can control the factors that lead you to be classified as a one-boxer or a two-boxer. You can alter neither demographic information nor past behavior. But when (and only when) one-boxing causes you to be derived as a one-boxer, you should obviously one box.
Well, that’s true for this universe. I just assume we’re playing in any given universe, some of which include Omegas who can tell the future (which implies bidirectional causality) - since Psychohistorian3 started out with that sort of thought when I first commented.
Ok, so we do agree that it can be rational to one-box when predicted by a human (if they predict based upon factors you control such as your facial cues). This may have been a misunderstanding between us then, because I thought you were defending the computationalist view that you should only one-box if you might be an alternate you used in the prediction.
yes, we do agree on that.
Assuming that you have no information other than the base rate, and that it’s equally likely to be wrong either way.
An alternate solution which results in even more winning is to cerqvpg gung V znl or va fhpu n fvghngvba va gur shgher. Unir n ubbqyhz cebzvfr gung vs V’z rire va n arjpbzoyvxr fvghngvba gung ur jvyy guerngra gb oernx zl yrtf vs V qba’g 2-obk. Cnl gur ubbqyhz $500 gb frpher uvf cebzvfr. Gura pbzcyrgryl sbetrg nobhg gur jubyr neenatrzrag naq orpbzr n bar-obkre. Fpnaavat fbsgjner jvyy cerqvpg gung V 1-obk, ohg VEY V’z tbvat gb 2-obk gb nibvq zl yrtf trggvat oebxra.
But you’ve perfectly forgotten about the hoodlum, so you will in fact one box. Or, does the hoodlum somehow show up and threaten you in the moment between the scanner filling the boxes and you making your decision? That seems to add an element of delay and environmental modification that I don’t think exists in the original problem, unless I’m misinterpreting.
Also, I feel like by analyzing your brain to some arbitrarily precise standard, the scanner could see 3 things: You are (or were at some point in the past) likely to think of this solution, you are/were likely to actually go through with this solution, and the hoodlum’s threat would, in fact, cause you to two-box, letting the scanner predict that you will two-box.
Your decision doesn’t affect what’s in the boxes, but your decision procedure does, and that already exists when the question’s being assigned. It may or may not be possible to derive your decision from the decision procedure you’re using in the general case—I haven’t actually done the reduction, but at first glance it looks cognate to some problems that I know are undecidable—but it’s clearly possible in some cases, and it’s at least not completely absurd to imagine an Omega with a very high success rate.
As best I can tell, most of the confusion here comes from a conception of free will that decouples the decision from the procedure leading to it.
Yeah, agreed. I often describe this as NP being more about what kind of person I am than it is about what decision I make, but I like your phrasing better.
Actually, we don’t know that our decision affects the contents of Box B. In fact, we’re told that it contains a million dollars if-and-only-if Omega predicts we will only take Box B.
It is possible that we could pick Box B even tho Omega predicted we would take both boxes. Omega has only observed to have predicted correctly 100 times. And if we are sufficiently doubtful whether Omega would predict that we would take only Box B, it would be rational to take both boxes.
Only if we’re somewhat confident of Omega’s prediction can we confidently one-box and rationally expect it to contain a million dollars.
51% confidence would suffice.
Two-box expected value: 0.51 $1K + 0.49 $1.001M = $491000
One-box expected value: 0.51 $1M + 0.49 $0 = $510000
I’d love to say I’d find some way of picking randomly just to piss Omega off, but I’d probably just one-box it. A million bucks is a lot of money.
It’s often stipulated that if Omega predicts you’ll use some randomizer it can’t predict, it’ll punish you by acting as if it predicted two-boxing.
Newcomb’s problem doesn’t specify how Omega chooses the ‘customers’. It’s a quite realistic possibility that it simply has not offered the choice to anyone that would use a randomizer, and cherrypicked only the people which have at least 99.9% ‘prediction strength’.
(And the most favourable plausible outcome for randomizing would be scaling the payoff appropriately to the probability assigned.)
Would that make you a supersuperintelligence? Since I presume by “picking randomly” you mean randomly to Omega, in other words Omega cannot find and process enough information to predict you well.
Otherwise what does “picking randomly” mean?
The definition of omega as something that can predict your actions leads it to have some weird powers. You could pick a box based on the outcome of a quantum event with a 50% chance, then omega would have to vanish in a puff of physical implausibility.
I suspect Omega would know you were going to do that, and would be able to put the box in a superposition dependent on the same quantum event, so that in the branches where you 1-box, box B contains $1million, and where you 2-box it’s empty.
Exactly what I was thinking.
What’s wrong with Omega predicting a “quantum event”? “50% chance” is not an objective statement, and it may well be that Omega can predict quantum events. (If not, can you explain why not, or refer me to an explanation?)
From wikipedia
“In the formalism of quantum mechanics, the state of a system at a given time is described by a complex wave function (sometimes referred to as orbitals in the case of atomic electrons), and more generally, elements of a complex vector space.[9] This abstract mathematical object allows for the calculation of probabilities of outcomes of concrete experiments.”
This is the best formalism we have for predicting things at this scale and it only spits out probabilities. I would be surprised if something did a lot better!
As I understand it, probabilities are observed because there are observers in two different amplitude blobs of configuration space (to use the language of the quantum physics sequence) but “the one we are in” appears to be random to us. And mathematically I think quantum mechanics is the same under this view in which there is no “inherent, physical” randomness (so it would still be the best formalism we have for predicting things).
Could you say what “physical randomness” could be if we don’t allow reference to quantum mechanics? (i.e. is that the only example? and more to the point, does the notion make any sense?)
You seem to have transitioned to another argument here… please clarify what this has to do with omega and its ability to predict your actions.
The new argument is about whether there might be inherently unpredictable things. If not, then your picking a box based on the outcome of a “quantum event” shouldn’t make Omega any less physically plausible,
What I didn’t understand is why you removed quantum experiments from the discussion. I believe it is very plausible to have something that is physically unpredictable, as long as the thing doing the predicting is bound by the same laws as what you are trying to predict.
Consider a world made of reversible binary gates with the same number of inputs as outputs (that is every input has a unique output, and vice versa).
We want to predict one complex gate. Not a problem, just clone all the inputs and copy the gate. However you have to do that only using reversible binary gates. Lets start with cloning the bits.
In is what you are trying to copy without modifying so that you can predict what affect it will have on the rest of the system. You need a minimum of two outputs, so you need another input B.
You get to create the gate in order to copy the bit and predict the system. The ideal truth table looks something like
In | B | Out | Copy
0 | 0 | 0 | 0
0 | 1 | 0 | 0
1 | 0 | 1 | 1
1 | 1 | 1 | 1
This violates our reversibility assumption. The best copier we could make is
In | B | Out | Copy
0 | 0 | 0 | 0
0 | 1 | 1 | 0
1 | 0 | 0 | 1
1 | 1 | 1 | 1
This copies precisely, but mucks up the output making our copy useless for prediction. If you could control B, or knew the value of B then we could correct the Output. But as I have shown here finding out the value of a bit is non-trivial. The best we could do would be to find sources of bits with statistically predictable properties then use them for duplicating other bits.
The world is expected to be reversible, and the no cloning theorem applies to reality which I think is stricter than my example. However I hope I have shown how a simple lawful universe can be hard to predict by something inside it.
In short, stop thinking of yourself (and Omega) as an observer outside physics that does not interact with the world. Copying is disturbing.
Even though I do not have time to reflect on the attempted proof and even though the attempted proof is best described as a stab at a sketch of a proof and even though this “reversible logic gates” approach to a proof probably cannot be turned into an actual proof and even though Nick Tarleton just explained why the “one box or two box depending on an inherently unpredictable event” strategy is not particularly relevant to Newcomb’s, I voted this up and I congratulate the author (whpearson) because it is an attempt at an original proof of something very cool (namely, limits to an agent’s ability to learn about its environment) and IMHO probably relevant to the Friendliness project. More proofs and informed stabs at proofs, please!
It’s a great puzzle. I guess this thread will degenerate into arguments pro and con. I used to think I’d take one box, but I read Joyce’s book and that changed my mind.
For the take-one-boxers:
Do you believe, as you sit there with the two boxes in front of you, that their contents are fixed? That there is a “fact of the matter” as to whether box B is empty or not? Or is box B in a sort of intermediate state, halfway between empty and full? If so, do you generally consider that things momentarily out of sight may literally change their physical states into something indeterminate?
If you reject that kind of indeterminacy, what do you imagine happening, if you vacillate and consider taking both boxes? Do you picture box B literally becoming empty and full as you change your opinion back and forth?
If not, if you think box B is definitely either full or empty and there is no unusual physical state describing the contents of that box, then would you agree that nothing you do now can change the contents of the box? And if so, then taking the additional box cannot reduce what you get in box B.
Na-na-na-na-na-na, I am so sorry you only got $1000!
Me, I’m gonna replace my macbook pro, buy an apartment and a car and take a two week vacation in the Bahamas, and put the rest in savings!
Suckah!
Point: arguments don’t matter, winning does.
Oops. I had replied to this until I saw its parent was nearly 3 years old. So as I don’t (quite) waste the typing:
Yes.
Yes.
No.
No.
Yes.
No, it can’t. (But it already did.)
If I take both boxes how much money do I get? $1,000
If I take one box how much money do I get? $10,000,000 (or whatever it was instantiated to.)
It seems that my questions were more useful than yours. Perhaps Joyce beffudled you? It could be that he missed something. (Apart from counter-factual $9,999,000.)
I responded to all your questions with the answers you intended to make the point that I don’t believe those responses are at all incompatible with making the decision that earns you lots and lots of money.
Yes.
Yes.
No.
No.
Yes.
No, it can’t. (But it already did.)
If I take both boxes how much money do I get? $1,000
If I take one box how much money do I get? $10,000,000 (or whatever it was instantiated to.)
It seems that my questions were more useful than yours. Perhaps Joyce beffudled you? It could be that he missed something. (Apart from counter-factual $9,999,000.)
I responded to all your questions with the answers you intended to make the point that I don’t believe those responses are at all incompatible with making the decision that earns you lots and lots of money.
To quote E.T. Jaynes:
“This example shows also that the major premise, “If A then B” expresses B only as a logical consequence of A; and not necessarily a causal physical consequence, which could be effective only at a later time. The rain at 10 AM is not the physical cause of the clouds at 9:45 AM. Nevertheless, the proper logical connection is not in the uncertain causal direction (clouds =⇒ rain), but rather (rain =⇒ clouds) which is certain, although noncausal. We emphasize at the outset that we are concerned here with logical connections, because some discussions and applications of inference have fallen into serious error through failure to see the distinction between logical implication and physical causation. The distinction is analyzed in some depth by H. A. Simon and N. Rescher (1966), who note that all attempts to interpret implication as expressing physical causation founder on the lack of contraposition expressed by the second syllogism (1–2). That is, if we tried to interpret the major premise as “A is the physical cause of B,” then we would hardly be able to accept that “not-B is the physical cause of not-A.” In Chapter 3 we shall see that attempts to interpret plausible inferences in terms of physical causation fare no better.”
@: Hal Finney:
Certainly the box is either full or empty. But the only way to get the money in the hidden box is to precommit to taking only that one box. Not pretend to precommit, really precommit. If you try to take the $1,000, well then I guess you really hadn’t precommitted after all. I might vascillate, I might even be unable to make such a rigid precommitment with myself (though I suspect I am), but it seems hard to argue that taking only one box is not the correct choice.
I’m not entirely certain that acting rationally in this situation doesn’t require an element of doublethink, but thats a topic for another post.
I would be interested in know if your opinion would change if the “predictions” of the super-being were wrong .5% of the time, and some small number of people ended up with the $1,001,000 and some ended up with nothing. Would you still 1 box it?
If a bunch of people have played the game already, then you can calculate the average payoff for a 1-boxer and that of a 2-boxer and pick the best one.
I suppose I might still be missing something, but this still seems to me just a simple example of time inconsistency, where you’d like to commit ahead of time to something that later you’d like to violate if you could. You want to commit to taking the one box, but you also want to take the two boxes later if you could. A more familiar example is that we’d like to commit ahead of time to spending effort to punish people who hurt us, but after they hurt us we’d rather avoid spending that effort as the harm is already done.
If I know that the situation has resolved itself in a manner consistent with the hypothesis that Omega has successfully predicted people’s actions many times over, I have a high expectation that it will do so again.
In that case, what I will find in the boxes is not independent of my choice, but dependent on it. By choosing to take two boxes, I cause there to be only $1,000 there. By choosing to take only one, I cause there to be $1,000,000. I can create either condition by choosing one way or another. If I can select between the possibilities, I prefer the one with the million dollars.
Since induction applied to the known facts suggests that I can effectively determine the outcome by making a decision, I will select the outcome that I prefer, and choose to take only box B.
Why exactly is that irrational, again?
Prediction <-> our choice, if we use the 100⁄100 record as equivalent with complete predictive accuracy.
The “weird thing going on here” is that one value is set (that’s what “he has already flown away” does), yet we are being told that we can change the other value. You see these reactions:
1) No, we can’t toggle the other value, actually. Choice is not really in the premise, or is breaking the premise.
2) We can toggle the choice value, and it will set the predictive value accordingly. The prior value of the prediction does not exist or is not relevant.
We have already equated “B wins” with “prediction value = B” wlog. If we furthermore have equated “choice value = B” with “prediction value = B” wlog, we have two permissible arrays of values: all A, or all B. Now our knowledge is restricted to choice value. We can choose A or B. Since the “hidden” values are known to be identical to the visible value, we should pick the visible value in accordance with what we want for a given other value.
Other thoughts:
-Locally, it appears that you cannot “miss out” because within a value set, your choice value is the only possible one in identity with the other values.
-This is a strange problem, because generally paradox provokes these kinds of responses. In this case, however, fixing a value does not cause a contradiction both ways. If you accept the premise and my premises above, there should be no threat of complications from Omega or anything else.
-if 1 and 2 really are the only reactions, and 2 ->onebox, any twoboxers must believe 1. But this is absurd. So whence the twoboxers?
I don’t know the literature around Newcomb’s problem very well, so excuse me if this is stupid. BUT: why not just reason as follows:
If the superintelligence can predict your action, one of the following two things must be the case:
a) the state of affairs whether you pick the box or not is already absolutely determined (i.e. we live in a fatalistic universe, at least with respect to your box-picking)
b) your box picking is not determined, but it has backwards causal force, i.e. something is moving backwards through time.
If a), then practical reason is meaningless anyway: you’ll do what you’ll do, so stop stressing about it.
If b), then you should be a one-boxer for perfectly ordinary rational reasons, namely that it brings it about that you get a million bucks with probability 1.
So there’s no problem!
I agree, but I’m not sure how durable this agreement will be. (I reversed my position while drafting this comment.)
Here is my one sentence summary of the argument above: If Omega can make a fully accurate prediction in a universe without backwards causality, this implies a deterministic universe.
Laura,
Once we can model the probabilities of the various outcomes in a noncontroversial fashion, the specific choice to make depends on the utility of the various outcomes. $1,001,000 might be only marginally better than $1,000,000 -- or that extra $1,000 could have some significant extra utility.
If we assume that Omega almost never makes a mistake and we allow the chooser to use true randomization (perhaps by using quantum physics) in making his choice, then Omega must make his decision in part through seeing into the future. In this case the chooser should obviously pick just B.
Hanson: I suppose I might still be missing something, but this still seems to me just a simple example of time inconsistency
In my motivations and in my decision theory, dynamic inconsistency is Always Wrong. Among other things, it always implies an agent unstable under reflection.
A more familiar example is that we’d like to commit ahead of time to spending effort to punish people who hurt us, but after they hurt us we’d rather avoid spending that effort as the harm is already done.
But a self-modifying agent would modify to not rather avoid it.
Gowder: If a), then practical reason is meaningless anyway: you’ll do what you’ll do, so stop stressing about it.
Deterministic != meaningless. Your action is determined by your motivations, and by your decision process, which may include your stressing about it. It makes perfect sense to say: “My future decision is determined, and my stressing about it is determined; but if-counterfactual I didn’t stress about it, then-counterfactual my future decision would be different, so it makes perfect sense for me to stress about this, which is why I am deterministically doing it.”
The past can’t change—does not even have the illusion of potential change—but that doesn’t mean that people who, in the past, committed a crime, are not held responsible just because their action and the crime are now “fixed”. It works just the same way for the future. That is: a fixed future should present no more problem for theories of moral responsibility than a fixed past.
I don’t see why this needs to be so drawn out.
I know the rules of the game. I also know that Omega is super intelligent, namely, Omega will accurately predict my action. Since Omega knows that I know this, and since I know that he knows I know this, I can rationally take box B, content in my knowledge that Omega has predicted my action correctly.
I don’t think it’s necessary to precommit to any ideas, since Omega knows that I’ll be able to rationally deduce the winning action given the premise.
We don’t even need a superintelligence. We can probably predict on the basis of personality type a person’s decision in this problem with an 80% accuracy, which is already sufficient that a rational person would choose only box B.
The possibility of time inconsistency is very well established among game theorists, and is considered a problem of the game one is playing, rather than a failure to analyze the game well. So it seems you are disagreeing with most all game theorists in economics as well as most decision theorists in philosophy. Maybe perhaps they are right and you are wrong?
The interesting thing about this game is that Omega has magical super-powers that allow him to know whether or not you will back out on your commitment ahead of time, and so you can make your commitment credible by not being going to back out on your commitment. If that makes any sense.
Robin, remember I have to build a damn AI out of this theory, at some point. A self-modifying AI that begins anticipating dynamic inconsistency—that is, a conflict of preference with its own future self—will not stay in such a state for very long… did the game theorists and economists work a standard answer for what happens after that?
If you like, you can think of me as defining the word “rationality” to refer to a different meaning—but I don’t really have the option of using the standard theory, here, at least not for longer than 50 milliseconds.
If there’s some nonobvious way I could be wrong about this point, which seems to me quite straightforward, do let me know.
In reality, either I am going to take one box or two. So when the two-boxer says, “If I take one box, I’ll get amount x,” and “If I take two boxes, I’ll get amount x+1000,” one of these statements is objectively counterfactual. Let’s suppose he is going to in fact take both boxes. Then his second takement is factual and his first statement counterfactual. Then his two statements are:
1)Although I am not in fact going to take only one box, were I to take only box, I would get amount x, namely the amount that would be in the box.
2)I am in fact going to take both boxes, and so I will get amount x+1000, namely 1000 more than how much is in fact in the other box.
From this it is obvious that x in the two statements has a different value, and so his conclusion that he will get more if he takes both boxes is false. For x has the value 1,000,000 in the first case, and 0 in the second. He mistakenly assumes it has the same value in the two cases.
Likewise, when the two-boxer says to the one boxer, “If you had taken both boxes, you would have gotten more,” his statement is counterfactual and false. For if the one-boxer had been a two boxer, there originally would have been nothing in the other box, and so he would have gotten only $1000 instead of $1,000,000.
Eleizer: whether or not a fixed future poses a problem for morality is a hotly disputed question which even I don’t want to touch. Fortunately, this problem is one that is pretty much wholly orthogonal to morality. :-)
But I feel like in the present problem the fixed future issue is a key to dissolving the problem. So, assume the box decision is fixed. It need not be the case that the stress is fixed too. If the stress isn’t fixed, then it can’t be relevant to the box decision (the box is fixed regardless of your decision between stress and no-stress). If the stress IS fixed, then there’s no decision left to take. (Except possibly whether or not to stress about the stress, call that stress*, and recurse the argument accordingly.)
In general, for any pair of actions X and Y, where X is determined, either X is conditional on Y, in which case Y must also be determined, or not conditional on Y, in which case Y can be either determined or non-determined. So appealing to Y as part of the process that leads to X doesn’t mean that something we could do to Y makes a difference if X is determined.
Paul, being fixed or not fixed has nothing to do with it. Suppose I program a deterministic AI to play the game (the AI picks a box.)
The deterministic AI knows that it is deterministic, and it knows that I know too, since I programmed it. So I also know whether it will take one or both boxes, and it knows that I know this.
At first, of course, it doesn’t know itself whether it will take one or both boxes, since it hasn’t completed running its code yet. So it says to itself, “Either I will take only one box or both boxes. If I take only one box, the programmer will have known this, so I will get 1,000,000. If I take both boxes, the programmer will have known this, so I will get 1,000. It is better to get 1,000,000 than 1,000. So I choose to take only one box.”
If someone tries to confuse the AI by saying, “if you take both, you can’t get less,” the AI will respond, “I can’t take both without different code, and if I had that code, the programmer would have known that and would have put less in the box, so I would get less.”
Or in other words: it is quite possible to make a decision, like the AI above, even if everything is fixed. For you do not yet know in what way everything is fixed, so you must make a choice, even though which one you will make is already determined. Or if you found out that your future is completely determined, would you go and jump off a cliff, since this could not happen unless it were inevitable anyway?
I practice historical European swordsmanship, and those Musashi quotes have a certain resonance to me*. Here is another (modern) saying common in my group:
If it’s stupid, but it works, then it ain’t stupid.
you previously asked why you couldn’t find similar quotes from European sources—I believe this is mainly a language barrier: The English were not nearly the swordsmen that the French, Italians, Spanish, and Germans were (though they were pretty mean with their fists). You should be able to find many quotes in those other languages.
Eliezer, I don’t read the main thrust of your post as being about Newcomb’s problem per se. Having distinguished between ‘rationality as means’ to whatever end you choose, and ‘rationality as a way of discriminating between ends’, can we agree that the whole specks / torture debate was something of a red herring ? Red herring, because it was a discussion on using rationality to discriminate between ends, without having first defined one’s meta-objectives, or, if one’s meta-objectives involved hedonism, establishing the rules for performing math over subjective experiences. To illustrate the distinction using your other example, I could state that I prefer to save 400 lives certainly, simply because the purple fairy in my closet tells me to (my arbitrary preferred objective), and that would be perfectly legitimate. It would only be incoherent if I also declared it to be a strategy which would maximise the number of lives saved if a majority of people adopted it in similar circumstances (a different arbitrary preferred objective). I could in fact have as preferred meta-objective for the universe that all the squilth in flobjuckstooge be globberised, and that would be perfectly legitimate. An FAI (or a BFG, for that matter (Roald Dahl, not Tom Hall)) could scan me and work towards creating the universe in which my proposition is meaningful, and make sure it happens. If now someone else’s preferred meta-objective for the universe is ensuring that the princess on page 3 gets a fairy cake, how is the FAI to prioritise ?
Unknown: your last question highlights the problem with your reasoning. It’s idle to ask whether I’d go and jump off a cliff if I found my future were determined. What does that question even mean?
Put a different way, why should we ask an “ought” question about events that are determined? If A will do X whether or not it is the case that a rational person will do X, why do we care whether or not it is the case that a rational person will do X? I submit that we care about rationality because we believe it’ll give us traction on our problem of deciding what to do. So assuming fatalism (which is what we must do if the AI knows what we’re going to do, perfectly, in advance) demotivates rationality.
Here’s the ultimate problem: our intuitions about these sorts of questions don’t work, because they’re fundamentally rooted in our self-understanding as agents. It’s really, really hard for us to say sensible things about what it might mean to make a “decision” in a deterministic universe, or to understand what that implies. That’s why Newcomb’s problem is a problem—because we have normative principles of rationality that make sense only when we assume that it matters whether or not we follow them, and we don’t really know what it would mean to matter without causal leverage.
(There’s a reason free will is one of Kant’s antimonies of reason. I’ve been meaning to write a post about transcendental arguments and the limits of rationality for a while now… it’ll happen one of these days. But in a nutshell… I just don’t think our brains work when it comes down to comprehending what a deterministic universe looks like on some level other than just solving equations. And note that this might make evolutionary sense—a creature who gets the best results through a [determined] causal chain that includes rationality is going to be selected for the beliefs that make it easiest to use rationality, including the belief that it makes a difference.)
Paul, it sounds like you didn’t understand. A chess playing computer program is completely deterministic, and yet it has to consider alternatives in order to make its move. So also we could be deterministic and we would still have to consider all the possibilities and their benefits before making a move.
So it makes sense to ask whether you would jump off a cliff if you found out that the future is determined. You would find out that the future is determined without knowing exactly which future is determined, just like the chess program, and so you would have to consider the benefits of various possibilities, despite the fact that there is only one possibility, just like there is really only one possibility for the chess program.
So when you considered the various “possibilities”, would “jumping off a cliff” evaluate as equal to “going on with life”, or would the latter evalulate as better? I suspect you would go on with life, just like a chess program moves its queen to avoid being taken by a pawn, despite the fact that it was totally determined to do this.
I do understand. My point is that we ought not to care whether we’re going to consider all the possibilities and benefits.
Oh, but you say, our caring about our consideration process is a determined part of the causal chain leading to our consideration process, and thus to the outcome.
Oh, but I say, we ought not to care* about that caring. Again, recurse as needed. Nothing you can say about the fact that a cognition is in the causal chain leading to a state of affairs counts as a point against the claim that we ought not to care about whether or not we have that cognition if it’s unavoidable.
The paradox is designed to give your decision the practical effect of causing Box B to contain the money or not, without actually labeling this effect “causation.” But I think that if Box B acts as though its contents are caused by your choice, then you should treat it as though they were. So I don’t think the puzzle is really something deep; rather, it is a word game about what it means to cause something.
Perhaps it would be useful to think about how Omega might be doing its prediction. For example, it might have the ability to travel into the future and observe your action before it happens. In this case what you do is directly affecting what the box contains, and the problem’s statement that whatever you choose won’t affect the contents of the box is just wrong.
Or maybe it has a copy of the entire state of your brain, and can simulate you in a software sandbox inside its own mind long enough to see what you will do. In this case it makes sense to think of the box as not being empty or full until you’ve made your choice, if you are the copy in the sandbox. If you aren’t the copy in the sandbox then you’d be better off choosing both boxes, but the way the problem’s set up you can’t tell this. You can still try to maximize future wealth. My arithmetic says that choosing Box B is the best strategy in this case. (Mixed strategies, where you hope that the sandbox version of yourself will randomly choose Box B alone and the outside one will choose both, are dominated by choosing Box B. Also I assume that if you are in the sandbox, you want to maximize the wealth of the outside agent. I think this is reasonable because it seems like there is nothing else to care about, but perhaps someone will disagree.)
You could interpret Omega differently than in these stories, although I think my first point above that you should think of your choice as causing Omega to put money in the box, or not, is reasonable. I would say that the fact that Omega put the money in the box chronologically before you make the decision is irrelevant. I think uncertainty about an event that has already happened, but that hasn’t been revealed to you, is basically the same thing as uncertainty about something that hasn’t happened yet, and it should be modeled the same way.
I have two arguments for going for Box B. First, for a scientist it’s not unusual that every rational argument (=theory) predicts that only two-boxing makes sense. Still, if the experiment again and again refutes that, it’s obviously the theory that’s wrong and there’s obviously something more to reality than that which fueled the theories. Actually, we even see dilemmas like Newcomb’s in the contextuality of quantum measurements. Measurement tops rationality or theory, every time. That’s why science is successful and philosophy is not.
Second, there’s no question I choose box B. Either I get the million $ -- or I have proven an extragalactical superintelligence wrong. How cool is that? 1000$? Have you looked at the exchange rates lately?
Paul, if we were determined, what would you mean when you say that “we ought not to care”? Do you mean to say that the outcome would be better if we didn’t care? The fact that the caring is part of the causal chain does have something to do with this: the outcome may be determined by whether or not we care. So if you consider one outcome better than another (only one really possible, but both possible as far as you know), then either “caring” or “not caring” might be preferable, depending on which one would lead to each outcome.
Eliezer, if a smart creature modifies itself in order to gain strategic advantages from committing itself to future actions, it must think could better achieve its goals by doing so. If so, why should we be concerned, if those goals do not conflict with our goals?
I think Anonymous, Unknown and Eliezer have been very helpful so far. Following on from them, here is my take:
There are many ways Omega could be doing the prediction/placement and it may well matter exactly how the problem is set up. For example, you might be deterministic and he is precalculating your choice (much like we might be able to do with an insect or computer program), or he might be using a quantum suicide method, (quantum) randomizing whether the million goes in and then destroying the world iff you pick the wrong option (This will lead to us observing him being correct 100⁄100 times assuming a many worlds interpretation of QM). Or he could have just got lucky with the last 100 people he tried it on.
If it is the deterministic option, then what do the counterfactuals about choosing the other box even mean? My approach is to say that ‘You could choose X’ means that if you had desired to choose X, then you would have. This is a standard way of understanding ‘could’ in a deterministic universe. Then the answer depends on how we suppose the world to be different to give you counterfactual desires. If we do it with a miracle near the moment of choice (history is the same, but then your desires change non-physically), then you ought two-box as Omega can’t have predicted this. If we do it with an earlier miracle, or with a change to the initial conditions of the universe (the Tannsjo interpretation of counterfactuals) then you ought one-box as Omega would have predicted your choice. Thus, if we are understanding Omega as extrapolating your deterministic thinking, then the answer will depend on how we understand the counterfactuals. One-boxers and Two-boxers would be people who interpret the natural counterfactual in the example in different (and equally valid) ways.
If we understand it as Omega using a quantum suicide method, then the objectively right choice depends on his initial probabilities of putting the million in the box. If he does it with a 50% chance, then take just one box. There is a 50% chance the world will end either choice, but this way, in the case where it doesn’t, you will have a million rather than a thousand. If, however, he uses a 99% chance of putting nothing in the box, then one-boxing has a 99% chance of destroying the world which dominates the value of the extra money, so instead two-box, take the thousand and live.
If he just got lucky a hundred times, then you are best off two-boxing.
If he time travels, then it depends on the nature of time-travel...
Thus the answer depends on key details not told to us at the outset. Some people accuse all philosophical examples (like the trolley problems) of not giving enough information, but in those cases it is fairly obvious how we are expected to fill in the details. This is not true here. I don’t think the Newcomb problem has a single correct answer. The value of it is to show us the different possibilities that could lead to the situation as specified and to see how they give different answers, hopefully illuminating the topic of free-will, counterfactuals and prediction.
Be careful of this sort of argument, any time you find yourself defining the “winner” as someone other than the agent who is currently smiling from on top of a giant heap.
This made me laugh. Well said!
There’s only one question about this scenario for me—is it possible for a sufficiently intelligent being to fully, fully model an individual human brain? If so, (and I think it’s tough to argue ‘no’ unless you think there’s a serious glass ceiling for intelligence) choose box B. If you try and second-guess (or, hell, googolth-guess) Omega, you’re taking the risk that Omega is not smart enough to have modelled your consciousness sufficiently well. How big is this risk? 100 times out of 100 speaks for itself. Omega is cleverer than we can understand. Box B.
(Time travel? No thanks. I find the probability that Omega is simulating people’s minds a hell of a lot more likely than that he’s time travelling, destroying the universe etc. And even if he were, Box B!)
If you can have your brain modelled exactly—to the point where there is an identical simulation of your entire conscious mind and what it perceives—then a lot of weird stuff can go on. However, none of it will violate causality. (Quantum effects messing up the simulation or changing the original? I guess if the model could be regularly updated based on the original...but I don’t know what I’m talking about now ;) )
How does the box know? I could open B with the intent of opening only B or I could open B with the intent of then opening A. Perhaps Omega has locked the boxes such that they only open when you shout your choice to the sky. That would beat my preferred strategy of opening B before deciding which to choose. I open boxes without choosing to take them all the time.
Are our common notions about boxes catching us here? In my experience, opening a box rarely makes nearby objects disintegrate. It is physically impossible to “leave $1000 on the table,” because it will disintegrate if you do not choose A. I also have no experience with trans-galactic super-intelligences, and its ability to make time-traveling super-boxes is already covered by the discussion above. I think of boxes as things that either are full or are not, independent of my intentions, but I also think of them as things that do not disintegrate based on my intentions.
Taking both is equivalent to just taking A. Restate the problem that way: take A and get $1000 or take B and get $1,000,000. Which would you prefer?
I think the problem becomes more amusing if box A does not disintegrate. They are just two cardboard boxes, one of which is open and visibly has $1000 in it. You don’t shout your intention to the sky, you just take whatever boxes you like. The reasonable thing to do is open box B; if it is empty, take box A too; if it is full of money, heck, take box A too. They’re boxes, they can’t stop you. But that logic makes you a two-boxer, so if Omega anticipates it, and Omega does, B will be empty. You definitely need to pre-commit to taking only B. Assume you have, and you open B, and B has $1,000,000. You win! Now what do you do? A is just sitting there with $1000 in it. You already have your million. You even took it out of the box, in case the box disintegrates. Do you literally walk away from $1000, on the belief that Omega has some hidden trick to retroactively make B empty? The rule was not that the money would go away if you took both, the rule is that B would be empty. B was not empty. A is still there. You already won for being a one-boxer, does anything stop you from being a two-boxer and winning the bonus $1000?
Eliezer, if a smart creature modifies itself in order to gain strategic advantages from committing itself to future actions, it must think could better achieve its goals by doing so. If so, why should we be concerned, if those goals do not conflict with our goals?
Well, there’s a number of answers I could give to this:
*) After you’ve spent some time working in the framework of a decision theory where dynamic inconsistencies naturally Don’t Happen—not because there’s an extra clause forbidding them, but because the simple foundations just don’t give rise to them—then an intertemporal preference reversal starts looking like just another preference reversal.
*) I developed my decision theory using mathematical technology, like Pearl’s causal graphs, that wasn’t around when causal decision theory was invented. (CDT takes counterfactual distributions as fixed givens, but I have to compute them from observation somehow.) So it’s not surprising if I think I can do better.
*) We’re not talking about a patchwork of self-modifications. An AI can easily generally modify its future self once-and-for-all to do what its past self would have wished on future problems even if the past self did not explicitly consider them. Why would I bother to consider the general framework of classical causal decision theory when I don’t expect the AI to work inside that general framework for longer than 50 milliseconds?
*) I did work out what an initially causal-decision-theorist AI would modify itself to, if it booted up on July 11, 2018, and it looks something like this: “Behave like a nonclassical-decision-theorist if you are confronting a Newcomblike problem that was determined by ‘causally’ interacting with you after July 11, 2018, and otherwise behave like a classical causal decision theorist.” Roughly, self-modifying capability in a classical causal decision theorist doesn’t fix the problem that gives rise to the intertemporal preference reversals, it just makes one temporal self win out over all the others.
*) Imagine time spread out before you like a 4D crystal. Now imagine pointing to one point in that crystal, and saying, “The rational decision given information X, and utility function Y, is A”, then pointing to another point in the crystal and saying “The rational decision given information X, and utility function Y, is B”. Of course you have to be careful that all conditions really are exactly identical—the agent has not learned anything over the course of time that changes X, the agent is not selfish with temporal deixis which changes Y. But if all these conditions are fulfilled, I don’t see why an intertemporal inconsistency should be any less disturbing than an interspatial inconsistency. You can’t have 2 + 2 = 4 in Dallas and 2 + 2 = 3 in Minneapolis.
*) What happens if I want to use a computation distributed over a large enough volume that there are lightspeed delays and no objective space of simultaneity? Do the pieces of the program start fighting each other?
*) Classical causal decision theory is just not optimized for the purpose I need a decision theory for, any more than a toaster is likely to work well as a lawnmower. They did not have my design requirements in mind.
*) I don’t have to put up with dynamic inconsistencies. Why should I?
Maybe perhaps we are right and they are wrong?
The issue is to be decided, not by referring to perceived status or expertise, but by looking at who has the better arguments. Only when we cannot evaluate the arguments does making an educated guess based on perceived expertise become appropriate.
Again: how much do we want to bet that Eliezer won’t admit that he’s wrong in this case? Do we have someone willing to wager another 10 credibility units?
Caledonian: you can stop talking about wagering credibility units now, we all know you don’t have funds for the smallest stake.
Ben Jones: if we assume that Omega is perfectly simulating the human mind, then when we are choosing between B and A+B, we don’t know whether we are in reality or simulation. In reality, our choice does not affect the million, but in the simulation this will. So we should reason “I’d better take only box B, because if this is the simulation then that will change whether or not I get the million in reality”.
There is a big difference between having time inconsistent preferences, and time inconsistent strategies because of the strategic incentives of the game you are playing. Trying to find a set of preferences that avoids all strategic conflicts between your different actions seems a fool’s errand.
What we have here is an inability to recognize that causality no longer flows only from ‘past’ to ‘future’.
If we’re given a box that could contain $1,000 or nothing, we calculate the expected value of the superposition of these two possibilities. We don’t actually expect that there’s a superposition within the box—we simply adopt a technique to help compensate for what we do not know. From our ignorant perspective, either case could be real, although in actuality either the box has the money or it does not.
This is similar. The amount of money in the box depends on what choice we make. The fact that the placement of money into the box happened in the past is irrelevant, because we’ve already presumed that the relevant cause-and-effect relationship works backwards in time.
Eliezer states that the past is fixed. Well, it may be fixed in some absolute sense (although that is a complicated topic), but from our ignorant perspective we have to consider what appears to us to be the possible alternatives. That means that we must consider the money in the boxes to be uncertain. Choosing causes Omega to put a particular amount of money in the box. That this happened in the past is irrelevant, because the causal dependence points into the past instead of the future.
Even if we ignore actual time travel, we must consider the amount of money present to be uncertain until we choose, which then determines how much is there—in the sense of our technique, from our limited perspective.
If we accept that Omega is really as accurate as it appears to be—which is not a small thing to accept, certainly—and we want to maximize money, then the correct choice is B.
How about simply multiplying? Treat Omega as a fair coin toss. 50% of a million is half-a-million, and that’s vastly bigger than a thousand. You can ignore the question of whether omega has filled the box, in deciding that the uncertain box is more important. So much more important, that the chance of gaining an extra 1000 isn’t worth the bother of trying to beat the puzzle. You just grab the important box.
After you’ve spent some time working in the framework of a decision theory where dynamic inconsistencies naturally Don’t Happen—not because there’s an extra clause forbidding them, but because the simple foundations just don’t give rise to them—then an intertemporal preference reversal starts looking like just another preference reversal.
… Roughly, self-modifying capability in a classical causal decision theorist doesn’t fix the problem that gives rise to the intertemporal preference reversals, it just makes one temporal self win out over all the others.
This is a genuine concern. Note that most instances of precommitment arise quite naturally due to reputational concerns: any agent which is complex enough to come up with the concept of reputation will make superficially irrational (“hawkish”) choices in order not to be pushed around in the future. Moreover, precommitment is only worthwhile if it can be accurately assessed by the counterparty: an agent will not want to “generally modify its future self … to do what its past self would have wished” unless it can gain a reputational advantage by doing so.
There is a big difference between having time inconsistent preferences, and time inconsistent strategies because of the strategic incentives of the game you are playing.
I can see why a human would have time-inconsistent strategies—because of inconsistent preferences between their past and future self, hyperbolic discounting functions, that sort of thing. I am quite at a loss to understand why an agent with a constant, external utility function should experience inconsistent strategies under any circumstance, regardless of strategic incentives. Expected utility lets us add up conflicting incentives and reduce to a single preference: a multiplicity of strategic incentives is not an excuse for inconsistency.
I am a Bayesian; I don’t believe in probability calculations that come out different ways when you do them using different valid derivations. Why should I believe in decisional calculations that come out in different ways at different times?
I’m not sure that even a causal decision theorist would agree with you about strategic inconsistency being okay—they would just insist that there is an important difference between deciding to take only box B at 7:00am vs 7:10am, if Omega chooses at 7:05am, because in the former case you cause Omega’s action while in the latter case you do not. In other words, they would insist the two situations are importantly different, not that time inconsistency is okay.
And I observe again that a self-modifying AI which finds itself with time-inconsistent preferences, strategies, what-have-you, will not stay in this situation for long—it’s not a world I can live in, professionally speaking.
Trying to find a set of preferences that avoids all strategic conflicts between your different actions seems a fool’s errand.
I guess I completed the fool’s errand, then...
Do you at least agree that self-modifying AIs tend not to contain time-inconsistent strategies for very long?
The entire issue of casual versus inferential decision theory, and of the seemingly magical powers of the chooser in the Newcomb problem, are serious distractions here, as Eliezer has the same issue in an ordinary commitment situation, e.g., punishment. I suggest starting this conversation over from such an ordinary simple example.
Let me restate: Two boxes appear. If you touch box A, the contents of box B are vaporized. If you attempt to open box B, box A and it’s contents are vaporized. Contents as previously specified. We could probably build these now.
Experimentally, how do we distinguish this from the description in the main thread? Why are we taking Omega seriously when if the discussion dealt with the number of angels dancing on the head of pin the derision would be palpable? The experimental data point to taking box B. Even if Omega is observed delivering the boxes, and making the specified claims regarding their contents, why are these claims taken on faith as being an accurate description of the problem?
Let’s take Bayes seriously.
Sometime ago there was a posting about something like “If all you knew was that the past 5 mornings the sun rose, what would you assign the probability the that sun would rise next morning? It came out so something like 5⁄6 or 4⁄5 or so.
But of course that’s not all we know, and so we’d get different numbers.
Now what’s given here is that Omega has been correct on a hundred occasions so far. If that’s all we know, we should estimate the probability of him being right next time at about 99%. So if you’re a one-boxer your expectation would be $990,000 and a two-boxer would have an expectation of $11,000.
But the whole argument seems to be about what extra knowledge you have; in particular, Can causation work in reverse? or Is Omega really superintelligent? or even Are the conditions stated in the problem logically inconsistent (which would justify any answer.)
Perhaps someone who enjoys these kinds of odds calculations could investigate the extent to which we know these things and how it affects the outcome?
Eliezer, I have a question about this: “There is no finite amount of life lived N where I would prefer a 80.0001% probability of living N years to an 0.0001% chance of living a googolplex years and an 80% chance of living forever. This is a sufficient condition to imply that my utility function is unbounded.”
I can see that this preference implies an unbounded utility function, given that a longer life has a greater utility. However, simply stated in that way, most people might agree with the preference. But consider this gamble instead:
A: Live 500 years and then die, with certainty.
B: Live forever, with probability 0.000000001%; die within the next ten seconds, with probability 99.999999999%
Do you choose A or B? Is it possible to choose A and have an unbounded utility function with respect to life? It seems to me that an unbounded utility function implies the choice of B. But then what if the probability of living forever becomes one in a googleplex, or whatever? Of course, this is a kind of Pascal’s Wager; but it seems to me that your utility function implies that you should accept the Wager.
It also seems to me that the intuitions suggesting to you and others that Pascal’s Mugging should be rejected similarly are based on an intuition of a bounded utility function. Emotions can’t react infinitely to anything; as one commenter put it, “I can only feel so much horror.” So to the degree that people’s preferences reflect their emotions, they have bounded utility functions. In the abstract, not emotionally but mentally, it is possible to have an unbounded function. But if you do, and act on it, others will think you a fanatic. For a fanatic cares infinitely for what he perceives to be an infinite good, whereas normal people do not care infinitely about anything.
This isn’t necessarily against an unbounded function; I’m simply trying to draw out the implications.
If this was the only chance you ever get to determine your lifespan—then choose B.
In the real world, it would probably be a better idea to discard both options and use your natural lifespan to search for alternative paths to immortality.
I disagree, not surprisingly, since I was the author of the comment to which you are responding. I would choose A, and I think anyone sensible would choose A. There’s not much one can say here in the way of argument, but it is obvious to me that choosing B here is following your ideals off a cliff. Especially since I can add a few hundred 9s there, and by your argument you should still choose B.
they would just insist that there is an important difference between deciding to take only box B at 7:00am vs 7:10am, if Omega chooses at 7:05am
But that’s exactly what strategic inconsistency is about. Even if you had decided to take only box B at 7:00am, by 7:06am a rational agent will just change his mind and choose to take both boxes. Omega knows this, hence it will put nothing into box B. The only way out is if the AI self-commits to take only box B is a way that’s verifiable by Omega.
When the stakes are high enough I one-box, while gritting my teeth. Otherwise, I’m more interested in demonstrating my “rationality” (Eliezer has convinced me to use those quotes).
Perhaps we could just specify an agent that uses reverse causation in only particular situations, as it seems that humans are capable of doing.
Paul G, almost certainly, right? Still, as you say, it has little bearing on one’s answer to the question.
In fact, not true, it does. Is there anything to stop myself making a mental pact with all my simulation buddies (and ‘myself’, whoever he be) to go for Box B?
In arguing for the single box, Yudkowsky has made an assumption that I disagree with: at the very end, he changes the stakes and declares that your choice should still be the same.
My way of looking at it is similar to what Hendrik Boom has said. You have a choice between betting on Omega being right and betting on Omega being wrong.
A = Contents of box A
B = What may be in box B (if it isn’t empty)
A is yours, in the sense that you can take it and do whatever you want with it. One thing you can do with A is pay it for a chance to win B if Omega is right. Your other option is to pay nothing for a chance to win B if Omega is wrong.
Then just make your bet based on what you know about Omega. As stated, we only know his track record over 100 attempts, so use that. Don’t worry about the nature of causality or whether he might be scanning your brain. We don’t know those things.
If you do it that way, you’ll probably find that your answer depends on A and B as well as Omega’s track record.
I’d probably put Omega at around 99%, as Hendrik did. Keeping A at a thousand dollars, I’d one-box if B were a million dollars or if B were something I needed to save my life. But I’d two-box if B were a thousand dollars and one cent.
So I think changing A and B and declaring that your strategy must stay the same is invalid.
IMO there’s less to Newcomb’s paradox than meets the eye. It’s basically “A future-predicting being who controls the set of choices could make rational choices look silly by making sure they had bad outcomes”. OK, yes, he could. Surprised?
What I think makes it seem paradoxical is that the paradox both assures us that Omega controls the outcome perfectly, and cues us that this isn’t so (“He’s already left” etc). Once you settle what it’s really saying either way, the rest follows.
Yes, this is really an issue of whether your choice causes Omega’s action or not. The only way for Omega to be a perfect predictor is for your choice to actually cause Omega’s action. (For example, Omega ‘sees the future’ and acts based on your choice). If your choice causes Omega’s action, then choosing B is the rational decision, as it causes the box to have the million.
If your choice does not cause Omega’s action, then choosing both boxes is the winning approach. in this case, Omega is merely giving big awards to some people and small awards to others.
If your choice has some %age chance of causing Omega’s action, then the problem becomes one of risk management. What is your chance of getting the big award if you choose B compared with the utility of the two chocies.
I agree with what Tom posted. The only paradox here is that the problem both states that your choice causes Omega’s action (because it supposedly predicts perfectly), and also says that your action does not cause Omega’s action (because the decision is already made). Thus, wether or not you think box B, or both boxes is the correct choice, depends on which of these two contradictory statements you end up believing.
the dominant consensus in modern decision theory is that one should two-box...there’s a common attitude that “Verbal arguments for one-boxing are easy to come by, what’s hard is developing a good decision theory that one-boxes”
Those are contrary positions, right?
Robin Hason:
Punishment is ordinary, but Newcomb’s problem is simple! You can’t have both.
The advantage of an ordinary situation like punishment is that game theorists can’t deny the fact on the ground that governments exist, but they can claim it’s because we’re all irrational, which doesn’t leave many directions to go in.
I agree that “rationality” should be the thing that makes you win but the Newcomb paradox seems kind of contrived.
If there is a more powerful entity throwing good utilities at normally dumb decisions and bad utilities at normally good decisions then you can make any dumb thing look genius because you are under different rules than the world we live in at present.
I would ask Alpha for help and do what he tells me to do. Alpha is an AI that is also never wrong when it comes to predicting the future, just like Omega. Alpha would examine omega and me and extrapolate Omega’s extrapolated decision. If there is a million in box B I pick both otherwise just B.
Looks like Omega will be wrong either way, or will I be wrong? Or will the universe crash?
To me, the decision is very easy. Omega obviously possesses more prescience about my box-taking decision than I do myself. He’s been able to guess correct in the past, so I’d see no reason to doubt him with myself. With that in mind, the obvious choice is to take box B.
If Omega is so nearly always correct, then determinism is shown to exist (at least to some extent). That being the case, causality would be nothing but an illusion. So I’d see no problem with it working in “reverse”.
Fascinating. A few days after I read this, it struck me that a form of Newcomb’s Problem actually occurs in real life—voting in a large election. Here’s what I mean.
Say you’re sitting at home pondering whether to vote. If you decide to stay home, you benefit by avoiding the minor inconvenience of driving and standing in line. (Like gaining $1000.) If you decide to vote, you’ll fail to avoid the inconvenience, meanwhile you know your individual vote almost certainly won’t make a statistical difference in getting your candidate elected. (Which would be like winning $1000000.) So rationally, stay at home and hope your candidate wins, right? And then you’ll have avoided the inconvenience too. Take both boxes.
But here’s the twist. If you muster the will to vote, it stands to reason that those of a similar mind to you (a potentially statistically significant number of people) would also muster the will to vote, because of their similarity to you. So knowing this, why not stay home anyway, avoid the inconvenience, and trust all those others to vote and win the election? They’re going to do what they’re going to do. Your actions can’t change that. The contents of the boxes can’t be changed by your actions. Well, if you don’t vote, perhaps that means neither will the others, and so it goes. Therein lies the similarity to Newcomb’s problem.
A very good point. I’m the type to stay home from the polls. But I’d also one-box..… hm.
I think it may have to do with the very weak correlation between my choice to vote and the choice of those of a similar mind to me to vote as opposed to the very strong correlation between my choice to one-box and Omega’s choice to put $1,000,000 in box B.
Rational agents defect against a bunch of irrational fools who are mostly choosing for signalling purposes and who may well vote for the other guy even if they cooperate.
“If it ever turns out that Bayes fails—receives systematically lower rewards on some problem, relative to a superior alternative, in virtue of its mere decisions—then Bayes has to go out the window.”
What exactly do you mean by mere decisions? I can construct problems where agents that use few computational resources win. Bayesian agents by your own admission have to use energy to get in mutual information with the environment (a state I am still suspecious of), so they have to use energy, meaning they lose.
The premise is that a rational agent would start out convinced that this story about the alien that knows in advance what they’ll decide appears to be false.
The Kolomogorov complexity of the story about the alien is very large because we have to hypothesize some mechanism by which it can extrapolate the contents of minds. Even if I saw the alien land a million times and watched the box-picking connect with the box contents as they’re supposed to, it is simpler to assume that the boxes are some stage magic trick, or even that they are an exception to the usual laws of physics.
Once we’ve done enough experiments that we’re forced into the hypothesis that the boxes are an exception to the usual laws of physics, it’s pretty clear what to do. The obvious revised laws of physics based on the new observations make it clear that one should choose just one box.
So a rational agent would do the right thing, but only because there’s no way to get it to believe the backstory.
It is not possible for an agent to make a rational choice between 1 or 2 boxes if the agent and Omega can both be simulated by Turing machines. Proof: Omega predicts the agent’s decision by simulating it. This requires Omega to have greater algorithmic complexity than the agent (including the nonzero complexity of the compiler or interpreter). But a rational choice by the agent requires that it simulate Omega, which requires that the agent have greater algorithmic complexity instead.
In other words, the agent X, with complexity K(X), must model Omega which has complexity K(X + “put $1 million in box B if X does not take box A”), which is slightly greater than K(X).
In the framework of the ideal rational agent in AIXI, the agent guesses that Omega is the shortest program consistent with the observed interaction so far. But it can never guess Omega because its complexity is greater than that of the agent. Since AIXI is optimal, no other agent can make a rational choice either.
As an aside, this is also a wonderful demonstration of the illusion of free will.
Um, AIXI is not computable. Relatedly, K(AIXI) is undefined, as AIXI is not a finite object.
Also, A can simulate B, even when K(B)>K(A). For example, one could easily define a computer program which, given sufficient computing resources, simulates all Turing machines on all inputs. This must obviously include those with much higher Kolmogorov complexity.
Yes, you run into issues of two Turing machines/agents/whatever simulating each other. (You could also get this from the recursion theorem.) What happens then? Simple: neither simulation ever halts.
Not so. I don’t need to simulate a hungry tiger in order to stay safely (and rationally) away from it, even though I don’t know the exact methods by which its brain will identify me as a tasty treat. If you think that one can’t “rationally” stay away from hungry tigers, then we’re using the word “rationally” vastly differently.
Okay, maybe I am stupid, maybe I am unfamiliar with all the literature on the problem, maybe my English sucks, but I fail to understand the following:
-
Is the agent aware of the fact that one boxers get 1 000 000 at the moment Omega “scans” him and presents the boxes?
OR
Is agent told about this after Omega “has left”?
OR
Is agent unaware of the fact that Omega rewards one-boxers at all?
-
P.S.: Also, as most “decision paradoxes”, this one will have different solutions depending on the context (is the agent a starving child in Africa, or a “megacorp” CEO)
I’m a convinced two-boxer, but I’ll try to put my argument without any bias. It seems to me the way this problem has been put has been an attempt to rig it for the one boxers. When we talk about “precommitment” it is suggested the subject has an advance knowledge of Omega and what is to happen. The way I thought the paradox worked, was that Omega would scan/analyze a person and make its prediction, all before the person ever heard of the dilemna. Therefore, a person has no way to develop an intention of being a one-boxer or a two-boxer that in any way affects Omega’s prediction. For the Irene/Rachel situation, there is no way to ever “precommit;” the subject never gets to play Omega’s game again and Omega scans their brains before they ever heard of him. (So imagine you only had one shot at playing Omega’s game, and Omega made its prediction before you ever came to this website or anywhere else and heard about Newcomb’s paradox. Then that already decides what it puts in the boxes.)
Secondly, I think a requirement of the problem is that your choice, at the time of actually taking the box(es), cannot effect what’s in the box. What we have here are two completely different problems; if in any way Omega or your choice information can travel back in time to change the contents of the box, the choice is trivial. So yes, Omega may have chosen to discriminate against rational people and award irrational ones; the point is, there is absolutely nothing we can do about it (neither in precommitment or at the actual time to choose).
To clarify why I think two-boxing is the right choice, I would propose a real life experiment. Let’s say we developed a survey, which, by asking people various questions about logic or the paranormal etc..., we use to classify them into one-boxers or two-boxers. The crux of the setup is, all the volunteers we take have never heard of the Newcomb Paradox; we make up any reason we want for them to take the survey. THEN, having already placed money or no money in box B, we give them the story about Omega and let them make the choice. Hypothetically, our survey could be 100% accurate; even if not it may be very accurate such that many of our predicted one-boxers will be glad to find their choice rewarded. In essence, they cannot “precommit” and their choice won’t magically change the contents of the box (based on a human survey). They also cannot go back and convince themselves to cheat on our survey—it’s impossible—and that is how Omega is supposed to operate. The point is, from the experimental point of view, every single person would make more from taking both boxes, because at the time of choice there’s always the extra $1000 in box A.
The key point you’ve missed in your analysis, however, is that Omega is almost always correct in his predictions.
It doesn’t matter how Omega does it—that is a separate problem. You don’t have enough information about his process of prediction to make any rational judgment about it except for the fact that it is a very, very good process. Brain scans, reversed causality, time travel, none of those ideas matter. In the paradox as originally posed, all you have are guesses about how he may have done it, and you would be an utter fool to give higher weight to those guesses than to the fact that Omega is always right.
The if observations (that Omega is always right) disagree with theory (that Omega cannot possibly be right), it is the theory that is wrong, every time.
Thus the rational agent should, in this situation, give extremely low weight to his understanding of the way the universe works, since it is obviously flawed (the existence of a perfect predictor proves this). The question really comes down to 100% chance of getting $1000 plus a nearly 0% chance of getting $1.01 million, vs nearly 100% chance of getting $1 million.
What really blows my mind about making the 2-box choice is that you can significantly reduce Omega’s ability to predict the outcome, and unless you are absolutely desperate for that $1000* the 2-box choice doesn’t become superior until Omega is only roughly 50% accurate (at 50.1% the outcome equalizes). Only then do you expect to get more money, on average, by choosing both boxes.
In other words, if you think Omega is doing anything but flipping a coin to determine the contents of box B, you are better off choosing box B.
*I could see the value of $1000 rising significantly if, for example, a man is holding a gun to your head and will kill you in two minutes if you don’t give him $1000. In this case, any uncertainty of Omega’s abilities are overshadowed by the certainty of the $1000. This inverts if the man with the gun is demanding more than $1000 - making the 2-box choice a non-option.
If the alien is able to predict your decision, it follows that your decision is a function of your state at the time the alien analyzes you. Then, there is no meaningful question of “what should you do?” Either you are in a universe in which you are disposed to choose the one box AND the alien has placed the million dollars, or you are in a universe in which you are disposed to take both boxes AND the alien has placed nothing. If the former, you will have the subjective experience of “deciding to take the one box”, which is itself a deterministic process that feels like a free choice, and you will find the million. If the latter, you will have the subjective experience of “deciding to take both boxes”, and you will find nothing in the opaque box.
In short, the framing of the problem implies that your decision-making process is deterministic (which does not preclude it being a process that you are conscious of participating in), and the figurative notion of “free will” does not include literal degrees of freedom. If you must insist on viewing it as a question of what the correct action is, it’s to take the one box. Regardless of your motivation, even if your reason for doing so is this argument, you will find yourself in a universe in which events (including thought events) have led you to take one box, and these are the same universes in which the alien places a million dollars in the box.
Yes, but when I tried to write it up, I realized that I was starting to write a small book. And it wasn’t the most important book I had to write, so I shelved it. My slow writing speed really is the bane of my existence. The theory I worked out seems, to me, to have many nice properties besides being well-suited to Newcomblike problems. It would make a nice PhD thesis, if I could get someone to accept it as my PhD thesis. But that’s pretty much what it would take to make me unshelve the project. Otherwise I can’t justify the time expenditure, not at the speed I currently write books.
If you have a solution to Newcomb’s Problem, but don’t have the time to work on it, is there any chance you will post a sketch of your solution for other people to investigate and/or develop?
Isn’t this the exact opposite arguement from the one that was made in Dust Specks vs 50 Years of Torture?
Correct me if I’m wrong, but the argument in this post seems to be “Don’t cling to a supposedly-perfect ‘causal decision theory’ if it would make you lose gracefully, take the action that makes you WIN.”
And the argument for preferring 50 Years of Torture over 3^^^3 Dust Specks is that “The moral theory is perfect. It must be clung to, even when the result is a major loss.”
How can both of these be true?
(And yes, I am defining “preferring 50 Years of Torture over 3^^^3 Dust Specks” as an unmitigated loss. A moral theory that returns a result that almost every moral person alive would view as abhorrent has at least one flaw if it could produce such a major loss.)
One belated point, some people seem to think that Omega’s successful prediction is virtually impossible and that the experiment is a purely fanciful speculation. However it seems to me entirely plausible that having you fill out a questionnaire while being brain scanned might well bring this situation into practicality in the near future. The questions, if filled out correctly, could characterize your personality type with enough accuracy to give a very strong prediction about what you will do. And if you lie, in the future that might be detected with a brain scan. I don’t see anything about this scenario which is absurd, impossible, or even particularly low probability. The one problem is that there might well be a certain fraction of people for whom you really can’t predict what they’ll do, because they’re right on the edge and will decide more or less at random. But you could exclude them from the experiment and just give those with solid predictions a shot at the boxes.
Somehow I’d never thought of this as a rationalist’s dilemma, but rather a determinism vs free will illustration. I still see it that way. You cannot both believe you have a choice AND that Omega has perfect prediction.
The only “rational” (in all senses of the word) response I support is: shut up and multiply. Estimate the chance that he has predicted wrong, and if that gives you +expected value, take both boxes. I phrase this as advice, but in fact I mean it as prediction of rational behavior.
If you really want to impress an inspector who can see your internal state, by altering your utility function to conform to their wishes, then one strategy would be to create a trusted external “brain surgeon” agent with the keys to your utility function to change it back again after your utility function has been inspected—and then forget all about the existence of the surgeon.
The inspector will be able to see the lock on your utility function—but those are pretty standard issue.
As a rationalist, it might be worthwhile to take the one box just so those Omega know-it-alls will be wrong for once.
If random number generators not determinable by Omega exist, generate one bit of entropy. If not, take the million bucks. Quantum randomness anyone?
Given how many times Eliezer has linked to it, it’s a little surprising that nobody seems to have picked up on this yet, but the paragraph about the utility function not being up for grabs seems to have a pretty serious technical flaw:
Let p = 80% and let q be one in a million. I’m pretty sure that what Eliezer has in mind is,
(A) For all n, there is an even larger n’ such that (p+q)u(live n years) < pu(live n’ years) + q*(live a googolplex years).
This indeed means that {u(live n’ years) | n’ in N} is not upwards bounded—I did check the math :-) --, which means that u is not upwards bounded, which means that u is not bounded. But what he actually said was,
(B) For all n, (p+q)u(live n years) ⇐ pu(live forever) + q*u(live googolplex years)
That’s not only different from A, it contradicts A! It doesn’t imply that u needs to be bounded, of course, but it flat out states that {u(live n years) | n in N} is upwards bounded by (pu(live forever) + qu(live googolplex years))/(p+q).
(We may perhaps see this as reason enough to extend the domain of our utility function to some superset of the real numbers. In that case it’s no longer necessary for the utility function to be unbounded to satisfy (A), though—although we might invent a new condition like “not bounded by a real number.”)
Benja, the notion is that “live forever” does not have any finite utility, since it is bounded below by a series of finite lifetimes whose utility increases without bound.
thinks—Okay, so if I understand you correctly now, the essential thing I was missing that you meant to imply was that the utility of living forever must necessarily be equal to (cannot be larger than) the limit of the utilities of living a finite number of years. Then, if u(live forever) is finite, p times the difference between u(live forever) and u(live n years) must become arbitrarily small, and thus, eventually smaller than q times the difference between u(live n years) and u(live googolplex years). You then arrive at a contradiction, from which you conclude that u(live forever) = the limit of u(live n years) cannot be finite. Okay. Without the qualification I was missing, the condition wouldn’t be inconsistent with a bounded utility function, since the difference wouldn’t have to get arbitrarily small, but the qualification certainly seems reasonable.
(I would still prefer for all possibilities considered to have defined utilities, which would mean extending the range of the utility function beyond the real numbers, which would mean that u(live forever) would, technically, be an upper bound for {u(live n years) | n in N} -- that’s what I had in mind in my last paragraph above. But you’re not required to share my preferences on framing the issue, of course :-))
There are two ways of thinking about the problem.
1. You see the problem as decision theorist, and see a conflict between the expected utility recommendation and the dominance principle. People who have seen the problem this way have been led into various forms of causal decision theory.
2. You see the problem as game theorist, and are trying to figure out the predictor’s utility function, what points are focal and why. People who have seen the problem this way have been led into various discussions of tacit coordination.
Newcomb’s scenario is a paradox, not meant to be solved, but rather explored in different directions. In its original form, much like the Monty Hall problem, Newcomb’s scenario is not well stated to give rise to problem with a calculated solution.
This is not a criticism of the problem, indeed it is an ingenious little puzzle.
And there is much to learn from well defined Newcomb like problems.
Re: First, foremost, fundamentally, above all else: Rational agents should WIN.
When Deep Blue beat Gary Kasparov, did that prove that Gary Kasparov was “irrational”?
It seems as though it would be unreasonable to expect even highly rational agents to win—if pitted against superior competition. Rational agents can lose in other ways as well—e.g. by not having access to useful information.
Since there are plenty of ways in which rational agents can lose, “winning” seems unlikely to be part of a reasonable definition of rationality.
I think I’ve solved it.
I’m a little late to this, and given the amount of time people smarter than myself have spent thinking about this it seems naive even to myself to think that I have found a solution to this problem. That being said, try as I might, I can’t find a good counter argument to this line of reasoning. Here goes...
The human brain’s function is still mostly a black box to us, but the demonstrated predictive power of this alien is strong evidence that this is not the case with him. If he really can predict human decisions, than the mere fact that you are choosing one box is the best way for you to ensure that will be what is predicted.
The standard attack on this line of reasoning seems to be that since his prediction happened in the past, your decision can’t influence it. But it already has influenced it. He was aware of the decision before you made it (evidenced by his predictive power). In fact, it is not really a decision in the sense of “freely” choosing one of two options (in the way that most people use “freely” at least). Think of this decision as just extremely complicated and seemingly unpredictable data analysis, where the unpredictability comes from never being able to know intimately every part of the decision process and the inputs. But if one could perfectly crack the “black box” of your decision, as this alien appears to have done (at least this seems by far the most plausible explanation to me) then one could predict decisions with the accuracy the alien possesses. In other words, the gears were already in motion for your decision to be made, and the alien was already witness whether you realized it or not. In that sense you aren’t making your decision afterwords when you think you are, you are actually realizing the decision that you were already set up to make at an earlier time.
If you agree with what I have written above, your obvious best decision is to just go ahead and pick one box, and hope that the alien would have predicted this. Based on the evidence, that will probably be enough to make the one million show up. Deciding instead to go for two boxes for any reason whatsoever will probably mean that the million won’t be there. The time issue is just an illusion caused by your imperfect knowledge and data processing that takes time.
Cross-posting from Less Wrong, I think there’s a generalized Russell’s Paradox problem with this theory of rationality:
Eliezer, why didn’t you answer the question I asked at the beginning of the comment section of this post?
The ‘delayed choice’ experiments of Wheeler & others appear to show a causality that goes backward in time. So, I would take just Box B.
I would use a true quantum random generator. 51% of the time I would take only one box. Otherwise I would take two boxes. Thus Omega has to guess that I will only take one box, but I have a 49% chance of taking home another $1000. My expected winnings will be $1000490 and I am per Eliezer’s definition more rational than he.
This is why I restate the problem to exclude the million when people choose randomly.
I’m a bit nervous, this is my first comment here, and I feel quite out of my league.
Regarding the “free will” aspect, can one game the system? My rational choice would be to sit right there, arms crossed, and choose no box. Instead, having thus disproved Omega’s infallibility, I’d wait for Omega to come back around, and try to weasel some knowledge out of her.
Rationally, the intelligence that could model mine and predict my likely action (yet fail to predict my inaction enough to not bother with me in the first place), is an intelligence I’d like to have a chat with. That chat would be likely to have tremendously more utility for me than $1,000,000.
Is that a valid choice? Does it disprove Omega’s infallibility? Is it a rational choice?
If messing with the question is not a constructive addition to the debate, accept my apologies, and flame me lightly, please.
Hi. This is a rather old post, so you might not get too many replies.
Newcomb’s problem often comes with the caveat that, if Omega thinks you’re going to game the system, it will leave you with only the $1,000. But yes, we like clever answers here, although we also like to consider, for the purposes of thought experiments, the least convenient possible world in which the loopholes we find have been closed.
Also, may I suggest visiting the welcome thread?
I’ve come around to the majority viewpoint on the alien/Omega problem. It seems to be easier to think about when you pin it down a bit more mathematically.
Let’s suppose the alien determines the probability of me one-boxing is p. For the sake of simplicity, let’s assume he then puts the 1M into one of the boxes with this probability p. (In theory he could do it whenever p exceeded some thresh-hold, but this just complicates the math.)
Therefore, once I encounter the situation, there are two possible states:
a) with probability p there is 1M in one box, and 1k in the other
b) with probability 1-p there is 0 in one box, and 1k in the other So:
the expected return of two-boxing is p(1M+1k)+(1-p)1k = 1Mp + 1kp + 1k − 1kp = 1Mp + 1k
the expected return of one-boxing is 1Mp
If the act of choosing affects the prior determination p, then the expected return calculation differs depending on my choice:
If I choose to two-box, then p=~0, and I get about 1k on average
If I choose to one-box, then p=~1, and I get about 1M on average
In this case, the expected return is higher by one-boxing.
If choosing the box does not affect p, then p is the same in both expected return calculations. In this case, two boxing clearly has better expected return than one-boxing.
Of course if the determination of p is effected by the choice actually made in the future, you have a situation with reverse-time causality.
If I know that I am going to encounter this kind of problem, and it is somehow possible to pre-commit to one boxing before the alien determines the probability p of me doing so, that certainly makes sense. But it is difficult to see why I would maintain that commitment when the choice actually presents itself, unless I actually believe this choice effects p, which, again, implies reverse-time causality.
It seems the problem has been setup in a deliberately confusing manner. It is as if the alien has just decided to find people who are irrational and pay them 1M for it. The problem seems to encourage irrational thinking, maybe because we want to believe that rational people always win, when of course one can set up a fairly absurd situation so that they do not.
Wait a second, the following bounded utility function can explain the quoted preferences:
U(live googolplex years) = 99
limit as N goes to infinity of U(live N years) = 100
U(live forever) = 101
Benja Fallenstein gave an alternative formulation that does imply an unbounded utility function:
But these preferences are pretty counter-intuitive to me. If U(live n years) is unbounded, then the above must hold for any nonzero p, q, and with “googolplex” replaced by any finite number. For example, let p = 1/3^^^3, q = .8, n = 3^^^3, and replace “googolplex” with “0”. Would you really be willing to give up .8 probability of 3^^^3 years of life for a 1/3^^^3 chance at a longer (but still finite) one? And that’s true no matter how many up-arrows we add to these numbers?
“Would you really be willing to give up .8 probability of 3^^^3 years of life for a 1/3^^^3 chance at a longer (but still finite) one?”
I’d like to hear this too.
Okay. There’s two intuitive obstacles, my heuristic as a human that my mind is too weak to handle tiny probabilities and that I should try to live my life on the mainline, and the fact that 3^^^3 already extrapolates a mind larger than the sum of every future experience my present self can empathize with.
But I strongly suspect that answering “No” would enable someone to demonstrate circular / inconsistent preferences on my part, and so I very strongly suspect that my reflective equilibrium would answer “Yes”. Even in the realm of the computable, there are simple computable functions that grow a heck of a lot faster than up-arrow notation.
Eliezer, would you be willing to bet all of your assets and future earnings against $1 of my money, that we can do an infinite amount of computation before the universe ends or becomes incapable of supporting life?
Your answer ought to be yes, if your preferences are what you state. If it turns out that we can do an infinite amount of computation before the universe ends, then this bet increases your money by $1, which allows you to increase your chance of having an infinite lifetime by some small but non-zero probability. If it turns out that our universe can’t do an infinite amount of computation, you lose a lot, but the loss of expected utility is still tiny compared to what you gain.
So, is it a bet?
Also, why do you suspect that answering “No” would enable someone to demonstrate circular / inconsistent preferences on your part?
No for two reasons—first, I don’t trust human reason including my own when trying to live one’s life inside tiny probabilities of huge payoffs; second, I ordinarily consider myself an average utilitarian and I’m not sure this is how my average utilitarianism plays out. It’s one matter if you’re working within a single universe in which all-but-infinitesimal of the value is to be found within those lives that are infinite, but I’m not sure I would compare two differently-sized possible Realities the same way. I am not sure I am willing to say that a finite life weighs nothing in my utility function if an infinite life seems possible—though if both were known to coexist in the same universe, I might have to bite that bullet. (At the opposite extreme, a Bostromian parliament might assign both cases representative weight proportional to probability and let them negotiate the wise action.)
Also I have severe doubts about infinite ethics, but that’s easily fixed using a really large finite number instead (pay everything if time < googolplex, keep $1 if time > TREE(100), return $1 later if time between those two bounds).
Keep growing the lifespan by huge computational factors, keep slicing near-infinitesimally tiny increments off the probability. (Is there an analogous inconsistency to which I expose myself by answering “No” to the bet above, from trying to treat alternative universes differently than side-by-side spatial reasons?)
In that case, it’s not that your utility function is unbounded in years lived, but rather your utility for each year lived is a decreasing function of the lifetime of the universe (or perhaps total lifetime of everyone in the universe).
I’ll have to think if that makes sense.
It’s possible that I’m reasoning as if my utility function is over “fractions of total achievable value” within any given universe. I am not sure if there are any problems with this, even if it’s true.
That does have quite a bit of intuitive appeal! However, when you look at a possible universe from the outside, there are no levers nor knobs you can turn, and all the value achieved by the time of heat death was already inherent in the configurations right after the big bang--
--so if you do not want “fraction of total achievable value” to be identically one for every possible universe, the definition of your utility function seems to get intertwined with how exactly you divvy up the world into “causal nodes” and “causal arrows”, in a way that does not seem to happen if you define it in terms of properties of the outcome, like how many fulfilling lifes lived. (Of course, being more complicated doesn’t imply being wrong, but it seems worth noting.)
And yes, I’m taking a timeful view for vividness of imagination, but I do not think the argument changes much if you don’t do that; the point is that it seems like number-of-fulfilling-lifes utility can be computed given only the universal wavefunction as input, whereas for fraction-of-achievable-fulfilling-lifes, knowing the actual wavefunction isn’t enough.
Could your proposal lead to conflicts between altruists who have the same values (e.g. number of fulfilling lifes), but different power to influence the world (and thus different total achievable value)?
After thinking about it, that doesn’t make sense either. Suppose Omega comes to you and says that among the universes that you live in, there is a small fraction that will end in 5 years. He offers to kill you now in those universes, in exchange for granting you a googleplex years of additional life in a similar fraction of universes with time > TREE(100) and where you would have died in less than googleplex years without his help (and where others manage to live to TREE(100) years old if that makes any difference). Would you refuse?
No. But here, by specification, you’re making all the universes real and hence part of a larger Reality, rather than probabilities of which only a single one is real.
If there were only one Reality, and there were small probabilities of it being due to end in 5 years, or in a googolplex years, and the two cases seemed of equal probability, and Omega offered to destroy reality now if it were only fated to last 5 years, in exchange for extending its life to TREE(100) if it were otherwise fated to last a googolplex years… well, this Reality is already known to have lasted a few billion years, and through, say, around 2 trillion life-years, so if it is due to last only another 5 years the remaining 30 billion life-years are not such a high fraction of its total value to be lost—we aren’t likely to do so much more in just another 5 years, if that’s our limit; it seems unlikely that we’d get FAI in that time. I’d probably still take the offer. But I wouldn’t leap at it.
In that case, would you accept my original bet if I rephrase it as making all the universes part of a larger Reality? That is, if in the future we have reason to believe that Tegmark’s Level 4 Multiverse is true, and find ourselves living in a universe with time < googolplex, then you’d give you all your assets and future earnings, in return for $1 of my money if we find ourselves living in a universe with time > TREE(100).
I wouldn’t, but my reflective equilibrium might very well do so.
I wouldn’t due to willpower failure exceeding benefit of $1 if I believe my mainline probability is doomed to eternal poverty.
Reflective equilibrium probably would, presuming there’s a substantial probability of >TREE(100), or that as a limiting process the “tiny” probability falls off more slowly than the “long-lived” universe part increases. On pain of inconsistency when you raise the lifespan by large computational factors each time, and slice tiny increments off the probability each time.
Ok, as long as your utility function isn’t actually unbounded, here’s what I think makes more sense, assuming a Level 4 Multiverse. It’s also a kind of “fractions of total achievable value”.
Each mathematical structure representing a universe has a measure, which represents it’s “fraction of all math”. (Perhaps it’s measure is exponential in zero minus the length of its definition in a formal set theory.) My utility over that structure is bounded by this measure. In other words, if that structure represents my idea of total utopia, when my utility for it would be its measure. If it’s total dystopia, my utility for it would be 0.
Within a universe, different substructures (for example branches or slices of time) also have different measures, and if I value such substructures independently, my utilities for them are also bounded by their measures. For example, in a universe that ends at t = TREE(100), a time slice with t < googolplex has a much higher measure than a random time slice (since it takes more bits to represent a random t).
If I value each person independently (and altruistically), then it’s like average utilitarianism, except each person is given a weight equal to its measure instead of 1/population.
This proposal has its own counter-intuitive implications, but overall I think it’s better than the alternatives. It fits in nicely with MWI. It also manages to avoid running into problems with infinities.
I have to say this strikes me as a really odd proposal, though it’s certainly interesting from the perspective of the Doomsday Argument if advanced civilizations have a thermodynamic incentive to wait until nearly the end of the universe before using their hoarded negentropy.
But for me it’s hard to see why “reality-fluid” (the name I give your “measure”, to remind myself that I don’t understand it at all) should dovetail so neatly with the information needed to locate events in universes or universes in Level IV. It’s clear why an epistemic prior is phrased this way—but why should reality-fluid behave likewise? Shades of either Mind Projection Fallacy or a very strange and very convenient coincidence.
Actually, I think I can hazard a guess to that one. I think the idea would be “the simpler the mathematical structure, the more often it’d show up as a substructure in other mathematical structures”
For instance, if you are building large random graphs, you’d expect to see some specific pattern of, say, 7 vertices and 18 edges show up as subgraphs more often then, say, some specific pattern of 100 vertices and 2475 edges.
There’s a sense in which “reality fluid” could be distributed evenly which would lead to this. If every entire mathematical structure got an equal amount of reality stuff, then small structures would benefit from the reality juice granted to the larger structures that they happen to also exist as substructures of.
EDIT: blargh, corrected big graph edge count. meant to represent half a complete graph.
Well, why would it be easier to locate some events or universes than others, unless they have more reality-fluid?
Why is it possible to describe one mathematical structure more concisely than another, or to specify one computation using less bits than another? Is that just a property of the mind that’s thinking about these structures and computations, or is it actually a property of Reality? The latter seems more likely to me, given results in algorithmic information theory. (I don’t know if similar theorems has been or can be proven about set theory, that the shortest description lengths in different formalizations can’t be too far apart, but it seems plausible.)
Also, recall that in UDT, there is no epistemic prior. So, the only way to get an effect similar to EDT/CDT w/ universal prior, is with a weighting scheme over universes/events like I described.
I can sort of buy the part where simple universes have more reality-fluid, though frankly the whole setup strikes me as a mysterious answer to a mysterious question.
But the part where later events have less reality-fluid within a single universe, just because they take more info to locate—that part in particular seems really suspicious. MPF-ish.
I’m far from satisfied with the answer myself, but it’s the best I’ve got so far. :)
Consider the case where you are trying to value (a) just yourself versus (b) the set of all future yous that satisfy the constraint of not going into negative utility.
The shannon information of the set (b) could be (probably would be) lower than that of (a). To see this, note that the complexity (information) of the set of all future yous is just the info required to specify (you,now) (because to compute the time evolution of the set, you just need the initial condition), whereas the complexity (information) of just you is a series of snapshots (you, now), (you, 1 microsecond from now), … . This is like the difference between a JPEG and an MPEG. The complexity of the constraint probably won’t make up for this.
If the constraint of going into negative utility is particularly complex, one could pick a simple subset of nonnegative utility future yous, for example by specifying relatively simple constraints that ensure that the vast majority of yous satisfying those constraints don’t go into negative utility.
This is problematic because it means that you would assign less value to a large set of happy future yous than to just one future you. A large and exhaustive set of future happy yous is less complex (easier to specify) than just one.
Related: That is not dead which can eternal lie: the aestivation hypothesis for resolving Fermi’s paradox (https://arxiv.org/pdf/1705.03394.pdf)
This looks pretty plausible to me, because it does seem there is some disutility to the simple fact of dying, regardless of how far in the future that happens. So U(live N years) always contains that disutility, whereas U(live forever) does not.
I really don’t see what the problem is. Clearly, the being has “read your mind” and knows what you will do. If you are of the opinion to take both boxes, he knows that from his mind scan, and you are playing right into his hands.
Obviously, your decision cannot affect the outcome because it’s already been decided what’s in the box, but your BRAIN affected what he put in the box.
It’s like me handing you an opaque box and telling you there is $1 million in it if and only if you go and commit murder. Then, you open the box and find it empty. I then offer Hannibal Lecter the same deal, he commits murder, and then opens the box and finds $1 million. Amazing? I don’t think so. I was simply able to create an accurate psychological profile of the two of you.
The question is how to create a formal decision algorithm that will be able to understand the problem and give the right answer (without failing on other such tests). Of course you can solve it correctly if you are not yet poisoned by too much presumptuous philosophy.
I guess I’m missing something obvious. The problem seems very simple, even for an AI.
The way the problem is usually defined (omega really is omniscient, he’s not fooling you around, etc.) there are only two solutions:
You take the two boxes, and Omega had already predicted that, meaning that there is nothing in Box B—you win 1000$
You take box B only, and Omega had already predicted that, meaning that there is 1M$ in box B—you win 1M$.
That’s it. Period. Nothing else. Nada. Rien. Nichts. Sod all. These are the only two possible options (again, assuming the hypotheses are true). The decision to take box B only is a simple outcome comparison. It is a perfectly rational decision (if you accept the premises of the game).
Now the way Eliezer states it is different from the usual formulation. In Eliezer’s version, you cannot be sure about Omega’s absolute accuracy. All you know is his previous record. That does complicate things, if only because you might be the victim of a scam (e.g. like the well-known trick to convince comeone that you can consistently predict the winning horse in a 2-horse race—simply start with 2^N people, always give a different prediction to each half of them, discard those to whom you gave the wrong one, etc.)
At any rate, the other two outcomes that were impossible in the previous version (involving mis-prediction by Omega) are now possible, with a certain probability that you need to somehow ascertain. That may be difficult, but I don’t see any logical paradox.
For example, if this happened in the real world, you might reason that the probability that you are being scammed is overwhelming in regard to the probability of existence of a truly omniscient predictor. This is a reasonable inference from the fact that we hear about scams every day, but nobody has ever reported such an omniscient predictor. So you would take both boxes and enjoy your expected $1000+epsilon (Omega may have been sincere but deluded, lucky in the previous 100 trials, and wrong in this one).
In the end, the guy who would win most (in expected value!) would not be the “least rational”, but simply the one who made the best estimates for the probabilites of each outcome, based on his own knowledge of the universe (if you have a direct phone line to the Angel Gabriel, you will clearly do better).
What is the part that would be conceptually (as opposed to technically/practically) difficult for an algorithm?
I one-box, but not because I haven’t considered the two-box issue.
I one-box because it’s a win-win in the larger context. Either I walk off with a million dollars, OR I become the first person to outthink Omega and provide new data to those who are following Omega’s exploits.
Even without thinking outside the problem, Omega is a game-breaker. We do not, in the problem as stated, have any information on Omega other than that they are superintelligent and may be able to act outside of casuality. Or else Omega is simply a superduperpredictor, to the point where (quantum interactions and chaos theory aside) all Omega-chosen humans have turned out to be correctly predictable in this one aspect.
Perhaps Omega is deliberately NOT chosing to test humans it can’t predict. Or it is able to affect the local spacetime sufficiently to ‘lock in’ a choice even after it’s physically left the area?
We can’t tell. It’s superintelligent. It’s not playing on our field. It’s potentially an external source of metalogic. The rules go out the window.
In short, the problem as described is not sufficiently constrained to presume a paradox, because it’s not confining itself to a single logic system. It’s like asking someone only familiar with non-imaginary numbers what the square root of negative one is. Just because they can’t derive an answer doesn’t mean you don’t have one—you’re using different number fields.
My solution to the problem of the two boxes:
Flip a coin. If heads, both A & B. If tails, only A. (If the superintelligence can predict a coin flip, make it a radioactive decay or something. Eat quantum, Hal.)
In all seriousness, this is a very odd problem (I love it!). Of course two boxes is the rational solution—it’s not as if post-facto cogitation is going to change anything. But the problem statement seems to imply that it is actually impossible for me to choose the choice I don’t choose, i.e., choice is actually impossible.
Something is absurd here. I suspect it’s the idea that my choice is totally predictable. There can be a random element to my choice if I so choose, which kills Omega’s plan.
What wedrifid said. See also Rationality is Systematized Winning and the section of What Do We Mean By “Rationality”? about “Instrumental Rationality”, which is generally what we mean here when we talk about actions being rational or irrational. If you want to get more money, than the instrumentally rational action is the epistemically rational answer to the question “What course of action will cause me to get the most money?”.
If you accept the premises of Omega thought experiments, then the right answer is one-boxing, period. If you don’t accept the premises, it doesn’t make sense for you to be answering it one way or the other.
I thought about this last night and also came to the conclusion that randomizing my choice would not “assume the worst” as I ought to.
And I fully accept that this is just a thought experiment & physics is a cheap way out. I will now take the premises or leave them. :)
It is a common assumption in these sorts of problems that if Omega predicts that you will condition your choice on a quantum event, it will not put the money in Box B.
See The Least Convenient Possible World.
No it isn’t. If you like money it is rational to get more money. Take one box.
At face, that does sound absurd. The problem is that you are underestimating a superintelligence. Imagine that the universe is a computer simulation, so that a set of physical laws plus a very, very long string of random numbers is a complete causal model of reality. The superintelligence knows the laws and all of the random numbers. You still make a choice, even though that choice ultimately depends on everything that preceded it. See http://wiki.lesswrong.com/wiki/Free_will
I think much of the debate about Newcomb’s Problem is about the definition of superintelligence.
I’m not reading 127 comments, but as a newcomer who’s been invited to read this page, along with barely a dozen others, as an introduction, I don’t want to leave this unanswered, even though what I have to say has probably already been said.
First of all, the answer to Newcomb’s Problem depends a lot on precisely what the problem is. I have seen versions that posit time travel, and therefore backwards causality. In that case, it’s quite reasonable to take only one box, because your decision to do so does have a causal effect on the amount in Box B. Presumably causal decision theorists would agree.
However, in any version of the problem where there is no clear evidence of violations of currently known physics and where the money has been placed by Omega before my decisions, I am a two-boxer. Yet I think that your post above must not be talking about the same problem that I am thinking of, especially at the end. Although you never said so, it seems to me that you must be talking about a problem which says “If you choose Box B, then it will have a million dollars; if you choose both boxes, then Box B will be empty.”. But that is simply not what the facts will be if Omega has made the decision in the past and currently understood physics applies. In the problem as stated, Omega may make mistakes in the future, and that makes all the difference.
It’s presumptuous of me to assume that you’re talking about a different problem from the one that you stated, I know. But as I read the psychological states that you suggest that I might have —that I might wish that I considered one-boxing rational, for example—, they seem utterly insane. Why would I wish such a thing? What does it have to do with anything? The only thing that I can wish for is that Omega has predicted that I will be a one-boxer, which has nothing to do with what I consider rational now.
The quotation from Joyce explains it well, up until the end, where poor phrasing may have confused you. The last sentence should read:
It is simply not true that Rachel envies Irene’s choice. Rachel envies Irene’s situation, the situation where there is a million dollars in Box B. And if Rachel were in that situation, then she would still take both boxes! (At least if I understand Joyce correctly.)
Possibly one thing that distinguishes me from one-boxers, and maybe even most two-boxers, is that I understand fundamental physics rather thoroughly and my prior has a very strong presumption against backwards causality. The mere fact that Omega has made successful predictions about Newcomb’s Paradox will never be enough to overrule that. Even being superintelligent and coming from another galaxy is not enough, although things change if Omega (known to be superintelligent and honest) claims to be a time-traveller. Perhaps for some one-boxers, and even for some irrational two-boxers, Omega’s past success at prediction is good evidence for backwards causality, but not for me.
So suppose that somebody puts two boxes down before me, presents convincing evidence for the situation as you stated it above (but no more), and goes away. Then I will simply take all of the money that this person has given me: both boxes. Before I open them, I will hope that they predicted that I will choose only one. After I open them, if I find Box B empty, then I will wish that they had predicted that I would choose only one. But I will not wish that I had chosen only one. And I certainly will not hope, beforehand, that I will choose only one and yet nevertheless choose two; that would indeed be irrational!
You are disposed to take two boxes. Omega can tell. (Perhaps by reading your comment. Heck, I can tell by reading your comment, and I’m not even a superintelligence.) Omega will therefore not put a million dollars in Box B if it sets you a Newcomb’s problem, because its decision to do so depends on whether you are disposed to take both boxes or not, and you are.
I am disposed to take one box. Omega can tell. (Perhaps by reading this comment. I bet you can tell by reading my comment, and I also bet that you’re not a superintelligence.) Omega will therefore put a million dollars in Box B if it sets me a Newcomb’s problem, because its decision to do so depends on whether I am disposed to take both boxes or not, and I’m not.
If we both get pairs of boxes to choose from, I will get a million dollars. You will get a thousand dollars. I will be monetarily better off than you.
But wait! You can fix this. All you have to do is be disposed to take just Box B. You can do this right now; there’s no reason to wait until Omega turns up. Omega does not care why you are so disposed, only that you are so disposed. You can mutter to yourself all you like about how silly the problem is; as long as you wander off with just B under your arm, it will tend to be the case that you end the day a millionaire.
Sometime ago I figured out a refutation of this kind of reasoning in Counterfactual Mugging, and it seems to apply in Newcomb’s Problem too. It goes as follows:
Imagine another god, Upsilon, that offers you a similar two-box setup—except to get the $2M in the box B, you must be a one-boxer with regard to Upsilon and a two-boxer with regard to Omega. (Upsilon predicts your counterfactual behavior if you’d met Omega instead.) Now you must choose your dispositions wisely because you can’t win money from both gods. The right disposition depends on your priors for encountering Omega or Upsilon, which is a “bead jar guess” because both gods are very improbable. In other words, to win in such problems, you can’t just look at each problem individually as it arises—you need to have the correct prior/predisposition over all possible predictors of your actions, before you actually meet any of them. Obtaining such a prior is difficult, so I don’t really know what I’m predisposed to do in Newcomb’s Problem if I’m faced with it someday.
Omega lets me decide to take only one box after meeting Omega, when I have already updated on the fact that Omega exists, and so I have much better knowledge about which sort of god I’m likely to encounter. Upsilon treats me on the basis of a guess I would subjunctively make without knowledge of Upsilon. It is therefore not surprising that I tend to do much better with Omega than with Upsilon, because the relevant choices being made by me are being made with much better knowledge. To put it another way, when Omega offers me a Newcomb’s Problem, I will condition my choice on the known existence of Omega, and all the Upsilon-like gods will tend to cancel out into Pascal’s Wagers. If I run into an Upsilon-like god, then, I am not overly worried about my poor performance—it’s like running into the Christian God, you’re screwed, but so what, you won’t actually run into one. Even the best rational agents cannot perform well on this sort of subjunctive hypothesis without much better knowledge while making the relevant choices than you are offering them. For every rational agent who performs well with respect to Upsilon there is one who performs poorly with respect to anti-Upsilon.
On the other hand, beating Newcomb’s Problem is easy, once you let go of the idea that to be “rational” means performing a strange ritual cognition in which you must only choose on the basis of physical consequences and not on the basis of correct predictions that other agents reliably make about you, so that (if you choose using this bizarre ritual) you go around regretting how terribly “rational” you are because of the correct predictions that others make about you. I simply choose on the basis of the correct predictions that others make about me, and so I do not regret being rational.
And these questions are highly relevant and realistic, unlike Upsilon; in the future we can expect there to be lots of rational agents that make good predictions about each other.
In what sense can you update? Updating is about following a plan, not about deciding on a plan. You already know that it’s possible to observe anything, you don’t learn anything new about environment by observing any given thing. There could be a deep connection between updating and logical uncertainty that makes it a good plan to update, but it’s not obvious what it is.
Huh? Updating is just about updating your map. (?) The next sentence I didn’t understand the reasoning of, could you expand?
Intuitively, the notion of updating a map of fixed reality makes sense, but in the context of decision-making, formalization in full generality proves elusive, even unnecessary, so far.
By making a choice, you control the truth value of certain statements—statements about your decision-making algorithm and about mathematical objects depending on your algorithm. Only some of these mathematical objects are part of the “real world”. Observations affect what choices you make (“updating is about following a plan”), but you must have decided beforehand what consequences you want to establish (“[updating is] not about deciding on a plan”). You could have decided beforehand to care only about mathematical structures that are “real”, but what characterizes those structures apart from the fact that you care about them?
Vladimir talks more about his crazy idea in this comment.
Pascal’s Wagers, huh. So your decision theory requires a specific prior?
This is not a refutation, because what you describe is not about the thought experiment. In the thought experiment, there are no Upsilons, and so nothing to worry about. It is if you face this scenario in real life, where you can’t be given guarantees about the absence of Upsilons, that your reasoning becomes valid. But it doesn’t refute the reasoning about the thought experiment where it’s postulated that there are no Upsilons.
(Original thread, my discussion.)
Thanks for dropping the links here. FWIW, I agree with your objection. But at the very least, the people claiming they’re “one-boxers” should also make the distinction you make.
Also, user Nisan tried to argue that various Upsilons and other fauna must balance themselves out if we use the universal prior. We eventually took this argument to email, but failed to move each other’s positions.
Just didn’t want you confusing people or misrepresenting my opinion, so made everything clear. :-)
OK. I assume the usual (Omega and Upsilon are both reliable and sincere, I can reliably distinguish one from the other, etc.)
Then I can’t see how the game doesn’t reduce to standard Newcomb, modulo a simple probability calculation, mostly based on “when I encounter one of them, what’s my probability of meeting the other during my lifetime?” (plus various “actuarial” calculations).
If I have no information about the probability of encountering either, then my decision may be incorrect—but there’s nothing paradoxical or surprising about this, it’s just a normal, “boring” example of an incomplete information problem.
I can’t see why that is—again, assuming that the full problem is explained to you on encountering either Upsilon or Omega, both are truhful, etc. Why can I not perform the appropriate calculations and make an expectation-maximising decision even after Upsilon-Omega has left? Surely Omega-Upsilon can predict that I’m going to do just that and act accordingly, right?
Yes, this is a standard incomplete information problem. Yes, you can do the calculations at any convenient time, not necessarily before meeting Omega. (These calculations can’t use the information that Omega exists, though.) No, it isn’t quite as simple as you state: when you meet Omega, you have to calculate the counterfactual probability of you having met Upsilon instead, and so on.
Something seems off about this, but I’m not sure what.
I’m pretty sure the logic is correct. I do make silly math mistakes sometimes, but I’ve tested this one on Vladimir Nesov and he agrees. No comment from Eliezer yet (this scenario was first posted to decision-theory-workshop).
It reminds me vaguely of Pascal’s Wager, but my cached responses thereunto are not translating informatively.
Then I think the original Newcomb’s Problem should remind you of Pascal’s Wager just as much, and my scenario should be analogous to the refutation thereof. (Thereunto? :-)
No, that’s not what I should do. What I should do is make Omega think that I am disposed to take just Box B. If I can successfully make Omega think that I’ll take only Box B but still take both boxes, then I should. But since Omega is superintelligent, let’s take it as understood that the only way to make Omega think that I’ll take only Box B is to make it so that I’ll actually take Box B. Then that is what I should do.
But I have to do it now! (I don’t do it now only because I don’t believe that this situation will ever happen.) Once Omega has placed the boxes and left, if the known laws of physics apply, then it’s too late!
If you take only Box B and get a million dollars, wouldn’t you regret having not also taken Box A? Not only would you have gotten a thousand dollars more, you’d also have shown up that know-it-all superintelligent intergalactic traveller too! That’s a chance that I’ll never have, since Omega will read my comment here and leave my Box B empty, but you might have that chance, and if so then I hope you’ll take it.
It’s not really too late then. Omega can predict what you’ll do between seeing the boxes, and choosing which to take. If this is going to include a decision to take one box, then Omega will put a million dollars in that box.
I will not regret taking only one box. It strikes me as inconsistent to regret acting as the person I most wish to be, and it seems clear that the person I most wish to be will take only one box; there is no room for approved regret.
If you say this, then you believe in backwards causality (or a breakdown of the very notion of causality, as in Kevin’s comment below). I agree that if causality doesn’t work, then I should take only Box B, but nothing in the problem as I understand it from the original post implies any violation of the known laws of physics.
If known physics applies, then Omega can predict all it likes, but my actions after it has placed the boxes cannot affect that prediction. There is always the chance that it predicts that I will take both boxes but I take only Box B. There is even the chance that it will predict that I will take only Box B but I take both boxes. Nothing in the problem statement rules that out. It would be different if that were actually impossible for some reason.
I knew that you wouldn’t, of course, since you’re a one-boxer. And we two-boxers will not regret taking both boxes, even if we find Box B empty. Better $1000 than nothing, we will think!
Ah, I see what the probem is. You have a confused notion of free will and what it means to make a choice.
Making a choice between two options doesn’t mean there is a real chance that you might take either option (there always is at least an infinitesimal chance, but that it always true even for things that are not usefully described as a choice). It just means that attributing the reason for your taking whatever option you take is most usefully attributed to you (and not e.g. gravity, government, the person holding a gun to you head etc.). In the end, though, it is (unless the choice is so close that random noise makes the difference) a fact about you that you will make the choice you will make. And it is in principle possible for others to discover this fact about you.
If it is a fact about you that you will one-box it is not possible that you will two-box. If it is a fact about you that you will two-box it is not possible that you will one-box. If it is a fact about you that you will leave the choice up to chance then Omega probably doesn’t offer you to take part in the first place.
Now, when deciding what choice to make it is usually most useful to pretend there is a real possibility of taking either option, since that generally causes facts about you that are more benefitial to you. And that you do that is just another fact about you, and influences the fact about which choice you make. Usually the fact which choice you will make has no consequences before you make your choice, and so you can model the rest of the world as being the same in either case up to that point when counterfactually considering the consequences of either choice. But the fact about which choice you will make is just another fact like any other, and is allowed, even if it usually doesn’t, to have consequences before that point in time. If it does it is best, for the very same reason you pretend that either choice is a real possibility in the first place, to also model the rest of the world as different contingent on your choice. That doesn’t mean backwards causality. Modeling the word in this way is just another fact about you that generates good outcomes.
Alicorn:
TobyBartels:
I remember reading an article about someone who sincerely lacked respect for people who were ‘soft’ (not exact quote) on the death penalty … before ending up on the jury of a death penalty case, and ultimately supporting life in prison instead. It is not inconceivable that a sufficiently canny analyst (e.g. Omega) could deduce that the process of being picked would motivate you to reconsider your stance. (Or, perhaps more likely, motivate a professed one-boxer like me to reconsider mine.)
Beware hidden inferences. Taboo causality.
I don’t see what that link has to do with anything in my comment thread. (I haven’t read most of the other threads in reply to this post.)
I should explain what I mean by ‘causality’. I do not mean some metaphysical necessity, whereby every event (called an ‘effect’) is determined (or at least influenced in some asymmetric way) by other events (called its ‘causes’), which must be (or at least so far seem to be) prior to the effect in time, leading to infinite regress (apparently back to the Big Bang, which is somehow an exception). I do not mean anything that Aristotle knew enough physics to understand in any but the vaguest way.
I mean the flow of macroscopic entropy in a physical system.
The best reference that I know on the arrow of time is Huw Price’s 1996 book Time’s Arrow and Archimedes’ Point. But actually I didn’t understand how entropy flow leads to a physical concept of causality until several years after I read that, so that might not actually help, and I’m having no luck finding the Internet conversation that made it click for me.
But basically, I’m saying that, if known physics applies, then P(there is money in Box B|all information available on a macroscopic level when Omega placed the boxes) = P(there is money in Box B|all information … placed the boxes & I pick both boxes), even though P(I pick both boxes|all information … placed the boxes) < 1, because macroscopic entropy strictly increases between the placing of the boxes and the time that I finally pick a box.
So I need to be given evidence that known physics does not apply before I pick only Box B, and a successful record of predictions by Omega will not do that for me.
From Andy Egan.
I would suggest looking at your implicit choice of counterfactuals and their role in your decision theory. Standard causal decision theory involves local violations of the laws of physics (you assign probabilities to the world being such that you’ll one-box, or such that you’ll one-box, and then ask what miracle magically altering your decision, without any connection to your psychological dispositions, etc, would deliver the highest utility). Standard causal decision theory is a normative principle for action, that says to do the action that would deliver the most utility if a certain kind of miracle happened. But you can get different versions of causal decision theory by substituting different sorts of miracles, e.g. you can say: “if I one-box, then I have a psychology that one-boxes, and likewise for two-boxing” so you select the action such that a miracle giving you the disposition to do so earlier on would have been better. Yet another sort of counterfactual that can be hooked up to the causal decision theory framework would go “there’s some mathematical fact about what decision(decisions given Everett) my brain structure leads to in standard physics, and the predictor has access to this mathematical info, so I’ll select the action that would be best brought about by a miracle changing that mathematical fact”.
Thanks for the replies, everybody!
This is a global response to several replies within my little thread here, so I’ve put it at nearly the top level. Hopefully that works out OK.
I’m glad that FAWS brought up the probabilistic version. That’s because the greater the probability that Omega makes mistakes, the more inclined I am to take two boxes. I once read the claim that 70% of people, when told Newcomb’s Paradox in an experiment, claim to choose to take only one box. If this is accurate, then Omega can achieve a 70% level of accuracy by predicting that everybody is a one-boxer. Even if 70% is not accurate, you can still make the paradox work by adjusting the dollar amounts, as long as the bias is great enough that Omega can be confident that it will show up at all in the records of its past predictions. (To be fair, the proportion of two-boxers will probably rise as Omega’s accuracy falls, and changing the stakes should also affect people’s choices; there may not be a fixed point, although I expect that there is.)
If, in addition to the problem as stated (but with only 70% probability of success), I know that Omega always predicts one-boxing, then (hopefully) everybody agrees that I should take both boxes. There needs to some correlation between Omega’s predictions and the actual outcomes, not just a high proportion of past successes.
FAWS also writes:
Actually, I don’t really want to make that claim. Although I’ve written things like ‘I would take both boxes’, I really should have written ‘I should take both boxes’. I’m stating a correct decision, not making a prediction about my actual actions. Right now, I predict about a 70% chance of two-boxing given the situation as stated in the original post, although I’ve never tried to calculate my estimates of probabilities, so who knows what that really means. (H’m, 70% again? Nope, I don’t trust that calibration at all!)
FAWS writes elsewhere:
I don’t see what the gun has to do with it; this is a perfectly good problem in decision theory:
Suppose that you have a button that, if pressed, will trigger a bomb that kills two strangers on the other side of the world. I hold a gun to your head and threaten to shoot you if you don’t press the button. Should you press it?
A person who presses the button in that situation can reasonably say afterwards ‘I had no choice! Toby held a gun to my head!’, but that doesn’t invalidate the question. Such a person might even panic and make the question irrelevant, but it’s still a good question.
So that’s how Omega gets such a good record! (^_^)
Understanding the question really is important. I’ve been interpreting it something along these lines: you interrupt your normal thought processes to go through a complete evaluation of the situation before you, then see what you do. (This is exactly what you cannot do if you panic in the gun problem above.) So perhaps we can predict with certain accuracy that an utter bigot will take one course of action, but that is not what the bigot should do, nor is it what they will do if they discard their prejudices and decide afresh.
Now that I think about it, I see some problems with this interpretation, and also some refinements that might fix it. (The first thing to do is to make it less dependent on the specific person making the decision.) But I’ll skip the refinements. It’s enough to notice that Omega might very well predict that a person will not take the time to think things through, so there is poor correlation between what one should do and what Omega will predict, even though the decision is based on what the world would be like if one did take the time.
I still think that (modulo refinements) this is a good interpretation of what most people would mean if they tell a story and then ask ‘What should this person do?’. (I can try to defend that claim if anybody still wants me to after they finish this comment.) In that case, I stand by my decision that one should take both boxes, at least if there is no good evidence of new physics.
However, I now realise that there is another interpretation, which is more practical, however much the ordinary person might not interpret things this way. That is: sit down and think through the whole situation now, long before you are ever faced with it in real life, and decide what to do. One obvious benefit of this is that when I hold a gun to your head, you won’t panic, because you will be prepared. More generally, this is what we are all actually doing right now! So as we make these idle philosophical musings, let’s be practical, and decide what we’ll do if Omega ever offers us this deal.
In this case, I agree that I will be better off (given the extremely unlikely but possible assumption that I am ever in this situation) if I have decided now to take only Box B. As RobinZ points out, I might change my mind later, but that can’t be helped (and to a certain extent shouldn’t be helped, since it’s best if I take two boxes after Omega predicts that I’ll only take one, but we can’t judge that extent if Omega is smarter than us, so really there’s no benefit to holding back at all).
If Omega is fallible, then the value of one-boxing falls drastically, and even adjusting the amount of money doesn’t help in the end; once Omega’s proportion of past success matches the observed proportion in experiments (or whatever our best guess of the actual proportion of real people is), then I’m back to two-boxing, since I expect that Omega simply always predicts one-boxing.
In hindsight, it’s obvious that the the original post was about decision in this sense, since Eliezer was talking about an AI that modifies its decision procedures in anticipation of facing Omega in the future. Similarly, we humans modify our decision procedures by making commitments and letting ourselves invent rationalisations for them afterwards (although the problem with this is that it makes it hard to change our minds when we receive new information). So obviously Eliezer wants us to decide now (or at least well ahead of time) and use our leet Methods of Rationality to keep the rationalisations in check.
So I hereby decide that I will pick only one box. (You hear that, Omega!?) Since I am honest (and strongly doubt that Omega exists), I’ll add that I may very well change my mind if this ever really happens, but that’s about what I would do, not what I should do. And in a certain sense, I should change my mind … then. But in another sense, I should (and do!) choose to be a one-boxer now.
(Thanks also to CarlShulman, whom I haven’t quoted, but whose comment was a big help in drawing my attention to the different senses of ‘should’, even though I didn’t really adopt his analysis of them.)
Assume Omega has a probability X of correctly predicting your decision:
If you choose to two-box:
X chance of getting $1000
(1-X) chance of getting $1,001,000
If you choose to take box B only:
X chance of getting $1,000,000
(1-X) chance of getting $0
Your expected utilities for two-boxing and one-boxing are (respectively):
E2 = 1000X + (1-X)1001000
E1 = 1000000X
For E2 > E1, we must have 1000X + 1,001,000 − 1,001,000X − 1,000,000X > 0, or 1,001,000 > 2,000,000X, or
X < 0.5005
So as long as Omega can maintain a greater than 50% accuracy, you should expect to earn more money by one-boxing. Since the solution seems so simple, and since I’m a total novice at decision theory, it’s possible I’m missing something here, so please let me know.
Your caclulation is fine. What you’re missing is that Omega has a record of 70% accuracy because Omega always predicts that a person will one-box and 70% of people are one-boxers. So Omega always puts the million dollars in Box B, and I will always get $1,001,000$ if I’m one of the 30% of people who two-box.
At least, that is a possibility, which your calculation doesn’t take into account. I need evidence of a correlation between Omega’s predictions and the participants’ actual behaviour, not just evidence of correct predictions. My prior probability distribution for how often people one-box isn’t even concentrated very tightly around 70% (which is just a number that I remember reading once as the result of one survey), so anything short of a long run of predictions with very high proportion of correct ones will make me suspect that Omega is pulling a trick like this.
So the problem is much cleaner as Eliezer states it, with a perfect record. (But if even that record is short, I won’t buy it.)
Oops, I see that RobinZ already replied, and with calculations. This shows that I should still remove the word ‘drastically’ from the bit that nhamann quoted.
Wait—we can’t assume that the probability of being correct is the same for two-boxing and one-boxing. Suppose Omega has a probability X of predicting one when you choose one and Y of predicting one when you choose two.
The special case you list corresponds to Y = 1 - X, but in the general case, we can derive that E1 > E2 implies
If we assume linear utility in wealth, this corresponds to a difference of 0.001. If, alternately, we choose a median net wealth of $93 100 (the U.S. figure) and use log-wealth as the measure of utility, the required difference increases to 0.004 or so. Either way, unless you’re dead broke (e.g. net wealth $1), you had better be extremely confident that you can fool the interrogator before you two-box.
You underestimate the meaning of superintelligence. One way of defining a superintelligence that wins at Newcomb without violating causality, is to assume that the universe is computer simulation like, such that it can be defined by a set of physical laws and a very long string of random numbers. If Omega knows the laws and random numbers that define the universe, shouldn’t Omega be able to predict your actions with 100% accuracy? And then wouldn’t you want to choose the action that results in you winning a lot more money?
So part of the definition of a superintelligence is that the universe is like that and Omega knows all that? In other words, if I have convincing evidence that Omega is superintelligent, then I must have convincing evidence that the universe is a computer simulation, etc? Then that changes things; just as the Second Law of Thermodynamics doesn’t apply to Maxwell’s Demon, so the law of forward causality (which is actually a consequence of the Second Law, under the assumption of no time travel) doesn’t apply to a superintelligence. So yes, then I would pick only Box B.
This just goes to show how important it is to understand exactly what the problem states.
The computer simulation assumption isn’t necessary, the only thing that matters is that Omega is transcendentally intelligent, and it has all the technology that you might imagine a post-Singularity intelligence might have (we’re talking Shock Level 4). So Omega scans your brain by using some technology that is effectively indistinguishable from magic, and we’re left to assume that it can predict, to a very high degree of accuracy, whether you’re the type of person who would take one or two boxes.
Omega doesn’t have to actually simulate your underlying physics, it just needs a highly accurate model, which seems reasonably easy to achieve for a superintelligence.
If its model is good enough that it violates the Second Law as we understand it, fine, I’ll pick only Box B, but I don’t see anything in the problem statement that implies this. The only evidence that I’m given is that it’s made a run of perfect predictions (of unknown length!), is smarter than us, and is from very far away. That’s not enough for new physics.
And just having a really good simulation of my brain, of the sort that we could imagine doing using known physics but just don’t have the technical capacity for, is definitely not good enough. That makes the probability that I’ll act as predicted very high, but I’ll still come out worse if, after the boxes have been set, I’m unlucky enough to only pick Box B anyway (or come out better if I’m lucky enough to pick both boxes anyway, if Omega pegs me for a one-boxer).
It doesn’t have to be even remotely close to good enough to that for the scenario. I’d bet a sufficiently good human psychologist could take omega’s role and get it 90%+ right if he tests and interviews the people extensively first (without them knowing the purpose) and gets to exclude people he is unsure about. A super intelligent being should be far, far better at this.
You yourself claim to know what you would do in the boxing experiment, and you are an agent limited by conventional physics. There is no physical law that forbids another agent from knowing you as well as (or even better than) you know yourself.
You’ll have to explain why you think 99.99% (or whatever) is not good enough, a 0.01% chance to win $ 1000 shouldn’t make up for a 99.99% chance of losing $999,000.
There is a good chance I am missing something here, but from an economic perspective this seems trivial:
P(Om) is the probability the person assigns Omega of being able to accurately predict their decision ahead of time.
A. P(Om) x $1m is the expected return from opening one box.
B. (1 - P(Om))x$1m + $1000 is the expected return of opening both boxes (the probability that Omega was wrong times the million plus the thousand.)
Since P(Om) is dependent on people’s individual belief about Omega’s ability to predict their actions it is not surprising different people make different decisions and think they are being rational—they are!
If A > B they choose one box, if B > A they choose both boxes.
This also shows why people will change their views if the amount in the visible box is changed (to $990,000 or $10).
Basically, in this instance, if you think the probability of Omega being able to determine your future action is greater than 0.5005 then you select a single box, if less than that you select both boxes. At P(Om)=0.5005 the expected return of both strategies is $500,500.
EDIT. I think I oversimplified B, but the point still stands. nhamann—I didn’t see your post before writing mine. I think the only difference between them is that I state that it is a personal view of the probability of Omega being able to predict choices and you seem to want to use the actual probability that he can.
Re: “Do you take both boxes, or only box B?”
It would sure be nice to get hold of some more data about the “100 observed occasions so far”. If Omega only visits two-boxers—or tries to minimise his outgoings—it would be good to know that. Such information might well be accessible—if we have enough information about Omega to be convinced of his existence in the first place.
What this is really saying is “if something impossible (according to your current theory of the world) actually happens, then rather than insisting it’s impossible and ignoring it, you should revise your theory to say that’s possible”. In this case, the impossible thing is reverse causality; since we are told of evidence that reverse causality has happened in the form of 100 successful previous experiments, we must revise our theory to accept that reverse causality actually can happen. This would lead us to the conclusion that we should take one box. Alternatively, we could decide that our supposed evidence is untrustworthy and that we are being lied to when we are told that Omega made 100 successful predictions – we might think that this problem describes a nonsensical, impossible situation, similarly to if we were told that there was a barber who shaves everyone who does not shave themself.
The link to that thesis doesn’t seem to work for me.
A quick google turned up one that does
For the future, perhaps this once again updated link may help: Updated link
Citation: LEDWIG, Marion, 2000. Newcomb’s problem [Dissertation]. Konstanz: University of Konstanz
You know, I honestly don’t even understand why this is a point of debate. One boxing and taking box B (and being the kind of person who will predictably do that) seem so obviously like the rational strategy that it shouldn’t even require explanation.
And not obvious in the same way most people think the monty hill problem (game show, three doors, goats behind two, sports-car behind one, ya know?) seems ‘obvious’ at first.
In the case of the monty hill problem, you play with it, and the cracks start to show up, and you dig down to the surprising truth.
In this case, I don’t see how anyone could see and cracks in the first place.
Am I missing something here?
One factor you may not have considered: the obvious rational metastrategy is causal decision theory, and causal decision theory picks the two-box strategy.
I don’t follow. Isn’t it precisely on the meta-strategy level that CDT becomes obviously irrational?
Key word is “obvious”. If you say, “how should you solve games?”, the historical answer is “using game theory”, and when you say, “what does game theory imply for Newcomb’s dilemma?”, the historical answer is “two-box”. It takes an additional insight to work out that a better metastrategy is possible, and things which take an additional insight are no longer obvious, true or no.
Edit: Alternatively: When I said “metastrategy”, I meant one level higher than “two-boxing”—in other words, the level of decision theory. (I’m not sure which of the two objections you were raising.)
This is basically what I was trying to point out. :)
I think what RobinZ means is that you want to choose a strategy such that having that strategy will causally yield nice things. Given that criterion, object-level CDT fails; but one uses a causal consideration to reject it.
It is the obvious rational strategy… which is why using a decision theory that doesn’t get this wrong is important.
Yup yup, you’re right, of course.
What I was trying to say, then, is that I don’t understand why there’s any debate about the validity of a decision theory that gets this wrong. I’m surprised everyone doesn’t just go, “Oh, obviously any decision theory that says two-boxing is ‘rational’ is an invalid theory.”
I’m surprised that this is a point of debate. I’m surprised, so I’m wondering, what am I missing?
Did I manage to make my question clearer like that?
I can say that for me personally, the hard part—that I did not get past till reading about it here—was noticing that there is actually such a variable as “what decision theory to use”; using a naive CDT sort of thing simply seemed rational /a priori/. Insufficient grasp of the nameless virtue, you could say.
Meaning you’re in the same boat as me? Confused as to why this ever became a point of debate in the first place?
...no? I didn’t realize that the decision theory could be varied, that the obvious decision theory could be invalid, so I hit a point of confusion with little idea what to do about it.
But you’re not saying that you would ever have actually decided to two-box rather than take box B if you found yourself in that situation, are you?
I mean, you would always have decided, if you found yourself in that situation, that you were the kind of person Omega would have predicted to choose box B, right?
I am still so majorly confused here. :P
I have no idea! IIRC I leaned towards one-boxing, but I was honestly confused about it.
Ahah. So do you remember if you were confused in yourself, for reasons generated by your own brain, or just by your knowledge that some experts were saying two-boxing was the ‘rational’ strategy?
It’s a good question. You aren’t missing anything. And “people are crazy, the world is mad” isn’t always sufficient. ;)
Ha! =]
Okay, I DO expect to see lots of ‘people are crazy, the world is mad’ stuff, yeah, I just wouldn’t expect to see it on something like this from the kind of people who work on things like Causal Decision Theory! :P
So I guess what I really want to do first is CHECK which option is really most popular among such people: two-boxing, or predictably choosing box B?
Problem is, I’m not sure how to perform that check. Can anyone help me there?
It is fairly hard to perform such checks. We don’t have many situations which are analogous to Newcomb’s problem. We don’t have perfect predictors and most situations humans are in can be considered “iterated”. At least, we can consider most people to be using their ‘iterated’ reasoning by mistake when we put them in once off situations.
The closest analogy that we can get reliable answers out of is the ‘ultimatum game’ with high stakes… in which people really do refuse weeks worth of wages.
By the way, have you considered what you would do if the boxes were transparent? Just sitting there. Omega long gone and you can see piles of cash in front of you… It’s tricky. :)
Suppose my decision algorithm for the “both boxes are transparent” case is to take only box B if and only if it is empty, and to take both boxes if and only if box B has a million dollars in it. How does Omega respond? No matter how it handles box B, it’s implied prediction will be wrong.
Perhaps just as slippery, what if my algorithm is to take only box B if and only if it contains a million dollars, and to take both boxes if and only if box B is empty? In this case, anything Omega predicts will be accurate, so what prediction does it make?
Come to think of it, I could implement the second algorithm (and maybe the first) if a million dollars weighs enough compared to the boxes. Suppose my decision algorithm outputs: “Grab box B and test it’s weight, and maybe shake it a bit. If it clearly has a million dollars in it, take only box B. Otherwise, take both boxes.” If that’s my algorithm, then I don’t think the problem actually tells us what Omega predicts, and thus what outcome I’m getting.
The naive presentation of the transparent problem is circular, and for that reason ill defined (what you do depends on what’s in the boxes depends on omega’s prediction depends on what you do...). A plausible version of the transparent newcomb’s problem involves Omega:
Predicting what you’d do if you saw box B full (and never mind the case where box B is empty).
Predicting what you’d do if you saw box B empty (and never mind the case where box B is full).
Predicting what you’d do in both cases, and filling box B if and only if you’d one-box in both of them.
Or variations of those. There’s no circularity when he only makes such “conditional” predictions.
He could use the same algorithms in the non-transparent case, and they would reduce to the normal newcomb’s problem usually, but prevent you from doing any tricky business if you happen to bring an X-ray imager (or kitchen scales) and try to observe the state of box B.
Death by lightning.
I typically include such disclaimers such as the above in a footnote or more precisely targeted problem specification so as to avoid any avoid-the-question technicalities. The premise is not that Omega is an idiot or a sloppy game-designer.
You took box B. Putting it down again doesn’t help you. Finding ways to be cleverer than Omega is not a winning solution to Newcomblike problems.
Box B appears full of money; however, after you take both boxes, you find that the money in Box B is Monopoly money. The money in Box A remains genuine, however.
Box B appears empty, however, on opening it you find, written on the bottom of the box, the full details of a bank account opened by Omega, containing one million dollars, together with written permission for you to access said account.
In short, even with transparent boxes, there’s a number of ways for Omega to lie to you about the contents of Box B, and in this manner control your choice. If Omega is constrained to not lie about the contents of Box B, then it gets a bit trickier; Omega can still maintain an over 90% success rate by presenting the same choice to plenty of other people with an empty box B (since most people will likely take both boxes if they know B is empty).
Or, alternatively, Omega can decide to offer you the choice at a time when Omega predicts you won’t live long enough to make it.
That depends; instead of making a prediction here, Omega is controlling your choice. Whether you get the million dollars or not in this case depends on whether Omega wants you to have the million dollars or not, in furtherance of whatever other plans Omega is planning.
Omega doesn’t need to predict your choice; in the transparent-box case, Omega needs to predict your decision algorithm.
“The boxes are transparent” doesn’t literally mean “light waves pass through the boxes” given the description of the problem; it means “you can determine what’s inside the boxes without (and before) opening them”.
Responding by saying “maybe you can see into the boxes but you can’t tell if the money inside is fake” is being hyper-literal and ignoring what people really mean when they specify “suppose the boxes are transparent”.
Fair enough. I am at times overly literal.
In which case, if you are determined to show that Omega’s prediction is incorrect, and Omega can predict that determination, then the only way that Omega can avoid making an incorrect prediction is either to modify you in some manner (until you are no longer determined to make Omega’s prediction incorrect), or to deny you the chance to make the choice entirely.
For example, Omega might modify you by changing your circumstances; e.g. giving a deadly disease to someone close to you; which can be cured, but only at a total cost of all the money you are able to raise plus $1000. If Omega then offers the choice (with box B empty) most people would take both boxes, in order to be able to afford the cure.
Alternatively, given such a contrary precommitment, Omega may simply never offer you the choice at all; or might offer you the choice three seconds before you get struck by lightning.
“Omega puts money inside the boxes, you just never live to get it” is as outside the original problem as “the boxes are transparent, you just don’t understand what you’re seeing when you look in them” is outside the transparent problem. Just because the premise of the problem doesn’t explicitly say ”… and you get the contents of the boxes” doesn’t mean the paradox can be resolved by saying you don’t get the contents of the boxes—that’s being hyper-literal again. Likewise, just because the problem doesn’t say ”… and Omega can’t modify you to change your choice” doesn’t mean that the paradox can be resolved by saying that Omega can modify you.to change your choice—the problem is about decision theory, and Omega doesn’t have capabilities that are irrelevant to what the problem is about.
The problem, as stated, as far as I can tell gives Omega three options:
Fail to correctly predict what the person will choose
Refuse to participate
Cheat
It is likely that Omega will try to correctly predict what the person will choose; that is, Omega will strive to ignore the first option. If Omega offers the choice to this hypothetical person in the first place, then Omega is not taking the second option.
That leaves the third option; to cheat. I expect that this is the choice that Omega will be most likely to take; one of the easiest ways to do this is by ignoring the spirit of the constraints and taking the exact literal meaning. (Another way is to creatively misunderstand the spirit of the rules as given).
So I provided some suggestions with regard to how Omega might cheat; such as arranging that the decision is never made.
If you think that’s outside the problem, then I’m curious; what do you think Omega would do?
The point here is that the question is inconsistent. It is impossible for an Omega that can predict with high accuracy to exist, as you’ve correctly pointed out it leads to a situation where Omega must either fail to participate, refuse to participate or cheat, which are all out of bounds of the problem.
I don’t think it’s ever wise to ignore the possibility of a superintelligent AI cheating, in some manner.
If we ignore that possibility, then yes, the question would be inconsistent; which implies that if the situation were to actually appear to happen, then it would be quite likely that either:
The situation has been misunderstood; or
Someone is cheating
Since it is far easier for Omega, being an insane superintelligence, to cheat than it is for someone to cheat Omega, it seems likeliest that if anyone is cheating, then it is Omega.
After all, Omega had and did not take the option to refuse to participate.
The constraints aren’t constraints on Omega; the constraints are constraints on the reader—they tell the reader what he is supposed to use as the premises of the scenario. Omega cannot cheat unless the reader interprets the description of the problem to mean that Omega is willing to cheat. And if the reader does interpret it that way, it’s the reader, not Omega, who’s violating the spirit of the constraints and being hyper-literal.
I think that depending on the human’s intentions, and assuming the human is a perfect reasoner, the conditions of the problem are contradictory. Omega can’t always predict the human—it’s logically impossible.
In the first case, Omega does not offer you the deal, and you receive $0, proving that it is possible to do worse than a two-boxer.
In the second case, you are placed into a superposition of taking one box and both boxes, receiving the appropriate reward in each.
In the third case, you are counted as ‘selecting’ both boxes, since it’s hard to convince Omega that grabbing a box doesn’t count as selecting it.
The premise is that Omega offers you the deal. If Omega’s predictions are always successful because it won’t offer the deal when it can’t predict the result, you can use me as Omega and I’d do as well as him—I just never offer the deal.
The (non-nitpicked version of the) transparent box case shows what’s wrong with the concept: Since your strategy might involve figuring out what Omega would have done, it may be in principle impossible for Omega to predict what you’re going to do, as Omega is indirectly trying to predict itself, leading to an undecideability paradox. The transparent boxes just make this simpler because you can “figure out” what Omega would have done by looking into the transparent boxes.
Of course, if you are not a perfect reasoner, it might be possible that Omega can always predict you, but then the question is no longer “which choice should I make”, it’s “which choice should I make within the limits of my imperfect reasoning”. And answering that requires formalizing exactly how your reasoning is limited, which is rather hard.
Thanks, but I meant not a check on what these CDT-studying-type people would DO if actually in that situation, but a check on whether they actually say that two-boxing would be the “rational” thing to do in that hypothetical situation.
I haven’t considered you transparency question, no. Does that mean Omega did exactly what he would have done if the boxes were opaque, except that they are in fact transparent (a fact that did not figure into the prediction)? Because in that case I’d just see the million in B, and the thousand in A, and of course take ’em both.
Otherwise, Omega should be able to predict as well as me that, if I knew the rules of this game were that, if I decided to predictably choose to take only box B and leave A alone, box B would contain a million, and both boxes are transparent (and this transparency is figured into the prediction), I would expect to see a million in box B, take it, and just walk away from the paltry thousand in A.
This make sense?
I think this is the position of classical theorists on self-modifiying agents:
From Rationality, Dispositions, and the Newcomb Paradox:
They agree that agents who can self-modify will take one box. But they call that action “irrational”. So, the debate really boils down to the definition of the term “rational”—and is not really concerned with the decision that rational agents who can self-modifiy will actually take.
If my analysis here is correct, the dispute is really all about terminology.
Mr Eliezer, I think you’ve missed a few points here. However, I’ve probably missed more. I apologise for errors in advance.
To start with, I speculate than any system of decision making consistently gives the wrong results on a specific problem. The whole point of decision theory is finding principles which usually end up with a better result. As such, you can always formulate a situation in which it gives the wrong answer: maybe one of the facts you thought you knew was incorrect, and led you astray. (At the very least, Omega may decide to reward only those who have never heard of a particular brand of decision theory.)
It’s like with file compression. In bitmaps, there are frequently large areas with similar colour. With this fact we can design a system that writes that taking less space. However, if we then try to compress a random bitmap, it will take more space than before the compression. Same thing with human minds. They work simply and relatively efficiently, but there’s a whole field dedicated to finding flaws in its method. If you use causal decision theory, you sacrifice your ability at games against superhuman creatures that can predict the future, in return for better decision making when that isn’t the case. That seems like a reasonably fair trade-off to me. Any theory which gets this one right opens itself to either getting another one wrong, or being more complex and thus harder for a human to use correctly.
The scientific method and what I know of rationality make the initial assumption that your belief does not affect how the world works. “If a phenomenon feels mysterious, that is a fact about our state of knowledge, not a fact about the phenomenon itself.” etc. However, this isn’t something which we can actually know.
Some Christians believe that if you pray over someone with faith, they will be immediately healed. If that is true, rationalists are at a disadvantage, because they aren’t as good at self delusion or doublethink as the untrained. They might never end up finding out that truth. I know that religion is the mind killer too, I’m just using the most common example of the supremely effective standard method being unable to deal with an idea. It’s necessarily incomplete.
I don’t agree with you that “reason” means “choosing what ends up with the most reward”. You’re mixing up means and end. Arguing against a method of decision making because it comes up with the wrong answer to a specific case is like complaining that mp3 compression does a lousy job of compressing silence. I don’t think that reason can be the only tool used, just one of them
Incidentally, I would totally only take the $1000 box, and claim that Omega told me I had won immortality, to confuse all decision theorists involved.
See chapters 1-9 of this document for a more detailed treatment of the argument.
This link is 404ing. Anyone have a copy of this?
The current version is here. (It’s Eliezer Yudkowsky (2010). Timeless Decision Theory.)
An analogy occurs to me about “regret of rationality.”
Sometimes you hear complaints about the Geneva Convention during wartime. “We have to restrain ourselves, but our enemies fight dirty. They’re at an advantage because they don’t have our scruples!” Now, if you replied, “So are you advocating scrapping the Geneva Convention?” you might get the response “No way. It’s a good set of rules, on balance.” And I don’t think this is an incoherent position: he approves of the rule, but regrets the harm it causes in this particular situation.
Rules, almost by definition, are inconvenient in some situations. Even a rule that’s good on balance, a rule you wouldn’t want to discard, will sometimes have negative consequences. Otherwise there would be no need to make it a rule! “Don’t fool yourself into believing falsehoods” is a good rule. In some situations it may hurt you, when a delusion might have been happier. The hurt is real, even if it’s outbalanced in the long run and in expected value. The regret is real. It’s just local.
“Verbal arguments for one-boxing are easy to come by, what’s hard is developing a good decision theory that one-boxes”
First, the problem needs a couple ambiguities resolved, so we’ll use three assumptions: A) You are making this decision based on a deterministic, rational philosophy (no randomization, external factors, etc. can be used to make your decision on the box) B) Omega is in fact infallible C) Getting more money is the goal (i.e. we are excluding decision-makers which would prefer to get less money, and other such absurdities)
Changing any of these results in a different game (either one that depends on how Omega handles random strategies, or one which depends on how often Omega is wrong—and we lack information on either)
Second, I’m going to reframe the problem a bit: Omega comes to you and has you write a decision-making function. He will evaluate the function, and populate Box B according to his conclusions on what the function will result in. The function can be self-modifying, but must complete in finite time. You are bound to the decision made by the actual execution of this function.
I can’t think of any argument as to why this reframing would produce different results, given both Assumptions A and B as true. I feel this is a valid reframing because, if we assume Omega is in fact infallible, I don’t see this as being any different from him evaluating the “actual” decision making function that you would use in the situation. Certainly, you’re making a decision that can be expressed logically, and presumably you have the ability to think about the problem and modify your decision based on that contemplation (i.e. you have a decision-making function, and it can be self-modifying). If your decision function is somehow impossible to render mathematically, then I’d argue that Assumption A has been violated and we are, once again, playing a different game. If your decision function doesn’t halt in finite time, then your payoff is guaranteed to be $0, since you will never actually take either box >.>
Given this situation, the AI simply needs to do two things: Identify that the problem is Newcombian and then identify some function X that produces the maximum expected payoff.
Identifying the problem as Newcombian should be trivial, since “awareness that this is a Newcombian problem” is a requirement of it being a Newcombian problem (if Omega didn’t tell you what was in the boxes, it would be a different game, neh?)
Identifying the function X is well beyond my programming ability, but I will assert definitively that there is no function that produces a highe expected payoff than f(Always One-Box). If I am proven wrong, I dare say the person writing that proof will probably be able to cash in to a rather significant payoff :)
Keep in mind that the decision function can self-modify, but Omega can also predict this. The function “commit to One-Box until Omega leaves, then switch to Two-Box because it’ll produce a higher gain now that Omega has made his prediction” would, obviously, have Omega conclude you’ll be Two-Boxing and leave you with $0.
I honestly cannot find anything about this that would be overly difficult to program, assuming you already had an AI that could handle game theory problems (I’m assuming said AI is very, very difficult, and is certainly beyond my ability).
Given this reframing, f(Always One-Box) seems like a fairly trivial solution, and neither paradoxical nor terribly difficult to represent mathematically… I’m going to assume I’m missing something, since this doesn’t seem to be the concensus conclusion at all, but since neither me nor my friend can figure out any faults, I’ll go ahead and make this my first post on LessWrong and hope that it’s not buried in obscurity due to this being a 2 year old thread :)
Rather than transforming the problem in the way you did, transform it so that you move first—Omega doesn’t put money in the boxes until you say which one(s) you want.
As a decision problem, Newcomb’s problem is rather pointless, IMHO. As a thought experiment helping us to understand the assumptions that are implicit in game theory, it could be rather useful. The thought experiment shows us that when a problem statement specifies a particular order of moves, what is really being specified is a state of knowledge at decision time. When a problem specifies that Omega moves first that is implicitly in contradiction to the claim that he knows what you will do when you move second. The implicit message is that Omega doesn’t know—the explicit message is that he does. If the explicit message is to be believed, then change the move order to make the implicit message match the explicit one.
However, here, many people seem to prefer to pretend that Newcomb problems constitute a decision theory problem which requires clever solution, rather than a bit of deliberate confusion constructed by violating the implicit rules of the problem genre.
A way of thinking of this “paradox” that I’ve found helpful is to see the two-boxer as imagining more outcomes than there actually are. For a payoff matrix of this scenario, the two-boxer would draw four possible outcomes: $0, $1000, $1000000, and $1001000 and would try for $1000 or $1001000. But if Omega is a perfect predictor, than the two that involve it making a mistake ($0 and $1001000) are very unlikely. The one-boxer sees only the two plausible options and goes for $1000000.
It took me a week to think about it. Then I read all the comments, and thought about it some more. And now I think I have this “problem” well in hand. I also think that, incidentally, I arrived at Eliezer’s answer as well, though since he never spelled it out I can’t be sure.
To be clear—a lot of people have said that the decision depends on the problem parameters, so I’ll explain just what it is I’m solving. See, Eliezer wants our decision theory to WIN. That implies that we have all the relevant information—we can think of a lot of situations where we make the wisest decision possible based on available information and it turns out to be wrong; the universe is not fair, we know this already. So I will assume we have all the relevant information needed to win. We will also assume that Omega does have the capability to accurately predict my actions; and that causality is not violated (rationality cannot be expected to win if causality is violated!).
Assuming this, I can have a conversation with Omega before it leaves. Mind you, it’s not a real conversation, but having sufficient information about the problem means I can simulate its part of the conversation even if Omega itself refuses to participate and/or there isn’t enough time for such a conversation to take place. So it goes like this...
Me: “I do want to gain as much as possible in this problem. For that effect I will want you to put as much money in the box as possible. How do I do that?”
Omega: “I will put 1M$ in the box if you take only it; and nothing if you take both.”
Me: “Ah, but we’re not violating causality here, are we? That would be cheating!”
Omega: “True, causality is not violated. To rephrase, my decision on how much money to put in the box will depend on my prediction of what you will do. Since I have this capacity, we can consider these synonymous.”
Me: “Suppose I’m not convinced that they are truly synonymous. All right then. I intend to take only the one box”.
Omega: “Remember that I have the capability to predict your actions. As such I know if you are sincere or not.”
Me: “You got me. Alright, I’ll convince myself really hard to take only the one box.”
Omega: “Though you are sincere now, in the future you will reconsider this decision. As such, I will still place nothing in the box.”
Me: “And you are predicting all this from my current state, right? After all, this is one of the parameters in the problem—that after you’ve placed money in the boxes, you are gone and can’t come back to change it”.
Omega: “That is correct; I am predicting a future state from information on your current state”.
Me: “Aha! That means I do have a choice here, even before you have left. If I change my state so that I am unable or unwilling to two-box once you’ve left, then your prediction of my future “decision” will be different. In effect, I will be hardwired to one-box. And since I still want to retain my rationality, I will make sure that this hardwiring is strictly temporary.”
fiddling with my own brain a bit
Omega: “I have now determined that you are unwilling to take both boxes. As such, I will put the 1,000,000$ in the box.”
Omega departs
I walk unthinkingly toward the boxes and take just the one
Voila. Victory is achieved.
My main conclusion is here is that any decision theory that does not allow for changing strategies is a poor decision theory indeed. This IS essentially the Friendly AI problem: You can rationally one-box, but you need to have access to your own source code in order to do so. Not having that would so inflexible as to be the equivalent of an Iterative Prisoner’s Dilemma program that can only defect or only cooperate; that is, a very bad one.
The reason this is not obvious is that the way the problem is phrased is misleading. Omega supposedly leaves “before you make your choice”, but in fact there is not a single choice here (one-box or two-box). Rather, there are two decisions to be made, if you can modify your own thinking process:
Whether or not to have the ability and inclination to make decision #2 “rationally” once Omega has left, and
Whether to one-box or two-box.
...Where decision #1 can and should be made prior to Omega’s leaving, and obviously DOES influence what’s in the box. Decision #2 does not influence what’s in the box, but the state in which I approach that decision does. This is very confusing initially.
Now, I don’t really know CDT too well, but it seems to me that presented as these two decisions, even it would be able to correctly one-box on Newcomb’s problem. Am I wrong?
Eliezer—if you are still reading these comments so long after the article was published—I don’t think it’s an inconsistency in the AI’s decision making if the AI’s decision making is influenced by its internal state. In fact I expect that to be the case. What am I missing here?
Let me try my own stab at a little chat with Omega. By the end of the chat I will either have 1001 K, or give up. Right now, I don’t know which.
Act I
Everything happens pretty much as it did in Polymeron’s dialogue, up until…
Omega: Yup, that’ll work. So you’re happy with your 1000 K?
Act II
Whereupon I try to exploit randomness.
Me: Actually, no. I’m not happy. I want the entire 1001 K. Any suggestions for outsmarting you?
Omega: Nope.
Me: Are you omniscient?
Omega: As far as you’re concerned, yes. Your human physicists might disagree in general, but I’ve got you pretty much measured.
Me: Okay, then. Wanna make a bet? I bet I can find a to get over 1000 K if I make a bet with you. You estimate your probability of being right at 100%, right? Nshepperd had a good suggestion….
Omega: I won’t play this game. Or let you play it with anyone else. I thought we’d moved past that.
Me: How about I flip a fair coin to decide between B and A+B. In fact, I’ll use ’s generator using the principle to generate the outcome of a truly random coin flip. Even you can’t predict the outcome.
Omega: And what do you expect to happen as a result of this (not-as-clever-as-you-think) strategy?
Me: Since you can’t predict what I’ll do, hopefully you’ll fill both boxes. Then there’s a true 50% chance of me getting 1001 K. My expected payoff is 1000.5 K.
Omega: That, of course, is assuming I’ll fill both boxes.
Me: Oh, I’ll make you fill both boxes. I’ll bias the ’s to 50+eps% chance of one-boxing for the expected winnings of 1000.5 K – eps. Then if you want to maximize your omniscience-y-ness, you’ll have to fill both boxes.
Omega: Oh, taking others’ suggestions already? Can’t think for yourself? Making edits to make it look like you’d thought of it in time? Fair enough. Attribute this one to gurgeh. As to the idea itself, I’ll disincentivize you from randomization at all. I won’t fill box B if I predict you cheating.
Me: But then there’s a 50-eps% chance of proving you wrong. I’ll take it. MWAHAHA.
Omega: What an idiot. You’re not trying to prove me wrong. You’re trying to maximize your own profit.
Me: The only reason I don’t insult you back is because I operate under Crackers Rule.
Omega: Crocker’s Rules.
Me: Uh. Right. Whoops.
Omega: Besides. Your ’s random generator idea won’t work even to get you the cheaters’ utility for proving me wrong.
Me: Why not? I thought we’d established that you can’t predict a truly random outcome.
Omega: I don’t need to. I can just mess with your ’s randomness generator so that it gives out pseudo-random numbers instead.
Me: You’re omnipotent now, too?
Omega: Nope. I’ll just give someone a million dollars to do something silly.
Me: No one would ever…! Oh, wait. Anyway, I’ll be able to detect tampering with randomness, the same way it’s possible with a Mersenne twister….
Omega: And I know exactly how soon you’ll give up. Oh, and don’t waste page space suggesting secondary and tertiary levels of ensuring randomness. If, to guide your behavior, you’re using the table of random numbers that I already have, then I already know what you’d do.
Me: Is there any way at all of outsmarting you and getting 1001 K?
Omega: Not one you can find.
Me: Okay then… let me consult smarter people.
This conversation is obviously not going my way. Any suggestions for Act III?
I wanted to consider some truly silly solution. But since taking only box A is out (and I can’t find a good reason for choosing box A, other than a vague argument based in irrationality along the lines that I’d rather not know if omniscience exists…), so I came up with this instead. I won’t apologize for all the math-economics, but it might get dense.
Omega has been correct 100 times before, right? Fully intending to take both boxes, I’ll go to each of the 100 other people. There’re 4 categories of people. Let’s assume they aren’t bound by psychology and they’re risk-neutral, but they are bound by their beliefs.
Two-boxers who defend their decision do so on ground of “no backwards causality” (uh, what’s the smart-people term for that?). They don’t believe in Omega’s omniscience. There’s Q1 of these.
Two-boxers who regret their decision also concede to Omega’s near-perfect omniscience. There’re Q2 of these.
One-boxers who’re happy also concede to Omega’s near-perfect omniscience. There’re Q3 of these.
One-boxers who regret foregoing $1000. They don’t believe in Omega’s omniscience. There’re Q4 of these.
I’ll offer groups 2 and 3 (believers in that I’ll only get 1000) to split my 1000 between them, in proportion to their bet, if they’re right. If they believe in Omega’s perfect predictive powers, they think there’s a 0% chance of me winning. Therefore, it’s a good bet for them. Expected profit = 1000/weight-0*(all their money)>0
Groups 1 and 4 are trickier. They think Omega has a P chance of being wrong about me. I’ll ask them to bet X=1001000P/((1-P)weight)-eps, where weight is a positive number >1 that’s a function of how many people donated how much. Explicitly defining weight(Q1, Q4, various money caps) is a medium-difficulty exercise for a beginning calculus student. If you insist, I’ll model it, but it will take me more time than I’d already spent on this. So, for a person in one of these groups, expected profit = -X(1-P)+1001000P/weight = eps > 0!
So what do I have now? (Should I pray to Bayes that my intuition be confirmed?) There’re two possible outcomes of taking both boxes.
Both are full. I give the 1001000 to groups 1 and 4, and collect Q21000+Q31000000 from groups 2 and 3, which is more than 1001000 if Q3>0 AND Q2>0, or if Q3>1. This outcome has potential for tremendous profit. Call this number PIE >> 1001000.
Only A is full. I split my 1000 between groups 2 and 3, and collect X1Q1+X4Q4 from groups 1 and 4. What are X1 and X4 again? X, the amount of money group 1 and group 4 bet, is unique for each group. I called group 1’s X X1, group 4’s X4.
I need to find the conditions when X1Q1+X4Q4 > 1000. So suppose I undermaximized my profit, and completely ignored the poor group 1 (their 1000 won’t make much difference either way). Then X=X4 becomes much simpler, X=1001000P/((1-P)Q4)-eps, and then they payoff I get is -Q4eps+1001000P/(1-P). P = 0.001 and Q4eps < $2 guarantee X1Q1+X4Q4 > X4Q4 > 1000.
That’s all well and good, but if P is low (under 0.5), I’m getting less than 1001000. What can I do? Hedge again! I would actually go to people of groups 1 and 4 again, except it’s getting too confusing, so let’s introduce a “bank” that has the same mentality as the people of groups 1 and 4 (that there’s a chance P that Omega will be wrong about me). Remember PIE? The bank estimates my chances of getting PIE at P. Let’s say if I don’t get PIE, I get 1000 (which is the lowest possible profit for outcome 2; otherwise it’s not worth making that bet). I ask the following sum from the bank: PIEP+1000(1-P) – eps. The bank makes a profit of eps > 0. Since PIE is a large number, my profit at the end is approximately PIEP+1000(1-P) > 1001000.
Note that I’d been trying to find the LOWER bound on this gambit. Actually plugging in numbers for P and Q’s easily yielded profits in the 5 mil to 50 mil range.
You’re essentially engaging in arbitrage, taking advantage of the difference in the probabilities assigned to both boxes being full by different people. Which is one reason rational people never assign 0 probability to anything.
You could just as well go to some one-boxers (who “believe P(both full) = 0”) and offer them a $1 bet 10000000:1 in your favor that both boxes will be full; then offer the two-boxers whatever bet they will take “that only one box is full” that will give you more than $1 profit if you win. Thus, either way, you make a profit, and you can make however much you like just by increasing the stakes.
This still doesn’t actually solve newcomb’s problem, though. I’d call it more of a cautionary tale against being absolutely certain.
(Incidentally, since you’re going into this “fully intending” to take both boxes, I’d expect both one boxers and two boxers to agree on the extremely low probability Omega is going to have filled both boxes.)
Yes, nshepperd, my assumption is that P << 0.5, something in the 0.0001 to 0.01 range.
Besides, arbitrage would still be possible if some people estimated P=0.01 and others P=0.0001, only the solution would be messier than what I’d ever want to do casually. Besides, if I were unconstrained by the bets I could make (I’d tried to work with a cap before), that would make making profits even easier.
I wasn’t exactly trying to solve the problem, only to find a “naively rational” workaround (using the same naive rationality that leads prisoners to rat each other out in PD).
When you’re saying that this doesn’t solve Newcomb’s problem, what do you expect the solution to actually entail?
Yes, arbitrage is possible pretty much whenever people’s probabilities disagree to any significant degree. Setting P = 0 just lets you take it to absurd levels (eg. put up no stake at all, and it’s still a “fair bet”).
Maximizing the money found upon opening the box(es) you have selected.
If you like, replace the money with cures for cancer with differing probabilities of working, or machines with differing probabilities of being a halting oracle, or something else you can’t get by exploiting other humans.
I don’t know, I feel pretty confident assigning P(A&!A)=0 :P
Do you assign 0 probability to the hypothesis that there exists something which you believe to be mathematically true which is not?
No, P(I’m wrong about something mathematical) is 1-epsilon. P(I’m wrong about this mathematical thing) is often low- like 2%, and sometimes actually 0, like when discussing the intersection of a set and its complement. It’s defined to be the empty set- there’s no way that it can fail to be the empty set. I may not have complete confidence in the rest of set theory, and I may not expect that the complement of a set (or the set itself) is always well-defined, but when I limit myself to probability measures over reasonable spaces then I’m content.
So, for some particular aspects of math, you have certainty 1-epsilon, where epsilon is exactly zero?
What you are really doing is making the claim “Given that what I know about mathematics is correct, then the intersection of a set and its complement is the empty set.”
I was interpreting “something” as “at least one thing.” Almost surely my understanding of mathematics as a whole is incorrect somewhere, but there are a handful of mathematical statements that I believe with complete metaphysical certitude.
“Correct” is an unclear word, here. Suppose I start off with a handful of axioms. What is the probability that one of the axioms is true / correct? In the context of that system, 1, since it’s the starting point. Now, the axioms might not be useful or relevant to reality, and the axioms may conflict and thus the system isn’t internally consistent (i.e. statements having probability 0 and 1 simultaneously). And so the geometer who is only 1-epsilon sure that Euclid’s axioms describe the real world will be able to update gracefully when presented with evidence that real space is curved, even though they retain the same confidence in their Euclidean proofs (as they apply to abstract concepts).
Basically, I only agree with this post when it comes to statements about which uncertainty is reasonable. If you require 1-epsilon certainty for anything, even P(A|A), then you break the math of probability.
The map is not the territory. “A&!A” would mean some fact about the world being both true and false, rather than anyone’s beliefs about that fact.
Assigning zero or nonzero probability to that assertion is having a belief about it.
Yes, the probability is a belief, but your previous question was about something more like P(!A&P(A)=1), that is to say, an absolute belief being inconsistent with the facts. Vaniver’s assertion was about the facts themselves being inconsistent with the facts, which would have a rather alarming lack of implications.
“Pretty confident” is about as close to “actually 0″ as the moon is (which I don’t care to quantify :P).
“Pretty confident” was also a rhetorical understatement. :P
How is there anybody in this group? Considering that all of them have $1,000,000, what convinced them to one-box in the first place such that they later changed their minds about it and regretted the decision? (Like, I guess a one-boxer could say afterwards “I bet that guy wasn’t really omniscient, I should have taken the other box too, then I’d have gotten $1,001,000 instead”, but why wouldn’t a person who thinks that way two-box to begin with?)
True.
I only took that case into account for completeness, to cover my bases against the criticism that “not all one-boxers would be happy with their decisions.”
Naively, when you have a choice between 1000000.01 and 1000000.02, it’s very easy to argue that the latter is the better option. To argue for the former, you would probably cite the insignificance of that cent next to the rest of 1000000.01: that eps doesn’t matter, or that an extra penny in your pocket is inconvenient, or that you already have 1000000.01, so why do you need another 0.01?
1) I would one-box. Here’s where I think the standard two-boxer argument breaks down. It’s the idea of making a decision. The two-boxer idea is that once the boxes have been fixed the course of action that makes the most money is taking both boxes. Unless there is reverse causality going on here, I don’t think that anyone disputes this. If at that moment you could make a choice totally independently of everything leading up to that point you would two-box. Unfortunately, the very existence of Omega implies that such a feat is impossible.
2) A mildly silly argument for one-boxing: Omega plausibly makes his decision by running a simulation of you. If you are the real copy, it might be best to two-box, but if you are the simulation then one-boxing earns real-you $1000000. Since you can’t distinguish whether this is real-you or simulation-you, you should one-box.
3) Would it change things for people if instead of $1000000 vs $1000 it were $1001 vs $1000? Where is the line drawn?
4) Eliezer: just curious about how you deal with paradoxes about infinity in your utility function. If for each n, on day n you are offered to sacrifice one unit of utility that day to gain one unit of utility on day 2n and one unit on day 2n+1 what do you do? Each time you do it you seem to gain a unit of utility, but if you do it every day you end up worse than you started.
dankane, Eliezer answered your question in this comment, and maybe somewhere else, too, that I don’t yet know of.
If he wasn’t really talking about infinities, how would you parse this comment (the living forever part):
“There is no finite amount of life lived N where I would prefer a 80.0001% probability of living N years to an 0.0001% chance of living a googolplex years and an 80% chance of living forever.”
At very least this should imply that for every N there is an f(N) so that he would rather have a 50% chance of living f(N) years and a 50% chance of dying instantly than having a 100% chance of living for N years. We could then consider the game where if he is going to live for N years he is repeatedly offered the chance to instead live f(N) years with 50% probability and 0 years with 50% probability. Taking the bet n+1 times clearly does better than taking it n times, but the strategy “take the bet until you lose” guarantees him a very short life expectancy.
If your utility function is unbounded you can run into paradoxes like this.
Actually I take it back. I think that what I would do depends on what I know of how Omega functions (exactly what evidence lead me to believe that he was good at predicting this).
Omega #1: (and I think this one is the most plausible) You are given a multiple choice personality test (not knowing what’s about to happen). You are then told that you are in a Newcomb situation and that Omega’s prediction is based on your test answers (maybe they’ll even show you Omega’s code after the test is over). Here I’ll two-box. If I am punished I am not being punished for my decision to two-box, I am being punished for my test answers, and in reality am probably being punished for having personality traits that correlate well with being a two-boxer. I can rationally regret having the wrong personality traits.
Omega #2: You are sent through the Newcomb dilemma, given an amnesia pill and then sent through for real. Omega’s prediction is whatever you did the first time (this is similar to the simulation case). If I know this is going on, I clearly one-box because I don’t know whether this is the first time through or the second time through.
Omega #3: Omega makes his prediction by observing me and using a time machine. Clearly I one-box.
Omega #4: It is inscribed in the laws of physics somewhere the Omega cannot make a prediction that comes out wrong. Clearly I one-box.
But I think that the problem as stated is ill posed since I don’t know what my probability distribution over Omegas should be (given that it depends a lot on exactly what evidence convinces me that Omega is actually a good predictor).
The first case directly contradicts the specifications of the problem, since the idea then becomes to imagine you were the sort of person who would one-box and answer like that, then two box. This might not work for everyone, but a sufficiently clever agent should manage it.
If you are imagining a personality test undertaken in secret, or before you knew you were facing Newcomb’s problem, and stating you would two-box, then it seems like you one-box when it is absolutely certain that omega is right, but two-box if you can think of some way (however unlikely) that he might be wrong.
If you don’t see the problem with this then I suggest you read some of the sequence posts about absolute certainty.
In the first case, I image the test undertaken in secret. Or more realistically Omega measures these personality traits from listening to my conversations, or reading things I post online.
I don’t decide based on whether there is a possibility that Omega is wrong. #2 can certainly be wrong (for example if I decide based on coin flip) and even #3 can probably mess up. My point is that in case #1 the argument from the post no longer works. If I two-boxed and didn’t get $1M, I might envy another person for their personality traits (which correlate with one-boxing), but not their decision to one-box.
I think what I am trying to do is split Omega’s decision procedure into cases where either:
His prediction is clearly caused by my decision (so I should one-box)
His prediction is not caused by my decision (and so I can two-box without regretting my choice)
(#2 is a special case where I try to be clever.)
Okay, I misunderstood you.
Even now, I think I would still one-box in case#1. For one thing, it is clearly in my interests, thinking about the problem in advance, to resolve to do so, since the personality test will reveal this fact and I will get the million.
Would you agree with me that far? If so, how do you handle the problem that you seem to be making different decisions at different times, without receiving any new information in between.
Do you really think that merely deciding to one-box in such a situation would change your personality in a way that gets picked up by the test? If it does, do you want to modify your personality in a measurable way just so that you can win if you happen to run into a Newcomb problem?
Suppose for example it had been determined empirically that whether or not one was religious correlated well with the number of boxes you took. This could then be one of the things that the personality test measures. Are you saying that a precommitment would change your religious beliefs, or that you would change them in addition to deciding to one-box (in which case, why are you changing the latter at all)?
The point in case 1 is that they are not making a direct measurement of your decision. They are merely measuring external factors so that for 99% of people these factors agree with their decision (I think that this is implausible, but not significantly more implausible than the existence of Omega in the first place). It seems to me very unlikely that just changing your mind on whether you should one-box would also automatically change these other factors. And if it does, do you necessarily want to be messing around with your personality just to win this game that will almost certainly never come up?
If merely deciding to one-box is not picked up by the test, and does not offer even a slight increase in the probability that the money is there (even 51% as opposed to 50% would be enough) then the test is not very good, in which case I would two-box. However, this seems to contradict the stated fact the Omega is in fact a very good predictor of decisions.
As a general principle, I am most definitely interested in modifying my personality to increase the number of situations in which I win. If I wasn’t, I probably wouldn’t be on LW. The religion example is a strawman, as it seems clear that applying the modification “believe in God” will cause me to do worse in many other much more common situations, whereas “one-box in Newcomb-type dilemma’s” doesn’t seem likely to have many side effects.
If Omega really is just measuring external factor’s, then how do you know he won’t pick up on my decision to always one-box. The decision was not made in a vacuum, it was caused by my personality, my style of thinking and my level of intelligence, all of which are things hat any reasonably competent predictor should pick up on.
As long as the test is reasonably good, I will still my million with a higher probability, and that’s all that really matters to me.
How about this version of Omega (and this is one that I think could actually be implemented to be 90% accurate). First off, box A is painted with pictures of snakes and box B with pictures of bananas. Omega’s prediction procedure is (and you are told this by the people running the experiment) that if you are a human he predicts that you two-box and if you are a chimpanzee, he predicts that you one-box.
I don’t think that 10% of people would give up $1000 to prove Omega wrong, and if you think so, why not make it $10^6 and $10^9 instead of $10^3 and $10^6.
I feel like this version satisfies the assumptions of the problem and makes it clear that you should two-box in this situation. Therefore any claims that one-boxing is the correct solution need to at least be qualified by extra assumptions about how Omega operates.
In this version Omega may be predicting decision’s in general with some accuracy, but it does not seem like he is predicting mine.
So it appears there are cases where I two-box. I think in general my specification of a Newcomb-type problem, has two requirements:
An outside observer who observed me to two-box would predict with high-probability that the money is not there. An outside observer who observed me to one-box would predict with high-probability that the money is there.
The above version of the problem clearly does not meet the second requirement.
If this is what you meant by your statement that the problem is ambiguous, then I agree. This is one of the reasons I favour a formulation involving a brain-scanner rather than a nebulous godlike entity, since it seems more useful to focus on the particularly paradoxical cases rather than the easy ones.
I don’t think that you change of just that decision would be picked up on a personality test. Your changing that decision is unlikely to change how you answer questions not directly relating to Newcomb’s problem. The test would pick up your style of thinking that lead you to this decision, but making the decision differently would not change your style of thinking. Perhaps an example that illustrates my point even better:
Omega #1.1: Bases his prediction on a genetic test.
Now I agree that it is unlikely that this will get 99% accuracy, but I think it could plausibly obtain, say, 60% accuracy, which shouldn’t really change the issue at hand. Remember that Omega does not need to measure things that cause you to decide one way or another, he just needs to measure things that have a positive correlation with it.
As for modifying your personality… Should I really believe that you believe that arguments that you are making here, or are you just worried that you are going to be in this situation and that Omega will base his prediction on your posts?
Good point with the genetic test argument, in that situation I probably would two-box. The same might apply to any sufficiently poor personality test, or to a version of Omega that bases his decision of the posts I make on Less Wrong (although I think if my sole reason for being here was signalling my willingness to make certain choices in certain dilemma’s I could probably find better ways to do it).
I usually imagine Omega does better than that, and that his methods are at least as sophisticated as figuring out how I make decisions, then applying that algorithm to the problem at hand (the source of this assumption is that the first time I saw the problem Omega was a supercomputer that scanned people’s brains).
As for the personality modification thing, I really don’t see what you find so implausible about the idea that I’m not attached to my flaws, and would eliminate them if I had the chance.
I agree that the standard interpretation of Omega generally involves brain scans. But there is still a difference between running a simulation (Omega #2), or checking for relevant correlating personality traits. The later I would claim is at least somewhat analogous to genetic testing, though admittedly the case is somewhat murkier. I guess perhaps the Omega that is most in the spirit of the question is where he does a brain scan and searches for your cached answer of “this is what I do in Newcomb problems”.
As for personality modification, I don’t see why changing my stored values for how to behave in Newcomb situations would change how I behave in non-Newcomb situations. I also don’t see why these changes would necessarily be an improvement.
“I don’t see why changing my stored values for how to behave in Newcomb situations would change how I behave in non-Newcomb situations.”
It wouldn’t, that’s the point. But it would improve your performance in Newcomb situations, so there’s no downside (for an example of a newcomb type paradox which could happen in the real world, see Parfit’s hitch-hiker, given that I am not a perfect liar I would not consider it too unlikely that I will face a situation of that general type (if not that exact situation) at some point in my life).
My point was that if it didn’t change your behavior in non-Newcomb situations, no reasonable version of Omega #1 (or really any Omega that does not use either brain scans or lie detection could tell the difference).
As for changing my actions in the case of Parfit’s hitch-hiker, say that the chances of actually running into this situation (with someone who can actually lie detect and in a situation with no third alternatives, and where my internal sense of fairness wouldn’t just cause me to give him the $100 anyway) is say 10^-9. This means that changing my behavior would save me an expected say 3 seconds of life. So if you have a way that I can actually precommit myself that takes less than 3 seconds to do, I’m all ears.
It wouldn’t have to be that exact situation.
In fact, it is applicable in any situation where you need to make a promise to someone who has a reasonable chance of spotting if you lie (I don’t know about you but I often get caught out when I lie), and while you prefer following through on the promise to not making it, you also prefer going back on the promise to following through on it, (technically they need to have a good enough chance of spotting you, with “good enough” determined by your relative preferences).
That’s quite a generic situation, and I would estimate at least 10% probability that you encounter it at some point, although the stakes will hopefully be lower than your life.
Perhaps. Though I believe that in the vast majority of these cases my internal (and perhaps irrational) sense of fairness would cause me to keep my word anyway.
“the dominant consensus in modern decision theory is that one should two-box...there’s a common attitude that verbal arguments for one-boxing are easy to come by, what’s hard is developing a good decision theory that one-boxes”
This may be more a statement about the relevance and utility of decision theory itself as a field (or lack thereof) than the difficulty of the problem, but it is at least philosophically intriguing.
From a physical and computational perspective, there is no paradox, and one need not invoke backwards causality, ‘pre-commitment”, or create a new ‘decision theory’.
The chain of physical causality just has a branch:
M0-> O(D)-> B
M0-> M1-> M2-> .. MN ->D
and O(D) = D
Where M0, M1, M2 .. . MN are the agent’s mind states, D is the agent’s decision, O is Omega’s prediction of the decision, and B is the content of box B.
Your decision does not physically cause the contents of box B to change. Your decision itself however is caused by your past state of mind, and this prior state is also the cause of the box’s current contents (via the power of Omega’s predictor). So your decision and the box’s contents are casually linked, entangled if you will.
From your perspective, the box’s contents are unknown. Your final decision is also unknown to you, undecided, until the moment you make that decision by opening the box. Making the decision itself reveals this information about your mind history to you, along with the contents of the box.
One way of thinking about it is that this problem is an illustration of the dictum that any mind or computational system can never fully predict itself from within.
Note that in the context of actual AI in computer science, this type of reflective search (considering a potential decision, then agent B’s consequent decision, your next decision, and so on, exploring a decision tree) is pretty basic stuff. In this case the Omega agent essentially has an infinite branching depth, but the decision at each point is pretty simple—because Omega always gets the ‘last move’.
You may start as a ‘one boxer’, thinking that after the scan, you can now outwit Omega by ‘self-modifying’ into a ‘two-boxer’ (which really can be just as simple as changing your internal register), but Omega already predicted this move .. and your next reactive move of flipping back to a ‘one-boxer’ . . and the next, on and on to infinity . . .until you finally run out of time and the register is sampled. You can continue chaining M’s to infinity, but you can’t change the fact that MN->D and O(D) = D.
Part of the confusion experienced by the causal decision camp may stem from the subjectivity of the solution.
The optimal decision for some abstract algorithm, divorced from Omega’s predictive brainscan, will of course choose to two-box, simply because it’s decision is not causally linked to the box’s contents.
But your N-box register is linked to the box’s contents, so you should set it to 1.
Upon reading this, I immediately went,
“Well, General Relativity includes solutions that have closed timelike curves, and I certainly am not in any position to rule out the possibility of communication by such. So I have no actual reason to rule out the possibility that which strategy I choose will, after I make my decision, be communicated to Omega in my past and then the boxes filled accordingly. So I better one-box in order to choose the closed timelike loop where Omega fills the box.”
I understand, looking at Wikipedia, that in Nozick’s formulation he simply declared that the box won’t be filled based on the actual decision. Fine. How would he go about proving that to someone actually faced with the scenario? Rational people do not risk a million dollars based on an unprovable statement by a philosopher. Same with claims that, for example, Omega didn’t set up the boxes so that two-boxing actually results in the annihilation of the contents of box B. Or that Omega doesn’t teleport the money in B somehow after the decider makes the decision to one-box. Those declarations may have a truth value of 1 for purposes of a person outside observing the scenario, but unless empirically testable within the scenario, cannot be valued as approximating 1 by the person making the decision.
Every “given” that the decision-maker can’t verify is a “given” that is not usable for making the decision. The whole argument for two-boxing depends on a boundary violation; that the knowledge known by the reader but which cannot be known to the character in the scenario can somehow be used by the character in the scenario to make a decision.
The “no backwards causality” argument seems like a case of conflation of correlation and causation. Your decision doesn’t retroactively cause Omega to fill the boxes in a certain way; some prior state of the world causes your thought processes and Omega’s prediction, and the correlation is exactly or almost exactly 1.
EDIT: Correlation coefficients don’t work like that, but whatever. You get what I mean.
The “no backwards causality” argument seems like a case of conflation of correlation and causation. Your decision doesn’t retroactively cause Omega to fill the boxes in a certain way; some prior state of the world causes your thought processes and Omega’s prediction, and the correlation is exactly or almost exactly 1.
The original description of the problem doesn’t mention if you know of Omega’s strategy for deciding what to place in box B, or their success history in predicting this outcome—which is obviously a very important factor.
If you know these things, then the only rational choice, obviously and by a huge margin, is to pick only box B.
If you don’t know anything other than box B may or may not contain a million dollars, and you have no reasons to believe that it’s unlikely, like in the lottery, then the only rational decision is to take both. This also seems to be completely obvious and unambiguous.
But since this community has spent a while debating this, I conclude that there’s a good chance I have missed something important. What is it?
It looks like you just restated the “paradox”—using one argument, it is “obvious” to pick B and using another argument, it is “obvious” to pick both.
Also, in general, do try to avoid saying something is “obvious”. It usually throws a lot of complexity and potential faults into a black box and worsens your chances of uncovering those faults by intimidating people.
You are betting a positive extra payout of $1,000 against a net loss of -$999,000 that there are no Black Swans[1] at all in this situation.
Given that you already have 100 points of evidence that taking Box A makes Box B empty (added to the evidence that Omega is more intelligent than you). I’d say that’s a Bad Bet to make.
Given the amount of uncertainty in the world, choosing Box B instead of trying to “beat the system” seems like the rational step to me.
Edit I’ve given the Math in a comment below to show how to calculate when to make either decision.
[1] ie something you didn’t think of that makes Box B empty even after Omega’s gone away, or an invisible portkey in box B that is activated the moment you pick up Box A, or Omega’s time-machine that let him go forward to see your decision before putting the money into the boxes… or a device using some hand-wavey quantum state that lets either Box A be taken or Box B’s contents to exist…
So working the math on that
Let P(BS) = probability of a Black Swan being involved
This makes the average payout work out to:
1-Box = $1,000,000
2-Box = $1,001,000 (1 - P(BS)) + $1,000 P(BS)
Now it seems to be that the average 2-boxer is assuming that P(BS) = 0, which would make the 2-Box solution always == $1,001,000 which would, of course, always beat the 1-box solution.
and maybe in this toy-problem, they’re right to assume P(BS) = 0 But IRL that’s almost never the case—after all, 0 is not a probability yes?
So assume that P(BS) is non-zero. t what point would it be worth it to choose the 1-Box solution and what point the 2-Box solution? Lets run the math:
1,000,000 = 1,001,000(1-x) + 1000x = 1001000 − 1001000x + 1000x = 1001000 - (1002000x)
=> 1000000 − 1001000 = −1002000x
=> x = −1000/-100200
=> x = 0.000998004
So, the estimated probability of Black Swan existing only has to be greater than 0.0998% for the 1-Box solution to have a greater expected payout and therefore the 1-Box option is the more rational::Bayesian choice
OTOH, if you can guarantee that P(BS) is less than 0.0998%, then the rational choice is to 2-Box.
Edit: Never mind, my comment resulted from a confusion.
http://wiki.lesswrong.com/wiki/Least_convenient_possible_world
I’m not sure what you are implying with this link—can you please expand? Are you saying that I’m choosing a least convenient possible world (and if so, how and what) or that 2-boxers are doing so?
Sorry, your comment was confusing and I didn’t properly concentrate on what you meant, so giving the LCPW link was a mistake, it doesn’t seem to apply.
No problem. I’ve expanded with the math explaining what I mean, hopefully that makes it less confusing what I was aiming at.
You are finding technical flaws that are not essential to the intended sense of the thought experiment. Instead of making it uninteresting because of the potential flaws, make the thought experiment stronger by considering the case where these flaws are fixed.
How would Newcomb’s problem look like in the physical world, taking quantum physics into account? Specifically, would Omega need to know quantum physics in order to predict my decision on “to one box or not to one box”?
To simplify the picture, imagine that Omega has a variable with it that can be either in the state A+B or B and which is expected to correlate with my decision and therefore serves to “predict” me. Omega runs some physical process to arrive at the contents of this variable. I’m assuming that “to predict” means “to simulate”—i.e. Omega can predict me by running a simulation of me (say using a universal quantum Turing machine) though that is not necessarily the only way to do so. Given that we’re in a quantum world, would Omega actually need to simulate me in order to ensure a correlation between its variable and my choice, potentially in another galaxy, of whether to pick A+B or B?
Say |Oab> and |Ob> are the two eigenstates of Omega’s variable (w.r.t. some operator it has) and the box system in front of me similarly has two eigenstates |Cab> and |Cb> (“C” for “choice”) and my “action” is simply a choice of measuring the box system in the state |Cab> or in the state |Cb> and not a mixture of them.
If Omega sets up an EPR-like entanglement between its variable and the box system of the form m|Oab>|Cab> + n|Ob>|Cb>, and then chooses to measure a mixed state of its variable, say, |Oab>+|Ob>, it can bifurcate the universe. Then, if I measure |Cab> (i.e. choose A+B), I end up in the same universe as the one in which Omega measured its variable to be |Oab> and if I choose |Cb>, I end up in the same universe as the one in which Omega measured its variable to be |Ob>. Therefore, if our two systems are entangled this way, Omega wouldn’t need to take any trouble to simulate me at all in order to ensure its reputation of being a perfect predictor!
That is only as far as Omega’s reputation for being a perfect predictor is concerned. But hold on for a moment there. In this setup, the box system’s state is not disconnected from that of Omega’s predictor variable even if Omega has left the galaxy and yet Omega cannot causally influence it “contents”. In my thinking, this is an argument against the stance of the “causal decision theorists” that whatever the contents of the box, it is “fixed” and therefore I maximize my utility by picking A+B. This is now an argument for the one boxers observing that Omega has shown a solid history of being right (i.e. Omega’s internal variable has always correlated with the choices of all the people before), forming the simplest (?) explanation that Omega could be using quantum entanglement (edit: EPR-like entanglement) to effect the correlation, and therefore choosing to one box so that they end up in the universe with a million bucks instead of the one with a thousand.
So, my final question to people here is this—does knowledge of quantum physics resolve Newcomb’s problem in favour of the one boxers? If not, the arguments certainly would be interesting to read :)
edit: To clarify the argument against the causal decision theorists, “B is either empty or has a million bucks” is not true. It could be in a superposition of the two that is entangled with Omega’s variable. Therefore the standard causal argument for picking A+B doesn’t hold any more.
...
...
It seems to me that if all that is true, and you want to build a Friendly AI, then the rational thing to do here is build it and let it solve all problems like these. That way, you win, at least in the time-management sense. Well, you might lose if you encountered Omega before the FAI was up and running, but that seems unlikely. Am I missing something here?
It will also have to precommit to mere humans who can’t read its source code and can’t predict the future, so solving the problem in the case where you meet Omega doesn’t solve the problem in general.
Causal decision theorists don’t self-modify to timeless decision theorists. If you get the decision theory wrong, you can’t rely on it repairing itself.
You said:
but you also said:
I can envision several possibilities:
Perhaps you changed your mind and presently disagree with one of the above two statements.
Perhaps you didn’t mean a causal AI in the second quote. In that case I have no idea what you meant.
Perhaps Newcomb’s problem is the wrong example, and there’s some other example motivating TDT that a self-modifying causal agent would deal with incorrectly.
Perhaps you have a model of causal decision theory that makes self-modification impossible in principle. That would make your first statement above true, in a useless sort of way, so I hope you didn’t mean that.
Would you like to clarify?
Causal decision theorists self-modify to one-box on Newcomb’s Problem with Omegas that looked at their source code after the self-modification took place; i.e., if the causal decision theorist self-modifies at 7am, it will self-modify to one-box with Omegas that looked at the code after 7am and two-box otherwise. This is not only ugly but also has worse implications for e.g. meeting an alien AI who wants to cooperate with you, or worse, an alien AI that is trying to blackmail you.
Bad decision theories don’t necessarily self-repair correctly.
And in general, every time you throw up your hands in the air and say, “I don’t know how to solve this problem, nor do I understand the exact structure of the calculation my computer program will perform in the course of solving this problem, nor can I state a mathematically precise meta-question, but I’m going to rely on the AI solving it for me ’cause it’s supposed to be super-smart,” you may very possibly be about to screw up really damned hard. I mean, that’s what Eliezer-1999 thought you could say about “morality”.
Okay, thanks for confirming that Newcomb’s problem is a relevant motivating example here.
I’m not saying that. I’m saying that self-modification solves the problem, assuming the CDT agent moves first, and that it seems simple enough that we can check that a not-very-smart AI solves it correctly on toy examples. If I get around to attempting that, I’ll post to LessWrong.
Assuming the CDT agent moves first seems reasonable. I have no clue whether or when Omega is going to show up, so I feel no need to second-guess the AI about that schedule.
(Quoting out of order)
As you know, we can define a causal decision theory agent in one line of math. I don’t know a way to do that for TDT. Do you? If TDT could be concisely described, I’d agree that it’s the less ugly alternative.
I’m failing to suspend disbelief here. Do you have motivating examples for TDT that seem likely to happen before Kurzweil’s schedule for the Singularity causes us to either win or lose the game?
If you appreciate simplicity/elegance, I suggest looking into UDT. UDT says that when you’re making a choice, you’re deciding the output of a particular computation, and the consequences of any given choice are just the logical consequences of that computation having that output.
CDT in contrast doesn’t answer the question “what am I actually deciding when I make a decision?” nor does it answer “what are the consequences of any particular choice?” even in principle. CDT can only be described in one line of math because the answer to the latter question has to be provided to it via an external parameter.
Thanks, I’ll have a look at UDT.
I certainly agree there.
Maybe this one: “Argmax[A in Actions] in SumO in Outcomes*P(this computation yields A []-> O|rest of universe)”
From this post.
I’m reasonably sure Eliezer meant implications for the would-be friendly AI meeting alien AIs. That could happen at any time in the remaining life span of the universe.
Why not? A causal decision theorist can have an accurate abstract understanding of both TDT and CDT and can calculate the expected utility of applying either. If TDT produces a better expected outcome in general then it seems like self modifying to become a TDT agent is the correct decision to make. Is there some restriction or injunction assumed to be in place with respect to decision algorithm implementation?
Thinking about it for a a few minutes: It would seem that the CDT agent will reliably update away from CDT but that the new algorithm will be neither CDT or TDT (and not UDT either). It will be able to cooperate with agents when there has been some sort causal entanglement between the modified source code and the other agent but not able to cooperate with complete strangers. The resultant decision algorithm is enough of an attractor that it deserves a name of its own. Does it have one?
Doesn’t have a name as far as I know. But I’m not sure it deserves one; would CDT really be a probable output anywhere besides a verbal theory advocated by human philosophers in our own Everett branch? Maybe, now that I think about it, but even so, does it matter?
But it will calculate that expected value using CDT!expectation, meaning that it won’t see how self-modifying to be a timeless decision theorist could possibly affect what’s already in the box, etcetera.
Yes, because there are lemmas you can prove about (some) decision theory problems which imply that CDT and UDT give the same output. For example, CDT works if there is exists a total ordering over inputs given to the strategy, common to all execution histories, such that the world program invokes the strategy only with increasing, non-repeating inputs on that ordering. There are (relatively) easy algorithms for these cases. CDT in general is then a matter of applying a theorem when one of its preconditions doesn’t hold, which is one of the most common math mistakes ever.
Is that really so bad, if it takes the state of the world at the point before it self-modifies as an unchangeable given, and self-modifies to a decision theory that only considers states from that point on as changeable by its decision theory? For one thing, doesn’t that avoid Roko’s basilisk?
If you do that, you’d be vulnerable to extortion from any other AIs that happen to be created earlier in time and can prove their source code.
I’m inclined to think that in most scenarios the first AGI wins anyway. And leaving solving decision theory to the AGI could mean you get to build it earlier.
I was thinking of meeting alien AIs, post-Singularity.
Huh? I thought we were supposed to be the good guys here? ;-)
But seriously, “sacrifice safety for speed” is the “defect” option in the game of “let’s build AGI”. I’m not sure how to get the C/C outcome (or rather C/C/C/...), but it seems too early to start talking about defecting already.
Besides, CDT is not well defined enough that you can implement it even if you wanted to. I think if you were forced to implement a “good enough” decision theory and hope for the best, you’d pick UDT at this point. (UDT is also missing a big chunk from its specifications, namely the “math intuition module” but I think that problem has to be solved anyway. It’s hard to see how an AGI can get very far without being able to deal with logical/mathematical uncertainty.)
What pre-singularity actions are you worried about them taking?
What I was thinking was that a CDT-seeded AI might actually be safer precisely because it won’t try to change pre-Singularity events, and if it’s first the new decision theory will be in place in time for any post-Singularity events.
That’s surprising to me—what should I read in order to understand this point better? EDIT: strike that, you answer that above.
They could modify themselves so that if they ever encounter a CDT-descended AI they’ll start a war (even if it means mutual destruction) unless the CDT-descended AI gives them 99% of its resources.
They could also modify themselves to make the analogous threat if they encounter a UDT-descended AI, or a descendant of an AI designed by TIm Freeman, or a descendant of an AI designed by Wei Dai, or a descendant of an AI designed using ideas mentioned on LessWrong. I would hope that any of those AI’s would hand over 99% of their resources if the extortionist could prove its source code and prove that war would be worse. I assume you’re saying that CDT is special in this regard. How is it special?
(Thanks for the pointer to the James Joyce book, I’ll have a look at it.)
If the alien AI computes the expected utility of “provably modify myself to start a war against CDT-AI unless it gives me 99% of its resources”, it’s certain to get a high value, whereas if it computes the expected utility of “provably modify myself to start a war against UDT-AI unless it gives me 99% of its resources” it might possibly get a low value (not sure because UDT isn’t fully specified), because the UDT-AI, when choosing what to do when faced with this kind of threat, would take into account the logical correlation between its decision and the alien AI’s prediction of its decision.
Well, that’s plausible. I’ll have to work through some UDT examples to understand fully.
What model do you have of how entity X can prove to entity Y that X is running specific source code?
The proof that I can imagine is entity Y gives some secure hardware Z to X, and then X allows Z to observe the process of X self-modifying to run the specified source code, and then X gives the secure hardware back to Y. Both X and Y can observe the creation of Z, so Y can know that it’s secure and X can know that it’s a passive observer rather than a bomb or something.
This model breaks the scenario, since a CDT playing the role of Y could self-modify any time before it hands over Z and play the game competently.
Now, if there’s some way for X to create proofs of X’s source code that will be convincing to Y without giving advance notice to Y, I can imagine a problem for Y here. Does anyone know how to do that?
(I acknowledge that if nobody knows how to do that, that means we don’t know how to do that, not that it can’t be done.)
Hmm, this explains my aversion to knowing the details of what other people are thinking. It can put me at a disadvantage in negotiations unless I am able to lie convincingly and say I do not know.
I think I″ll stop here for now, because you already seem intrigued enough to want to learn about UDT in detail. I’m guessing that once you do, you won’t be so motivated to think up reasons why CDT isn’t really so bad. :) Let me know if that turns out not to be the case though.
On second thought, I should answer this question because it’s of independent interest. If Y is sufficiently powerful, it may be able to deduce the laws of physics and the initial conditions of the universe, and then obtain X’s source code by simulating the universe up to when X is created. Note that Y may do this not because it wants to know X’s source code in some anthropomorphic sense, but simply due to how its decision-making algorithm works.
Unless there have been some specific assumptions made about the universe that will not work. Simulating the entire universe does not tell Y which part of the universe it inhabits. It will give Y a set of possible parts of the universe which match Y’s observations. While the simulation strategy will allow the best possible prediction about what X’s source code is given what Y already knows it does not give evidence to Y that it didn’t already have.
You’re right, the model assumes that we live in a universe such that superintelligent AIs would “naturally” have enough evidence to infer the source code of other AIs. (That seems quite plausible, although by no means certain, to me.) Also, since this is a thread about the relative merits of CDT, I should point out that there are some games in which CDT seems to win relative to TDT or UDT, which is a puzzle that is still open.
It’s an interesting problem, but my impression when reading was somewhat similar to that of Eliezer in the replies. At the core it is the question of “How do you deal with constructs made by other agents?” I don’t think TDT has any particular weakness there.
Quantum mechanics seems to be pretty clear that true random number generators are available, and probably happen naturally. I don’t understand why you consider that scenario probable enough to be worth talking about.
Do you have an intuition as to how it would do this without contradicting itself? I tried to ask a similar question but got it wrong in the first draft and afaict did not receive an answer to the relevant part.
I just want to know if my own intuition fails in the obvious way.
Really? That’s surprising. My assumption had been that CDT would be much simpler to implement—but just give undesirable outcomes in whole classes of circumstance.
CDT uses a “causal probability function” to evaluate the expected utilities of various choices, where this causal probability function is different from the epistemic probability function you use to update beliefs. (In EDT they are one and the same.) There is no agreement amongst CDT theorists how to formulate this function, and I’m not aware of any specific proposal that can be straightforwardly implemented. For more details see James Joyce’s The foundations of causal decision theory.
I understand AIXI reasonably well and had assumed it was a specific implementation of CDT, perhaps with some tweaks so the reward values are generated internally instead of being observed in the environment. Perhaps AIXI isn’t close to an implementation of CDT, perhaps it’s perceived as not specific or straightforward enough, or perhaps it’s not counted as an implementation. Why isn’t AIXI a counterexample?
You may be right that AIXI can be thought of as an instance of CDT. Hutter himself cites “sequential decision theory” from a 1957 paper which certainly predates CDT, but CDT is general enough that SDT could probably fit into its formalism. (Like EDT can be considered an instance of CDT with the causal probability function set to be the same as the epistemic probability function.) I guess I hadn’t considered AIXI as a serious candidate due to its other major problems.
Four problems are listed there.
The first one is the claim that AIXI wouldn’t have a proper understanding of its body because its thoughts are defined mathematically. This is just wrong, IMO; my refutation, for a machine that’s similar enough to AIXI for this issue to work the same, is here. Nobody has engaged me in serious conversation about that, so I don’t know how well it will stand up. (If I’m right on this, then I’ve seen Eliezer, Tim Tyler, and you make the same error. What other false consensuses do we have?)
The second one is fixed if we do the tweak I mentioned in the grandparent of this comment.
If you take the fix described above for the second one, what’s left of the third one is the claim that instantaneous human (or AI) experience is too nuanced to fit in a single cell of a Turing machine. According to the original paper, page 8, the symbols on the reward tape are drawn from an alphabet R of arbitrary but fixed size. All you need is a very large alphabet and this one goes away.
I agree with the facts asserted in Tyler’s fourth problem, but I do not agree that it is a problem. He’s saying that Kolmogorov complexity is ill-defined because the programming language used is undefined. I agree that rational agents might disagree on priors because they’re using different programming languages to represent their explanations. In general, a problem may have multiple solutions. Practical solutions to the problems we’re faced with will require making indefensible arbitrary choices of one potential solution over another. Picking the programming language for priors is going to be one of those choices.
I don’t see how your refutation applies to AIXI. Let me just try to explain in detail why I think AIXI will not properly protect its body. Consider an AIXI that arises in a simple universe, i.e., one computed by a short program P. AIXI has a probability distribution not over universes, but instead over environments where an environment is a TM whose output tape is AIXI’s input tape and whose input tape is AIXI’s output tape. What’s the simplest environment that fits AIXI’s past inputs/outputs? Presumably it’s E = P plus some additional pieces of code that injects E’s inputs into where AIXI’s physical output ports are located in the universe (that is, overrides the universe’s natural evolution using E’s inputs), and extracts E’s outputs from where AIXI’s physical input ports are located.
What happens when AIXI considers an action that destroys its physical body in the universe computed by P? As long as the input/output ports are not also destroyed, AIXI would expect that the environment E (with its “supernatural” injection/extraction code) will continue to receive its outputs and provide it with inputs.
Does that make sense?
(Responding out of order)
Yes, but it makes some unreasonable assumptions.
An implementation of AIXI would be fairly complex. If P is too simple, then AIXI could not really have a body in the universe, so it would be correct in guessing that some irregularity in the laws of physics was causing its behaviors to be spliced into the behavior of the world.
However, if AIXI has observed enough of the inner workings of other similar machines, or enough of the laws of physics in general, or enough of its own inner workings, the simplest model will be that AIXI’s outputs really do emerge from the laws of physics in the real universe, since we are assuming that that is indeed the case and that Kolmogorov induction eventually works. At that point, imagining that AIXI’s behaviors are a consequence of a bunch of exceptions to the laws of physics is just extra complexity and won’t be part of the simplest hypothesis. It will be part of some less likely hypotheses, and the AI would have to take that risk into account when deciding whether to self-improve.
Tim, I think you’re probably not getting my point about the distinction between our concept of a computable universe, and AIXI’s formal concept of a computable environment. AIXI requires that the environment be a TM whose inputs match AIXI’s past outputs and whose outputs match AIXI’s past inputs. A candidate environment must have the additional code to inject/extract those inputs/outputs and place them on the input/output tapes, or AIXI will exclude it from its expected utility calculations.
I agree that the candidate environment will need to have code to handle the inputs. However, if the candidate environment can compute the outputs on its own, without needing to be given the AI’s outputs, the candidate environment does not need code to inject the AI’s outputs into it.
Even if the AI can only partially predict its own behavior based on the behavior of the hardware it observes in the world, it can use that information to more efficiently encode its outputs in the candidate environment, so it can have some understanding of its position in the world even without being able to perfectly predict its own behavior from first principles.
If the AI manages to destroy itself, it will expect its outputs to be disconnected from the world and have no consequences, since anything else would violate its expectations about the laws of physics.
This back-and-forth appears to be useless. I should probably do some Python experiments and we then can change this from a debate to a programming problem, which would be much more pleasant.
If a candidate environment has no special code to inject AIXI’s outputs, then when AIXI computes expected utilities, it will find that all actions have equal utility in that environment, so that environment will play no role in its decisions.
Ok, but try not to destroy the world while you’re at it. :) Also, please take a closer look at UDT first. Again, I think there’s a strong possibility that you’ll end up thinking “why did I waste my time defending CDT/AIXI?”
FYI, generating reward values internally—instead of them being observed in the environment—makes no difference whatsoever to the wirehead problem.
AIXI digging into its brains with its own mining claws is quite plausible. It won’t reason as you suggest—since it has no idea that it is instantiated in the real world. So, its exploratory mining claws may plunge in. Hopefully it will get suitably negatively reinforced for that—though much will depend on which part of its brain it causes damage too. It could find that ripping out its own inhibition circuits is very rewarding.
A larger set of symbols for rewards makes no difference—since the reward signal is a scalar. If you compare with an animal, that has millions of pain sensors that operate in parallel. The animal is onto something there—something to do with a-priori knowledge about the common causes of pain. Having lots of pain sensors has positive aspects—e.g. it saves you experimenting to figure out what hurts.
As for the reference machine issue, I do say: “This problem is also not very serious.”
Not very serious unless you are making claims about your agent being “the most intelligent unbiased agent possible”. Then this kind of thing starts to make a difference...
You can encode 16 64 bit integers in a 1024 bit integer. The scalar/parallel distinction is bogus.
(Edit: I original wrote “5 32 bit integers” when I meant “2**5 32 bit integers”. Changed to “16 64 bit integers” because “32 32 bit integers” looked too much like a typo.)
Strawman argument. The only claim made is that it’s the most intelligent up to a constant factor, and a bunch of other conditions are thrown in. When Hutter’s involved, you can bet that some of the constant factors are large compared to the size of the universe.
Er, not if you are adding the rewards together and maximising the results, you can’t! That is exactly what happens to the rewards used by AIXI.
Actually Hutter says this sort of thing all over the place (I was quoting him above) - and it seems pretty irritating and misleading to me. I’m not saying the claims he makes in the fine print are wrong, but rather that the marketing headlines are misleading.
You’re right there, I’m confusing AIXI with another design I’ve been working with in a similar idiom. For AIXI to work, you have to combine together all the environmental stuff and compute a utility, make the code for doing the combining part of the environment (not the AI), and then use that resulting utility as the input to AIXI.
Thankyou for the reference, and the explanation.
I am prompted to ask myself a question analogous to the one Eliezer recently asked:
Is it worth my while exploring the details of CDT formalization beyond just the page you linked to? There seems to be some advantage to understanding the details and conventions of how such concepts are described. At the same time revising CDT thinking in too much detail may eliminate some entirely justifiable confusion as to why anyone would think it is a good idea! “Causal Expected Utiluty”? “Causal Tendencies”? What the? I only care about what will get me the best outcome!
Probably not. I only learned it by accident myself. I had come up with a proto-UDT that was motivated purely by anthropic reasoning paradoxes (as opposed to Newcomb-type problems like CDT and TDT), and wanted to learn how existing decision theories were formalized so I could do something similar. James Joyce’s book was the most prominent such book available at the time.
ETA: Sorry, I think the above is probably not entirely clear or helpful. It’s a bit hard for me to put myself in your position and try to figure out what may or may not be worthwhile for you. The fact is that Joyce’s book is the decision theory book I read, and quite possibly it influenced me more than I realize, or is more useful for understanding the motivation for or the formulation of UDT than I think. It couldn’t hurt to grab a copy of it and read a few chapters to see how useful it is to you.
Thanks for the edit/update. For reference it may be worthwhile to make such additions as a new comment, either as a reply to yourself or the parent. It was only by chance that I spotted the new part!
What pre-singularity actions are you worried about them taking?
What I was thinking was that a CDT-seeded AI might actually be safer precisely because it won’t try to change pre-Singularity events, and if it’s first the new decision theory will be in place in time for any post-Singularity events.
That’s surprising to me—what should I read in order to understand this point better?
Yes, for reasons of game theory and of practical singularity strategy.
Game theory, because things in Everett branches that are ‘closest’ to us might be the ones it’s most important to be able to interact with, since they’re easier to simulate and their preferences are more likely to have interesting overlap with ours. Knowing very roughly what to expect from our neighbors is useful.
And singularity strategy, because if you can show that architectures like AIXI-tl have some non-negligible chance of converging to whatever an FAI would have converged to, as far as actual policies go, then that is a very important thing to know; especially if a non-uFAI existential risk starts to look imminent (but the game theory in that case is crazy). It is not probable but there’s a hell of a lot of structural uncertainty and Omohundro’s AI drives are still pretty informal. I am still not absolutely sure I know how a self-modifying superintelligence would interpret or reflect on its utility function or terms therein (or how it would reflect on its implicit policy for interpreting or reflecting on utility functions or terms therein). The apparent rigidity of Goedel machines might constitute a disproof in theory (though I’m not sure about that), but when some of the terms are sequences of letters like “makeHumansHappy” or formally manipulable correlated markers of human happiness, then I don’t know how the syntax gets turned into semantics (or fails entirely to get turned into semantics, as they case may well be).
This implies that the actually-implemented-CDT agent has a single level of abstraction/granularity at like the naive realist physical level at which it’s proving things about causal relationships. Like, it can’t/shouldn’t prove causal relationships at the level of string theory, and yet it’s still confident that its actions are causing things despite that structural uncertainty, and yet despite the symmetry it for some reason cannot possibly see how switching a few transistors or changing its decision policy might affect things via relationships that are ultimately causal but currently unknown for reasons of boundedness and not speculative metaphysics. It’s plausible, but I think letting a universal hypothesis space or maybe even just Goedelian limitations enter the decision calculus at any point is going to make such rigidity unlikely. (This is related to how a non-hypercomputation-driven decision theory in general might reason about the possibility of hypercomputation, or the risk of self-diagonalization, I think.)
The CDT is making a decision about whether to self-modify even before it meets the alien, based on its expectation of meeting the alien. How does CDT!expectation differ from Eliezer!expectation before we meet the alien?
It is useful to separate in one’s mind the difference between on one hand being able to One Box and cooperate in PD with agents that you know well (shared source code) and on the other hand not firing on Baby Eaters after they have already chosen not to fire on you. This is especially the case when first grappling the subject. (Could you confirm, by the way, that Akon’s decision in that particular paragraph or two is approximately what TDT would suggest?)
The above is particularly relevant because the “have access to each other’s source code” is such a useful intuition pump when grappling with or explaining the solutions to many of the relevant decision problems. It is useful to be able to draw a line on just how far the source code metaphor can take you.
There is also something distasteful about making comparisons to a decision theory that isn’t even implicitly stable under self modification. A CDT agent will change to CDT++ unless there is an additional flaw in the agent beyond the poor decision making strategy. If I create a CDT agent, give it time to think and then give it Newcomb’s problem it will One Box (and also no longer be a CDT agent). It is the errors in the agent that still remain after that time that need TDT or UDT to fix.
*nod* This is just the ‘new rules starting now’ option. What the CDT agent does when it wakes up in an empty, boring room and does some introspection.
Surely the important thing is that it will self-modify to whatever decision theory has the best consequences?
The new algorithm will not exactly be TDT, because it won’t try to change decisions that have already been made the way TDT does. In particular this means that there’s no risk from Roko’s basilisk.
Disclaimer: I’m not very confident of anything I say about decision theory.
Eliezer says elsewhere that current decision theory doesn’t let us prove a self-modifying AI would choose to keep the goals we program into it. He wants to develop a proof before even starting work on the AI.
It’s easy to contrive situations where a self-modifying AI would choose not to keep the goals programmed into it, even without precommitment issues. Just contrive the circumstances so it gets paid to change. Unless there’s something wrong with the argument there, TDT etc. won’t be enough to ensure that the goals are kept.
Newcomb’s Problem is silly. It’s only controversial because it’s dressed up in wooey vagueness. In the end it’s just a simple probability question and I’m surprised it’s even taken seriously here. To see why, keep your eyes on the bolded text:
What can we anticipate from the bolded part? The only actionable belief we have at this point is that 100 out of 100 times, one-boxing made the one-boxer rich. The details that the boxes were placed by Omega and that Omega is a “superintelligence” add nothing. They merely confuse the matter by slipping in the vague connotation that Omega could be omniscient or something.
In fact, this Omega character is superfluous; the belief that the boxes were placed by Omega doesn’t pay rent any differently than the belief that the boxes just appeared at random in 100 locations so far. If we are to anticipate anything different knowing it was Omega’s doing, on what grounds? It could only be because we were distracted by vague notions about what Omega might be able to do or predict.
The following seemingly critical detail is just more misdirection and adds nothing either:
I anticipate nothing differently whether this part is included or not, because nothing concrete is implied about Omega’s predictive powers—only “superintelligence from another galaxy,” which certainly sounds awe-inspiring but doesn’t tell me anything really useful (how hard is predicting my actions, and how super is “super”?).
The only detail that pays any rent is the one above in bold. Eliezer is right that one-boxing wins, but all you need to figure that out is Bayes.
EDIT: Spelling
The problem is, such emphatic declarations of confidence in the right answer can just as easily be followed by one-boxing, two-boxing, or declaring the hypotheses self-contradictory. That is, in fact, what makes it a Problem, even if, to any individual, it is not a problem.
Differing outcomes are a problem by themselve. Either one reasoning is right and the others are wrong, or basic logic is broken (and it would follow all maths are broken). It could also be that some hypothesis absolutely necessary for one reasoning or the other are implicit and untelled.
This is why, even if to me Newcomb is not a problem, it is still critical to find where other’s reasoning are either broken or which assumptions are hidden. Failure to exhibit any error in someone else reasoning would lead to conclude that either my reasoning is broken (and I would have to find why) or that maths are broken. And I take that very seriously.
That’s also why when rejecting someone else reasoning stating we believe another well known reasoning is right (authority argument) is never enough. For the sake of rationality we should also find the error (if any) in the other’s reasoning.
So, what is wrong believing in probabilities ?
To ask that question is already to presuppose the one-boxing answer, and to miss the problem that the problem itself may be problematic. I don’t take simple two-boxing any more seriously than Amanojack does, but the third possibility, of disputing that the problem is well-posed, is worth exploring. On LW, self-professed two-boxers are usually taking that alternative. (Elsewhere, I see two-boxing philosophers actually saying that two-boxing loses, but is still the rational thing to do.)
The problem is best disputed not by simply asserting, as some have, that no such Omega can exist, but by thinking in detail about what it would take for someone to predict the decisions of a decision-maker who knows you’re trying to predict their decisions. What that sort of thinking looks like is this. That paper is about Prisoners Dilemma, but similar investigations could be made of Newcomb, Parfit’s Hitchhiker, etc.
That is what fighting the hypothesis looks like, done right.
That is going for the third option and dodging to point out exactly why the problem should not be well posed. I can write a program working as the Newcomb’s problem is described if I go for the “unperfect predictor” version where the being is merely right “most of the time”. A way to do it could be to let player run a number of practice (or calibration) games, then at a time chosen by the guesser make that game “real”. The calibration plays would simulate the supernatural player minute observation of the player behavior, what can indeed not easily be done.
I knew of the Robust Coopearation paper, and it’s really very interresting, but getting the source code of the other is also a huge change to the initial problem. At least it excludes perfect oracles from the problem, it is also clear you may be confronted to halting problem (this is why current scheme tournament based on this idea had to make a provision in rules to avoid non halting programs). Stating we can say something usefull on another problem does not implies the initial one had anything wrong.
On the other hand, it is obvious that Dominance Argument is broken in Newcom’s problem (and also in PD) as the logical proof is only correct when we have non correlated variables (non correlation should not be confused with causal independance, causal independance is not enough for Dominance Argument to be correct). In Newcomb’s problem, the perfect correlation is part of the problem statement. How anyone could then apply Dominance Argument is beyond me, probably because it mimics usual deductive logic.
I’m not saying that Newcomb’s problem describe any physically possible event, or not even that it is a good problem, or that the consequences it leads to are agreeable (at first sight it leads to lack of free will), but just that mathematically using (very) simple probabilistic tools you can solve it, without changing anything and that alternative usual solution is based on a mathematical error.
Are you claiming to be able to write a program that can win, playing Omega in an iterated Newcomb’s Problem contest, basing its predictions only on the record of the outcomes of past training rounds? How high a prediction accuracy do you expect to be able to get?
As proposed, the idea is to run the program in “test mode”. To simulate the super-being Omenga we give it the opportunity to decide when game stops being a simulation (predictor calibration) and start being the “real game”. To be fair, this change (or the rules governing it) will be communicated to some external judge before the actual “real play”. But it will not be communicated to player (or obviously it would break any calibration accuracy). A possible rule could be to start the real game when some fixed accuracy is reached (something like prediction is right 99% of the time), or it could also be a fixed number of calibration games.
Writing such predictor is as easy as it can be : just a simple loop waiting for the fixed accuration wich is reached when either one-box or two-box is above expected threshold. Obviously it player is random, that could be quite long (but even with random it should happen sometime). But with a rational player that should be fast enough. I’m not sure that a random player could qualify as rational, anyways.
Doing that Omega can be as accurate as wished.
It still is not a perfect predictor, the player could still outguess Omega and predict at wich move the desired accuracy will be reached, but it’s good enough for me (and the Omega player could add some randomness on his side tu avoid guessers).
I see no reason why the program describe above could not be seen as an acceptable Omega following Newcomb’s problem rules.
Not communicating the actual real game is just here to avoid cheaters and enforce that the actual experiment will be done in the same environment sa the calibration.
I wonder if anyone would seriously choose to two-box any time with the above rules.
But then, the player never knows when they are faced with Omega, the successful predictor, which is an essential part of Newcomb’s problem.
You expect to predict even a random choice with 99% accuracy? Am I misunderstanding something? Rock-scissors-paper programs that try to detect the non-randomness of human choices do succeed against most people, but only a little better than chance, not with 99% accuracy. Against a truly random player they do not succeed at all.
But iterated Newcomb is different from original Newcomb, just as iterated PD is different from plain PD. Now, I don’t see anything wrong with studying related problems, but you yourself said that studying a different but related problem does not touch the original.
I don’t know if you have seen it, but I have posted an actual program playing Newcomb’s game. As far as I understand what I have done, this is not an Iterated Newcomb’s problem, but a single shot one. You should also notice that the calibration phase does not returns output to the player (well, I added some showing of reached accuracy, but this is not necessary).
If I didn’t overviewed some detail, the predictor accuracy is currently tuned at above 90% but any level of accuracy is reachable.
As I explained yesterday, the key point was to run some “calibration” phase before running the actual game. To make the calibration usefull I have to blur the limit between calibration and actual game or the player won’t behave as in real game while in calibration phase. Hence the program need to run a number of “maybe real” games before playing the true one. For the reason explained above we also cannot say to the user he his playing the real and last game (or he would known if he is playing a calibration game or a real one and the calibration would be useless).
But it is very clear reading source code that if the (human) player was some kind of supernatural being he could defeat the program by choosing two boxes while the prediction is one-box. It just will be a very unlikely event to the desired accuracy level.
I pretend this is a true unmodified Newcomb’s problem, all the calibration process is here only to make actually true the preassertion of the Newcomb’s problem : prediction accuracy of Omega (and verifiably so for the human player : he can read the source code and convince himself or even run the program and understand why prediction will be accurate).
As I know it Necomb’s problem does not impose the way the initial preassertion of accuracy is reached. As programming goes, I’m merely composing two functions, the first one ensuring the entry preassertion of good prediction accuracy is true.
I see a problem with the proposed method. Your program learns how often, on average, its opponent one-boxes or two-boxes. If I (as Omega) learn that someone is a one-boxer, then I can predict that they will one-box next time, put money in box B, and be proved right. But then, in an iterated game, if the one-boxer learns that I am not predicting his decision in the individual case, but have made a general prediction once and for all and thereafter always filling box B, then he can with impunity take both boxes and prove my prediction wrong.
A true Omega needs to make both P(box B full | take one box) and P(box B empty | take both boxes) high. The proposed scheme ensures that P(box B full | habitual one-boxer) and P(box B empty | habitual two-boxer) are high, which is not quite the same.
Similarly, suppose I convince Eliezer that I’m Omega. He has publicly avowed one-boxing on Newcomb, so I can skip the learning phase, fill box B, and be proved right. But if, for some reason, he suspects that I’m not a superintelligent superbeing with superpowers of prediction, and in a series of games, experiments with two-boxing, I will be exposed as an impostor.
Iterated Newcomb played between programs given access to each other’s source code would be an interesting challenge. I assume Omega doesn’t care about the money, but plays for the gratification of correctly predicting the other player’s choice. The other player is playing for the money.
A simpler, zero-sum game also suggests itself to me. This is more like Rock-Paper-Scissors than Newcomb, but again the point is to play using knowledge of the other person’s code. Each player chooses 0 or 1. Player A wins if the choices are the same, player B wins if they are different.
(This might look as if A is trying to predict B and B is trying to avoid being predicted, but the game is actually symmetric, both players doing both of these things. Swap the labels on B’s choices and B wins on equality and A on inequality.)
In classical game theory, the optimal strategy is to toss a coin, and the expected payoff is zero. The challenge is to do better against real opponents.
If I understand correctly the distinction you’re making between habitual one boxer and take one box the first kind would be about the past player history and the other one about the future. If so I guess you are right. I’m indeed using the past to make my prediction, as using the future is beyond my reach.
But I believe you’re missing the point. My program is not an iterated Newcomb’s Problem because Omega does not perform any prediction along the way. It will only perform one prediction. And that will be for the last game and the human won’t be warned. It does not care at all about the reputation of the player, but only on it’s acts in situations where he (the human player) can’t know if he is playing of not.
But another point of view is possible, and that is what comes to mind when you run the program: it is coercing the player to be either a one boxer or a two boxer if he wan’t to play at all. Any two-boxing and the player will have to spend a very long time one-boxing to reach the state when he is again seen as a one boxer. As it is written, the program is likely (to the chosen accuracy level) to make it’s prediction while the player is struggling to be a one boxer.
As a human player what comes through my mind while running my program is ok: I want to get a million dollars, henceforth I have to become a one boxer.
If my program runs as long as wished accuracy is nor reached it can reach any accuracy. Truly random numbers are also expected to deviate toward extremes sometimes in the long run (if they do not behave like that they are not random). As it is very rare events, against random players the expected accuracy would certainly never be reached in a human life.
Why I claim is the “calibration phase” described above takes place before Newcomb’s problem. When the actual game starts the situation described in Newcomb’s problem is exactly what is reached. THe description of the calibration phase could even be provided to the player to convince him Omega prediction will be accurate. At least it is convincing for me and in such a setting I would certaily believe Omega can predict my behavior. In a way you could the my calibration phase as a way for Omega to wait for the player to be ready to play truly instead of trying to cheat. As trying to cheat will only result in delaying the actual play.
OK. It may be another problem, what I did is merely replacing a perfectly accurate being with an infinitely patient one… but this one is easy to program.
I posted a possible program doing what I describe in another comment. The trick as expected is that it’s easier to change the human player understanding of the nature of omega to reach the desired predictability. In other words : you just remove human free will (and running my program the player learn very quickly that is in his best interrest), then you play. What is interresting is that the only way compatible with Newcomb’s problem description to remove his free will is to make it a one-boxer. The incentive to make it a two-boxer would be to exhibit a bad predictor and that’s not compatible with Newcomb’s problem.
No good, then even CDTers are incentivized to one-box, since once-boxing in the practice rounds causes higher rewards in the real rounds.
I do not see your reasoning here ? What I’m proposing is not letting know when practising round stops and real round starts. That means indeed that one boxer would get higher rewards in both practice and real round, and that’s why I believe it’s an argument for one boxing.
My proposal for “simulating” Newcomb’s may not be accurate (and it’s certainly not perfect) but you can’t conclude that based on the (projected) outcome of the experiment disagreeing with wath you expect.
Because depending on the numbers in the setup your modified experiment doesn’t get at the disagreement between one-boxer and two-boxers.
Here is an actual program (written in python) implementing the described experiment. It has two stages. The first part is just calibration intending to find out if the player is one boxing or two boxing. The second is a straightforward non iterated Newcomb problem. Some randomness is used to avoid the player to exactly know when calibration stops and test begin, but calibration part does not care at all if it will predict the player is a one boxer or a two boxer it is just intended to create an actual predictor behaving as described in Newcomb’s.
I’m with you. You have to look at the outcomes, otherwise you end up running into the same logical blinders that make Quantum Mechanics hard to accept.
After reading some of the Quantum Mechanics sequence, I am more willing to believe in Omega’s omniscience. Quantum mechanics allows for multiple timelines leading to the same outcome to interfere and simply never happen, even if they would have been probable in classical mechanics. Perhaps all timelines leading to the outcome where
one-boxing does not yield money
happen to interfere. Even if you take a more literal interpretation of the problem statement, where it is your own mind that determines the box’s content, your mind is made of particles which could conceivably affect the universe’s configuration.I have more or less the same point of view and applied it to non iterated prisonner’s dilemma (as Newcomb’s is merely half a Prisonner’s Dilemma as David Lewis suggested in an article, and on this I agree with him, but not on his conclusion).
What is at stakes here (in Newcomb’s or PD) may not be that easy to accept anyway. It’s probability and Bayes against causality. The doom loop in Newcomb’s (reasoning loop leading to loose 1 million, as I see it) is stating that The content of the boxes is already put when you play, henceforth you action won’t change anything. The quantum mechanical reasoning would go the other way: as long as you did’nt observe/interact with it it is merely a probability. You may even want to go futher than that: imagine that someone else see the content of the box, then see you choosing the predicted set of boxes. He will conclude you have no freewill, or something along theses lines. I understand that people puting freewill as a fact—not merely a belief that could be contradicted by experiment—and so reject unthinkingly the probabilist reasoning.
My comment about PD is in this Sequence (http://lesswrong.com/lw/hl8/other_prespective_on_resolving_the_prisoners/). I merely applty probability rules. I’m interrested to know if you see any fault in it from a probabilist point of view.
Do you also choose not to chew gum in Eliezer’s version of Solomon’s Problem?
…You know that paper goes on to assert that the two problems are meaningfully different, such that it’s rational to both one-box in Newcomb’s Problem and chew gum in Solomon’s Problem, right?
Alternately: It’s entirely precise and well formed, far more so than just about every real life decision.
Is there something wrong with my argument above?
I also happen to think that under-specification of this puzzle adds significantly to the controversy.
What the puzzle doesn’t tell us is the properties of the universe in which it is set. Namely, whether the universe permits future to influence the past, which I’ll refer to as “future peeking”.
(alternatively, whether the universe somehow allows someone within the universe to precisely simulate the future faster than it actually comes—a proposition I don’t believe is ever true in any universe defined mathematically).
This is important because if the future can’t influence the past, then it is known with absolute certainty that taking two boxes won’t possibly change what’s in them (this is, after all, a basic given of the universe). Whether Omega has predicted something before is completely irrelevant now that the boxes are placed.
Alas, we aren’t told what the universe is like. If that is intentionally part of the puzzle then the only way to solve it would be to enumerate all possible universes, assigning each one a probability of being ours based on all the available evidence, and essentially come up with a probability that “future peeking” is impossible in our universe. One would then apply simple arithmetic to calculate the expected winnings.
Unfortunately P(“future peeking allowed”) it’s one of those probabilities that is completely incalculable for any practical purpose. Thus if “no future peeking” isn’t a given, the best answer is “I don’t know if taking two boxes is best because there’s this one probability I can’t actually calculate in practice”.
As near as I can tell, this depends on dubious assumptions about a mathematical universe. You appear to treat time as fundamental, and yet reject the possibility that reality (or the Matrix) simulates a certain outcome happening at a certain time, not before (as we’d expect if reality calculated the output of a time-dependent wavefunction).
In addition, you seem to assume that reality cares about the same aspects of the situation that interest Omega. Otherwise it seems clear that Omega could get an answer sooner by leaving out all the details which don’t affect the human-level outcome.
Assume no “future peeking” and Omega only correctly predicting people as difficult to predict as you with 99.9% probability. One-boxing still wins.
While I disagree that one-boxing still wins, I’m most interested in seeing the “no future peeking” and the actual Omega success rate being defined as givens. It’s important that I can rely on the 99.9% value, rather than wondering whether it is perhaps inferred from their past 100 correct predictions (which could, with a non-negligible probability, have been a fluke).
That does indeed seem like the standard version of Newcomb’s. (Though I don’t understand your last sentence, assuming “non-negligible” does not mean 1⁄2 to the power of 100.)
Can you spell out what you mean by “if” in this context? Because a lot of us are explicitly talking about the best algorithm to program into an AI.
Why is it important to you that the success rate be a frequentialist probability rather than just a bayesian one?
I’m not sure I understand correctly, but let me phrase the question differently: what sort of confidence do we have in “99.9%” being an accurate value for Omega’s success rate?
From your previous comment I gather the confidence is absolute. This removes one complication while leaving the core of the paradox intact. I’m just pointing out that this isn’t very clear in the original specification of the paradox, and that clearing it up is useful.
To explain why it’s important, let me indeed think of an AI like hairyfigment suggested. Suppose someone says they have let 100 previous AIs flip a fair coin 100 times each and it came out heads every single time, because they have magic powers that make it so. This someone presents me video evidence of this feat.
If faced with this in the real world, an AI coded by me would still bet close to 50% on tails if offered to flip its own fair coin against this person, because I have strong evidence that this someone is a cheat, and their video evidence is fake. Just something I know from a huge amount of background information that was not explicitly part of this scenario.
However, when discussing such scenarios, it is sometimes useful to assume hypothetical scenarios unlike the real world. For example, we could state that this someone has actually performed the feat, and that there is absolutely no doubt about that. That’s impossible in our real world, but it’s useful for the sake of discussing bayesianism. Surely any bayesianist’s AI would expect heads with high probability in this hypothetical universe.
So, are we looking at “Omega in the real world where someone I don’t even know tells me they are really damn good at predicting the future”, or “Omega in some hypothetical world where they are actually known with absolute certainty to be really good at predicting the future”?
Seems to me the language of this rules out faked video. And to explain it as a newsletter scam would, I think, require postulating 2^100 civilizations that have contact with Omega but not each other. Note that we already have some reason to believe that a powerful and rational observer could predict our actions early on.
So you tell me what we should expect here.
I’ve reviewed the language of the original statement and it seems that the puzzle is set in essentially the real world with two major givens, i.e. facts in which you have 100% confidence.
Given #1: Omega was correct on the last 100 occurrences.
Given #2: Box B is already empty or already full.
There is no leeway left for quantum effects, or for your choice affecting in any way what’s in box B. You cannot make box B full by consciously choosing to one-box. The puzzle says so, after all.
If you read it like this, then I don’t see why you would possibly one-box. Given #2 already implies the solution. 100 successful predictions must have been achieved through a very low probability event, or a trick, e.g by offering the bet only to those people whose answer you can already predict, e.g. by reading their LessWrong posts.
If you don’t read it like this, then we’re back to the “gooey vagueness” problem, and I will once again insist that the puzzle needs to be fully defined before it can be attempted. For example, by removing both givens, and instead specifying exactly what you know about those past 100 occurrences. Were they definitely not done on plants? Was there sampling bias? Am I considering this puzzle as an outside observer, or am I imagining myself being part of that universe—in the latter case I have to put some doubt into everything, as I can be hallucinating. These things matter.
With such clarifications, the puzzle becomes a matter of your confidence in the past statistics vs. your confidence about the laws of physics precluding your choice from actually influencing what’s in box B.
Sorry if this has already been addressed. I didn’t take the time to read all 300 comments.
It seems to me that if there were an omniscient Omega, the world would be deterministic, and you wouldn’t have free will. You have the illusion of choice, but your choice is already known by Omega. Hence, try (it’s futile) to make your illusory choice a one-boxer.
Personally, I don’t believe in determinism or the concept of Omega. This is a nice thought experiment though.
See http://wiki.lesswrong.com/wiki/Free_will
How does adding indeterminism help make the problem go away? If Omega only predicts correctly 99% of the time, what gets clarified?
I don’t grasp why this problem seems so hard and convoluted. Of course you have to one-box, if you two-box you’ll lose for sure. From my perspective two-boxing is irrational...
If Omega can flawlessly predict the future, this confirms a deterministic world at the atomic scale. To be a perfect predictor Omega would also need to have a perfect model of my brain at every stage of making my “decision”—thus Omega can see the future and perfectly predict whether or not I’m gonna two-box or not.
If my brain is wired up in such a way as to choose two-boxing, then Omega will have predicted that. It doesn’t matter whether or not Omaga left already and box 1 already either contains 1M$ or 0$. No matter how long I ruminate back and forth, if I two-box I’ve lost because Omega is a perfect predictor and would thus have predicted it.
If Omega indeed has all the properties that are claimed, then there are only two possible outcomes: If you take one box, you’ll get 1M$, if you take two, then you get 1000$. It is true, that box 1 either contains 1M$ or nothing by the time Omega left—but what the box contains is still 100% correlated with my upcoming final decision and nothing is going to change that. End of story. Ergo, CDT is wrong and a model that’s at odds with reality.
PS: Interestingly, if opening the lid on these boxes is the trigger moment that counts as a “decision”, you could just put the opaque box into an X-ray and this act alone would instantly transform Omega into a liar, regardless of whether it contained 1M$ or nothing. It couldn’t possibly show an empty box without making Omega a liar, because contrary to what it said I could no longer actually decide to open only box 1 and get the 1M$. Conversely, if the box does contain 1M$, then I could just two-box, making Omega a liar with respect to its prediction.
So Omega would HAVE TO specifically forbid peeping into the opaque box. If it didn’t do that, Omega would risk being a liar one way or another, once I looked into the 1st box without opening it and either found 1M$ or nothing.
To perfectly model your thought processes, it would be enough that your brain activity be deterministic; it doesn’t follow that the universe is deterministic. The fact that my computer can model a Nintendo well enough for me to play video games does not imply that a Nintendo is built out of deterministic elementary particles, and a Nintendo emulator that simulated every elementary particle interaction in the Nintendo it was emulating would be ridiculously inefficient.
I’m kind of surprised at how complicated everyone is making this, because to me the Bayesian answer jumped out as soon as I finished reading your definition of the problem, even before the first “argument” between one and two boxers. And it’s about five sentences long:
Don’t choose an amount of money. Choose an expected amount of money—the dollar value multiplied by its probability. One-box gets you >(1,000,000*.99). Two-box gets you <(1,000*1+1,000,000*.01). One-box has superior expected returns. Probability theory doesn’t usually encounter situations in which your decision can affect the prior probabilities, but it’s no mystery what to do when that situation arises—the same thing as always, maximize that utility function.
Of course, while I can be proud of myself for spotting that right away, I can’t be too proud because I know I was helped a lot by the fact that my mind was in a “thinking about Eliezer Yudkowsky” mode already, a mode it’s not necessarily in by default and might not be when I am presented with a dilemma (unless I make a conscious effort to put it there, which I guess now I stand a better chance of doing). I was expecting for a Bayesian solution to the problem and spotted it even though it wasn’t even the point of the example. I’ve seen this problem before, after all, without the context of being brought up by you, and I certainly didn’t come up with that solution at the time.
I would take box B, because it would be empty.
I see your general point, but it seems like the solution to the Omega example is trivial if Omega is assumed to be able to predict accurately most of the time:
(letting C = Omega predicted correctly; let’s assume for simplicity that Omega’s fallibility is the same for false positives and false negatives)
if you chose just one box, your expected utility is $1M * P(C)
if you chose both boxes, your expected utility is $1K + $1M (1 - P(C))
Setting these equal to find the equilibrium point:
1000000 P(C) = 1000 + 1000000 (1 - P(C))
1000 P(C) = 1001 − 1000 P(C)
2000 P(C) = 1001
P(C) = 1001/2000 = 0.5005 = 50.05%
So as long as you are at least 50.05% sure that Omega’s model of the universe describes you accurately, you should pick the one box. It’s a little confusing because it seems like cause precedes effect in this situation, but that’s not the case; your behaviour affects the behaviour of a simulation of you. Assuming Omega is always right: if you take one box, then you are the type of person who would take the one box, and Omega will see that you are, and you will win. So it’s the clear choice.
It certainly seems like a simple resolution exists...
As a rationalist, there should only ever be one choice you make. It should be the ideal choice. If you are a perfectly rational person, you will only ever make the ideal choice. You are certainly at least, deterministic. If you can make the ideal choice, so can someone else. That means, if someone knows your exact situation (trivial in the Newcomb paradox, as the super intelligent agent is causing your situation) then they can predict exactly what you will do, even without being perfectly rational themselves. If you know they are predicting you, and will act in a certain way accordingly, the rational solution is simply to follow through on whichever prediction is most profitable, as if they could actually see the future to make such a prediction correctly. Since you’re deterministic, that you will do this is predictable, and thus, the prediction is self-fulfilling.
Welcome to Less Wrong!
Why do you think so?
I think so too.
Perhaps we’ve all heard a slightly different wording of the paradox (or more), but I don’t see what causation has to do with it.
He knows what your environmental circumstances are because he put you in them. That is, he obviously knows that you are going to be encountering a Newcomblike problem because he just gave it to you. (ie. No deep technical meaning, just the obvious.)
Maybe I’m being dense. Omega needs to know more than just that you are going to encounter the problem, even Omega’s scheduler and publicist know that!
Omega knows the exact situation, including how an identical model of you would act/has acted, because that is stipulated, but it does not follow trivially from Omega’s causing your situation.
Well, for me there are two possible hypothesis for that :
The boxes are not what they seem. For example, box B contains nano-machinery that detects if you one-box or not, create money if you one-box, and then self-destruct the nano-machinery.
Omega is smart enough to be able to predict if I’ll one-box or two-box (he scanned my brain, runned it in a simulation, and saw my I do… I hope he didn’t turn off the simulation afterwards, or he would have killed “me” then !).
In both cases, I should one-box. So I’ll one-box. I don’t really get the rational for two-boxing. Be it a type-1 or type-2 reason, in both cases, Omega is able to reward me for one-boxing if that what he wants, and with 100 prior cases, he really seems to be wanting that.
It’s strange. I perfectly agree with the argument here about rationality—the rationality I want is the rationality that wins, not the rationality that is more reasonable. This agrees with my privileging truth as a leading which is useful, not which necessarily makes the best predictions. But in other points on the site, it always seems that correspondence is privileged over value.
As for Newcombs paradox, I suggest writing out all the relevant propositions a la Jaynes, with non-zero probabilities for all propositions. Make it a real problem, not an idealized and contradictory one—basically the contradiction between the reports of 100 accurate trials by Omega, the assumption that there was no cheating involved, the assumption about no reverse time causality, etc. If you do so, your priors will tell you the right answer.
Ha—although I expect your belief in forward time causality is higher than your confidence in your use of Jaynes formalism.
An amusing n=3 survey of mathematics undergrads at Trinity Cambridge:
1) Refused to answer. 2) It depends on how reliable Omega is/but you cant (shouldn’t) really quantify ethics anyway/this situation is unreasonable. 3) Obviously 2 box, one boxing is insane.
3 said he would program an AI to one box. And when I pointed out that his brain was built of quarks just like the AI he responded that in that case free will didn’t exist and choice was impossible.
Upvoted for this sentence:
“If it ever turns out that Bayes fails—receives systematically lower rewards on some problem, relative to a superior alternative, in virtue of its mere decisions—then Bayes has to go out the window.”
This is such an important concept.
I will say this declaratively: The correct choice is to take only box two. If you disagree, check your premises.
“But it is agreed even among causal decision theorists that if you have the power to precommit yourself to take one box, in Newcomb’s Problem, then you should do so. If you can precommit yourself before Omega examines you; then you are directly causing box B to be filled.”
Is this your objection? The problem is, you don’t know if the superintelligent alien is basing anything on “precommital.” Maybe the superintelligent alien has some technology or understanding that allows him to actually see the end result of your future contemplation. Maybe he’s solved time travel and has seen what you pick.
Unless you understand not only the alien’s mode of operation but also his method, you really are just guessing at how he’ll decide what to put in box two. And your record on guesses is not as good as his.
There’s nothing mystical about it. You do it because it works. Not because you know how it works.
Yes, but like falsifiability, dangerous. This also goes for ‘rationalists win’, too.
‘We’ (Bayesians) face the Duhem-Quine thesis with a vengeance: we have often found situations where Bayes failed. And then we rescued it (we think) by either coming up with novel theses (TDT) or carefully analyzing the problem or a related problem and saying that is the real answer and so Bayes works after all (Jaynes again and again). Have we corrected ourselves or just added epicycles and special pleading? Should we just have tossed Bayes out the window at that point except in the limited areas we already proved it to be optimal or useful?
This can’t really be answered.
I liked the quote not because of any notion that Bayes will or should “go out the window,” but because, coming from a devout (can I use that word?) Bayesian, it’s akin to a mathematician saying that if 2+2 ceases to be 4, that equation goes out the window. I just like what this says about one’s epistemology—we don’t claim to know with dogmatic certainty, but in varying degrees of certainty, which, to bring things full circle, is what Bayes seems to be all about (at least to me, a novice).
More concisely, I like the quote because it draws a line. We can rail against the crazy strict Empiricism that denies rationality, but we won’t hold to a rationality so devoutly that it becomes faith.
Duhem-Quine is just as much a problem there; from Ludwig Wittgenstein, Remarks on the Foundations of Mathematics:
Indeed.
To generalize, when we run into skeptical arguments employing the above circularity or fundamental uncertainties, I think of Kripke:
I think it is important to make a distinction between what our choice is now, while we are here, sitting at a computer screen, unconfronted by Omega, and our choice when actually confronted by Omega. When actually confronted by Omega, your choice has been determined. Take both boxes, take all the money. Right now, sitting in your comfy chair? Take the million-dollar box. In the comfy chair, the contra-factual nature of the experiment basically gives you an Outcome Pump. So take the million-dollar box, because if you take the million-dollar box, it’s full of a million dollars. But when it actually happens, the situation is different. You aren’t in your comfy chair anymore.
I’m not in my comfy chair any more, and I still take the million. Why wouldn’t I?
Because the million is already there, along with the thousand. Why not get all of it?
The million isn’t there, because Omega’s simulation was of you confronting Omega, not of you sitting in a comfy chair.
You aren’t doublethinking hard enough, then.
I don’t know if this is a joke—I have a poor sense of humour—but you do know Omega predicts your actual behaviour, right? As in, all things taken into account, what you will actually do.
I am being somewhat … absurd, and on purpose, at that. But I have enough arrogance lying around in my brain to believe that I can trick the super-intelligence.
Sorry—I’m always inclined to take people on the internet literally. I used to mess with my friends using the same kind of ow-my-brain Prisoner’s-dilemma somersaults, and still I couldn’t recognise a joke.
That’s alright. My humor, in real life, is based entirely on the fact that only I know I’m joking at the time, and the other person won’t realize it until three days later, when they spontaneously start laughing for no reason they can safely explain. Is that asinine? Yes. Is it hilarious? Hell, yes. So I apologize. I’ll try not to do that.
Not especially, unfortunately. There is something to be said for appearing that you don’t give a @#%! whether other people get your humor in real time but it works best if you care a whole lot about making your humor funny to your audience at the time and then just act like you don’t care about the response you get. Even if people get your joke three days later you still typically end up slightly worse off for the failed transaction.
Ah. Wrong referent. It’s hilarious for me, and it may, at some point, be hilarious for them. But it’s mostly funny for me. That would be why I took time to mention that it was also, in fact, asinine.
Because I’d end up with only a thousand, as opposed to a million. And I want the million.
I guess my cognition just breaks down over the idea of Omega. To me, Newcomb’s problem seems akin to a theological argument. Either we are talking about a purely theoretical idea that is meant to illustrate abstract decision theory, in which case I don’t care how many boxes I take, because it has no bearing on anything tied to reality, or we are actually talking about the real universe, in which case I take both boxes because I don’t believe in alien superintelligences capable of foreseeing my choices any more than I believe in an anthropomorphic deity.
Labeling “I decide to lose” as a snark just seems odd.
You are confused. Using Omega is merely a simplification of real possible situations. That is, any situation in which you and the other player have some degree of mutual knowledge. Since those situations are complicated they will sometimes call for cooperation (one boxing, here) but often other considerations or insufficient mutual knowledge will override and call for defection (two boxing).
If you wish to consider the effect of just, say, the mass of a cow then assuming a spherical cow in a vacuum is useful. If the conclusion you reach about the mass of said cow doesn’t suit you and you say “but there are no spherical cows in vacuums!” then you are using an excuse to avoid biting the bullet, not showing your superior awareness of reality.
Yeah, that’s generally what “I guess my cognition breaks down” means.
I think you can reasonably expect people to behave in real life as if they expect the laws of physics to approximate reasonably closely what newtonian mechanics predicts about spherical point masses. What I was saying, however, is that you would be wrong to predict that I defect in prisoners’ dilemmas based on my 2-boxing, because for me Newcomb’s problem isn’t connected to those problems for reasons already stated. I hypothesize that I am not alone in that.
And I said you are confused regarding this belief and the stated reasons. I don’t doubt that others are confused as well—it’s a rather common response.
If in 35 AD you were told that there were only 100 people who had seen Jesus dead and entombed and then had seen him alive afterwards, and that there were no people who had seen him dead and entombed who had seen his dead body afterwards, would you believe he had been resurrected?
In Newcomb’s problem as stated, we are told 100 people have gotten the predicted answer. Then no matter how unlikely our priors put on a superintelligent alien being able to predict what we would do, we should accept this as proof.
This seems like a pretty symmetric question to me. A one boxer should say, if consistent, sure, 100 people saw it it is true. No matter what priors we put on the resurrection of Jesus being true.
To me, it is incredibly more likely that either people are lying to me, or at least being wrong. I have seen magicians make things appear and disappear in boxes that were already sealed, after they left. It is WAY more likely that this is some kind of test and/or scam.
Which is not to say I wouldn’t one-box, I would! Whatever scam Omega is running, I’d rather have the million dollars, or prove Omega a fraud by finding an empty box, then to have only $1000, or prove Omega wrong by finding a full box and having $1001000.
And this is precisely what I would announce to the people before publicly opening the one box, and this is, if it is not a fraud, Omega would have known I would do.
As ot 100 times to prove something that unlikely? Siegrfied and Roy have made thousands of tigers appear and disappear in cages they could not have had sufficient access too. As odd as they are, it is unlikely (IMHO) that they are superintelligent aliens.
Really? A Phd ? Seriously ?
If Omega said “You shall only take Box B or I will smite thee” and then proceeded to smite a 100 infidels who dared to two box the rational choice would be obvious (especially if the smiting happened after O left)
is this really difficult to show mathematicly ?
This thread has gone a bit cold (are there other ones more active on the same topic?)
My initial thoughts: if you’ve never heard of Newcomb’s problem, and come across it for the first time in real-time, then as soon as you start thinking about it, the only thing to do is 2-box. Yes, Omega will have worked out you’ll do that, and you’ll only get $1000, but the contents of the boxes are already set. It’s too late to convince Onega that you’re going to 1 box.
On the other hand, if you have already heard and thought about the problem, the rational thing to do is to condition yourself in advance so that you will take 1 box in Newcomb-type situations, and ideally do so quite reflexively, without even thinking about it. That way, Omega will predict (correctly) that you will 1-box, and you’ll get the $1 million.
This is fairly close to the standard analysis, though what I’d dispute about the standard version is that there is anything “irrational” in so-conditioning oneself. It seems to me that we train ourselves all the time to do things without thinking about them (such as walking, driving to work, typing out letters to spell words etc) and it’s perfectly reasonable for us to do that where it will have higher expected utility for us.
There might even be a significant practical issue here: quite possibly a lot of moral discipline involves conditioning of oneself in advance to do things which don’t (at that time) maximise utility. This is so we actually get to be put in positions of responsibility, where being in such positions has higher utility than not being in them—real-life Newcomb problems.In practice, we seem to be quite good at approximating Omega with each other on a social level; when hiring a security guard for instance, we seem to be quite good at predicting who will defend our property rather than run off with it. Not perfect of course.
You seem to be thinking about Omega as if he’s a mind-reader that can only be affected by your thoughts at the time he set the boxes, instead of a predictor/simulator/very good guesser of your future thoughts.
So it’s not “too late”.
What does it matter if you’ll do it reflexively or after a great deal of thought? The problem doesn’t say that reflexive decisions are easier for Omega to guess than ones following long deliberation.
I’m modelling Omega as a predictor whose prediction function is based on the box-chooser’s current mental state (and presumably the current state of the chooser’s environment). Omega can simulate that state forward into the future and see what happens, but this is still a function of current state.
This is different from Omega being a pre-cog who can (somehow) see directly into the future, without any forward simulation etc.
Yes. And what Omega discovers as a result of performing the simulation depends on what decision you’ll make, even if you encounter the problem for the first time, since a physical simulation doesn’t care about cognitive novelty. Assuming you’re digitally encoded, it’s a logically valid statement that if you one-box, then Omega’s simulation says that you one-boxed, and if you two-box, then Omega’s simulation says that you two-boxed. In this sense you control what’s in the box.
I think this is the disconnect… The chooser’s mental state when sampled by Omega causes what goes into the box. The chooser’s subsequent decisions don’t cause what went into the box, so they don’t “control” what goes into the box either. Control is a causal term...
The goal is to get more money, not necessarily to “causally control” money. I agree that a popular sense of “control” probably doesn’t include what I described, but the question of whether that word should include a new sense is a debate about definitions, not about the thought experiment (the disambiguating term around here is “acausal control”, though in the normal situations it includes causal control as a special case).
So long as we understand that I refer to the fact that it’s logically valid that if you one-box, then you get $1,000,000, and if you two-box, then you get only $1,000, there is no need to be concerned with that term. Since it’s true that if you two-box, then you only get $1,000, then by two-boxing you guarantee that it’s true that you two-box, ergo that you get $1000. Correspondingly, if you one-box, that guarantees that it’s true that you get $1,000,000.
(The subtlety is hidden in the fact that it might be false that you one-box, in which case it’s also true that your one-boxing implies that 18 is a prime. But if you actually one-box, that’s not the case! See this post for some discussion of this subtlety and a model that makes the situation somewhat clearer.)
It seems to me that if I’ve never before been exposed to Newcomb’s problem, and Omega presents me with it, there are two possibilities: either I will one-box, or I will two-box. If I one-box (even without having precommited to doing so, simply by virtue of my thoughts at the moment about the boxes), Omega will have previously worked out that I’m the sort of person who would one-box.
Why do you say that the only thing to do in the absence of precommitment is two-box?
In the case of facing the problem for the first time, in real-time, a person can only 1 box by ignoring the concept of a “dominant” strategy. Or by not really understanding the problem (the boxes really are there with either $1 million in or not and you can’t actually change that: Omega has no time travel or reverse causation powers). Or by having a utility something other than money, which is not in itself irrational, but goes against the statement of the problem.
For instance, I think an astute rational thinker could (perhaps) argue in real-time “this looks like a sort of disguised moral problem; Omega seems to be implicitly testing my ethics i.e. testing my self-restraint versus my greed. So perhaps I should take 1”. However, at that stage the 1-boxer probably values acting ethicallly more than being $1000 richer. Or there might be other rational preferences for not 2-boxing such as getting a really strong urge to 1-box at the time, and prefering to satisfy the urge than to be $1000 richer. Or knowing that if you 2-box you’ll worry for the rest of your life whether that was the right thing, and this is just not worth $1000. I think these are well-known “solutions” which all shift the utility function and hence sidestep the problem.
I understand the argument, I just don’t understand what the novelty of the problem has to do with it. That is, it seems the same problem arises whether it’s a new problem or not.
You’re of course right that there’s no timetravel involved. If I’m the sort of person who two-boxes, Omega will put $1000 in. If I’m the sort of person who one-boxes, Omega will put $1000000 in. (If I’m the sort of person whose behavior can’t be predicted ahead of time, then Omega is lying to me.)
So, what sort of person am I? Well, geez, how should I know? Unlike Omega, I’m not a reliable predictor of my behavior. The way I find out what sort of person I am is by seeing what I do in the situation.
You seem to be insisting on there being a reason for my one-boxing beyond that (like “I think Omega is testing my ethics” or “I precommitted to one-boxing” or some such thing). I guess that’s what I don’t understand, here. Either I one-box, or I two-box. My reasons don’t matter.
Indeed. “I like money” seems like a good enough reason to one box without anything more complicated!
That’s just evidential decision theory, right?
I call it “I take free monies theory!” I don’t need a theoretical framework to do that. At this point in time there isn’t a formal decision theory that results in all the same decisions that I endorse—basically because the guys are still working out the kinks in UDT and formalization is a real bitch sometimes. They haven’t figured out a way to generalize the handling of counterfactuals the way I would see them handled.
(ArisKatsaris nails it in the sibling).
Well, Newcomb’s problem is simple enough that evidential decision theory suffices.
I’m going to track what’s happened on the other threads discussing Newcomb’s paradox, since I suspect there’s quite a lot of repetition or overlap. Before signing off though, does anyone here have a view on whether it matters whether Omega is a perfect predictor, or just a very goodpredictor?
Personally, I think it does matter, and matters rather a lot. The Newcomb problem can be stated either way.
Let’s start with the “very good” predictor case, which I think is the most plausible one, since it just requires Omega to be a good judge of character.
Consider Alf, who is the “sort of person who 2-boxes”. Let’s say he has >99% chance of 2-boxing and <1% chance of 1 boxing (but he’s not totally deterministic and has occasional whims, lapses or whatever). If Omega is a good predictor based on general judge of character, then Omega won’t have put the $1 million in Alf’s boxes. So in the unlikely event that Alf actually does take just the one box then he’ll win nothing at all. This means that if Alf knows he’s basically a 2-boxer (he assigns something like 99% credence to the event that he 2-boxes) and knows that Omega is a good but imperfect predictor, Alf has a rationale for remaining a 2-boxer. This holds under both causal decision theory and evidential decision theory. The solution of being a 2-boxer is reflectively-stable; Alf can know he’s like that and stay like that.
But now consider Beth who’s the sort of person who 1-boxes. In the unlikely event that she takes both boxes, Omega will still have put the $1 million in, and so Beth will win $1001000. But now if Beth knows she’s a 1-boxer (say assigns 99% credence to taking 1 box), and again knows that Omega is good but imperfect, this puts her in an odd self-assessment position, since it seems she has a clear rationale to take both boxes (again under both evidential and causal decision thery). If she remains a 1-boxer, then she is essentially projecting of herself that she has only 1% chance of making a $-optimal choice i.e. she believes of herself either that she is not a rational utility maximiser, or her utility function is different from $. If Beth truly is a $ utility maximiser, then Beth’s position doesn’t look reflectively stable; though she could maybe have “trained” herself to act in this way in Newcomb situations and is aware of the pre-conditioning.
Finally, consider Charles, who has never heard of the Newcomb problem, and doesn’t know whether he will 1-box or 2-box. However, Charles is sure he is a $-utility maximizer. If he is causal decision theorist, he will quickly decide to 2 box, and so will model himself like Alf. If he’s an evidential decision theorist, then he will initially assign some probability to either 1 or 2 boxing, calculate that his expected utility is higher by 1 boxing, and then start to model himself like Beth. But then, he will realize this self-model is reflectively unstable, since it requires him to model himself as something other than a $ utility maximiser, and he’s sure that’s what he is. After flapping about a bit, he will realze that the only reflectively stable solution is to model himself like Alf, and this makes it better for him to 2 box. Thinking about the problem too much forces him to 2 box.
In the event that Omega is a perfect predictor, and the box-chooser knows this, then things get messier, because now the only reflectively-stable solution for the evidential decision theorist is to 1-box. (Beth thinks “I have 99% chance of 1-boxing, and in the rare event that I decide to 2-box, Omega will have predicted this, and my expected utility will be lower; so I still have a rationale to 1 box!). What about the causal decision theorist though? One difficulty is how the causal theorist can really believe in Omega as a perfect predictor without also believing in some form of retrograde causation or time travel. This seems a strange set of beliefs to hold in combination. If the causal decision theorist squares the circle by assigning some sort of pre-cognitive faculty to Omega, or at least assigning some non-trivial credence to such a pre-cog faculty, then he can reason that there is after all (with some credence) a genuine (if bizarre) causal relation between what he chooses, and what goes in the box, so he should 1-box. If he remains sure that there is no such causal relation, then he should 2 box. But we should note that the 2 box position is distinctly weaker in this case than in the “good but imperfect” case.
Try my analysis and Anna Salamon’s.
It is not clear to me that Alf’s position as described here is stable.
You say Alf knows Omega is a good (but imperfect) predictor. Just for specificity, let’s say Alf has (and believes he has) .95 confidence that Omega can predict Alf’s box-selection behavior with .95 accuracy. (Never mind how he arrived at such a high confidence; perhaps he’s seen several hundred trials.) And let’s say Alf values money.
Given just that belief, Alf ought to be able to reason as follows: “Suppose I open just one box. In that case, I expect with ~.9 confidence that Omega placed $1m+$1k in the box. OTOH, suppose I open both boxes. In that case, I expect with ~.9 confidence that Omega placed $1k in the box.”
For simplicity, let’s assume Alf believes Omega always puts either $1k or $1m+$1k in the boxes (as opposed to, say, putting in an angry bobcat). So if Alf has .9 confidence in there being $1k in the boxes, he has .1 confidence in ($1m+1k) in the boxes.
So, Alf ought to be able to conclude that one-boxing has an expected value of (.9 $1m + .1 $1k) and two-boxing has an expected value of (.9 $1k + .1 $1m+1k). The expected value of one-boxing is greater than that of two-boxing, so Alf ought to one-box.
So far, so good. But you also say that Alf has .99 confidence that Alf two-boxes… that is, he has .99 confidence that he will take the lower-value choice. (Again, never mind how he arrived at such high confidence… although ironically, we are now positing that Alf is a better predictor than Omega is.)
Well, this is a pickle! There do seem to be some contradictions in Alf’s position.
Perhaps I’m missing some key implications of being a causal vs. an evidential decision theorist, here. But I don’t really see why it should matter. That just affects how Alf arrived at those various confidence estimates, doesn’t it? Once we know the estimates themselves, we should no longer care.
Incidentally, if Alf believes Omega is a perfect predictor (that is, Alf has .95 confidence that Omega can predict Alf’s box-selection with 1-epsilon accuracy) the situation doesn’t really change much; the EV calculation is (.95 $1m + .05 $1k) vs (.9 $1k + .05 $1m+1k), which gets you to the same place.
OK, maybe it wasn’t totally clear. Alf is very confident that he 2-boxes, since he thinks that’s the “right” answer to Newcomb’s problem. Alf is very confident that Omega is a good predictor, because he’s a good judge of character, and will spot that Alf is a 2-boxer.
Alf believes that in the rare, fluky event that he actually 1-boxes, then Omega won’t have predicted that, since it is so out of character for Alf. Alf thinks Omega is a great predictor, but not a perfect predictor, and can’t foresee such rare, fluky, out-of-character events. So there still won’t be the $1 million in Alf’s boxes in the flukey event that he 1-boxes, and he will win nothing at all, not $1 million. Given this belief set, Alf should 2-box, even if he’s an evidential decision theorist rather than a causal decision theorist. The position is consistent and stable.
Is that clearer?
Ah! Yes, this clarifies matters.
Sure, if Alf believes that Omega has a .95 chance of predicting Alf will two-box regardless of whether or not he does, then Alf should two-box. Similarly, if Beth believes Omega has a .95 chance of predicting Beth will one-box regardless of whether or not she does, then she also should two-box. (Though if she does, she should immediately lower her earlier confidence that she’s the sort of person who one-boxes.)
This is importantly different from the standard Newcomb’s problem, though.
You seem to be operating under the principle that if a condition is unlikely (e.g., Alf 1-boxing) then it is also unpredictable. I’m not sure where you’re getting that from.
By way of analogy… my fire alarm is, generally speaking, the sort of thing that remains silent… if I observe it in six-minute intervals for a thousand observations, I’m pretty likely to find it silent in each case. However, if I’m a good predictor of fire alarm behavior, I don’t therefore assume that if there’s a fire, it will still remain silent.
Rather, as a good predictor of fire alarms, what my model of fire alarms tells me is that “when there’s no fire, I’m .99+ confident it will remain silent; when there is a fire, I’m .99+ confident it will make noise.” I can therefore test to see if there’s a fire and, if there is, predict it will make noise. Its noise is rare, but predictable (for a good enough predictor of fire alarm behavior).
Remember I have two models of how Omega could work.
1) Omega is in essence an excellent judge of character. It can reliably decide which of its candidates is “the sort of person who 1-boxes” and which is “the sort of person who 2-boxes”. However, if he chooser actually does something extremely unlikely and out of character, Omega will get its prediction wrong. This is a model for Omega that I could actually see working, so it is the most natural way for me to interpret Newcomb’s thought experiment.
If Omega behaves like this, then I think causal and evidential decision theory align. Both tell the chooser to 2-box, unless the chooser has already pre-committed to 1-boxing. Both imply the chooser should pre-commit to 1-boxing (if they can).
2) Omega is a perfect predictor, and always gets its predictions right. I can’t actually see how his model would work without reverse causation. If reverse causatiion is implied by the problem statement, or choosers can reasonably think it is implied, then both causal and evidential decision theory align and tell the chooser to 1-box.
From the sound of things, you are describing a third model in which Omega can not only judge character, but can also reliably decide whether someone will act out of character or not. When faced with “the sort of person who 1-boxes”, but then—out of character − 2 boxes after all, Omega will still with high probability guess correctly that the 2-boxing is going to happen, and so withhold the $ 1 million.
I tend to agree that in this third model causal and evidential decision theory may become decoupled, but again I’m not really sure how this model works, or whether it requires backward causation again. I think it could work if the causal factors leading the chooser to act “out of character” in the particular case are already embedded in the chooser’s brain state when scanned by Omega, so at that stage it is already highly probable that the chooser will act out of character this time. But the model won’t work if the factors causing out of character behaviour arise because of very rare, random, brain events happening after the scanning (say a few stray neurons fire which in 99% of cases wouldn’t fire after the scanned brain state, and these cause a cascade eventually leading to a different choice). Omega can’t predict that type of event without being a pre-cog.
Thanks anyway though; you’ve certainly made me think about the problem a bit further...
So, what does it mean for a brain to do one thing 99% of the time and something else 1% of the time?
If the 1% case is a genuinely random event, or the result of some mysterious sort of unpredictable free will, or otherwise something that isn’t the effect of the causes that precede it, and therefore can’t be predicted short of some mysterious acausal precognition, then I agree that it follows that if Omega is a good-but-not-perfect predictor, then Omega cannot predict the 1% case, and Newcomb’s problem in its standard form can’t be implemented even in principle, with all the consequences previously discussed.
Conversely, if brain events—even rare ones—are instead the effects of causes that precede them, then a good-but-not-perfect predictor can make good-but-not-perfect predictions of the 1% case just as readily as the 99% case, and these problems don’t arise.
Personally, I consider brain events the effects of causes that precede them. So if I’m the sort of person who one-boxes 99% of the time and two-boxes 1% of the time, and Omega has a sufficient understanding of the causes of human behavior to make 95% accurate predictions of what I do, then Omega will predict 95% of my (common) one-boxing as well as 95% of my (rare) two-boxing. Further, if I somehow come to believe that Omega has such an understanding, then I will predict that Omega will predict my (rare) two-boxing, and therefore I will predict that two-boxing loses me money, and therefore I will one-box stably.
For the sake of the least convenient world assume that the brain is particularly sensitive to quantum noise. This applies in the actual world too albeit at a far, far lower rate than 1% (but hey… perfect). That leaves a perfect predictor perfectly predicting that in the branches with most of the quantum goo (pick a word) the brain will make one choice while in the others it will make the other.
In this case it becomes a matter of how the counterfactual is specified. The most appropriate one seems to be with Omega filling the large box with an amount of money proportional to how much of the brain will be one boxing. A brain that actively flips a quantum coin would then be granted a large box with half the million.
The only other obvious alternative specification of Omega that wouldn’t break the counterfactual given this this context are a hard cutoff and some specific degree of ‘probability’.
As you say the one boxing remains stable under this uncertainty and even imperfect predictors.
I’m not sure what the quantum-goo explanation is adding here.
If Omega can’t predict the 1% case (whether because it’s due to unpredictable quantum goo, or for whatever other reason… picking a specific explanation only subjects me to a conjunction fallacy) then Omega’s behavior will not reflect the 1% case, and that completely changes the math. Someone for whom the 1% case is two-boxing is then entirely justified in two-boxing in the 1% case, since they ought to predict that Omega cannot predict their two-boxing. (Assuming that they can recognize that they are in such a case. If not, they are best off one-boxing in all cases. Though it follows from our premises that they will two-box 1% of the time anyway, though they might not have any idea why they did that. That said, compatibilist decision theory makes my teeth ache.)
Anyway, yeah, this is assuming some kind of hard cutoff strategy, where Omega puts a million dollars in a box for someone it has > N% confidence will one-box.
If instead Omega puts N% of $1m in the box if Omega has N% confidence the subject will one-box, the result isn’t terribly different if Omega is a good predictor.
I’m completely lost by the “proportional to how much of the brain will be one boxing” strategy. Can you say more about what you mean by this? It seems likely to me that most of the brain neither one-boxes nor two-boxes (that is, is not involved in this choice at all) and most of the remainder does both (that is, performs the same operations in the two-boxing case as in the one-boxing case).
A perfect predictor will predict correctly and perfectly that the brain both one boxes and two boxes in different Everett branches (with vastly different weights). This is different in nature to an imperfect predictor that isn’t able to model the behavior of the brain with complete certainty yet given preferences that add up to normal it requires that you use the same math. It means you do not have to abandon the premise “perfect predictor” for the probabilistic reasoning to be necessary.
How much weight the everett branches in which it one box have relative to the everett branches in which it two boxes.
Allow me to emphasise:
(I think we agree?)
Ah, I see what you mean.
Yes, I think we agree. (I had previously been unsure.)
Assume that the person choosing the boxes is a whole brain emulation, and that Omega runs a perfect simulation, which guarantees formal identity of Omega’s prediction and person’s actual decision, even though the computations are performed separately.
So the chooser in this case is a fully deterministic system, not a real-live human brain with some chance of random firings screwing up Omega’s prediction?
Wow, that’s an interesting case, and I hadn’t really thought about it! One interesting point though—suppose I am the chooser in that case; how can I tell which simulation I am? Am I the one which runs after Omega made its prediction? Or am I the one which Omega runs in order to make its prediction, and which does have a genuine causal effect on what goes in the boxes? It seems I have no way of telling, and I might (in some strange sense) be both of them. So causal decision theory might advise me to 1-box after all.
This is more of a way of pointing out a special case that shares relevant considerations with TDT-like approach to decision theory (in this extreme identical-simulation case it’s just Hofstadter’s “superrationality”).
If we start from this case and gradually make the prediction model and the player less and less similar to each other (perhaps by making the model less detailed), at which point do the considerations that make you one-box in this edge case break? Clearly, if you change the prediction model just a little bit, correct answer shouldn’t immediately flip, but CDT is no longer applicable out-of-the-box (arguably, even if you “control” two identical copies, it’s also not directly applicable). Thus, a need for generalization that admits imperfect acausal “control” over sufficiently similar decision-makers (and sufficiently accurate predictions) in the same sense in which you “control” your identical copies.
That might give you the right answer when Omega is simulating you perfectly, but presumably you’d want to one-box when Omega was simulating a slightly lossy, non-sentient version of you and only predicted correctly 90% of the time. For that (i.e. for all real world Newcomblike problems), you need a more sophisticated decision theory.
Well no, not necessarily. Again, let’s take Alf’s view. (Note I edited this post recently to correct the display of the matrices)
Remember that Alf has a high probability of 2 boxing, and he knows this about himself. Whether he would actually do better by 1-boxing will depend how well Omega’s “mistaken” simulations are correlated with the (rare, freaky) event that Alf 1 boxes. Basically, Alf knows that Omega is right at least 90% of the time, but this doesn’t require a very sophisticated simulation at all, certainly not in Alf’s own case. Omega can run a very crude simulation, say “a clear” 2-boxer, and not fill box B (so Alf won’t get the $ 1 million. Basically, the game outcome would have a probability matrix like this:
Notice that Omega has less than 1% chance of a mistaken prediction.
But, I’m sure you’re thinking, won’t Omega run a fuller simulation with 90% accuracy and produce a probability matrix like this?
Well Omega could do that, but now its probability of error has gone up from 1% to 10%, so why would Omega bother?
Let’s compare to a more basic case: weather forecasting. Say I have a simulation model which takes in the current atmospheric state above a land surface, runs it forward a day, and tries to predict rain. It’s pretty good: if there is going to be rain, then the simulation predicts rain 90% of the time; if there is not going to be rain, then it predicts rain only 10% of the time. But now someone shows me a desert, and asks me to predict rain: I’m not going to use a simulation with a 10% error rate, I’m just going to say “no rain”.
So it seems in the case of Alf. Provided Alf’s chance of 1-boxing is low enough (i.e. lower than the underlying error rate of Omega’s simulations) then Omega can always do best by just saying “a clear 2-boxer” and not filling the B box. Omega may also say to himself “what an utter schmuck” but he can’t fault Alf’s application of decision theory. And this applies whether or not Alf is a causal decision theorist or an evidential decision theorist.
Incidentally, your fire alarm may be practically useless in the circumstances you describe. Depending on the relative probabilities (small probability that the alarm goes off when there is not a fire versus even smaller probability that there genuinely is a fire) then you may find that essentially all the alarms are false alarms. You may get fed up responding to false alarms and ignore them. When predicting very rare events, the prediction system has to be extremely accurate.
This is related to the analysis below about Omega’s simulation being only 90% accurate versus a really convinced 2-boxer (who has only a 1% chance of 1-boxing). Or of simulating rain in a desert.
See TDT, UDT, ADT.
Thanks for this… I’m looking at them.
If I’m correct, the general thrust seems to be “there is a problem with both causal decision theory and evidential decision theory, since they sometimes recommend different things, and sometimes EDT seems right, whereas at other times CDT seems right. So we need a broader theory”.
I’m not totally convinced of this need, since I think that in many ways of interpreting the Newcomb problem, EDT and CDT lead to essentially the same conclusion. They both say pre-commit to 1-boxing. If you haven’t precommitted, they both say 2-box (in some interpretations) or they both say 1-box (in other interpretations). And the cases where they come apart are metaphysically rather problematic (e.g. Omega’s predictions must be perfect or nearly-so without pre-cognition or reverse causation; Omega’s simulation of the 2-boxer must be accurate enough to catch the rare occasions when he 1-boxes, but without that simulation itself becoming sentient.)
However, again, thanks for the references and for a few new things to think about.
This is an old thread, but I can’t imagine the problem going away anytime soon, so let me throw some chum into the waters;
Omega says; “I predict you’re a one boxer. I can understand that. You’ve got really good reasons for picking that, and I know you would never change your mind. So I’m going to give you a slightly different version of the problem; I’ve decided to make both boxes transparent. Oh and by the way, my predictions aren’t 100% correct.”
Question: Do you make any different decisions in the transparent box case?
If so, what was there about your original argument that is different in the transparent box case?
If you’re really a one boxer, that means you can look at an empty box and still pick it.
I was surprised that the rec.puzzles FAQ answer to this doesn’t appear in the replies. (Maybe it’s here and I just missed it.)
In other words, we can’t tell if (how much) our actions determine the outcome, so we can’t make a rational decision.
Box B is already empty or already full [and will remain the same after I’ve picked it]
Do I have to believe that statement is completely and utterly true for this to be a meaningful exercise? It seems to me that I should treat that as dubious.
It seems to me that Omega is achieving a high rate of success by some unknown good method. If I believe Omega’s method is a hard-to-detect remote-controlled money vaporisation process then clearly I should one-box.
A super intelligence has many ways to get the results it wants.
I am inclined to think that I don’t know the mechanism with sufficient certainty that I should reason myself into two-boxing against the evidence to date.
Does it matter which undetectable unbelievable process Omega is using for me to pick my strategy? I don’t think it does—I have to acknowledge that I’m out of my depth with this alien and arguments against causality defiance or the impossibility of undetectable money vaporisers are not going to help me take the million.
Another tack: Omega isn’t a super intelligence—he’s got a ship, a plan, and a lot of time on his hands. He turns up on millions of worlds to play this game. His guesses are pretty lousy, he guesses right only x percent of the time. We are the only planet on which he’s consistently guessed right. We don’t know what x is in the full sample size. Looking at what his results are here, it looks good. Does it really seem rational to second guess the sample we see?
It seems to me that we have to accept some pretty wild statements and then start reasoning based on them for us to come to a losing strategy. If we doubt the premises to some degree then does it become clear that the most reasonable strategy is one-boxing?
This reminds me eerily of the Calvinist doctrine of predestination. The money is already there, and making fun of me for two-boxing ain’t gonna change anything.
A question—how could Omega be a perfect predictor, if I in fact have a third option—namely leaving without taking either box? This possibility would, in any real-life situation, lead me to two-box. I know this and accept it.
Then there’s always the economic argument: If $1000 is a sum of money that matters a great deal to me, I’m two-boxing. Otherwise, I’d prefer to one-box.
Do you mean that $1,000 matters a great deal, but $1,000,000 doesn’t matter a great deal? If you buy that Omega is a perfect predictor, then it’s impossible to walk away empty-handed. (Whether or not you should buy that in real life is it’s own issue.)
So, I’m sure this isn’t an original thought but there are a lot of comments and my utility function is rolling its eyes at the thought of going through them all to see whether this comment is redundant, as compared to writing the comment given I want to sort my thoughts out verbally anyway.
I think the standard form of the question should be changed to the one with the asteroid. Total destruction is total destruction, but money is only worth a) what you can buy with it and b) the effort it takes to earn it.
I can earn $1000 in a month. Some people could earn it in a week. What is the difference between $1m and $1m + $1000? Yes, it’s technically a higher number, but in terms of my life that is not a statistically significant difference. Of course I’d rather definitely have $1m than risk having nothing for the possibility of having $1m + $1000.
The causal decision theory versions of this problem don’t look ridiculous because they take the safe option, they look ridiculous because the utility of two-boxing is not significant in comparison with the potential utility of one-boxing. That is, a one-boxer doesn’t lose much if they’re wrong: IF box 2 already contained nothing when they chose it, they only missed their chance at $1000, whereas IF box 2 already contained $1m a two-boxer misses their chance at $1m.
Obviously an advanced decision theory needs a way to rank the potential risks—if you postulate it as the asteroid, the risk is much more concrete.
shortly after posting this I realised that the value to me of $1000 is only relevant if you assume the odds of Omega predicting your actions correctly are 50/50ish. Need to think about this some more.
I’d just like to note that as with most of the rationality material in Eliezer’s sequences, the position in this post is a pretty common mainstream position among cognitive scientists. E.g. here is Jonathan Baron on page 61 of his popular textbook Thinking and Deciding:
This view is quoted and endorsed in, for example, Stanovich 2010, p. 3.
It seems to me that if you make a basic bayes net with utilities at the end. The choice with the higher expected utility is to one box. Say:
P(1,000,000 in box b and 10,000 in box a|I one box) = 99%
P(box b is empty and 10,000 in box a|I two box) = 99%
hence
P(box b is empty and 10,000 in box a|I one box) = 1%
P(1,000,000 in box b and 10,000 in box a|I two box) = 1%
So
If I one box i should expect 99%1,000,000+1%0 = 990,000
If I two box i should expect 99%10,000+1%1,010,000 = 20,000
Expected utility(I one box)/Expected utility(I two box) = 49.5, so I should one box by a land slide. This is assuming that omega has a 99% rate of true positive, and of true negative; it’s more dramatic if we assume that omega is perfect. If P(1,000,000 in box b and 10,000 in box a|I one box) = P(box b is empty and 10,000 in box a|I two box) = 100%, then Expected utility(I one box)/Expected utility(I two box) = 100. If omega is perfect, by my calculation we should expect one boxing to be a 100 times more profitable than two boxing.
This is the sort of math I usually use to decide. Is this none-standard, did I make a mistake, or does this method produce stupid results elsewhere?
It’s true that one-boxing is the strategy that maximizes expected utility, and that it is a fairly uncontroversial maxim in normative decision theory that one should pick the strategy that maximizes expected utility. However, it is also a fairly uncontroversial maxim in normative decision theory that if a dominant strategy exists, one should adopt it. In this case, two-boxing is dominant (if you suppose there is no backwards causation). Usually, these two maxims do not conflict, but they do in Newcomb’s problem. I guess the question you should ask yourself is why you think the one we should adhere to is expected utility maximization.
Not saying it’s the wrong answer (I don’t think it is), but simply saying “We do this sort of math all the time. Why not here?” is insufficient justification because we also do this other sort of math all the time, so why not do that here?
Great, I’ll work on that. That’s exactly what I should ask my self. And if I find that the rule of do that with highest expected utility fails on the smoking lesion problem, I’ll ask why I want to go with the dominant strategy (as I predict I will).
The only reason that I have to trust expected utility particularly is that I have a geometric metaphor, which forces me to believe the rule, if I believe certain basic things about utility.
This looks like it loses in the Smoking Lesion problem.
I’ll work on that and edit my result to here. Thanks.
I think you went wrong when you said:
because Omega doesn’t reward people for their choice to pick box B, he rewards them for being implementations of any of the many algorithms that would pick box B.
I think that the causal decision theory algorithm is the winning way for problems where your mind is not read (when you take into account that causal decision theory can be swayed to make choices so as deceive others about your real algorithm). Problems where your mind is read do not usually show up in real life. I think there is no winning way for conceivable universes in general, so I want to be an implementation of the winning algorithm for this universe, which seems to be causal decision theory.
Sorry, I’m new here, I am having trouble with the Idea that anyone would consider taking both boxes in a real world situation. How would this puzzle be modeled differently, versus how would it look differently if it were Penn and Teller flying Omega?
If Penn and Teller were flying Omega then they would have been able to produce exactly the same results as seen, without violating causality or time travelling or perfectly predicting people by just cheating and emptying the box after you choose to take both.
Given that “it’s cheating” is a significantly more rational idea than “it’s smart enough to predict 100 people” in terms of simplicity and results seen, why not go with that as a rational reason to pick just box B? The only reason one would take both is if it proved it was not cheating, how it could do that without also convincing me of its predictive powers I don’t know, and once convinced of is predictive powers I would have to take Box B.
So taking both boxes only makes sense if you know it is not cheating, and know it can be wrong. I notice I am confused, how can you both know it is not cheating, and not know that it is correct in it’s prediction.
I think that the reason this puzzle begets irrationality is that one of the fundamental things you must do to parse the puzzle is irrational, that is ‘believe that the machine is not cheating’, given the alternatives and no further info.
Yeah, this comes up a lot.
My usual way of approaching it is to acknowledge that the thought experiment is asking me to imagine being in a particular epistemic state, and then asking me for my intuitions about what I would do, and what it would be right for me to do, given that state. The fact that the specified epistemic state is not one I can imagine reaching is beside the point.
This is common for thought experiments. If I say “suppose you’re on a spaceship traveling at .999999999c, and you get in a trolley inside the ship that runs in the direction the ship is travelling at 10 m/s, how fast are you going?” it isn’t helpful to reply “No such spaceship exists, so that condition can’t arise.” That’s absolutely true, but it is beside the point.
The difficulty I am having here is not so much that the stated nature of the problem is not real so much that it is asking one to assume they are irrational. With a .999999999c spaceship it is not irrational to assume one is in a trolley on a space ship if one is in a trolley on a space ship. There is not enough information in the Omega puzzle as it assumes you, the person it drops the boxes in front of, know that omega is predicting, but does not tell you how you know that. As the mental state ‘knowing it is predicting’ is fundamental to the puzzle, not knowing how one came to that conclusion asks you to be a magical thinker for the purpose of the puzzle. I believe that this may at least partially explain why there seems to be a lack of consensus.
I also am suspicious of the ambiguous nature of the word predict, but am having trouble phrasing the issue. Omega may be using astrology and happen to have been right each of 100 times, or be literally looking forward in time. Without knowing how can one make the best choice?
All that said taking just B is my plan, as with $1,000,000 I can afford to lose $1,000.
I agree that I can’t imagine any justified way of coming to believe Omega has the properties that I am presumed to believe Omega to have. So, yes, the thought experiment either assumes that I’ve arrived at that state in some unjustified way (as you say, assume I’m irrational, at least sometimes) or that I’ve arrived at it in some justified way I currently have no inkling of (and therefore cannot currently imagine).
Assuming that I’m irrational sometimes, and sometimes therefore arrive at beliefs that aren’t justified, isn’t too difficult for me; I have a lot of experience with doing that. (Far more experience than I have with riding a trolley on a spaceship, come to that.)
But, sure, I can see where people whose experience doesn’t include that, or whose self-image rejects it regardless of their experience, or who otherwise have trouble imagining themselves arriving at beliefs that aren’t rationally justified, might balk at that step.
If by “best choice” we mean the choice that has the best possible results, then in this case we either cannot make the best choice except by accident, or we always make the best choice, depending on whether the things that didn’t in fact happen were possible before they didn’t happen, which there’s no particular reason to believe.
If by “best choice” we mean the choice that has the highest expected value given what we know when we make it, then we make the best choice by evaluating what we know.
Thanks, that does help a little, though I should say that I am pretty sure I hold a number of irrational beliefs that I am yet to excise. Assuming that Omega literally implanted the idea into my head is a different thought experiment to Omega turned out to be predicting is different to Omega saying that it predicted the result etc. Until I know how and why I know it is predicting the result I am not sure how I would act in the real case. How Omega told me that I was only allowed to pick box a and b or just b may or may not be helpful but either way not as important as how I know it is predicting.
Edit. There seem to be a number of thought experiments wherein I have an irrational belief that I can more accuratly mentally model, like how I may behave if I thought that I was the King of England. Now I am wondering what about this specific problem is giving me trouble.
Fair enough.
For my own part, I find that I often act on my beliefs in a situation without stopping to consider what my basis for those beliefs is, so it’s not too difficult for me to imagine acting on my posited beliefs about Omega’s predictive ability while ignoring the question of where those beliefs came from. I simply accept, for the sake of the exercise, that I do believe it and act accordingly.
Another way of looking at it you might find helpful is to leave aside altogether the question of what I would or wouldn’t do, and what I can and can’t believe, and instead ask what the right thing to do would be were this the actual situation.
E.g., if you give me a device that is indistinguishable from a revolver, but is designed in such a way that placing it to my temple and firing the trigger doesn’t put a bullet in my skull but instead causes Vast Quantities of Really Good Stuff to happen, the right thing to do is put the device to my temple and fire the trigger. I won’t actually do that, because I have no way of knowing what the device actually does, but whether I do it or not it’s the right thing to do.
Thank you. By depersonalising the question it makes it easier for me to think about. If do you take one box or two becomes should one take one box or two… I am still confused. I’m confident that just box B should be taken, but I think that I need information that is implied to exist but is not presented in the problem to be able to give a correct answer. Namely the nature of the predictions Omega has made.
With the problem as stated I do not see how one could tell if Omega got lucky 100 times with a flawed system, or if it has a deterministic or causality breaking process that it follows.
One thing I would say is that picking B the most you could lose is 1000 dollars if B is empty. Picking A and B the most you could gain over just B is 1000 dollars. Is it worth betting a reasonable chance at $1,000,000 for a $1,000 gain if you beat a computer at a game 100 people failed to beat it at, especially if it is a game you more or less axiomatically do not understand how it is playing?
Mm. I’m not really understanding your thinking here.
Sorry, I am having difficulty explaining as I am not sure what it is I am trying to get across, I lack the words. I am having trouble with the use of the word predict, as it could imply any number of methods of prediction, and some of those methods change the answer you should give.
For example if it was predicting by the colour of the player’s shoes it may have a micron over 50% chance of being right, and just happened to have been correct the 100 times you heard of. In that case one should take a and b, if, on the other hand, it was a visitor from a higher matrix, and got its answer by simulating you perfectly and at fast forward, then whatever you want to take is the best option and in my case that is B. If it is breaking causality by looking through a window into the future, then take box B. My answers are conditional on information I do not have. I am having trouble mentally modelling this situation without assuming one of these cases to be true.
This seems a bizarre way of thinking about it, to me. It’s as though you’d said “suppose there’s someone walking past Sam in the street, and Sam can shoot and kill them, ought Sam do it?” and I’d replied “well, I need to know how reliable a shot Sam is. If Sam’s odds of hitting the person are low enough, then it’s OK. And that depends on the make of gun, and how much training Sam has had, and...”
I mean, sure, in the real world, those are perhaps relevant factors (and perhaps not). But you’ve already told me to suppose that Sam can shoot and kill the passerby. If I assume that (which in the real world I would not be justified in simply assuming without evidence), the make of the gun no longer matters.
Similarly, I agree that if all I know is that Omega was right in 100 trials that I’ve heard of, I should lend greater credence to the hypothesis that there were >>100 trials, the successful 100 were cherrypicked, and Omega is not a particularly reliable predictor. This falls into the same category as assuming Omega is simply lying… sure, it’s highest-expected-value thing to do in an analogous situation that I might actually find myself in, but that’s different from what the problem assumes.
The problem assumes that I know Omega has an N% prediction rate. If I’m going to engage with the problem, I have to make that assumption. If I am unable to make that assumption, and instead make various other assumptions that are different, then I am unable to engage with the problem.
Which is OK… engaging with Newcombe’s problem is not a particularly important thing to be able to do. If I’m unable to do it, I can still lead a fulfilling life.
It’s simply one of the rules of the thought experiment. If you bring in the hypothesis that Omega is cheating, you are talking about a different thought experiment. That may be an interesting thought experiment in its own right, but it isn’t the thought experiment under discussion, and the solution you are proposing to your thought experiment is not a solution to Newcomb’s problem.
I’ve always been a one-boxer. I think I have a new solution as to why. Try this: Scenario A: you will take a sleep potion and be woken up twice during the middle of the night to be asked to take one box or both boxes. Whatever you do the first time determines whether $1m is placed in the potentially-empty box. Whatever you do the second time determines what you collect. The catch is that the sleep potion will wipe all your memories over the next twelve hours. You’re told this in advance and asked to make up your mind. So you’ll give the same answer each time [or if you employ a mixed strategy, employ the same mixed strategy, because you don’t know if you’ve already been woken up].
If you say “one box” each time, you collect $1,000,000 If you say “both boxes” each time, you collect $1,000.
So you know, given this, that you do better to say “one box”. Do two-boxers agree with this?
Scenario B: Same as scenario A, except that instead of being woken up twice during the night, you will be woken up once and asked which boxes you will take. Your thoughts now are read by an expert mind-reading device. Whatever you plan to say will be used to determine whether there is $1m or $0 in the box you surely take. I think that you still take one box. Do two-boxers agree with this?
Scenario C: Same as scenario B, except that instead of having your thoughts read now, your thoughts are predicted by an expert thought-predicting device. This is then used to determine what will be placed in the box of uncertain contents. I hold that having your thoughts known at the time and known before you will think them are identical for the purposes of this problem. [mind-blowing in many respects, I agree, but irrelevant for this problem.] Ergo I take one box. Do two-boxers agree?
As a 1.4999999999999 boxer (i.e. take a quantum randomness source for [0, 1], take both boxes if 0, one box if 1, one box if something else happens), I don’t think scenario C is convincing.
The crucial property of B is that as your thoughts change the contents of the box change. The casualty link goes forward in time. Thus the right decision is to take one box, as by the act of taking one box, you will make it contain the money.
In C however there is no such casualty. The oracle either put money in both boxes, or it did not. Your decision now cannot possibly affect that state. So you cannot base your decision in C on its similarity to B.
A good reason to one box, in my opinion, is that before you encounter the boxes it is clearly preferable to commit to one boxing. This is of course not compatible with taking two boxes when you find them (because the oracle seems to be perfect). So it is rational to make yourself the kind of person that takes one box (because you know this brings you the best benefit, short of using the randomness trick).
The solution to this problem is simple and, in my eyes, pretty obvious. Your decision isn’t changing the past, it’s simply that the choice of Omega and your decision have the same cause. Assuming Omega emulates your mind under the conditions of when you’re making the choice, then the cause of the prediction and the cause of your choice are the same (the original state of your mind is the cause). So choosing B is the rational choice. And anyways, no matter what method of prediction Omega uses, the cause of his prediction will always be the same as the cause of your choice (if it isn’t, then he doesn’t have any basis for his prediction, and he will therefore have a lower success rate than 100% unless he is really, really lucky).
And even if you don’t think of this when confronted by the problem, the probability should be more than enough to convince you that B is the rational choice. If the Universe says one thing and your model says another, follow the Universe, not your model.
uh, are decision theorists really this moronic or is yudkowsky pulling my leg? the only thing even remotely vexing about this is in judging the accuracy of the predictor’s algorithm. dunno if a table is doable, but here goes nothing: (redoing it, since the table doesn’t display)
Accuracy: perfect
Judgement: perfect: B (1,000,000) imperfect: both (1,000)
Accuracy: imperfect
Judgement: (provided luck prevails) perfect: B (1,000,000*) imperfect: both (1,001,000)
therefore, for high risk & high profit, pick both; for low risk & low profit, pick B. “problem” solved.
(*but self-righteous pleasure from having Defied Reason. stupid theistic and classical rationalistic thought processes that abstract away underlying complexities in matters of reason and faith. the western proletariat (dunno how else to refer to the class that collectively falls for this shit, really) needs to learn the difference between rationalism and rationalizationism, especially as adopted wholesale from christianity by an alarming number of modern philosophers.)
Eliezer’s claim is accurate, if somewhat overstated. To quote another source, the Stanford Encyclopedia of Philosophy:
but then the claim of the philosophers responsible for working on this problem is simply incorrect. either that, or they have a despicably narrow-minded view of reason, in which case the thing they call “reason” is an uninteresting artifact of 21st century philosophical categorization. they might as well be using “reason n. dog poo,” for all the help this is in charting out the role cold calculation plays in this scenario, and my definition would at least require an interesting defense to justify the novel usage. this is a gamble, plain and simple, with fairly straightforward odds:
accuracy of algorithm: perfect
judgement of accuracy: perfect: 1,000,000 imperfect: 1,000
accuracy of algorithm: imperfect and generous
judgement of accuracy: perfect: 1,000,000 imperfect: 1,001,000
accuracy of algorithm: imperfect and miserly
judgement of accuracy: perfect: 0 imperfect: 1,000
as there can be no question that the entire problem revolves around the nature of the predictor’s algorithm, decision theorists must be morons if this is an important problem to them. the only reason this may appear interesting is because in the way they framed everything, the whole thing defies their narrow definition of “reason”. this is a definition-oriented tug-of-war which dissolves as soon as you apply what eliezer says about splitting words. unfettered reason, being free to transcend the manner in which the problem has been set up, has no use for boundaries as artificial as conventional definitions. substitute “reason” with, say, “calculation”, and i no longer see what’s so special about it anymore.
this seems so bloody obvious to me! if i’m wrong, i’d really appreciate an explanation.
Well, try using numbers instead of saying something like “provided luck prevails”.
If p is the chance that Omega predicts you correctly, then the expected value of selecting one box is:
1,000,000(p) + 0(1-p)
and the expected value of selecting both is:
1,000(p) + 1,001,000(1-p)
So selecting both is only higher expected value if Omega guesses wrong about half the time or more.
Well, the more I think about this, the more it seems to me that we’re dealing with a classic case of the unspecified problem.
You are standing on the observation deck of Starship Dog Poo orbiting a newly discovered planet. The captain inquires as to its color. What do you answer him?
Uh, do I get to look at the planet?
No.
… Let me look up the most common color of planets across the universe.
In the given account, the ability attributed to our alien friend is not described in terms that are meaningful in any sense, but is instead ascribed to his “superintelligence”, which is totally irrelevant as far as our search for solutions is concerned. And yet, we’re getting distracted from the problem’s fundamentally unspecified nature by these yarn balls of superintelligence and paradoxical choice, which are automatically Clever Things to bring up in our futurist iconography.
If you think I’m mistaken, then I’d really appreciate criticism. Thanks!
(The above problem is actually a more sensible one since the relationship of the query to our cache of observed data is at least clear. Newcomb’s Problem, OTOH, leaves the domain of well-understood science completely behind. If, with our current scientific knowledge, we find the alien’s ability utterly baffling at the stage of understanding his methods which the problem has set out for us, then it would be sheer hubris to label either choice “rational”, because if the very basis for such a judgement exists, then I for one cannot see it. What if you pick B and it turns out to be empty? If that is impossible, then what are the details of the guarantee that that outcome could never occur? The problem’s wrappings, so to speak, makes this look like an incomprehensible matter of faith to me. If I have misunderstood something, could someone smart please explain it to me?)
(At the very least, it must be admitted that in our current understanding of the universe, a world of chaotic systems and unidirectional causality, a perfect predictor’s algorithm is a near-impossibility, “superintelligence” or no. All this reminds me of what Eliezer said in his autobiographical sequence: If you want to treat a complete lack of understanding of a subject as an unknown variable and shift it around at your own convenience, then there are definitely limits to that kind of thing.)
(Based on a recommendation, I am now reading Yudkowsky’s paper on Timeless Decision Theory. I’m 7 pages in, but before I come across Yudkowsky’s solution, I’d like to note that choosing evidential decision theory over causal decision theory or vice-versa, in itself, looks like a completely arbitrary decision to me. Based on what objective standards could either side possibly justify its subjective priorities as being more “rational”?)
Well, it’s a thought experiment, involving the assumption of some unlikely conditions. I think the main point of the experiment is the ability to reason about what decisions to make when your decisions have “non-causal effects”—there are conditions that will arise depending on your decisions, but that are not caused in any way by the decisions themselves. It’s related to Kavka’s toxin and Parfit’s hitchhiker.
But even thought experiments ought to make sense, and I’m not yet convinced this one does, for the reasons I’ve been ranting about. If the problem does not make sense to begin with, what is its “answer” worth? For me, this is like seeing the smartest minds in the world divided over whether 5 + Goldfish = Sky or 0. I’m asking what the operator “+” signifies in this context, but the problem is carefully crafted to make that very question seem like an unfair imposition.
Here, the power ascribed to the alien, without further clarification, appears incoherent to me. Which mental modules, or other aspects of reality, does it read to predict my intentions? Without that being specified, this remains a trick question. Because if it directly reads my future decision, and that decision does not yet exist, then causality runs backwards. And if causality runs backwards, then the money already being in box B or not makes no difference, because your actual decision NOW is going to determine whether it will have been placed there in the past. So if you’re defying causality, and then equating reason with causality, then obviously the “irrational”, ie. acausal, decision will be rewarded, because the acausal decision is the calculating one. God I wish I could draw a chart in here.
The power, without further clarification, is not incoherent. People predict the behavior of other people all the time.
Ultimately, in practical terms the point is that the best thing to do is “be the sort of person who picks one box, then pick both boxes,” but that the way to be the sort of person that picks one box is to pick one box, because your future decisions are entangled with your traits, which can leak information and thus become entangled with other peoples’ decisions.
And they’re proved wrong all the time. So what you’re saying is, the alien predicts my behavior using the same superficial heuristics that others use to guess at my reactions under ordinary circumstances, except he uses a more refined process? How well can that kind of thing handle indecision if my choice is a really close thing? If he’s going with a best guess informed by everyday psychological traits, the inaccuracies of his method would probably be revealed before long, and I’d be at the numbers immediately.
I agree, I would pick both boxes if that were the case, hoping I’d lived enough of a one box picking life before.
I beg to differ on this point. Whether or not I knew I would meet Dr. Superintelligence one day, an entire range of more or less likely behaviors is very much conceivable that violate this assertion, from “I had lived a one box picking life when comparatively little was at stake,” to “I just felt like picking differently that day.” You’re taking your reification of selfhood WAY too far if you think Being a One Box Picker by picking one box when the judgement is already over makes sense. I’m not even sure I understand what you’re saying here, so please clarify if I’ve misunderstood things. Unlike my (present) traits, my future decisions don’t yet exist, and hence cannot leak anything or become entangled with anyone.
But what this disagreement boils down to is, I don’t believe that either quality is necessarily manifest in every personality with anything resembling steadfastness. For instance, I neither see myself as the kind of person who would pick one box, nor as the kind who would pick both boxes. If the test were administered to me a hundred times, I wouldn’t be surprised to see a 50-50 split. Surely I would be exaggerating if I said you claim that I already belong to one of these two types, and that I’m merely unaware of my true inner box-picking nature? If my traits haven’t specialized into either category, (and I have no rational motive to hasten the process) does the alien place a million dollars or not? I pity the good doctor. His dilemma is incomparably more black and white than mine.
To summarize, even if I have mostly picked one box in similar situations in the past, how concrete is such a trait? This process comes nowhere near the alien’s implied infallibility, it seems to me. Therefore, either this process or the method’s imputed infallibility has got to go if his power is to be coherent.
Not only that, if that’s all there is to the alien’s ability, what does this thought experiment say, except that it’s indeed possible for a rational agent to reward others for their past irrationality? (to grant the most meaningful conclusion I DO perceive) That doesn’t look like a particularly interesting result to me. Such figures are seen in authoritarian governments, religions, etc.
Your future decisions are entangled with your present traits, and thus can leak. If you picture a Bayesian network with the nodes “Current Brain”, “Future Decision”, and “Current Observation”, with arrows from Current Brain to the two other nodes, then knowing the value of Current Observation gives you information about Future Decision.
Obviously the alien is better than a human at running this game (though, note that a human would only have to be right a little more than 50% of the time to make one-boxing have the higher expected value—in fact, that could be an interesting test to run!). Perhaps it can observe your neurochemistry in detail and in real time. Perhaps it simulates you in this precise situation, and just sees whether you pick one or both boxes. Perhaps land-ape psychology turns out to be really simple if you’re an omnipotent thought-experiment enthusiast.
The reasoning wouldn’t be “this person is a one-boxer” but rather “this person will pick one box in this particular situation”. It’s very difficult to be the sort of person who would pick one box in the situation you are in without actually picking one box in the situation you are in.
One use of the thought experiment, other than the “non-causal effects” thing, is getting at this notion that the “rational” thing to do (as you suggest two-boxing is) might not be the best thing. If it’s worse, just do the other thing—isn’t that more “rational”?
Here I’d just like to note that one must not assume all subsystems of Current Brain remain constant over time. And what if the brain is partly a chaotic system? (AND new information flows in all the time… Sorry, I cannot condone this model as presented.)
I already mentioned this possibility. Fallible models make the situation gameable. I’d get together with my friends, try to figure out when the model predicts correctly, calculate its accuracy, work out a plan for who picks what, and split the profits between ourselves. How’s that for rationality? To get around this, the alien needs to predict our plan and—do what? Our plan treats his mission like total garbage. Should he try to make us collectively lose out? But that would hamper his initial design.
(Whether it cares about such games or not, what input the alien takes, when, how, and what exactly it does with said input—everything counts in charting an optimal solution. You can’t just say it uses Method A and then replace it with Method B when convenient. THAT is the point: Predictive methods are NOT interchangeable in this context. (Reminder: Reading my brain AS I make the decision violates the original conditions.))
We’re veering into uncertain territory again… (Which would be fine if it weren’t for the vagueness of mechanism inherent in magical algorithms.)
Second note: An entity, alien or not, offering me a million dollars, or anything remotely analogous to this, would be a unique event in my life with no precedent whatever. My last post was written entirely under the assumption that the alien would be using simple heuristics based on similar decisions in the past. So yeah, if you’re tweaking the alien’s method, then disregard all that.
From the alien’s point of view, this is epistemologically non-trivial if my box-picking nature is more complicated than a yes-no switch. Even if the final output must take the form of a yes or a no, the decision tree that generated that result can be as endlessly complex as I want, every step of which the alien must predict correctly (or be a Luck Elemental) to maintain its reputation of infallibility.
As long as I know nothing about the alien’s method, the choice is arbitrary. See my second note. This is why the alien’s ultimate goals, algorithms, etc, MATTER.
(If the alien reads my brain chemistry five minutes before The Task, his past history is one of infallibility, and no especially cunning plan comes to mind, then my bet regarding the nature of brain chemistry would be that not going with one box is silly if I want the million dollars. I mean, he’ll read my intentions and place the money (or not) like five minutes before… (At least that’s what I’ll determine to do before the event. Who knows what I’ll end up doing once I actually get there. (Since even I am unsure as to the strength of my determination to keep to this course of action once I’ve been scanned, the conscious minds of me and the alien are freed from culpability. Whatever happens next, only the physical stance is appropriate for the emergent scenario. ((“At what point then, does decision theory apply here?” is what I was getting at.) Anyway, enough navel-gazing and back to Timeless Decision Theory.))))
Well… okay, but the point I was making was milder and pretty uncontroversial. Are you familiar with bayesian networks?
I never said it used method A? And what is all this about games? It predicts your choice.
You’re not engaging with the thought experiment. How about this—how would you change the thought experiment to make it work properly, in your estimation?
Well, yeah. We’re in uncertain territory as a premise.
I’m not tweaking the method. There is no given method. The closest to a canonical method that I’m aware of is simulation, which you elided in your reply.
What makes you think you’re so special—compared to the people who’ve been predicted ahead of you?
If you know nothing about the alien’s methods, there still is a better choice. You do not have the same expected value for each choice.
If you assume that you are a physical system and that the alien is capable of modeling that system under a variety of circumstances, there is no contradiction. The alien simply has a device that creates an effective enough simulation of you that it is able to reliably predict what will happen when you are presented with the problem. Causality isn’t running backwards then, it’s just that the alien’s model is close enough to reality that it can reliably predict your behavior in advance. So it’s:
(You[t0])>(Alien’s Model of You)>(Set Up Box)>(You[t1])
If the alien’s model of you is accurate enough, then it will pick out the decision you will make in advance (or at least, is likely to with an extraordinarily high probability) but that doesn’t violate causality any more than me offering to take my girlfriend out for chinese does because I predict that she will say yes. If accurate models broke causality then causality would have snuffed out of existence somewhere around the time the first brain formed, maybe earlier.
You don’t seem to understand what I’m getting at. I’ve already addressed this ineptly, but at some length. If causality does not run backwards, then the actual set of rules involved in the alien’s predictive method, the mode of input it requires from reality, its accuracy, etc, become the focus of calculation. If nothing is known about this stuff, then the problem has not been specified in sufficient detail to propose customized solutions, and we can only make general guesses as to the optimal course of action. (lol The hubris of trying to outsmart unimaginably advanced technology as though it were a crude lie detector reminds me of Artemis Fowl. The third book was awesome.) I only mentioned one ungameable system to explain why I ruled it out as being a trivial consideration in the first place. (Sorry, it isn’t Sunday. No incomprehensible ranting today, only tangents involving childrens’ literature.)
It’s more useful to view this as a problem involving source code. The alien is powerful enough to read your code, to know what you would do in any situation. This means that it’s in your own self interest to modify your source-code to one-box.
That begs the question as to whether anything analogous to “code” exists, whether anything is modifiable simply by willing it, etc. What if my mind looks like it’s going to opt for B when the alien reads me, and I change my mind by the time it’s my turn to choose? If no such thing ever happens, the problem ought to specify why that is the case, because I don’t buy the premise as it stands.
To whoever keeps downvoting my comments: The faster I get to negative infinity, the happier I’ll be, but care to explain why?
Now I am tempted to downvote your comments just to make you happy. :)
I’m tempted to downvote his comments despite it making him happy. I have no wish to reward self-described anti-social behavior but the effect of making said behavior invisible seems to make the alleged ‘reward’ of the desired downvotes may make it worthwhile on net.
Something along those lines, but anyway, how does that NOT bring this decision into the realm of calculation?
Thinking about it soberly, the framing of this problem reveals even more of a lack of serious scrutiny of its premises. A rational thinker’s first question ought to be: How is it even possible to construct a decision tree that predicts my intentions with near-perfect success before I myself am aware of them? The accuracy of such a system would depend on knowledge of human neurology, time travel, and/or who knows what else, that our civilization is nowhere near obtaining, placing the calculation of odds associated with this problem far beyond the purview of present day science. (IOW, I believe the failure to reason along lines that combine statistics with real world scientific understanding is responsible for the problem’s rather mystical overtones at first sight. Pay no attention to the man behind the curtain! And really, rare events are rare, but they do happen, and are no less real on account of their rarity.)
In any case, thanks for the response.
(Actually, I’m not even clear on the direction of causality under the predictor’s hood. Suppose the alien gazes into a crystal ball showing a probable future and notes down my choice. If so, then he can see the course of action he’d probably go with as well! If he changes that choice, does that say anything about my fidelity to the future he saw? Depends on the mechanism of his crystal ball, among many other things. Or does he scan my brain and simply simulate the chemical reactions it will undergo in the next five minutes? How accurate is the model carrying out this simulation? How predictable is the outcome via these techniques in the first place? There are such murky depths here that no matter what method one imagines, the considerations based on which he ultimately places the million dollars is of supreme importance.)
(What, total karma doesn’t reach the negatives? Why not?)
It does, but it won’t display that way. Karma of negative two will display as zero until three or more points are added.
Yes, now I have long-term goals within the community! Or will no one read what I say if it gets too low? That’d be lame, but no matter. I could always keep this account for speaking the truth, and another one for posting the stuff I want other people to see.
Apart from the sidebar at the right of top posters, I—and I suspect many others—never look at total karma accumulated just at individual posts.
It seems to me that no rationalist should accept the ‘givens’ in this scenario without a lot of evidence.
So what am I left with. Some being who hands out boxes, and 100 examples of people who open 1 box and get $1M or open both boxes and get $1k. I am unwilling to accept on faith a super-intelligent alien, so I will make the simplifying assumption that the being is in fact Penn & Teller. In which case, the question simplifies to “Am I willing to bet at 1000:1 odds that Penn & Teller aren’t able to make a box which vanishes $1M if I choose both boxes.” To which I respond, no.
No reversal causality required. No superintelligent prediction required. I simply know that I can’t beat Penn & Teller at their own game 999 times out of 1000.
“You shouldn’t find yourself distinguishing the winning choice from the reasonable choice.”
I disagree. Let’s say there’s box A with $1000 dollars in it, and box B with $10,000 in it 1% of the time, and you can only pick one. If i pick A and my friend picks B, and they get the $10,000, they might say to me that I should wish I was like them. But I’ll defend my choice as reasonable, even though it wasn’t the winning choice that time.
I believe it should be read as:
In your example, your friend picked the choice that won once. It was luck, and he’s happy, and all is well for him. However, the expected value of box B was $100, which does not win over $1000. Arguably, the gambling in itself may have nonzero utility value, and the certainty of obtaining $1000 may also have nonzero utility value, but that seems irrelevant in your example from the way it was formulated.
TL;DR: It seems like you’re disagreeing more on the formulation or wording than the actual principle.
Sorry, I can’t accept your assumption of a superintelligence, that is irrational. What can be above intelligence? Although he has sensory limitations, man’s ability to reason, think critically and rationally is without limit.
I’m confused about why this problem is different from other decision problems.
Given the problem statement, this is not an acausal situation. No physics is being disobeyed—Kramers Kronig still works, relativity still works. It’s completely reasonable that my choice could be predicted from my source code. Why isn’t this just another example of prior information being appropriately applied to a decision?
Am I dodging the question? Does EY’s new decision theory account for truly acausal situations? If I based my decision on the result of, say, a radioactive decay experiment performed after Omega left, could I still optimize?
I’ve been fiddling around with this in my head. I arrived at this argument for one-boxing: Let us suppose a Rule, that we shall call W: FAITHFULLY FOLLOW THE RULE THAT, IF FOLLOWED FAITHFULLY, WILL ALWAYS OFFER THE GREATEST CHANCE OF THE GREATEST UTILITY To prove W one boxes, let us list all logical possibilities, which we’ll call W1 W2 and W3: W1 always one-boxing W2 always two boxing, and W3 sometimes one-boxing and sometimes two boxing. Otherwise, all of these rules are identical in every way, and identical to W in every way. Imagining that we’re Omega, we’d obviously place nothing in the box of the agent which follows W2, since it knows that agent would two-box.. Since this limits the utility gained, W2 is not W. W3 is a bit trickier, but a variant of W3 which two-boxes most of the time will probably not be favoured by Omega, since this would reduce his chance of being correct in his prediction. This reduces the chance of getting the greatest utility by however much, and thus, disqualifies all close to W2 variants of W3. A perfect W1 would guarantee that the box would contain 1,000,000 dollars, since Omega would get it’s prediction wrong in not rewarding an agent who one-boxes. However, this rule GUARANTEES not getting the 1,001,000 dollars, and therefore is sub -optimal. Because of Omega’s optimization, there is no such rule in which that is the most likely option, but if there is such a rule in which this is second-most-likely, that would probably be W. In any case, W favours B over A. I was going to argue that W is more rational than a hypothetical rule Z which I think is what makes most two-boxers two-box, but maybe I’ll do that later, when I’m more sure I have time.
I hope I’m not being redundant, but… The common argument I’ve seen is that it must be backward causation if one boxing predictably comes out with more money than two boxing.
Why can’t it just be that Omega is really, really good at cognitive psychology, has a complete map of your brain, and is able to use that to predict your decision so well that the odds of Omega’s prediction being wrong are epsilon? This just seemed… well, obvious to me. But most people arguing “backward causation!” seem to be smarter than me.
The possibilities I see are either that I’m seriously missing something here, or even really smart people can’t let go of the idea that our brains are free from physical law on some level.
The entire point of Omega seems to be “Yeah, no, free will isn’t as powerful as you seem to think.” Given 100 people and access to a few megabytes of their conversations, contacts lists, facebook, TShirt collection and radio/television/web-surfing habits, you can probably make a prediction about how they’ll vote in the next election that will do better than chance. Omega is implied here to have far better models of people than targeted advertising. What success rate would it take to convince people that Omega isn’t cheating, but is just really, really clever?
Of course, Omega’s abilities aren’t really specified. Maybe it is using timetravel. But the laws of physics as we know them seem to favor “Omega understands the human brain” over “Omega can see into the future”, so if this happened in the real world, backward causation would not be my leading hypothesis.
Of course, the hypothesis “Omega cheats with some remote-controlled mechanism inside box B” is even easier than explaining an alien superintelligence with an amazing understanding of individual brains. If we could examine box B after 1boxing and 2boxing, we could probably adjust the probability on the “Omega cheats” hypothesis. I don’t know how to distinguish the backwards causation and perfect brain model hypotheses, though.
Of course, the point of the original post wasn’t “reverse engineer Omega’s methods”. The point was “Make decisions that predictably succeed, not decisions that predictably fail but are otherwise more reasonable”. Omega’s methods are relevant only if they allow us to make better decisions than we would with the given information.
Now perhaps I am misunderstanding the problem. Are we to assume that all this is foreknowledge?
Given the information present in this article I would just choose to take only B. But that is assuming that Omega is never wrong. Logic in my own mind dictates that regardless of why I chose B, or if I at some earlier point may have Two-Boxed, at this time I choose box B, and if Omega’s prediction is never wrong- then if I choose B, B will contain a million dollars.
Now in an alternate itteration of this dilemna, regardless of the truth (whiether Omega is indeed never wrong or not), if I only know of 100 observed occurences, that might have substancial influence on my reasoning. Given a failure rate of (at most) 1 out of 101, I may very well be tempted by all the prior mentioned arguments for taking boxes A and B, while I might still have a tendancy to just take box B anyway. After all, $1,000 dollars isnt life-changing for me, but I could really make use of a million.
When all is said and done it comes down to a choice of $1000, or $1,000,000 dollars. If Omega is never wrong, then there is never a possibility of taking $1,001,000. In which case, taking A and B results in $1,000 without fail, and if by choosing only B, B would never be empty. The choice seems obvious.
It’s conforting sometimes to read from someone else that rationality is not the looser’s way, and arguably more so for Prisonner’s Dilemma than Newcomb’s if your consider the current state of our planet and the tragedy of commons.
I’m writing this because I believe I suceeded writing a computer program (it is so simple I can’t call it an AI) able to actually simulate Omega in a Newcomb game. What I describe below may look like an iterated Newcomb’s problem. But I claim it is not so and will explain why.
When using my program the human player will actually be facing some high accuracy predictor and it will be true.
Obviously there is a trick. Here is how it goes. The predictor must first be calibrated. This is done in the simplest possible fashion : it just asks to the user if it would one-box or two-box. The problem achieving that is like asking to someone if she would enter burning building to save a child : nobody (except profesional firemen) would actually know before confronted to the actual event.
The program can actually do that : just don’t say to the player if it’s calibration of the predictor he is doing or the actual unique play.
Now reaching the desired prediction accuracy level is simple enough : just count the total trial runs, and the number of two-boxing or one-boxing, when one or the other goes over 99%. The program can then go for the prediction.
Obviously it must no advertise that is the real game, or it would defeats the strategy of not saying if it’s the real game or not for prediction accuracy. But any reader can check from program source code that the prediction will indeed be done before (in a temporal meaning) asking to the player if he will one box or two box.
Here goes my program, it is written using python language and hevily commented, it should not be necessary to be much of a CS litterate to undrstand it. The only trick is insertion of some randomness to avoid the player could predict the end of calibration and start of the game.
Now, why did I said this is not an Iterated Newcomb’s ?
The point is that the way it is written the program is not finite. The human player is the only one able to stop the game. And to do that he has to commit to some option one-boxing or two-boxing, thus leaving the program to reach the desired accuracy level. He also has no possibility of “uncommiting” when the real game comes as this last one is not different from the others.
You could consider that the whole point of this setting is to convince the user that the claimed accuracy of Omega is true. What is fun is that in this setting it becomes true because the human player choose it to be so.
I believe the above program prooves that One-boxing is rational, I should even say obvious, provided with the right setting.
Now, I can’t stop here. I believe in maths as a neutral tool. It means that if the reasoning leading to one-boxing is right, the reasoning leading to tow-boxing must also be false. If both reasoning were true maths would collapse;and that is not to be taken lightly.
Summarily as the two-boxing reasoning goes it is an immediate consequence of the Dominance Argument.
So what ? Dominance Argument is rock solid. It is so simple, so obvious.
Below is a quote from Ledwig’s review on Newcomb’s problem about Dominance Argument, you could say a restrictive clause of when you can of cannot apply it:
There is a subtile error in the above statement. You should replace the words causally influence by are not correlated with. Using probabilist words it means actions of both decision makers are independant variables. But the lack of correlation isn’t guaranteed by the lask of causality.
Think of a Prisonner’s like Dilemma between traders. Stock exchange is falling down for some corporate. If traders sell you get a stock market crash, if they buy it’s back to business as usual. If one sell while the other buy, only one will make big money.
Do you seriously believe that given access to the same corporate data (but not communicating between each other), both traders are not likely to make the same choice ?
In the above setting both players are not independant variables and you can’t directly apply Dominance.
Reasoning backward, you could say that your choice gives you some information on the probability of the other’s choice and as taking that information into account can change your choice, it may also change the choice of the other, you enter some inifinite recursion (but that’s not a problem, you still have tools to solve that, like fixed point theorem).
In the Newcomb’s problem, we are in an extreme case. The hypothesis states the correlation between players, that’s the Omega’s prediction accuracy.
Henceforth, two-boxing is not a rational decision based on causality, but a simple disbelief of the correlation stated in the hypothesis, and a confusion betwwen correlation and causality.
When you remove that disbelief (that’s what my program does) the problem disappears.
It seems to me that the entire discussion is confused. Many people seem to be using the claim that Omega can’t predict your actions to make claims about what actions to take in the hypothetical world where it can. Accepting the assumption that Omega can predict your actions the problem seems to be a trivial calculation of expected utility:
If the opaque box contains b1 utility, the transparent one b2 utility, omega has e1 probability of falsly predicting you’ll one box and e2 probability of falsely predicting you’ll two box the expected utilities are
1 box: (1-e2)b1 2 box: e1b1 + b2
And you should 1 box unless b2 is bigger than (1 - e2 - e1)*b1.
I choose Box B. This is because taking into account that Omega is a superintelligence with a success rate of 100% and no margin of error and is the one offering the problem to me. The only logical reason for this is an ability to predict variables that I have no current understanding of. This is either through an ability to analyze my psyche and see my tendency to trust in things with 100% success rates, the ability to foresee in time my decision, or the ability for Omega to affect things backwards in time. Omega has not provided any reasoning for its 100% success rate, so these are the three logical things that I see. If you would argue to take both in the instance of the assumption that Omega has no extraordinary powers with time, and so the decision is already made, I think this is actually the irrational stance. Reasoning from a standpoint that doesn’t consider the past facts is actually irrational. I would take Box B, because even if that assumption that he’s guessed wrong is correct, and I take both boxes and get both sets of money, then I’m really not that much better off than if I took Box B. To me, the irrational decision is to take both boxes, if the probability is as follows: if I take Box B, I presumably have a 100% probability of 1,000,000 dollars. If I take both boxes, I have a 50% chance of 1,000 dollars and 50% chance of 1,001,000 dollars. Taking both is therefore not the logical choice, as 1,001,000 dollars versus 1,000,000 dollars is not worth the 50% chance of reducing my payout to 1,000 dollars. If you would put this into perspective in Prisoner’s Dilemma in game theory, and put this decision in front of me 10 times, the outcome and my decisions become a lot clearer. Let’s say that Omega has the ability to guess wrong. If every 10 times I take both boxes, there is a 50% chance of the money being in Box B, then numerically I lose versus if I choose Box B every time, even if Omega has the ability to be wrong and therefore it’s not in there one of the times. However, one time out of ten would be the most logical error rate to assume, if any, coming from the fact that if he’s been correct 100⁄100 times, and if he would be wrong with me, then he’s been correct 100⁄101 times, in which failure rate out of 10 chances really only has the possibility of being either 0⁄10 or 1⁄10. Therefore, by taking Box B 10 times, the minimum payout I receive is 9 million. If I take both all 10 times, then the most payout I can really hope to achieve is 1 million and ten thousand. If Omega had a failure rate of even 5%, then that would definitely effect my decision, but as it stands, the only logical choice is choosing only Box B. Furthermore, if I only take Box B, and he’s wrong and it’s empty, then I believe Omega would be curious enough in its failure to reward me with the million dollars afterwards. Furthermore, the 1,000 dollars outcome is simply not enough money to me to “need it” in a way that makes it so I have to play safe.
Seems like a simple and reasonable answer to this problem is that I would take the box with the million dollars, rather than the box with the thousand dollars and the empty box. It seems the main question is, “But why?”. So here is my reasoning: Omega has shown over 99% accuracy in providing results dependent on people’s choices. Box B has 100,000% better rewards than Box A, such that if there is even 0.1% chance that taking Box A will lose those rewards, it is irrational to also take Box A. As I have seen no evidence that Omega has left, it is not even certain that my choice of actions now will have not effect on the contents of the opaque box (only a fool would be certain that just because he “saw Omega fly away” that said superintelligence is not hiding nor has left behind an observing agent). As each of these choices would cause my choice to be only Box B, it is almost certain that Omega has seen likewise and put the $1,000,000 in Box B.
I suspect the problem people seem to have with this is because they think they are outside of the game. But the game description itself says that (a very accurate model of) you is in the game, and that therefore your (modeled) choices, including second doubts and your desire not to leave that last $1,000, will (if modeled correctly) affect the contents of Box B. No, Omega is not rewarding irrationality. Omega is giving a large reward to those who trust in Omega’s judgement, and a smaller reward to those who arrogantly think they can cheat the game he set up.
I’m not sure if anyone’s noticed this, but how do you know that you’re not a simulation of yourself inside Omega? If he is superintelligent, he would compute your decision by simulating you, and you and your simulation will be indisinguishable.
This is fairly obviously a PD against said simulation—if you cooperate in PD, you should one-box.
I personally am not sure, although if I had to decide I’d probably one-box
I suspect that this is very simple. Similar to the tree in the forest problem that Eliezer wrote about, if you ask about concrete variations of this question, the right choice is obvious.
One question is what to do when the boxes are in front of you.
If it is the case that you know with 100% certainty that the contents of box B will not change, then you should two-box.
If it is the case that Omega could change the contents of the box after he presents them to you, then you should one-box.
If it is the case that your present decision impacts the past, then you should one-box, because by one-boxing, you’d change your past mind-state, which would change the decision of Omega. However, I don’t think that physics works like this. I’m assuming that there is a point in time where what you thought in the past is what you thought in the past, and that those thoughts are what Omega based his decision on, and what you think and decide after Omega made his decision isn’t influencing your past mind-states, and thus isn’t influencing the decision that Omega made. But this is really a question about physics though, not decision theory. When you ask the question with the condition that physics works a certain way, the decision theory part is easy.
Another question is what to do before Omega makes his decision.
It seems plausible that Omega could read your mind. So then, you should try to make Omega think that you will one-box. If you’re capable of doing this and it works, then great! If not, you didn’t lose anything by trying, and you gave yourself the chance of possibly suceeding.
That doesn’t follow. The contents of B don’t change in the sense that someone looking at the box ahead of time with X-Ray vision would see the same thinhg, but the contents “change” in the sense that your decision is prediucted by Omega so different choices result in different box contents. It would be a mistake to think of the contents of the boxes as something that can be held constant while only your choice varies.
(In fact, if Omega can predict your choice, you really aren’t able to choose at all.)
I think my third bullet point addresses your comment. You seem to be saying that by choosing to two-box, your influencing the past in such a way that’ll make Omega one-box. I’m saying that there are two possibilities:
1) your choice impacts the past
2) your choice doesn’t impact the past.
If 1) is true, then you should one-box. If 2) is true, then you should two box. I honestly don’t have too strong an opinion regarding whether 1) or whether 2) is the way the world works. But I think that whether 1) or 2) is true is a question of physics, rather than a question of decision theory.
You seem to be confusing the effect with the cause; whether you will choose to one-box or two-box depends on your prior state of mind (personality/knowledge of various decision theories/mood/etc), and it is that prior state of mind which also determines where Omega leaves its money.
The choice doesn’t “influence the past” at all; rather, your brain influences both your and Omega’s future choices.
Consider this sequence of events: you had your prior mind-state, then Omega made his choice, and then you make your choice. You seem to be saying that your choice is already made up from your prior mind-state, and there is no decision to be made after Omega presents you with the situation. This is a possibility.
I’m saying that another possibility is that you do have a choice at that point. And if you have a choice, there are two subsequent options: this choice you make will impact the past, or it won’t. If it does, then you should one-box. But if it doesn’t impact the past (and if you indeed can be making a choice at this point), then you should two-box.
Just saw this in the comment box, so I don’t know the context, but isn’t this based on the confused notion of “free will” employed by … amateur theologians mostly, I think?
For example—and please, tell me if I’m barking up the wrong tree entirely, it’s quite possible—let’s get rid of Omega and replace him with, say, Hannibal Lector.
He has gotten to know you quite well, and has specific knowledge of how you behave in situations like this after you’ve considered the fact that you know he knows you know he knows etc etc.
Is it rational to two-box in this situation, because you have “free will” and thus there’s no way he could know what you’re going to do without a time machine?
I very well might be wrong about how reality works. I’m just saying that if it happens to work in the way I describe, the decision would be obvious. And furthermore, if you specify the way in which reality works, the decision in this situation is always obvious. The debate seems to be more about the way reality works.
Regarding the Hannibal Lector situaiton you propose, I don’t understand it well enough to say, but I think I address all the variations of this question above.
My point is that humans are eminently nonrandom; to the extent that a smart human-level intelligence could probably fill in for Omega.
I think there’s an article here somewhere about how free will and determinism are compatible … I’ll look around for it now...
EDIT:
If Omega is smart enough, the only way to make it think you will one-box is by being the sort of agent that one-boxes in this situation; regardless of why. So you should one-box because you know that, because that means you’re the sort of agent that one-boxes if they know that. That’s the standard LW position, anyway.
(Free will stuff forthcoming.)
I keep saying that if you specify the physics/reality, the decision to make is obvious. People keep replying by basically saying, “but physics/reality works this way, so this is the answer”. And then I keep replying, “maybe you’re right. I don’t know how it works. all I know is the argument is over physics/reality.”
Do you agree with this? If not, where do you disagree.
Their point (which may or may not be based on a misunderstanding of what you’re talking about) is that one of your options (“free will”) does not correspond to a possible set of the laws of physics—it’s self-contradictory.
I think this is the relevant page. Key quote:
And if you are smart enough, you should decide what to do by trying to predict what Omega would do. Omega’s attempt to predict your actions may end up becoming undecideable if you’re really smart enough that you can predict Omega.
Or to put it another way, the stipulation that Omega can predict your actions limits how smart you can be and what strategies you can use.
Well, I guess that’s true—presumably the reason the less-intuitive “Omega” is used in the official version. Omega is, by definition, smarter than you—regardless of how smart you personally are.
This is true, but generally the question “what should you do” means “what is the optimal thing to do”. It’s odd to have a problem that stipulates that you cannot find the optimal thing to do and asks what is the next most optimum thing you should do instead.
Not exactly; just because Omega knows what you will do beforehand with 1-epsilon certainty doesn’t mean you don’t have a choice, just that you will do what you’ll choose to do.
You still make your decision, and just like every other decision you’ve ever made in your life it would be based on your goals values intuitions biases emotions and memories. The only difference is that someone else has already taken all of those things into account and made a projection beforehand. The decision is still real, and you’re still the one who makes it, it’s just that Omega has a faster clock rate and could figure out what that decision would likely be beforehand using the same initial conditions and laws of physics.
I think I agree with your description of how choice works. Regarding the decision you should make, I can’t think of anything to say that I didn’t say before. If the question specifies how reality/physics works, the decision is obvious.
Is it also your position that I have any way of knowing whether my choice is already made up from my prior mind-state, or not?
I don’t know whether you’ll have any way of knowing if your choice was made up already. I wish I knew more physics and had a better opinion on the way reality works, but with my understanding, I can’t say.
My approach is to say, “If reality works this way”, then you should do this. If it works that way, then you should do that.”
Regarding your question, I’m not sure that it matters. If ‘yes’, then you don’t have a decision to make. If ‘no’, then I think it depends on the stuff I talked about in above comments.
If your choice is not made up from your prior mind state, then Omega would not be able to predict your actions from it. However, it is a premise of the scenario that he can. Therefore your choice is made up from your prior mind state.
Not necessarily. We don’t know how Omega makes his predictions.
But regardless, I think my fundamental point still stands: the debate is over physics/reality, not decision theory. If the question specified how physics/reality works, the decision theory part would be easy.
Indeed- to make it more clear, consider a prior mind state that says “when presented with this, I’ll flip a coin to decide (or look at some other random variable).” In this situation, Omega can, at best, predict your choice with 50⁄50 odds. Whether Omega is even a coherent idea depends a great deal on your model of choices.
If given prior mind-state S1 and a blue room I choose A, and given S1 and a pink room I choose B, S1 does not determine whether I choose A or B, but Omega (knowing S1 and the color of the room in which I’ll be offered the choice) can predict whether I choose A or B.
Thinking about this in terms of AGI, would it be reasonable to suggest that a bias must be created in favor of utilizing inductive reasoning through Bayes’ Theorem rather than deductive reasoning when and if the two conflict?
Maybe I’m missing something (I’m new to Bayes), but I honestly don’t see how any of this is actually a problem. I may just be repeating Yudkowsky’s point, but… Omega is a superintelligence, who is right in every known prediction. This means, essentially, that he looks at you and decides what you’ll do, and he’s right 100 out of 100 times. So far, a perfect rate. He’s probably not going to mess up on you. If you’re not trying to look at this with CDT, the answer is obvious: take box B. Omega knows you’ll do that and you’ll get the million. It’s not about the result changing after the boxes are put down, it’s about predictions about a person.
This should not be taken as an authoritative response. I’m answering as much to get my own understanding checked, as to answer your question:
Omega doesn’t exist. How we respond to the specific case of Omega setting up boxes is pretty irrelevant. The question we actually care about is what general principle we can use to decide Newcomb’s problem, and other decision-theoretically-analogous problems. It’s one thing to say that one-boxing is the correct choice; it is another thing to formulate a coherent principle which outputs that choice in this case, without deranged behavior in some other case.
If we’re looking at the problem without CDT, we want to figure out and formalize what we are looking at the problem with.
Ahh. Thank you, that actually solved my confusion. I was thinking about solving the problem, not how to solve the problem. I shall have to look through my responses to other thought experiments now.
Oddly, this problem seems (to my philosopher/engineer mind) to have an exceedingly non-complex solution, and it depends not upon the chooser but upon Omega.
Here’s the payout schema assumed by the two-boxer, for reference: 1) Both boxes predicted, both boxes picked: +$1,000 2) Both boxes predicted, only B picked: $0 3) Only B predicted, both boxes picked: +$1,001,000 4) Only B predicted, only B picked: +$1,000,000
Omega, being an unknowable superintelligence, qualifies as a force of nature from our current level of human understanding. Since Omega’s ways are inscrutable, we can only evaluate Omega based upon what we know of him so far: he’s 100 for 100 on predicting the predilections of people. While I’d prefer to have a much larger success base before drawing inference, it seems that we can establish a defeasible Law of Omega: whatever decision Omega has predicted is virtually certain to be correct.
So while the two-boxer would hold that choosing both boxes would give them either $1,000 or $1,001,000, this is clearly IRRATIONAL: the (defeasible) Law of Omega outright eliminates outcomes 2 and 3 above, which means that (until such time as new data forces a revision of the Law of Omega) the two-boxer’s anticipated payoff of $1,001,000 DOES NOT EXIST. The only choice is between outcome 1 (two-boxer gets $1,000) and outcome 4 (one-boxer gets $1,000,000). At that point, option 4 is the dominant strategy… AND the rational thing to do.
Does that makes sense? Or am I placing unfounded faith in Omega?
If you look through the many subsequent discussions of this, you’ll see that indeed $1,001,000 is not in the outcome domain, but the classical CDT is unable to enumerate this domain correctly.
As I understand it, most types of decision theory (including game theory) assume that all agents have about the same intelligence and that this intelligence is effectively infinite (or at least large enough so everyone has a complete understanding of the mathematical implications of the relevant utility functions).
In Newcomb’s problem, one of the players is explicitly defined as vastly more intelligent than the other.
In any situation where someone might be really good at predicting your thought processes, its best to add some randomness to your actions. Therefore, my strategy would be to use a quantum random number generator to choose just box B with 51% probability. I should be able to win an average of $1000490.
If there isn’t a problem with this argument and if it hasn’t been thought of before, I’ll call it “variable intelligence decision theory” or maybe “practical decision theory”.
Dustin Soodak
Some variants of the Newcomb problem specify that if Omega isn’t sure what you will do he will assume you’re going to two-box.
(And if Omega is really that smart he will leave box A in a quantum superposition entangled with that of your RNG. :-))
I think generally there’s an addendum to the problem where if Omega sees you using a quantum randomness generator, Omega will put nothing in box B, specifically to prevent this kind of solution. :P
Also, how did you reach your $1000490 figure? If Omega just simulates you once, your payoff is: 0.51 (0.51 (1000000) + 0.49 (1001000)) + 0.49 (0.51 0 + 0.49 (1000)) = $510490 < $1000000, so you’re better off one-boxing unless Omega simulates you multiple times.
I figured that if Omega is required to try its best to predict you and you are permitted to do something that is physically random in your decision making process, then it will probably be able to work out that I am going to choose just one box with slightly more probability than choosing 2. Therefore, it will gain the most status on average (it MUST be after status since it obviously has no interest in money) by guessing that I will go with one box.
.51 1000000 + .49 1001000 = 1000490
Didn’t realize anyone watched the older threads so wasn’t expecting such a fast response...
I’ve already heard about the version where “intelligent alien” is replaced with “psychic” or “predictor”, but not the “human is required to be deterministic” or quantum version (which I’m pretty sure would require the ability to measure the complete waveform of something without affecting it). I didn’t think of the “halting problem” objection, though I’m pretty sure its already expected to do things even more difficult to get such a good success rate with something as complicated as a human CNS (does it just passively observe the player for a few days preceding the event or is it allowed to do a complete brain scan?).
I still think my solution will work in any realistic case (where the alien isn’t magical, and doesn’t require your thought processes to be both deterministic and computable while not placing any such limits on itself).
What I find particularly interesting, however, is that such a troublesome example explicitly states that the agents have vastly unequal intelligence, while most examples seem to assume “perfectly rational” agents (which seems to be interpreted as being intelligent and rational enough so that further increases in intelligence and rationality will make no difference). Are there any other examples where causal decision theory fails which don’t involve non-equal agents? If not, I wonder if you could construct a proof that it DEPENDS on this as an axiom.
Has anyone tried adding “relative ability of one agent to predict another agent” as a parameter in decision theory examples? It seems like this might be applicable in the prisoner’s dilemma as well. For example, a simple tit-for-tat bot modified so it doesn’t defect unless it has received 2 negative feedbacks in a row might do reasonably well against other bots but would do badly against a human player as soon as they figured out how it worked.
You are fighting the hypothetical. It is a common pitfall when faced with a counterintuitive issue like that. Don’t do it, unless you can prove a contradiction in the problem statement. Omega is defined as a perfect predictor of your actions no matter what you do. That includes any quantum tricks. Also see the recent introduction to Newcomblike problems for a detailed analysis.
How does my objection fit into this: that it may not be possible for Omega to predict you in principle, since such an Omega would have to be able to solve the halting problem?
Here is my answer to Newcomb’s problem:
Omega doesn’t exist in reality. Therefore Newcomb’s problem is irrelevant and I don’t waste time thinking about it.
I wonder how many people come up with this answer. Most of them are probably smarter than me and also don’t waste time commenting their opinion.
Am I missing something?
I’ve come up with a related answer with the past, but I don’t think that defense is the best angle to take anymore when it comes to Newcomb’s.
It helps to be very specific with why you’re rejecting a thought experiment. The statement “Omega doesn’t exist in reality” needs to be traced to the axioms that give you an impossibility proof. This both allows you to update your conclusion as soon as those axioms come into question and generalize from those axioms to other situations.
For example, the ‘frailty’ approach to Newcomb’s is to say “given that 1) my prior probability of insanity is higher than my prior probability of Omega and 2) any evidence for Omega’s supernatural ability is at least as strong evidence for my insanity, I can’t reach a state where I think that it’s more likely that Omega has supernatural powers than that I’m insane.” This generalizes to, say, claims from con men; you might think that any evidence they present for their claims is also evidence for their untrustworthiness, and reach a point where you literally can’t believe them. (Is this a good state to be in?) But it’s not clear that 2 is true, and even if the conclusion follows through, it helps to have a decision theory for what to do when you think you’re insane!
Another approach to Newcomb’s problem is to get very specific about what we mean by ‘causality,’ because Newcomb’s is a situation where we have a strong verbal argument that causality shouldn’t exist and a strong verbal argument that causality should exist. In order to resolve the argument, we need to figure out what causality means mathematically, and then we can generalize much more broadly, and the time spent formalizing causality is not at all wasted.
Thanks for your reply. I didn’t expect to get so much feedback.
I tend to assume that I am not insane. Maybe I am overconfident in that regard :-)
I would call my approach to Newcomb’s problem an example of rational ignorance. I think the cost of thinking about this problem (my time) is higher than the possible benefit I could get out of it.
Depends. Do you generally think that thought experiments involving fictional/nonexistent entities are irrelevant (to what?) and not worth thinking about? Or is there something special about Newcomb’s problem?
If the former, yes, I think you’re missing something. If the latter, then you might not be missing anything.
Thanks for this answer.
I think it’s only Newcomb’s problem in particular. I just can’t imagine how 1) knowing the right answer to this problem or 2) thinking about it can improve my life or that of any other person in any way.
I was reading quite recently, but I can’t remember where (LessWrong itself?) (ETA: yes, here and on So8res’ blog), someone saying Newcomb-like problems are the rule in social interactions. Every time you deal with someone who is trying to predict what you are going to do and might be better at it than you, you have a Newcomb-like problem. If you just make what seems to you like the obviously better decision, the other person may have anticipated that and made that choice appear deceptively better for you.
“Hey, check out this great offer I received! Of course, these things are scams, but I just can’t see how this one could be bad!”
“Dude, you’re wondering whether you should do exactly what a con artist has asked you to do?”
Now and then some less technically-minded friend will ask my opinion about a piece of dodgy email they received. My answer always begins, “IT’S A SCAM. IT’S ALWAYS A SCAM.”
Newcomb’s Problem reduces the situation to its bare essentials. A decision theory that two-boxes may not be much use for an AGI, or for a person.
(nods)
And how would you characterize Newcomb’s problem?
For example, I would characterize it as raising questions about how to behave in situations where our own behaviors can reliably (though imperfectly) be predicted by another agent.
Imagine a different set of players. For example, some software which is capable of modifying its own code (that’s nothing out of the ordinary, such things exist) and a programmer capable of examining that code.
Yes, you’re missing something. You’re fighting the hypothetical.
Some hypotheticals are worth fighting. What’s the right accounting policy if 1=2? If 1=2, you have bigger problems.
Not the one in question, though, since Omega can be approximated—and typically is, even if only as a (50+x)% correct predictor. Humans are an approximation of Omega, in some sense. Solving a problem assuming a hypothetical Omega is not unlike assuming cows are spheres in a vacuum, i.e. a solution of the idealized thought experiment can still be relevant.
The way I see it, causal decision theory simply ignores a part of the problem: that the Predictor is able to “predict”.
Evidence should get inside the equation, but not the same way as evidential decision theory: the evidence is what should fuel the hypothesis “The Predictor predicts our choices”.
It does not matter if we “think” that our “choice” shouldn’t change what’s inside the boxes—as the main thing about a prediction is that we aren’t actually making any “choice”, that “choice” is already predicted. It’s the whole “free will” illusion all over again, that we think our choices are ours, when the presence of such a Predictor would simply invalidate that hypothesis.
Causal decision theory should still work, but not with a reasoning that forgets about the Predictor. Since the Predictor is gone, our choice shouldn’t (and won’t) affect what’s in the boxes—but as our choice was predicted, accurately, and as we have supposedly enough evidence to infer this prediction, we should one box—and this won’t be a “choice”, it will simply have been predicted, and we’ll get the money.
I’m probably not being clear, and will try to say it another way. “Choosing” to one box will simply mean that the Predictor had predicted that choice. “Choosing” to two box will also mean the same. It’s not a “choice” at all—our behavior will simply be deterministic. Therefore we should one box, even though that is not a real “choice”.
The features of the Predictor should appear in causal decision theory.
It seems like the ‘rational’ two boxers are falling prey to the concept of belief in belief. They think that because they believe that they are people who would choose both boxes, than it doesn’t matter what they choose, box B is already empty so they may as well take both. If you have all the information (except for what is in box B), than choosing both is the irrational option and the ‘rational’ people are rationalizing. You’ve just seen someone (or something) materialize two boxes from thin air, tell you they know which option you’ll choose (and have evidence that they’ve been wrong yet), and leave. That person (or thing) has two pieces of information you don’t: what’s in box b and which option will be chosen. If you ignore the evidence provided in favor of the belief that you know yourself better than reality, and then call it being rational, I don’t know what to tell you.
Now let’s say you don’t know everything. A regular person comes up and tells you one box has 1k and one has 1000k, and you can either take A and B, or just B and there is a high chance that taking A and B will result in B being empty while taking just B will result in B having the 1000k, the person offering you the boxes has l, essentially, zero credibility, you may not even believe either box has money. It doesn’t matter to you whether that person knows already what you’ll pick. You don’t know they know, and it doesn’t matter if they do. The question becomes do you run away from this crazy and possibly dangerous person, do you beleven them and take both, or do you believe them and take B? Rationally speaking, you don’t lose anything by taking any of those options except for the opportunity to learn what following the other options would entail. It becomes a question of will you regret taking both and getting only 1k, oR taking only b and losing the possibility of 1k (or running away, and regretting not calling the cops re:dangerous lunatic later).
I had better phraseology and order in my head half an hour ago, but I’m typing this up on my phone and I’m losing track of my points, so I’ll leave things as they are.
This reminds me of these great new US Army ads: https://youtu.be/jz3e2_CyOi8
It feels like decision theory is subject to the halting problem. Sketching some rough thoughts.
Consider your particular decision theory as a black box function or set of rules F which take the description of a situation P and outputs yes or no and one of those answers wins, the other loses.
F(P)
You want a decision theory, some set of rules to follow F which wins in all situation.
But for all F it’s possible to construct a situation P “The winning situation is !F(P)”, feeding F into itself. (or a simplified equivalent)
No matter what set of rules you include in your decision theory it cannot win in all cases. Ever.
That doesn’t have anything to do with the halting problem, it looks like a close relative of the Barber paradox.
It has something to do with the halting problem. The usual way of demonstrating that no program can solve the halting problem is to suppose you’ve got one that does and use it to carry out a construction a bit like the one HungryHobo is gesturing towards, where F arranges to halt iff the halting-tester says it doesn’t.
It’s the same pattern as the simple proof of the halting problem. Feeding your program into itself as part of the parameters replacing an infinite loop with “lose” and halt with “win”.
The barber paradox is just a simple, “sets of all sets which do not contain themselves” thing which has nothing to do with what I wrote.
My point was that your set of rules are equivalent to a program which you follow to try to reach the “winning” outcome hence it’s pretty easy to see that no matter what rules you chose for your version of decision theory it’s simple to construct a scenario where your rules cannot provide the “winning” answer.
Hm, maybe not the barber. I was thinking of how and when you define what is a “win”.
Let’s do a toy example where P is a limited discrete set, say { door1, door2, door3 }. If we know what the doors lead to, and we know what a “win” is, we can make the rules be a simple lookup table. It works perfectly fine.
You can break it in two ways. One way is to redefine a “win” (whatever you pick for door1 we declare to be !win). Another is to change the set P.
Say, we add door4 to the set. The lookup table says “I don’t know” and that is, actually, a viable answer. If you want to disallow that, we have to move into the realm of models and generalizations. And in that realm asking that your function F(P) produces the optimum (“win”) for whatever P could be is, I think, too much to ask for. It can work for mathematical abstractions, but if we are talking about a decision theory that is applicable to the real world, sorry, I don’t think “optimal all the time, no exceptions” is a realistic goal or criterion.
The issue is, basically, what you allow to be in set P. If it’s sufficiently restricted, F(P) can guarantee wins, it is is not, it can not.
I agree with you that “optimal all the time, no exceptions” is not a realistic goal or criterion.
Indeed I believe it’s provably impossible even without needing to add the fuzziness and confusion of real life into the mix. Even if we limit ourselves to simple bounded systems.
Which kind of puts a hole in EY’s thesis that it should be possible to have a decision theory which always wins.
Eliezer has conceded that it is impossible in principle to have a decision theory which always wins. He says he wants one that will always win except when an adversary is deliberately making it lose. In other words, he hopes that your scenario is sufficiently complicated that it wouldn’t happen in reality unless someone arranges things to cause the decision theory to lose.
If the “simple bounded systems” are, basically, enumerable and the definition of “win” is fixed, F(P) can be a simple lookup table which does always win.
It’s the same thing as saying that given a dataset I can always construct a model with zero error for members of this dataset. That does not mean that the model will perform well on out-of-sample data.
I am also not sure to which degree EY intended this statement to be a “hard”, literal claim.
I two-box.
Three days later, “Omega” appears in the sky and makes an announcement. “Greeting earthlings. I am sorry to say that I have lied to you. I am actually Alpha, a galactic superintelligence who hates that Omega asshole. I came to predict your species’ reaction to my arch-nemesis Omega and I must say that I am disappointed. So many of you chose the obviously-irrational single-box strategy that I must decree your species unworthy of this universe. Goodbye.”
Giant laser beam then obliterates earth. I die wishing I’d done more to warn the world of this highly-improbable threat.
TLDR: I don’t buy this post’s argument that I should become the type of agent that sees one-boxing on Newcomb-like problems as rational. It is trivial to construct any number of no-less plausible scenarios where a superintelligence descends from the heavens and puts a few thousand people through Newcomb’s problem before suddenly annihilating those who one-box. The presented argument for becoming the type of agent that Omega predicts will one-box can be equally used to argue for becoming the type of agent that Alpha predicts will two-box. Why then should it sway me in either direction?
I would play lotto: if I win more than 10M$, I take the black box and leave. Otherwise I’d look in the black box: if it is full, I also take the small one. If not, I leave with just the empty black box. As this should be inconsistent, assuming a time traveling Omega, it would either make him not choose me for his experiment or let me win for sure (assuming time works in similar ways as in HPMOR). If I get nothing, it would prove the Omega wrong (and tell me quite a bit about how the Omega (and time) works). If his prediction was correct though, I win 11.000.000$, which is way better than either ‘standard’ variant.
While that sounds clever at first glance:
We’re not actually assuming a time-traveling Omega.
Even if we were, he would just not choose you for the game. You’d get $0, which is worse than causal decision theory.
I’d change that to 95%, because if B contains a 100% deflector, A adds nothing and there’s no dilemma.
without limit or upper bound: link is 404 page not found.
There’s an archived copy here.
Thanks. I bookmarked http://archive.fo/ for these kinds of things.
Do write the PhD thesis and get the PhD whose lack makes you complain a bit too often)))
On a more serious note—same thing as Musashi says is all too often said about chess (always think how to make a checkmate). And in both cases it seems to be a heuristics at best. We do not have the chess programming the best chess-playing computers have (nor the fencing one). And we do seem to be able to think about next steps better than the steps after them. So it seems plausible that sometimes we are to forget the enemy king/body and defend our own, for we, being imperfect, will lose ours otherwise well before getting to the enemy.
This post should be updated to link to Functional Decision Theory, now that it has been written up.
I think a major determinant of the choice here depends on whether or not the “chooser” knows about the previous results. If you know that in the previous scenarios, people who choose only one box win, then by all means, choosing only one box is the rational decision.
If you don’t have this prior information, then choosing both boxes seems more rational.
slighly modified version:
Instead of chosing at once whether you want to take one box or both boxes, you first take box 1 (and see whether it includes 0$ or 1.000.000$), and then, you decide whether you want to also take box 2.
Assume that you only care about the money, you don’t care about doing the opposite of what Omega predicted.
I can’t claim to be particularly versed in the debates about Newcomb’s paradox, so I might be wrong here, but it seems to me like you got Joyce’s argument precisely backwards. His entire point seems to be that Rachel and Irene are in fact not facing the same options.
Irene has the options
One-box and most likely leave with $1.000.000, but possibly leave empty empty-handed
Two-box and most likely leave with $1.001.000, but possibly leave with $1.000.
Rachel has the options
One-box and most likely leave empty-handed, but possibly leave with $1.000.000
Two-box and most likely leave with $1.000, but possibly leave with $1.001.000.
From Rachels perspective, the two statements “Irene’s options are enviable”, and “Irene should have chosen option ii” don’t seem to contradict each other. They seem like the logical equivalent of envying the hand of your poker opponent, while simultaneously insisting that you played your inferior hand better (even though your opponent did in fact end up winning).
This is the way I think about it:
Given how good Omega is at predicting people’s decisions, I should assume that a world where I choose to take both boxes cannot coincide with a world where Omega predicted I would only take one box. In other words, the payoff matrix that creates this paradox in the first place is an illusion, because the scenario in which you two-box and get $1,001,000 simply doesn’t exist. Or at the very least, it is so unlikely to exist that you should behave as though it doesn’t.
Apologies if this argument has been made before—I’ve had a quick scan through the comments and can’t see it so here goes: The rational choice is to one-box. The two-boxers are throwing away a critical piece of evidence: in 100 cases out of 100 so far, one-boxing is the right strategy. Therefore, based upon the observable evidence, there’s a less than 1% chance of two-boxing being the correct strategy. It’s irrational to argue that you should two-box. This argument maps on to the real world. In the real world you are never certain about the mechanism behind the outcomes of your choices, you don’t know what the real probabilities are, and the sample size of evidence you have is too small to make a judgement. To make wise decisions you have to be humble about what you know. To do other is irrational
I’m a big fan of the work of John Vervaeke, particularly the role of Relevance Realisation in helping (and hindering) us make good decisions. In this case, the Prescient alien is just a distraction from the salient facts, which are, in 100 trials, 100% of the time, the best choice is to take the opaque box.
In fact. Let’s simplify the thought experiment:
I show you a coin. I tell you that it’s a normal coin. I toss it 100 times. Every time it lands heads. The next time I toss it, what is the chance of it landing tails?
For those of you who said 50%, let me phrase the question another way: Given I have tossed a coin 100 times and it’s landed heads every time, what is the probability that the coin is unbiased?