What’s going on is that Eliezer Yudkowsky has argued forcefully for one-boxing, in terms of his “way of winning” thing, which, after reading the other stuff he wrote about that (like the “nameless virtue”), probably created a “why aren’t you winning” alarm bell in people’s heads.
Most philosophers haven’t been introduced to the problem by Eliezer Yudkowsky.
To me, Newcomb’s problem seemed like a contrived trick to punish CDT, and it seemed that any other decision theory was just as likely to run into some other strange scenario to punish it, until I started thinking about AIs that could simulate you accurately, something else that differentiates LessWrong from professional philosophers.
When I realized the only criteria by which a “best decision theory” could be crowned was winning in as many realistic scenarios as possible, and stopped caring that “acausal control” sounded like an oxymoron, and that there could potentially be Newcomblike problems to face in real life, and that there were decision theories that could win on Newcomb’s problem without bungling the smoker’s lesion problem, and read this:
What if your daughter had a 90% fatal disease, and box A contained a serum with a 20% chance of curing her, and box B might contain a serum with a 95% chance of curing her?
About atheists vs theists and undergrads vs philosophers, I think two-boxing is a position that preys on your self-image as a rationalist. It feels like you are getting punished for being rational, like you are losing not because of your choice, but because of who you are (I would say your choice is embedded in who you are, so there is no difference). One-boxing feels like magical thinking. Atheists and philosophers have stronger self-images as rationalists. Most haven’t grokked this:
How can you improve your conception of rationality? Not by saying to yourself, “It is my duty to be rational.” By this you only enshrine your mistaken conception. Perhaps your conception of rationality is that it is rational to believe the words of the Great Teacher, and the Great Teacher says, “The sky is green,” and you look up at the sky and see blue. If you think: “It may look like the sky is blue, but rationality is to believe the words of the Great Teacher,” you lose a chance to discover your mistake.
Will’s link has an Asimov quote that supports the “self-image vs right answer” idea, at least for Asimov:
I would, without hesitation, take both boxes . . . I am myself a determinist, but it is perfectly clear to me that any human being worthy of being considered a human being (including most certainly myself) would prefer free will, if such a thing could exist. . . Now, then, suppose you take both boxes and it turns out (as it almost certainly will) that God has foreseen this and placed nothing in the second box. You will then, at least, have expressed your willingness to gamble on his nonomniscience and on your own free will and will have willingly given up a million dollars for the sake of that willingness-itself a snap of the finger in the face of the Almighty and a vote, however futile, for free will. . . And, of course, if God has muffed and left a million dollars in the box, then not only will you have gained that million, but far more imponant you will have demonstrated God’s nonomniscience.9
But losing the million dollars also shoves in your face your ultimate predictability.
Voluntarily taking a loss in order to insult yourself doesn’t seem rational to me.
Plus, that’s not a form of free will I even care about. I like that my insides obey laws. I’m not fond of the massive privacy violation, but that’d be there or not regardless of my choice.
Two boxing is relying on your own power to think and act. One boxing is basically knuckling under to the supposed power of the predictor. I will trust that I am helpless before him. He is an agent and I am his object.
Theists one box and atheists two box. Not surprising.
The twist here at LW is that lots of people seem terribly enamored of these recursive ill posed problems, and think they have solutions to them. That’s where the one boxing comes from here, IMO.
I’d one box because there’s no way I’d risk losing a million dollars to get an extra thousand based on arguments about a problem which bores me so much I have trouble paying attention to it.
Best analysis of Newcomb’s Paradox I’ve seen so far—boring. There’s nothing to see here. It all comes down to how you model the situation and what your priors are.
I find it hard to imagine a situation where I have more belief in the Predictor’s ability than the ability of the Predictor to give false evidence that I can’t figure out the trick of.
I’d two box because I see no reason to risk of losing anything. In the face of perceived trickery, I’m all the more betting on causality.
Adding to your story, it’s not just Eliezer Yudkowsky’s introduction to Newcomb’s problem. It’s the entire Bayesian / Less Wrong mindset. Here, Eliezer wrote:
That was when I discovered that I was of the type called ‘Bayesian’. As far as I can tell, I was born that way.
I felt something similar when I was reading through the sequences. Everything “clicked” for me—it just made sense. I couldn’t imagine thinking another way.
Same with Newcomb’s problem. I wasn’t introduced to it by Eliezer, but I still thought one-boxing was obvious; it works.
Many Less Wrongers that have stuck around probably have had a similar experience; the Bayesian standpoint seems intuitive. Eliezer’s support certainly helps to propagate one-boxing, but LessWrongers seem to be a self-selecting group.
It also helps that most Bayesian decision algorithms actually take on the arg max_a U(a)*P(a) reasoning of Evidential Decision Theory, which means that whenever you invoke your self-image as a capital-B Bayesian you are semi-consciously invoking Evidential Decision Theory, which does actually get the right answer, even if it messes up on other problems.
(Commenting because I got here while looking for citations for my WIP post about another way to handle Newcomb-like problems.)
It may well be the strength of argument. It could also be the lead of a very influential/respected figure and the power of groupthink. In my experience, 2 forums with similar mission statements (‘political debating sites’, say, or ‘atheist sites’) often end up having distinct positions on all sorts of things that most of their posters converge around. The same is true of any group, although if ‘theists’ was a genuine survey of theists the world over it’s at least a far more representative group.
It would be very interesting to add a control group in some way: confront someone with the issue who was of a typical LessWrong demographic but hadn’t read anything about Newcomb on Less Wrong for instance.
If it’s not just a quality of this sort of group-think, my best guess is that it’s to do with the greater practical focus (or at least theoretical belief in practical focus!) on LessWrong. I suspect most people automatically parse this sort of philosophical question as ‘what is more abstractly logical’ whereas people on here probably parse it more as ‘what should I do to win’. But I think these sort of ‘our inherent group qualities’ explanations are almost always locatable but often unnecessary in light of group-think.
To me, Newcomb’s problem seemed like a contrived trick to punish CDT, and it seemed that any other decision theory was just as likely to run into some other strange scenario to punish it,
David Wolpert of the “No Free Lunch Theorem” was one of my favorite researchers back in the 90s. If I remember it right, part of the No Free Lunch Theorem for generalizers was that for any world where your generalizer worked, there would be another world where it didn’t. The issue was the fit of your generalizer to the universe you were in.
Has anyone actually wrote out the bayesian updating for Newcomb? It should take quite a lot of evidence for me to give up on causality as is.
As it turns out, looking at the Newcomb’s Paradox wikipedia page, Wolpert was on the job for this problem, pointing out ” It is straightforward to prove that the two strategies for which boxes to choose make mutually inconsistent assumptions for the underlying Bayes net”. Yes, that’s about my feeling. A hypothetical is constructed which contradicts something for which we have great evidence. Choosing to overturn old conclusions on the basis of new evidence is a matter of the probabilities you’ve assigned to the different and contradictory theories.
Really nothing to see here. Hypothesizing strong evidence that contradicts something you’ve assigned high probability to naturally feels confusing. Of course it does.
My past self is not the cause of my future choices, it is one of many distal causes for my future choices. Similarly, it is not the cause of Omega’s prediction. The direct cause of my future choice is my future self and his future situation, where Omega is going to rig the future situation so that my future self is screwed if he makes the usual causal analysis.
Predictable is fine. People predict my behavior all the time, and in general, it’s a good thing for both of us.
As far as Omega goes, I object to his toying with inferior beings.
We could probably rig up something to the same effect with dogs, using their biases and limitations against them so that we can predict their choices, and arrange it so that if they did the normally right thing to do, they always get screwed. I think that would be a rather malicious and sadistic thing to do to a dog, as I consider the same done to me.
As far as this “paradox” goes, I object to the smuggled recursion, which is just another game of “everything I say is a lie”. I similarly object to other “super rationality” ploys. I also object to the lack of explicit bayesian update analysis. Talky talky is what keeps a paradox going. Serious analysis makes one’s assumptions explicit.
It’s also worth mentioning that in the initial wording of the problem, unlike Eliezer’s wording, it was just stated that the predictor is “almost certain” to have predicted correctly about which boxes you are going to take, and it was also specifically stated that there is no reverse causality going on (that what you actually decide to do now has no effect on what is in the boxes.)
Newcomb’s problem seemed like a contrived trick to punish CDT, and it seemed that any other decision theory was just as likely to run into some other strange scenario to punish it, until I started thinking about AIs that could simulate you accurately
Are you implying there exist decision theories that are are less likely to run into strange scenarios that punish it? I would think that Omegas could choose to give any agent prejudicial treatment.
We have to determine what counts as “unfair”. Newcomb’s problem looks unfair because your decision seems to change the past. I have seen another Newcomb-like problem that was (I believe) genuinely unfair, because depending on their decision theory, the agents were not in the same epistemic state.
Here what I think is a “fair” problem. It’s when
the initial epistemic state of the agent is independent of its source code;
given the decisions of the agent, the end result is independent of its source code;
if there are intermediary steps, then given the decisions of the agent up to any given point, its epistemic state and any intermediate result accessible to the agent at that point are independent of its source code.
If we think of the agent as a program, I think we can equate “decision” with the agent’s output. It’s harder however to equate “epistemic state” with its input: recall Omega saying “Here is the 2 usual boxes. I have submitted this very problem in a simulation to TDT. If it one boxed, box B has the million. If it two boxed, box B is empty”. So, if you’re TDT, this problem is equivalent to the old Newcomb problem, where oneBox ⇔ $$$. But any other agent could 2 box, and get the million and the bonus. (Also, “TDT” could be replaced by a source code listing that the agent would recognize as its own.)
Anyway, I believe there’s a good chance a decision theory exists such that it gets the best results out of any “fair” problem.Though now that I think of it, condition 2 may be a sufficient criterion for “fairness”, for the problem above violates it: if TDT two-boxes, it does not get the million. Well except it does not two box, so my counter-factual doesn’t really mean anything…
It still seems to me that you can’t have a BestDecisionAgent. Suppose agents are black boxes—Omegas can simulate agents at will, but not view their source code. An Omega goes around offering agents a choice between:
$1, or
$100 if the Omega thinks the agent acts differently than BestDecisionAgent in a simulated rationality test, otherwise $2 if the agent acts like BestDecisionAgent in the rationality test.
Does this test meet your criteria for a fair test? If not, why not?
I think I have left a loophole. In your example, Omega is analysing the agent by analysing its outputs in unrelated, and most of all, unspecified problems. I think the end result should only depend on the output of the agent on the problem at hand.
Here’s a possibly real life variation. Instead of simulating the agent, you throw a number of problems at it beforehand, without telling it it will be related to a future problem. Like, throw an exam at a human student (with a real stake at the end, such as grades). Then, later you submit the student to the following problem:
Welcome to my dungeon. Sorry for the headache, but I figured you wouldn’t have followed someone like me in a place like this. Anyway. I was studying Decision Theory, and wanted to perform an experiment. So, I will give you a choice:
Option 1 : you die a most painful death. See those sharp, shimmering tools? Lots of fun.
Option 2 : if I think you’re not the kind of person who makes good life decisions, I’ll let you go unharmed. Hopefully you will harm yourself later. On the other hand, if I think you are the kind of person who makes good life decisions, well, too bad for you: I’ll let you most of you go, but you’ll have to give me your hand.
Option 2? Well that doesn’t surprise me, though it does disappoint me a little. I would have hoped, after 17 times already… well, no matter. So, do you make good decisions? Sorry, I’m afraid “no” isn’t enough. Let’s see… oh, you’re you’re applying for College, if I recall correctly. Yes, I did my homework. I’m studying, remember? So, let’s see your SAT scores. Oh, impressive. That should explain why you never left home those past three weeks. Looks like you know how to trade off short term well being for long term projects. Looks like a good life decision.
So. I’m not exactly omniscient, but this should be enough. I’ll let you go. But first, I believe you’ll have to put up with a little surgery job.
Sounds like something like that could “reasonably” happen in real life. But I don’t think it’s “fair” either, if only because being discriminated for being capable of taking good decisions is so unexpected.
Omega gives you a choice of either $1 or $X, where X is either 2 or 100
Yes, that’s what I mean. I’d like to know what, if anything, is wrong with this argument that no decision theory can be optimal.
Suppose that there were a computable decision theory T that was at least as good as all other theories. In any fair problem, no other decision theory could recommend actions with better expected outcomes than the expected outcomes of T’s recommended actions.
We can construct a computable agent, BestDecisionAgent, using theory T.
For any fair problem, no computable agent can perform better (on average) than BestDecisionAgent.
Call the problem presented in the grandfather post the Prejudiced Omega Problem. In the Prejudiced Omega Problem, BestDecisionAgent will almost assuredly collect $2.
In the Prejudiced Omega Problem, another agent can almost assuredly collect $100.
The Prejudiced Omega Problem does not involve an Omega inspecting the source code of the agent.
The Prejudiced Omega Problem, like Newcomb’s problem, is fair.
Contradiction
I’m not asserting this argument is correct—I just want to know where people disagree with it.
Let BestDecisionAgent choose the $1 with probability p. Then the various outcomes are:
Simulation's choice | Our Choice | Payoff
$1 | $1 = $1
$1 | $2 or $100 = $100
$2 or $100 | $1 = $1
$2 or $100 | $2 or $100 = $2
And so p should be chosen to maximise p^2 + 100p(1-p) + p(1-p) + 2(1-p)^2. This is equal to the quadratic −98p^2 + 97p + 2, which Wolfram Alpha says is maximised by p = 97⁄196, for a expected payoff of ~$26.
If we are not BestDecisionAgent, and so are allowed to choose separately, we aim to maximise pq + 100p(1-q) + q(1-p) + 2(1-p)(1-q), which simplifies to −98pq+98p-q+2, which is maximized by q = 0, for a payoff of ~$50.5. This surprises me, I was expecting to get p = q.
So (3) and (4) are not quite right, but the result is similar. I suspect BestDecisionAgent should be able to pick p such that p = q is the best option for any agent, at the cost of reducing the value it gets.
ETA: Of course you can do this just by setting p = 0, which is what you assume. Which, actually, means that (3) and (4) contradict each other: if BestDecisionAgent always picks the $2 over the $1, then the best any agent can do is $2.
(Incidentally, how do you format tables properly in comments?)
$100 if the Omega thinks the agent acts differently than BestDecisionAgent in a simulated rationality test, otherwise $2 if the agent acts like BestDecisionAgent in the rationality test.
The Omega chooses payoff of $2 vs. $100 based off of a separate test that can differentiate between BestDecisionAgent and some other agent. If we are BestDecisionAgent, the Omega will know this and will be offered at most a $2 payoff. But some other agent will be different from BestDecisionAgent in a way that the Omega detects and cares about. That agent can decide between $1 and $100. Since another agent can perform better than BestDecisionAgent, BestDecisionAgent cannot be optimal.
Ah, ok. In that case though, the other agent wins at this game at the expense of failing at some other game. Depending on what types of games the agent is likely to encounter, this agents effectiveness may or may not actually be better than BestDecisionAgent. So we could possibly have an optimal decision agent in the sense that no change to its algorithm could increase its expected lifetime utility, but not to the extent of not failing in any game.
The problem of cooperation and simulation is one that happens in reality, right now, even though simulation accuracy is far far lower. The problem of Omega is a reductio of this but I think it’s plausible that entities can approach omega-level abilities of prediction even if they’ll never actually get that accurate.
What’s going on is that Eliezer Yudkowsky has argued forcefully for one-boxing, in terms of his “way of winning” thing, which, after reading the other stuff he wrote about that (like the “nameless virtue”), probably created a “why aren’t you winning” alarm bell in people’s heads.
Most philosophers haven’t been introduced to the problem by Eliezer Yudkowsky.
To me, Newcomb’s problem seemed like a contrived trick to punish CDT, and it seemed that any other decision theory was just as likely to run into some other strange scenario to punish it, until I started thinking about AIs that could simulate you accurately, something else that differentiates LessWrong from professional philosophers.
When I realized the only criteria by which a “best decision theory” could be crowned was winning in as many realistic scenarios as possible, and stopped caring that “acausal control” sounded like an oxymoron, and that there could potentially be Newcomblike problems to face in real life, and that there were decision theories that could win on Newcomb’s problem without bungling the smoker’s lesion problem, and read this:
that convinced me to one-box.
Addendum:
About atheists vs theists and undergrads vs philosophers, I think two-boxing is a position that preys on your self-image as a rationalist. It feels like you are getting punished for being rational, like you are losing not because of your choice, but because of who you are (I would say your choice is embedded in who you are, so there is no difference). One-boxing feels like magical thinking. Atheists and philosophers have stronger self-images as rationalists. Most haven’t grokked this:
Will’s link has an Asimov quote that supports the “self-image vs right answer” idea, at least for Asimov:
Seems like Asimov isn’t taking the stakes seriously enough. Maybe we should replace “a million dollars” with “your daughter here gets to live.”
And only coincidentally signalling that his status is worth more than a million dollars.
But losing the million dollars also shoves in your face your ultimate predictability.
Voluntarily taking a loss in order to insult yourself doesn’t seem rational to me.
Plus, that’s not a form of free will I even care about. I like that my insides obey laws. I’m not fond of the massive privacy violation, but that’d be there or not regardless of my choice.
Two boxing is relying on your own power to think and act. One boxing is basically knuckling under to the supposed power of the predictor. I will trust that I am helpless before him. He is an agent and I am his object.
Theists one box and atheists two box. Not surprising.
The twist here at LW is that lots of people seem terribly enamored of these recursive ill posed problems, and think they have solutions to them. That’s where the one boxing comes from here, IMO.
I’d one box because there’s no way I’d risk losing a million dollars to get an extra thousand based on arguments about a problem which bores me so much I have trouble paying attention to it.
What if Box B contains $1,500 instead of $1,000,000 but Omega has still been right 999 times out of 1000?
You did get me to pay a little more attention to the problem. I’d two box in that case. I’m not sure where my crossover is.
Edited to add: I think I got it backwards. I’d still one box. Committing to one-box seems advantageous if Omega is reasonably reliable.
I suppose that then you could numbers on whether the person will reliably keep commitments.
Best analysis of Newcomb’s Paradox I’ve seen so far—boring. There’s nothing to see here. It all comes down to how you model the situation and what your priors are.
I find it hard to imagine a situation where I have more belief in the Predictor’s ability than the ability of the Predictor to give false evidence that I can’t figure out the trick of.
I’d two box because I see no reason to risk of losing anything. In the face of perceived trickery, I’m all the more betting on causality.
Adding to your story, it’s not just Eliezer Yudkowsky’s introduction to Newcomb’s problem. It’s the entire Bayesian / Less Wrong mindset. Here, Eliezer wrote:
I felt something similar when I was reading through the sequences. Everything “clicked” for me—it just made sense. I couldn’t imagine thinking another way.
Same with Newcomb’s problem. I wasn’t introduced to it by Eliezer, but I still thought one-boxing was obvious; it works.
Many Less Wrongers that have stuck around probably have had a similar experience; the Bayesian standpoint seems intuitive. Eliezer’s support certainly helps to propagate one-boxing, but LessWrongers seem to be a self-selecting group.
It also helps that most Bayesian decision algorithms actually take on the
arg max_a U(a)*P(a)
reasoning of Evidential Decision Theory, which means that whenever you invoke your self-image as a capital-B Bayesian you are semi-consciously invoking Evidential Decision Theory, which does actually get the right answer, even if it messes up on other problems.(Commenting because I got here while looking for citations for my WIP post about another way to handle Newcomb-like problems.)
It may well be the strength of argument. It could also be the lead of a very influential/respected figure and the power of groupthink. In my experience, 2 forums with similar mission statements (‘political debating sites’, say, or ‘atheist sites’) often end up having distinct positions on all sorts of things that most of their posters converge around. The same is true of any group, although if ‘theists’ was a genuine survey of theists the world over it’s at least a far more representative group.
It would be very interesting to add a control group in some way: confront someone with the issue who was of a typical LessWrong demographic but hadn’t read anything about Newcomb on Less Wrong for instance.
If it’s not just a quality of this sort of group-think, my best guess is that it’s to do with the greater practical focus (or at least theoretical belief in practical focus!) on LessWrong. I suspect most people automatically parse this sort of philosophical question as ‘what is more abstractly logical’ whereas people on here probably parse it more as ‘what should I do to win’. But I think these sort of ‘our inherent group qualities’ explanations are almost always locatable but often unnecessary in light of group-think.
David Wolpert of the “No Free Lunch Theorem” was one of my favorite researchers back in the 90s. If I remember it right, part of the No Free Lunch Theorem for generalizers was that for any world where your generalizer worked, there would be another world where it didn’t. The issue was the fit of your generalizer to the universe you were in.
Has anyone actually wrote out the bayesian updating for Newcomb? It should take quite a lot of evidence for me to give up on causality as is.
As it turns out, looking at the Newcomb’s Paradox wikipedia page, Wolpert was on the job for this problem, pointing out ” It is straightforward to prove that the two strategies for which boxes to choose make mutually inconsistent assumptions for the underlying Bayes net”. Yes, that’s about my feeling. A hypothetical is constructed which contradicts something for which we have great evidence. Choosing to overturn old conclusions on the basis of new evidence is a matter of the probabilities you’ve assigned to the different and contradictory theories.
Really nothing to see here. Hypothesizing strong evidence that contradicts something you’ve assigned high probability to naturally feels confusing. Of course it does.
Your past, Omega-observed self can cause both Omega’s prediction and your future choice without violating causality.
What you’re objecting to is your being predictable.
My past self is not the cause of my future choices, it is one of many distal causes for my future choices. Similarly, it is not the cause of Omega’s prediction. The direct cause of my future choice is my future self and his future situation, where Omega is going to rig the future situation so that my future self is screwed if he makes the usual causal analysis.
Predictable is fine. People predict my behavior all the time, and in general, it’s a good thing for both of us.
As far as Omega goes, I object to his toying with inferior beings.
We could probably rig up something to the same effect with dogs, using their biases and limitations against them so that we can predict their choices, and arrange it so that if they did the normally right thing to do, they always get screwed. I think that would be a rather malicious and sadistic thing to do to a dog, as I consider the same done to me.
As far as this “paradox” goes, I object to the smuggled recursion, which is just another game of “everything I say is a lie”. I similarly object to other “super rationality” ploys. I also object to the lack of explicit bayesian update analysis. Talky talky is what keeps a paradox going. Serious analysis makes one’s assumptions explicit.
The obvious difference between these hypotheticals is that you’re smart enough to figure out the right thing to do in this novel situation.
It’s also worth mentioning that in the initial wording of the problem, unlike Eliezer’s wording, it was just stated that the predictor is “almost certain” to have predicted correctly about which boxes you are going to take, and it was also specifically stated that there is no reverse causality going on (that what you actually decide to do now has no effect on what is in the boxes.)
For some reason this expression makes me think of the Princess Bride (she’s only Mostly Dead).
Are you implying there exist decision theories that are are less likely to run into strange scenarios that punish it? I would think that Omegas could choose to give any agent prejudicial treatment.
We have to determine what counts as “unfair”. Newcomb’s problem looks unfair because your decision seems to change the past. I have seen another Newcomb-like problem that was (I believe) genuinely unfair, because depending on their decision theory, the agents were not in the same epistemic state.
Here what I think is a “fair” problem. It’s when
the initial epistemic state of the agent is independent of its source code;
given the decisions of the agent, the end result is independent of its source code;
if there are intermediary steps, then given the decisions of the agent up to any given point, its epistemic state and any intermediate result accessible to the agent at that point are independent of its source code.
If we think of the agent as a program, I think we can equate “decision” with the agent’s output. It’s harder however to equate “epistemic state” with its input: recall Omega saying “Here is the 2 usual boxes. I have submitted this very problem in a simulation to TDT. If it one boxed, box B has the million. If it two boxed, box B is empty”. So, if you’re TDT, this problem is equivalent to the old Newcomb problem, where oneBox ⇔ $$$. But any other agent could 2 box, and get the million and the bonus. (Also, “TDT” could be replaced by a source code listing that the agent would recognize as its own.)
Anyway, I believe there’s a good chance a decision theory exists such that it gets the best results out of any “fair” problem.Though now that I think of it, condition 2 may be a sufficient criterion for “fairness”, for the problem above violates it: if TDT two-boxes, it does not get the million. Well except it does not two box, so my counter-factual doesn’t really mean anything…
It still seems to me that you can’t have a BestDecisionAgent. Suppose agents are black boxes—Omegas can simulate agents at will, but not view their source code. An Omega goes around offering agents a choice between:
$1, or
$100 if the Omega thinks the agent acts differently than BestDecisionAgent in a simulated rationality test, otherwise $2 if the agent acts like BestDecisionAgent in the rationality test.
Does this test meet your criteria for a fair test? If not, why not?
I think I have left a loophole. In your example, Omega is analysing the agent by analysing its outputs in unrelated, and most of all, unspecified problems. I think the end result should only depend on the output of the agent on the problem at hand.
Here’s a possibly real life variation. Instead of simulating the agent, you throw a number of problems at it beforehand, without telling it it will be related to a future problem. Like, throw an exam at a human student (with a real stake at the end, such as grades). Then, later you submit the student to the following problem:
Sounds like something like that could “reasonably” happen in real life. But I don’t think it’s “fair” either, if only because being discriminated for being capable of taking good decisions is so unexpected.
Omega gives you a choice of either $1 or $X, where X is either 2 or 100?
It seems like you must have meant something else, but I can’t figure it out.
Yes, that’s what I mean. I’d like to know what, if anything, is wrong with this argument that no decision theory can be optimal.
Suppose that there were a computable decision theory T that was at least as good as all other theories. In any fair problem, no other decision theory could recommend actions with better expected outcomes than the expected outcomes of T’s recommended actions.
We can construct a computable agent, BestDecisionAgent, using theory T.
For any fair problem, no computable agent can perform better (on average) than BestDecisionAgent.
Call the problem presented in the grandfather post the Prejudiced Omega Problem. In the Prejudiced Omega Problem, BestDecisionAgent will almost assuredly collect $2.
In the Prejudiced Omega Problem, another agent can almost assuredly collect $100.
The Prejudiced Omega Problem does not involve an Omega inspecting the source code of the agent.
The Prejudiced Omega Problem, like Newcomb’s problem, is fair.
Contradiction
I’m not asserting this argument is correct—I just want to know where people disagree with it.
Qiaochu_Yuan’s post is related.
Let BestDecisionAgent choose the $1 with probability p. Then the various outcomes are:
And so p should be chosen to maximise p^2 + 100p(1-p) + p(1-p) + 2(1-p)^2. This is equal to the quadratic −98p^2 + 97p + 2, which Wolfram Alpha says is maximised by p = 97⁄196, for a expected payoff of ~$26.
If we are not BestDecisionAgent, and so are allowed to choose separately, we aim to maximise pq + 100p(1-q) + q(1-p) + 2(1-p)(1-q), which simplifies to −98pq+98p-q+2, which is maximized by q = 0, for a payoff of ~$50.5. This surprises me, I was expecting to get p = q.
So (3) and (4) are not quite right, but the result is similar. I suspect BestDecisionAgent should be able to pick p such that p = q is the best option for any agent, at the cost of reducing the value it gets.
ETA: Of course you can do this just by setting p = 0, which is what you assume. Which, actually, means that (3) and (4) contradict each other: if BestDecisionAgent always picks the $2 over the $1, then the best any agent can do is $2.
(Incidentally, how do you format tables properly in comments?)
The Omega chooses payoff of $2 vs. $100 based off of a separate test that can differentiate between BestDecisionAgent and some other agent. If we are BestDecisionAgent, the Omega will know this and will be offered at most a $2 payoff. But some other agent will be different from BestDecisionAgent in a way that the Omega detects and cares about. That agent can decide between $1 and $100. Since another agent can perform better than BestDecisionAgent, BestDecisionAgent cannot be optimal.
Ah, ok. In that case though, the other agent wins at this game at the expense of failing at some other game. Depending on what types of games the agent is likely to encounter, this agents effectiveness may or may not actually be better than BestDecisionAgent. So we could possibly have an optimal decision agent in the sense that no change to its algorithm could increase its expected lifetime utility, but not to the extent of not failing in any game.
The problem of cooperation and simulation is one that happens in reality, right now, even though simulation accuracy is far far lower. The problem of Omega is a reductio of this but I think it’s plausible that entities can approach omega-level abilities of prediction even if they’ll never actually get that accurate.