A full explanation to Newcomb’s paradox.
Since I’ve read of Newcomb’s paradox on the sequences on less wrong, I’ve always thought there is something fundamentally wrong about timeless decision theory. I would end up coming up with an alternate explanation that seemed correct to me. I then searched it on Wikipedia and found that what I said was already said of course. But I’m still curios what the community thinks on the topic.
Newcomb’s paradox according to Wikipedia is as follows.
There is an infallible predictor, a player, and two boxes designated A and B. The player is given a choice between taking only box B, or taking both boxes A and B. The player knows the following:[4]
Box A is clear, and always contains a visible $1,000.
Box B is opaque, and its content has already been set by the predictor:
If the predictor has predicted the player will take both boxes A and B, then box B contains nothing.
If the predictor has predicted that the player will take only box B, then box B contains $1,000,000.
The player does not know what the predictor predicted or what box B contains while making the choice.
To be clear when we say the predictor is infallible we can either mean that the predictor has not made a single mistake, over hundreds of witnessed occurrences or that we are dealing with a actual infallible predictor either way what I am going to say is valid.
There are four possible ways in which the predictor is making his predictions.
1. He’s not actually making any predictions and is cheating. This form of cheating can be either changing what’s in the box after you choose. Using mind control on you or any way of cheating the problem. either way if this is the case you should obviously choose just box B since your decision changes what’s in the box.
2. The predictor can in some way see the future. In this case you should obviously choose just box B because your choice affects the past since the predictor knowing the future reverses the casualty of time and what you do now actually changes what was their in the past without any paradox.
3. The predictor is really good at figuring out personalities. This seems highly unlikely because all we would need would be for one person to come in with a dice or a coin, who has a personality of wanting to flip a coin to decide. However, they could be using tricks like only selecting people who they are extremely sure of. Either way, since the money is already in the box you should obviously choose both boxes since your choice in no way affects the past.
4. The predictor is running a full or partial simulation of the situation. In this case it depends on whether you are able to differentiate between real life and the simulation. Because if you know that you are the real-life person than you should obviously choose both since your choice affects nothing. It doesn’t affect what your simulation chooses and it doesn’t affect what is in the boxes. And if you’re the you in the simulation than you should just choose box B since your choice affects what is going to be in the box. The impossible part is that you have absolutely no idea if you are the you in the simulation or the real you. Since in order for it to be a exact simulation of you the simulation must also think it exists in the real world in order for it to be accurate, or else it would make a different choice then you since the choice you made is dependant on the knowledge that you’re the real you and the simulation would not have that. Therefore, you should choose box just box B since you might be the you in the simulation.
In the end I think the correct choice is to choose box B since I think 1 is the most likely followed by 4. With both 2 and 3 being extremely unlikely.
A huge difference between this and roko’s basilisk is that the AI has no reason to think its in a perfectly accurate simulation and therefore has no incentive to torture people. Since what it does in the future cannot affect the past. While it seems like in newcomb’s problem your decision now is affecting the simulations decision. It’s can be looked at as the reverse, your simulations decision is deciding your decision.
A Interesting corollary to newcomb’s problem is the Psychological Twin Prisoner’s Dilemma. An agent and her twin must both choose to either “cooperate” or “defect.” If both cooperate, they each receive $1,000,000.1 If both defect, they each receive $1,000. If one cooperates and the other defects, the defector gets $1,001,000 and the cooperator gets nothing. The agent and the twin know that they reason the same way, using the same considerations to come to their conclusions. However, their decisions are causally independent, made in separate rooms without communication. Should the agent cooperate with her twin?
This problem at first glance seems to evade the solution I have previously proposed. Since each agent is trying to maximise their own utility as opposed to in the previous case where the person in the simulation is trying to maximise the real persons utility. This to me gets at the heart of the problem of choice. Since it would seem that the choice you make affects your twin since whatever you choose it changes what your twin does, yet you and your twin are causally independent. This also hints at even more annoying problem. What does it mean to choose if all choices are predestined.
The main place where I disagree with you here is that you make such a big distinction between “really good at figuring out personalities” (3) and “running a full or partial simulation of the situation” (4). As your phrase “full or partial” suggests, simulations have a large range of fidelity. To illustrate a bit:
Fully precise physical sim, in a physics which allows such a thing. (Quantum mechanics poses some problems for this in our universe.)
Simulation which is highly accurate down to the atomic level.
Simulation which is highly accurate down to the cell level.
Simulation which (like many modern fluid dynamic sims) identifies regions which need to be run at higher/lower levels of fidelity, meaning that in principle, some areas can be simulated down to the atomic level, while other areas may be simulated at a miles-large scale.
Like the above, but also accelerated by using a neural network trained on the likely outcome of all the possible configurations, allowing the simulation to skip forward several steps at a time instead of going one step at a time.
...
Somewhere on this continuum sits “really good at figuring out personalities”. So where do you draw the line? If it’s a matter of degree (as seems likely), how do you handle the shift from taking both boxes (as you claim is right for scenario 3) to one box (as you claim is right for scenario 4)?
In scenario 3, you would get 1k, while a 2-boxer would get 1m. You can comfort yourself in the “correctness” of your decision, but you’re still losing out. More to the point, a rational agent would prefer to self-modify to be the sort of agent that gets 1m in this kind of situation, if possible. So, your decision theory is not stable under self-reflection.
(All of this is just based on my understanding, no guarantees.)
Miri is studying decision theory in the context of embedded agency. Embedded Agency is all about what happens if you stop having a clear boundary between the agent and the environment (and you instead have the agent as part of the environment, hence embedded). Decision problems where the outcome depends on your behavior in counter-factual situations are just one of several symptoms that come from being an embedded agent.
In this context, we care about things like “if an agent is made of parts, how can she ensure her parts are aligned” or “if an agent creates copies of herself, how can we make sure nothing goes wrong” or “if the agent creates a successor agent, how can we make sure the successor agent does what the original agent wants”.
I say this because (3) and (4) suddenly sound a lot more plausible when you’re talking about something like an embedded agent playing a newcomb-like game (or a counter-factual mugging type game or a prisonner-dilemma type game) with a copy of itself.
Also, I believe Timeless Decision Theory is outdated. The important decision theories are Updateless Decision Theory and Functional Decision Theory. Afaik, UDT is both better and better formalized than TDT.
I started reading the FDT paper and it seems to make a lot more sense than TDT. And most importantly does not fail like TDT did in regards to roko’s basilisk.
I think the idea is that the 4th scenario is the case, and you can’t discern whether you’re the real you or the simulated version, as the simulation is (near-) perfect. In that scenario, you should act in the same way that you’d want the simulated version to. Either (1) you’re a simulation and the real you just won $1,000,000; or (2) you’re the real you and the simulated version of you thought the same way that you did and one-boxed (meaning that you get $1,000,000 if you one-box.)
I agree with you, I just was trying to emphasize that if your the real you your decision doesn’t change anything. At most it can do is if the simulation is extremely accurate is it can reveal what was already chosen since you know that you will make the same decision as you previously made in the simulation. The big difference between me and timeless decision theory is that I contend that the only reason to choose just box B is because you might be in the simulation. This completely gets rid of ridiculous problems like roko’s basilisk. Since we are not currently simulating a AI therefore a future AI cannot affect us. If the AI had the suspicion that it was in a simulation then it might have a incentive to torture people but given that it has no reason to think that, torture is a waste of time and effort.
Under some circumstances, it seems that option 4 would result in the predictor trying to solve the Halting Problem since figuring out your best option may in effect involve simulating the predictor.
(Of course, you wouldn’t be simulating the entire predictor, but you may be simulating enough of the predictor’s chain of reasoning that the predictor essentially has to predict itself in order to predict you.)
That’s a interesting point.
Our estimate of which of the four possibilities is correct is conditional on us living in a universe where we observe that the predictor always guesses correctly. If we put aside cheating (which should almost always be our guess if we observe something happening that seems to defy our understanding of how the universe operates) we should have massive uncertainty concerning how randomness and/or causation operates and thus not assign too low a probability to either (2) or (3).
That’s a valid point. Still I think four would still be the most likely and since the payoff is significantly bigger it’s still worth it to choose just B.
I’ve been confused by the traditional description of CDT failing at the Newcomb problem. I understand CDT to be something like “pick the choice with the highest expected value”. This is how I imagine such an agent reasoning about the problem:
“If I one-box, then I get $0 with epsilon probability and $1m with one-minus-epsilon probably for an expected value of ~$1m. If I two-box, then I get $1k + $1m with epsilon probability and $1k with one-minus-epsilon probably for an expected value of ~$1k. One-boxing has higher expected value, so I should one-box.”
How does the above differ from actual CDT? Most descriptions I’ve heard have the agent considering Omega’s prediction as a single unknown variable with some distribution and then showing that this cancels out of the EV comparison, but what’s wrong with considering the two EV calculations independently of each other and only comparing the final numbers?
I struggled with this for a long time. I forget which of the LW regulars finally explained it simply enough for me. CTD is not “classical decision theory”, as I previously believed, and it does not summarize to “pick highest expected value”. It’s “causal decision theory”, and it optimizes on results based on a (limited) causal model, which does not allow the box contents to be influenced by the (later in time) choice the agent makes.
“naive, expectation-based decision theory” one-boxes based on probability assignments, regardless of causality—it shuts up and multiplies (sum of probability times outcome). But it’s not a formal prediction model (which causality is), so doesn’t help much in designing and exploring artificial agents.
IOW, causal decision theory is only as good as it’s causal model, which is pretty bad for situations like this.
Regarding the topic of your last paragraph (how can we have choice in a deterministic universe): this is something Gary Drescher discusses extensively in his book.
Firstly, he points out that determinism does not imply that choice is necessarily futile. Our ‘choices’ only happen because we engage in some kind of decision or choice making process. Even though the choice may be fixed in advance, it is still only taken because we engage in this process.
Additionally, Gary proposes the notion of a subjunctive means-end link (a means-end link is a method of identifying what is a means to a particular end), wherein one can act for the sake of what would have to be the case if they take a particular action. For example, in newcomb’s problem one can pick just a single box because it would then have to be the case that the big box contains a million.
Putting these two things together might help make sense of how our actions affect these kind of thought experiments.