Simulation-aware causal decision theory: A case for one-boxing in CDT
Disclaimer: I am a math student new to LW and the rationalist community. I could not find this idea anywhere else after a modest amount of research; I apologize if this has been brought up before.
Summary: CDT agents may behave differently than expected in situations involving reliable predictors due to the possibility that they are themselves simulations being run by the predictor. This may lead to decisions such as one-boxing in Newcomb’s problem and acquiescing in counterfactual mugging.
The traditionally held conclusion of CDT in Newcomb’s problem is that the decision of which boxes to take cannot causally influence the predictor’s prediction, so in the absence of any evidence as to what the prediction is, a CDT agent should take both boxes as it would be better off with that choice than with the choice to take only the opaque box regardless of what the prediction is. The potential challenge I pose to this is that in such a situation, a rational agent ought to have significant credence in the possibility that it is a simulation being run by the predictor, and that in the case that the agent is a simulation, its choice actually does have a causal influence on the “real” prediction.
Fundamental to this argument is the claim that a reliable predictor is almost certainly a reliable simulator. To see why this is plausible, consider that if a predictor was not able to predict any particular internal process of the agent, that process could be abused to generate decisions that thwart the predictor. For a rough example, think of a human-like agent deciding “at random” whether to one-box or two-box. Human pseudorandom bit generation is far from a coin flip, but it depends on enough internal state that a predictor unable to simulate most, if not all, of the brain would likely not perform well enough to warrant being called reliable.
We must also make the assumption that an agent in this situation would, if it knew it was a simulation, strive to maximize expected utility for its real-world counterpart instead of its simulated self. I’m not entirely sure if this is a reasonable assumption, and I admit this may be a failure point of the theory, but I’ll continue anyway. At the very least, it seems like the kind of behavior you would want to build into a utility-maximizing agent if you were to design one.
Now, for a concrete example of a decision problem where this observation is consequential, consider Newcomb’s problem with $1,000 in the transparent box and $1,000,000 possibly in the opaque box. Assume that the agent has 50% credence that it is a simulation. In the case that the agent is a simulation, it does not have causal influence over its decision in the real world and so it assumes a uniform prior over the two possibilities (its real-world counterpart one-boxing or two-boxing). In the case that the agent is in the real world, it assumes a uniform prior over the presence of the $1,000,000 in the opaque box. Given these assumptions, the expected value of two-boxing is
,
(where the first top-level term corresponds to the simulation, the second to reality, the first parenthesized terms to the other agent (real or sim) two-boxing, and the second to the other agent one-boxing) whereas the expected value of one-boxing is
,
and hence a CDT agent with these assumptions would choose to one-box.
For the second example, consider a counterfactual mugging of $100 in exchange for a counterfactual $1,000,000 (to simplify the calculations, assume that the predictor would only go through the effort of making a prediction if the coin landed heads). Under the same assumptions as above, the CDT expected value of acquiescing to the mugging is
,
and the expected value of refusing the mugging is
,
hence a CDT agent would acquiesce.
This idea relies on several assumptions, some of which are more reasonable than others. I don’t claim that this is how a pure CDT agent would actually behave under these scenarios, but it’s at least an interesting alternative to think about to explain why the common interpretation of CDT appears irrational (or at least destitute in practice) when it comes to decision problems involving predictors, despite giving reasonable verdicts in most others. In the future, it would be interesting to investigate to what extent this version of CDT agrees with other decision theories, and on what problems it diverges.
I discuss this view of Newcomb’s Problem in my paper on “Puzzles of Anthropic Reasoning Resolved Using Full Non-indexical Conditioning”, available (in original and partially-revised versions) at https://glizen.com/radfordneal/anth.abstract.html
See the section 2.5 on “Dangers of fantastic assumptions”, after the bit about the Chinese Room.
As noted in a footnote there, this view has also been discussed at these places:
https://scottaaronson.blog/?p=30
http://countiblis.blogspot.com/2005/12/newcombs-paradox-and-conscious.html
Thank you so much, this is exactly what I was looking for. It’s reassuring to know I’m not crazy and other people have thought of this before.
For my understanding of CDT, this is not a conclusion, it’s an axiom. CDT that acknowledges decision-backward-causality isn’t called CDT anymore.
Yes, perhaps that sentence wasn’t the best way to convey my point. This version of CDT does not in any way acknowledge backward-causality or subjunctive dependence; the idea is that since the predictor runs a simulation of the agent before deciding on its prediction, there is a forward causal influence on the predictor’s action in the case that the agent is a simulation.
Simulation of decision is just one way of having backward-causality. Omega is acting on the decision BEFORE the agent makes the decision. The whole problem with CDT is that it does not accept that a choice can affect previous choices.
No, in this view, you may be acting before Omega makes his decision, because you may be a simulation run by Omega in order to determine whether to put the $1 million in the box. So there is no backward causation assumption in decided to take just one box.
Nozick in his original paper on Newcomb’s Problem explicitly disallows backwards causation (eg, time travel). If it were allowed, there would be the usual paradoxes to deal with.
Oh, as long as Omega acts last (you choose, THEN omega fills or empties the box, THEN the results are revealed, there’s no conundrum. Mixing up “you might be in a simulation, or you might be real, and the simulation results contstrain your real choice” is where CDT fails.
I think you’re misunderstanding something, but I can’t quite pin down what it is. For clarity, here is my analysis of the events in the thought experiment in chronological order:
1. Omega decides to host a Newcomb’s problem, and chooses an agent (Agent A) to participate in it.
2. Omega scans Agent A and simulates their consciousness (call the simulation Agent B), placing it in a “fake” Newcomb’s problem situation (e.g. Omega has made no prediction about Agent B, but says that it has in the simulation in order to get a result)
3. Agent B makes its decision, and Omega makes its prediction based on that
4. Omega shows itself to Agent A and initiates Newcomb’s problem in the real world, having committed to its prediction in step 3
5. Agent A makes its decision and Newcomb’s problem is done.
From a third-party perspective, there is no backward causality. The decision of Agent B influences Omega’s prediction, but the decision of Agent A does not. Likewise, the decision of Agent B does not influence the decision of Agent A, as it is hidden by Omega (this is why the simulation part of the EV calculations assumes a uniform prior over the decision of Agent A). There is no communication or causal influence between the simulation and reality besides the simulation influencing Omega’s prediction. The sole factor that makes it appear as though there is some kind of backward causality is that subjectively, neither agent knows whether they are Agent A or Agent B, and so they each act as though there is a 50% chance that they have forward causal influence over Omega’s prediction—not the prediction that Omega purports to already have made, since there is no way to influence that, but the prediction that Omega will make in the real world based on Agent B’s decision. That is, the only sense in which the agent in my post has causal influence over Omega’s decision is that in the case that they are Agent B, they will make their choice, find that the whole thing was fake and the boxes are full of static or something, they will cease to exist as the simulation is terminated, and then their decision will influence the prediction Omega claims to have made to Agent A.
I suspect the misunderstanding here is that I was too vague with the wording of the claim that “in the case that the agent is a simulation, its choice actually does have a causal influence on the ‘real’ prediction”. I hope that the distinction between Agents A and B clears up what I’m saying.
This is quite likely. I suspect that my understanding of CDT, from a technical perspective, remains incorrect, even after a fair bit of reading and discussion. From what I understand, CDT does not include the possibility that it can be simulated well enough that Omega’s prediction is binding. That is the backward-causality (the REAL, CURRENT decision is entangled with a PREVIOUS observation) which breaks it.
The point of the view expressed in this post is that you DON’T have to see the decisions of the real and simulated people as being “entangled”. If you just treat them as two different people, making two decisions (which if Omega is good at simulation are likely to be the same), then Causal Decision Theory works just fine, recommending taking only one box.
The somewhat strange aspect of the problem is that when making a decision in the Newcomb scenario, you don’t know whether you are the real or the simulated person. But less drastic ignorance of your place in the world is a normal occurrence. For instance, you might know (from family lore) that you are descended from some famous person, but be uncertain whether you are the famous person’s grandchild or great grandchild. Such uncertainty about “who you are” doesn’t undermine Causal Decision Theory.
One can easily think of mundane situations in which A has to decide on some action without knowing whether or not B has or has not already made some decision, and in which how A acts will affect what B decides, if B has not already made their decision. I don’t think such mundane problems pose any sort of problem for causal decision theory. So why would Newcomb’s Problem be different?