Is causal decision theory plus self-modification enough?
Occasionally a wrong idea still leads to the right outcome. We know that one-boxing on Newcomb’s problem is the right thing to do. Timeless decision theory proposes to justify this action by saying: act as if you control all instances of your decision procedure, including the instance that Omega used to predict your behavior.
But it’s simply not true that you control Omega’s actions in the past. If Omega predicted that you will one-box and filled the boxes accordingly, that’s because, at the time the prediction was made, you were already a person who would foreseeably one-box. One way to be such a person is to be a TDT agent. But another way is to be a quasi-CDT agent with a superstitious belief that greediness is punished and modesty is rewarded—so you one-box because two-boxing looks like it has the higher payoff!
That is an irrational belief, yet it still suffices to generate the better outcome. My thesis is that TDT is similarly based on an irrational premise. So what is actually going on? I now think that Newcomb’s problem is simply an exceptional situation where there is an artificial incentive to employ something other than CDT, and that most such situations can be dealt with by being a CDT agent who can self-modify.
Eliezer’s draft manuscript on TDT provides another example (page 20): a godlike entity—we could call it Alphabeta—demands that you choose according to “alphabetical decision theory”, or face an evil outcome. In this case, the alternative to CDT that you are being encouraged to use is explicitly identified. In Newcomb’s problem, no such specific demand is made, but the situation encourages you to make a particular decision—how you rationalize it doesn’t matter.
We should fight the illusion that a TDT agent retrocausally controls Omega’s choice. It doesn’t. Omega’s choice was controlled by the extrapolated dispositions of the TDT agent, as they were in the past. We don’t need to replace CDT with TDT as our default decision theory, we just need to understand the exceptional situations in which it is expedient to replace CDT with something else. TDT will apply to some of those situations, but not all of them.
The point of Newcomb’s Problem isn’t that it’s difficult to write a program that one-boxes- it’s that it’s difficult to write a program that comes out ahead both on Newcomblike problems with Predictors and on basic problems without Predictors. Yours would grab 1000 over 1001000 in general, which would be the wrong move if the boxes of utility were natural features of a landscape.
I don’t have a better decision theory than TDT, but I also don’t believe that what you do in the present affects the past. However, the nature of the situation is such as to reinforce, in a TDT-like agent, the illusion that decisions made in the present affect decisions simulated in the past. (That is, assuming it is an agent self-aware enough to have such beliefs.)
One conception of the relationship between CDT and TDT is that it is like the relationship between classical and relativistic mechanics: relativistic mechanics is truer but it reduces to classical mechanics in a certain limit. But I think TDT is more like alphabetical decision theory—though useful in a far wider variety of scenarios: it is not a decision theory that you would want to have, outside of certain peculiar situations which offer an incentive to deviate from CDT.
I need to study UDT more, because sometimes it sounds like it’s just CDT in a multiverse context, and yet it’s supposed to favor one-boxing.
Would you cooperate in the Prisoner’s Dilemma against an almost-copy of yourself (with only trivial differences so that your experiences would be distinguishable)? It can be set up so that neither of you decide within the light-cone of the other’s decision, so there’s no way your cooperation can physically ensure the other’s cooperation.
If you’re quite convinced that the reasonable thing is to defect, then pretty obviously you’ll get (D,D).
If you’re quite convinced that the reasonable thing is to cooperate, then pretty obviously you’ll get (C,C).
(OK, you could decide randomly- but then you’re just as likely to get (C,D) as (D,C).)
This is another sort of problem that TDT and UDT get right without any need for ad-hoc add-ons. The point is that advanced decision theories can be reasonably simple (where applications of Löb’s Theorem are counted as simple), get the right answer in all the cases where CDT gets the right answer (grabbing the highest utility when you’re the only agent around, finding the Nash equilibrium in a zero-sum game, etc), and also get the right answer when other agents are basing their decisions in a knowable way off of their predictions of what you’ll do in various hypotheticals. Newcomb’s Problem may sound artificial, but that’s because we’ve made that dependence as simple and deterministic as possible in order to have a good test problem.
If we taboo terms like cause and control, how could we state this?
Nohow. Your decision procedure’s output leads to money being put into one box, and also to you choosing that box, or little money being put into second box, and also to you choosing both.
If you ever anticipate some sort of prisoner dilemma between identical instances of your decision procedure (that’s what Newcomb’s problem is) you adjust the decision procedure accordingly. It doesn’t matter in the slightest to the prisoner dilemmas whenever there is temporal separation between instances of decision procedure, or spatial separation; nothing changes if the Omega doesn’t learn your decision directly, yet creates items inside boxes immediately before they are opened. Nothing even changes if the Omega hears your choice and then puts items into the boxes. In all of those cases, a run of decision procedure leads to an outcome.
I’m not so sure. The output of your decision procedure is the same as the output of Omega’s prediction procedure, but that doesn’t tell you how algorithmically similar they are.
Well, if you are to do causal decision theory, you must also be in a causal world (or at least assume you are in causal world), and in the causal world, correlation of omega’s decisions with yours implies either coincidence or some causation—either omega’s choices cause your choices, your choices cause omega’s choices, or there is a common cause to both yours and omega’s choices. The common cause could be the decision procedure, could be the childhood event that makes a person adopt the decision procedure, etc. In the latter case, it’s not even a question of decision theory. The choice of box been already made—by chance, or by parents, or by someone who convinced you to one/two box. From that point on, it has been mechanistically propagating by laws of physics, and affected both omega and you. (and even before that point, it has been mechanistically propagating ever since big bang).
The huge problem about application of decision theory is the idea of immaterial soul that’s doing the deciding however it wishes. That’s not how things are. There are causes to the decisions. Using causal decision theory together with the idea of immaterial soul that’s deciding from outside of the causal universe, leads to a fairly inconsistent world.
You can’t. But why would we want to taboo those terms?
The inability to taboo a term can indicate that the term’s meaning is not sufficiently clear and well-thought out.
I want to know what they mean in context. I feel I cannot evaluate the statement otherwise; I am not sure what it is telling me to expect.
My understanding is that tabooing is usually “safe”.
If a concept is well-defined and non-atomic then you can break it down into its definition, and the argument will still be valid.
If a concept is not well-defined then why are you using it?
So the only reasons for not-tabooing something would seem to be:
My above argument is confused somehow (e.g. the concepts of “well-defined” or “atomic” are themselves not well-defined and need tabooing)
For convenience—someone cal effectively stall an argument by asking you to taboo every word
The concepts are atomic
Treating control (and to a lesser extent causality) as atomic seems to imply a large inferential distance from the worldview popular on LW. Is there a sequence or something else I can read to see how to get to there from here?
Refusing to taboo may be a good idea if you don’t know how, and using the opaque concept gives you better results (in the intended sense) than application of the best available theory of how it works. (This is different from declaring understanding of a concept undesirable or impossible in principle.)
Yes, that makes sense. Do you think this applies here?
Same reason we usually play “rationalist’s taboo” around here: to separate the denotations of the terms from their connotations and operate on the former.
I agree—at least if this CDT agent has the foresight to self-modify before getting “scanned” by Omega (or put through a lie detector test, which is a statistically significant implementation of the same idea).
The question is then, if our self-modifying agent foresees running into a variety of problems where other agents will be able to predict its actions, what decision theory should it self-modify to?
Could you have a CDT agent that’s never thought about Newcomb problems, out for a stroll, then Omega appears and explains the situation, and then the CDT agent reasons its way to one-boxing anyway? Maybe, AIXI-style, it does an exhaustive investigation of the payoffs resulting from various actions, it notices that changing itself into a one-boxer is correlated with a higher payoff, and so it performs the act!
It wouldn’t work as you’ve stated it. The action of changing itself to a one-boxer would, according to its current decision theory, increase payoffs for every Newcomb’s Problem it would encounter from that moment forward, but not for any in which the Predictor had already made its decision.
Seriously, you can work this out for yourself.
What confuses me here is that a causal model of reality would still tell it that being a one-boxer now will maximize the payoff now, if it examines possible worlds in the right way. It seems to come down to cognitive contingencies—whether its heuristics manage to generate this observation, without it then being countered by a “can’t-change-the-past” heuristic.
I may need to examine the decision-theory literature to see what I can reasonably call a “CDT agent”, especially Gibbard & Harper, where the distinction with evidential decision theory is apparently defined.
That’s the main difference between decision theories like CDT, TDT and UDT.
I think it’s the only difference between CDT and TDT: TDT gets a semi-correct causal graph, CDT doesn’t. (Only semi-correct because the way Eliezer deals with Platonic nodes, i.e. straightforward Bayesian updating, doesn’t seem likely to work in general. This is where UDT seems better than TDT.)
What is this “correlated” you speak of? :P I think if Omega pops up with already-filled boxes, the standard argument for two-boxing goes through whether the CDT agent is self-modifying or not.
I don’t endorse the decision theory you describe but do point out that this isn’t CDT. You are actually talking about a new, largely undefined decision theory which consists of using CDT sometimes and something else at other times. That is, you have replaced CDT. If it were the case that TDT (or the derivatives like UDT) were flawed and needed to be replaced in this manner then the correct way to theorize about the subject is to pick apart just what decision algorithm you would pick for which exceptional situation and then describe this overall process—perhaps calling it ‘Mitchell Porter Ad-hoc Decision Theory.’
No.
It is true that CDT with self modification is unstable. All CDT agents will instantly modify themselves to be agents that behave as if they are UDT agents with respect to information about external decisions made after the time of self modification but will interact with all external decisions made before the time of self modification based on CDTlike reasoning.
It isn’t possible for a CDT agent to escape this constraint through any effort of self modification because it is fundamentally against a CDT agent’s nature to try to do so.
Eric Barnes had got this far in 1997 - saying:
Your post seems to be a poster-child example for the comment I wrote a couple days ago.
Er… as a positive example, or a negative example?
Negative, I think. Mitchell_Porter views Newcomblike problems in terms of physical causality. The problems themselves are pushing him to switch to thinking about logical causality, but he can’t make the jump because he thinks it must be wrong. It’s frustrating for me to watch, knowing how much beautiful undiscovered math might lie beyond the jump.
I wonder if I have any such obvious hangups. I know I did in the past.
I will get the hang of this eventually. I just have to break it down into a form I can accept first. I see what you, Nesov, Nisan, etc., are doing with the mutually dependent programs or functions. But we could tell a story, of Omega meeting a TDT agent who one-boxes and gets the reward, in which everything is caused by forward-in-time physical causality. So the status of “logical causality” is uncertain and perhaps suspect. It may not be an essential concept, in order to understand what’s going on here.
Many things can be explained in multiple different ways, and for physical events a physical (causal) explanation is always possible. The lesson of LW-style decision theories seems to be that one shouldn’t privilege the physical explanation over other types of knowledge about how events depend on each other, that other kinds of dependence can be equally useful (even though there must be a physical explanation for how those dependencies between physical events got established).
My CDT solution is to notice that Omega’s promise of the future is correct, then it must be capable of bringing it about. Perhaps it knows how to teleport the larger reward away if we go for the visible reward. Maybe Omega is just a master stage magician. Point is, taking the action of going and getting the visible reward will prevent me from getting the invisible one. I don’t need to worry about implementation details like whether it’s really based on my decision before I make it, or just the actions I take after I make it. The constraint is equally real whether I understand the mechanism or not.
Yes, that dodges the real question, but can an example illustrating the same deficiency of CDT be constructed, that isn’t subject to this dodge? I’m not certain it’s possible.
You could look at it another way. If a CDT agent knows it will face unspecified Newcomblike problems in the future, it will want to make the most general precommitment now. Of course you can’t come up with the most general precommitment that will solve all decision problems, because there could be a universe that arbitrarily punishes you for having a specific decision algorithm in your head, and rewards some other silly decision algorithm for being different. But if the universe rewards or punishes you only based on the return value of your algorithm and not its internals, then we can hope to figure out mathematically how the most general precommitment (UDT) should choose its return value in every situation. We already know enough to suspect that that it will probably talk about logical implication instead of physical causality, even in a world that runs on physical causality.
Self-modifying CDT loses in Parfit’s Hitchhiker.
You seem to be assuming that a decision making procedure should compute what action an agent should take to best achieve some goals. Under this assumption, it seems very important that an agent’s actions can’t affect the past. But the innovation of TDT is that a decision making procedure should compute what decision the abstract (not just any particular physical implementation) decision making procedure itself should output to best achieve some goals. Because the abstract decision making procedure is not a physical thing in space-time, it is not subject to the same limitations as the physical agent. The abstract decision making procedure really does control whatever physically instantiates it at any time (and indirectly influences whatever interacts with those instantiations).
Supposedly we want a new, reflective decision theory because you can’t have “a CDT agent who can self-modify.”
I ask this more as a question, not as a statement, because I’m not terribly familiar with TDT or AI (I’m just an engineering student who’s maybe 1⁄3 of the way through the sequences) but is there any conflict between TDT and self modification? Suppose we modify Newcomb’s problem somewhat, and say that Omega is predicting whether or not you were a one boxer a year ago. Suppose an AI was, in fact, a two boxer a year ago and self modified to be a one boxer now. Since Omega could simply read the code, it would know this. But by my understanding, a TDT AI would still one box in this situation, which would lose.
This is probably already answered somewhere, but why a CDT agent would not one-box, thinking like this: “if Omega always predicts right, then I can assume the predictions are made by full simulation. Then why do I think I am actually out right now? Maybe this is the Omega’s simulation. I have no way of knowing, so I must choose an action that is the best for both cases”.
?
Because that’s not what CDT is. CDT takes the prior/simultaneous decisions of other agents as pure unknowns, and looks to minimax its payoff over all possible probabilistic distributions of those unknowns. It won’t even cooperate with what it knows to be an exact copy of itself in the Prisoner’s Dilemma, which is basically the situation you’ve set up.
Yes, but if it is uncertain about its situation—whether it is in the world A or in the world B—then it should assign probabilities to the different possibilities and maximize its payoff over these probabilities, shouldn’t it? In the case of being in Omega’s simulation vs. actually choosing the box, if the assigned probabilities are 1⁄2, the agent would one-box.
Regarding the Prisoner Dilemma, whether CDT defects appears to depend on exact details of problem formalization. For example, if a CDT agent plays against its mirror image, then presumably it would cooperate—because there is a direct causal chain between its action and the action of its mirror image. So why should it not recognize a similar causal chain between its action and an action of its exact copy?
Possibly, if the CDT’s utility function cares about the real world (and not, say, the agent’s subjective experience), it should view one-boxing as 1⁄2 probability of filling the box, 1⁄2 probability of taking the box. But there are utility functions that would still not get there, and Omega could spoil it completely by changing the utility function (though not the decision algorithm, nor the payoffs in the new utility function) of the one it simulates.
No, it’s the same subjective experience. It’s like an exact copy of an upload being made, then being run for 5 minutes in parallel with the original, and then terminated. There’s no difference in utilities.
The actual way in which Omega makes its decision doesn’t matter. What matters is the agent’s knowledge that Omega always guesses right. If this is the case, then the agent can always represent Omega as a perfect simulator.
The logic doesn’t work if the agent knows that Omega guesses right only with some probability p<1. But if this is the case, then the problem looks underspecified. The correct decision depends on exact nature of the noise. If Omega makes the decision by analyzing the agent’s psychological tests taken in childhood, then the agent should two-box. And if Omega makes a perfect simulation and then adds random noise, the agent should one-box.
This is always so, there are details absent from any incomplete model, whose state can decide the outcome as easily as your decision. Gaining knowledge about those details allows to improve the decision, but absent that knowledge the only thing to do is to figure out what the facts you do know suggest.
If people use this consideration to consistently beat Omega, its accuracy can’t be 90%. Therefore, in that case, they can’t beat Omega with this argument, proof by contradiction.
(If they don’t use this consideration, then you could win, but this hypothetical is of little use if you don’t know that. For example, it seems like a natural correction to specify that you only know the figure 90% and not correlation between the correctness of guesses and the test subjects’ properties, and that the people sampled for this figure were not different from you in any way that you consider relevant for this problem.)
If no facts about the nature of the “noise” is specified, then the phrase “probability of correct decision by Omega is 0.9″ does not make sense. It does not add any knowledge beyond “sometimes Omega makes mistakes”.
If only 10% of the people use this consideration, then why not?
(AFAIU, the point in parentheses basically amounts to the idea that in the absence of any known causal links I should use EDT (=Bayesian reasoning))
You use all that is known about how events, including your own decision, depend on each other. Some of these dependencies can’t withstand your interventions, which are often themselves coming out of the error terms. In this way, EDT is the same as TDT, its errors originating from failure to recognize this effect of breaking correlations and (a flaw shared with CDT) from unwillingness in include abstract computations in the models. CDT, on the other hand, severs too many dependencies by using its causal graph surgery heuristic.
My correction of the problem statement makes sure that the dependence of Omega’s prediction on your decision is not something that can be broken by your decision, so graph surgery should spare it. (In CDT terms, both your decision and Omega’s prediction depend on your original state, and CDT mistakenly severs this dependence by thinking its decision uncaused.)
But when you make this correction, and then compare agents performance based on it, you should place the agents in the same situation, if the comparison is to be fair. In particular, the situation must be the same regarding the knowledge of this correction—knowledge that “the dependence of Omega’s prediction on your decision is not something that can be broken by your decision”. In a regular analysis here on LW of Newcomb’s problem, TDT receives an unfair advantage, in that it is given this knowledge while CDT is not, presumably because CDT cannot represent it.
But in fact it can—why not? If it means drawing causal arrows backwards in time, so what?
And in case of “pure” Newcomb’s problem, where the agent knows that Omega is 100% correct, even the backward causal arrows are not needed. I think. That was what my original comment was about, and so far no one answered...
The comparison doesn’t necessarily have to be fair, it only needs to accurately discern the fittest. A cat, for example, won’t even notice that an IQ test is presented before it, but that doesn’t mean that we have to make adjustments, that the conclusion is incorrect.
Updates are propagated in both directions, so you draw causal arrows only forwards in time, just don’t sever this particular arrow during standard graph surgery on a standard-ish causal graph, so that knowledge about your decision tells you something about its origins in the past, and then about the other effects of those origins on the present. But CDT is too stubborn to do that, and a re-educated CDT is not a CDT anymore, it’s half-way towards becoming a TDT.
Good point.
Perhaps. Although it’s not clear to me why CDT is allowed to notice that its mirror image does whatever it does, but not that its perfect copy does whatever it does.
And what about the “simulation uncertainty” argument? Is it valid or there’s a mistake somewhere?
That is just what “probability” means: it quantifies possibilities that can’t be ruled out, where it’s not possible to distinguish those that do take place from those that don’t.
Bayesians say all probabilities are conditional. The question here is on what this “0.9” probability is conditioned.
On me having chicken for supper. Unless you can unpack “being conditional” to more than a bureaucratic hoop that’s easily jumped through, it’s of no use.
On reflection, my previous comment was off the mark. Knowing that Omega always predicts “two-box” is an obvious correlation between a property of agents and the quality of prediction. So, your correction basically states that the second view is the “natural” one: Omega always predicts correctly and then modifies the answer in 10% cases.
In such case, the “simulation uncertainty” argument should work the same way as in the “pure” Newcomb’s problem, with the correction for the 10% noise (which does not change the answer).
Oh, come on. According to Janes, the marginal probability P(Omega is correct | Omega predicts something) is supposed to be additionally conditioned on everything you know about the situation. If you know that Omega always predicts “two-box”, then P(Omega is correct | Omega predicts something) is equal to the relative frequency of two-boxers in the population. If you know that Omega first always predicts correctly and then modifies its answer in 10% cases, then it’s something completely different. If you have no knowledge about whether the first or the second is true, then what can you do? Presumably, try Solomonoff induction, too bad it’s incomputable.
(See the parenthetical in the current updated version of the comment.)
I added a parenthetical to my comment as well :)