Decision theory and “winning”
With much help from crazy88, I’m still developing my Decision Theory FAQ. Here’s the current section on Decision Theory and “Winning”. I feel pretty uncertain about it, so I’m posting it here for feedback. (In the FAQ, CDT and EDT and TDT and Newcomblike problems have already been explained.)
One of the primary motivations for developing TDT is a sense that both CDT and EDT fail to reason in a desirable manner in some decision scenarios. However, despite acknowledging that CDT agents end up worse off in Newcomb’s Problem, many (and perhaps the majority of) decision theorists are proponents of CDT. On the face of it, this may seem to suggest that these decision theorists aren’t interested in developing a decision algorithm that “wins” but rather have some other aim in mind. If so then this might lead us to question the value of developing one-boxing decision algorithms.
However, the claim that most decision theorists don’t care about finding an algorithm that “wins” mischaracterizes their position. After all, proponents of CDT tend to take the challenge posed by the fact that CDT agents “lose” in Newcomb’s problem seriously (in the philosophical literature, it’s often referred to as the Why ain’cha rich? problem). A common reaction to this challenge is neatly summarized in Joyce (1999, p. 153-154 ) as a response to a hypothetical question about why, if two-boxing is rational, the CDT agent does not end up as rich as an agent that one-boxes:
Rachel has a perfectly good answer to the “Why ain’t you rich?” question. “I am not rich,” she will say, “because I am not the kind of person [Omega] thinks will refuse the money. I’m just not like you, Irene [the one-boxer]. Given that I know that I am the type who takes the money, and given that [Omega] knows that I am this type, it was reasonable of me to think that the $1,000,000 was not in [the box]. The $1,000 was the most I was going to get no matter what I did. So the only reasonable thing for me to do was to take it.”
Irene may want to press the point here by asking, “But don’t you wish you were like me, Rachel?”… Rachael can and should admit that she does wish she were more like Irene… At this point, Irene will exclaim, “You’ve admitted it! It wasn’t so smart to take the money after all.” Unfortunately for Irene, her conclusion does not follow from Rachel’s premise. Rachel will patiently explain that wishing to be a [one-boxer] in a Newcomb problem is not inconsistent with thinking that one should take the $1,000 whatever type one is. When Rachel wishes she was Irene’s type she is wishing for Irene’s options, not sanctioning her choice… While a person who knows she will face (has faced) a Newcomb problem might wish that she were (had been) the type that [Omega] labels a [one-boxer], this wish does not provide a reason for being a [one-boxer]. It might provide a reason to try (before [the boxes are filled]) to change her type if she thinks this might affect [Omega’s] prediction, but it gives her no reason for doing anything other than taking the money once she comes to believes that she will be unable to influence what [Omega] does.
In other words, this response distinguishes between the winning decision and the winning type of agent and claims that two-boxing is the winning decision in Newcomb’s problem (even if one-boxers are the winning type of agent). Consequently, insofar as decision theory is about determining which decision is rational, on this account CDT reasons correctly in Newcomb’s problem.
For those that find this response perplexing, an analogy could be drawn to the chewing gum problem. In this scenario, there is near unanimous agreement that the rational decision is to chew gum. However, statistically, non-chewers will be better off than chewers. As such, the non-chewer could ask, “if you’re so smart, why aren’t you healthy?”. In this case, the above response seems particularly appropriate. The chewers are less healthy not because of their decision but rather because they’re more likely to have an undesirable gene. Having good genes doesn’t make the non-chewer more rational but simply more lucky. The proponent of CDT simply extends this response to Newcomb’s problem.
One final point about this response is worth nothing. A proponent of CDT can accept the above argument but still acknowledge that, if given the choice before the boxes are filled, they would be rational to choose to modify themselves to be a one-boxing type of agent (as Joyce acknowledged in the above passage and as argued for in Burgess, 2004). To the proponent of CDT, this is unproblematic: if we are sometimes rewarded not for the rationality of our decisions in the moment but for the type of agent we were at some past moment then it should be unsurprising that changing to a different type of agent might be beneficial.
The response to this defense of two-boxing in Newcomb’s problem has been divided. Many find it compelling but others, like Ahmed and Price (2012) think it does not adequately address to the challenge:
It is no use the causalist’s whining that foreseeably, Newcomb problems do in fact reward irrationality, or rather CDT-irrationality. The point of the argument is that if everyone knows that the CDT-irrational strategy will in fact do better on average than the CDT-rational strategy, then it’s rational to play the CDT-irrational strategy.
Given this, there seem to be two positions one could take on these issues. If the response given by the proponent of CDT is compelling, then we should be attempting to develop a decision theory that two-boxes on Newcomb’s problem. Perhaps the best theory for this role is CDT but perhaps it is instead BT, which many people think reasons better in the psychopath button scenario. On the other hand, if the response given by the proponents of CDT is not compelling, then we should be developing a theory that one-boxes in Newcomb’s problem. In this case, TDT, or something like it, seems like the most promising theory currently on offer.
I should probably give some context of how I see this section playing a role in the wider FAQ—this might help clarify things but will also give people (including perhaps Luke) a chance to correct me if I’ve misunderstood the purpose of the section.
In the rest of the FAQ, it has basically been presumed that the aim of decision theory is to develop a decision theory that wins. While dissenting views have been mentioned briefly, the FAQ is structured around this way of judging decision theories.
This raises a question of whether it correctly represents the views of decision theorists. After all, this FAQ is supposed to be an introduction to the area (rather than a substantial original contribution) and the hope is that the FAQ will be of use to people studying decision theory in academia. Given this, it’s important that standard positions are represented accurately.
So an additional section is being added to clarify how standard views should be interpretted (this section). As such, this clarifies the views that many decision theorists have about the issue of decision theory and winning. It is not meant to be a defence of these views nor is it meant to be a detailed analysis—the document is a FAQ and is meant to provide a clear introduction to the field including standard views on how decision theory’s should be judged.
Hopefully that provides some context for reading it and (for those that know the desired context) if I’ve misunderstood the context, please do clarify with me. For those that don’t know the desired context, feel free to comment if you think this is a flawed thing to be aiming for anyway.
possible typo: “But don’t you wish you were like me, Rachel?”… Rachael can and should admit that she does wish she were more like Irene… ”
“Rachael”
Thanks. Will fix.
Is this a typo?
Oops. A typo but an unintentionally coherent one.
(I can’t edit the original post but I’ll make sure to change it in the FAQ itself).
Do they think that two-boxing is genetic and cannot be unlearned?
On a different note, I don’t understand how one self-consistently discuss Newcomb’s without going into the issues of free will, determinism and inside vs outside view.
Perhaps the precise sentence makes things unclear. The argument is that in both cases it is not the rationality of the decision that leads to the payoff but some other factor.
In the chewing gum problem, that other factor is genetic. In Newcomb’s problem, that other factor is about agent type or decision theory.
Does the proponent of CDT think this cannot be unlearned? Not necessarily. However, they think that by the time you’re faced with the box, it’s too late to usefully do so because it’s not your decision that’s being considered but rather your prior agent type. If you’re not yet faced with the box, then the proponent of CDT would say that you should unlearn two-boxing. But, as noted in the section, this is not to say that they think two-boxing is an irrational decision: if people are sometimes rewarded not for their decision but for their agent type then it shouldn’t be surprising that it might be rational to follow a decision theory that sometimes endorses irrational decisions (because the reward for the type outweighs the reward for the decision). But this says something about the rationality of agent types, not decisions (according to the proponent of CDT, which is the view that is being represented here).
Interesting. I didn’t fully realize that people tend to identify with a way of thinking enough to consciously go into losing even when a winning move is obvious to them.
I think the argument is a little more technical than that. This argument asserts that decision theory is about decisions, and the thing that determines whether you win at Newcomb’s is something other than a decision. It still might be good to have a way to win at Newcomb’s, but on this view that thing will not be a decision theory.
The question being asked to the decision theory is “what is the best decision?” and CDT says it’s taking both boxes. Leaving $1,000 on the table is not helpful. Being the sort of person who would leave $1,000 on the table happens to be helpful in this case, but nobody asked CDT “What sort of person should I be?”. If you frame that question as a decision, like “There’s a Newcomb’s tournament coming up; which decision should you precommit to?” then CDT will precommit to 1-boxing, even though if you ask it at the time of the decision, it will still say that absent the precommitment it would prefer to also have the $1,000.
Your comment captures the point well but I think this line is a little misleading. The proponent of CDT can say that a decision theory (held timelessly, including before the boxes were filled) might make you win at NP but they will say that decision theory is about making optimal decisions not about choosing optimal decision theories (though of course, the decision about which decision theory to follow is one possible decision and, the proponent of CDT will say, it is a decision that CDT handles correctly).
I guess the point I don’t understand is the difference between precommitting and deciding on the spot in this case. The calculation seems exactly the same and not time-consuming, so why bother precommitting?
I’m not sure whether you’re saying that the proponent of CDT has a silly view or whether you’re saying you don’t understand their view. If the second:
The proponent of CDT would say that it’s not the same calculation in both cases.
They would say that NP rewards you for your agent type (broadly construed) at t=0 and not your decision at t=1.
Precommitment is about changing your agent type at t=0 so the relevant calculation here (according to the proponent of CDT) is “what are the benefits of having each agent type at t=0?”. One-boxing agent type will come out on top.
Deciding in the moment is about your decision at t=1 so the relevant calculation here (according to the proponent of CDT) “What are the benefits of making each decision at t=1 given that this can’t change my agent type at t=0?”
Perhaps it could be argued that these calculations reduce to one another but, if so, that’s a substantive argument that needs to be had. At least on the face of it, the calculations are different.
The second… well, probably a bit of both. Anyway, I think that I understand my reservation about the classic presentation of CDT. From Wikipedia:
It’s the first statement that is false in the perfect predictor version, because it fights the counterfactual (the predictor is perfect). So the naive CDT in this case is not even self-consistent, as it assigns non-zero odds (100% in fact) to the predictor being imperfect.
It seems more reasonable to say that your choice of one or two boxes causally affects your self-assignment to one of the two groups, winners and losers.
I’m not convinced that this is a fair portrayal of what the proponent of CDT says. That’s not to weigh in on whether they’re right but I don’t think they fail to be self-consistent in the way you have outlined.
The proponent of CDT doesn’t assign non-zero odds to the predictor being imperfect, they just say that it doesn’t matter if the predictor is perfect or not as, given that the boxes are already filled, it is too late to influence the thing which would lead you to get the $M (your agent type at t=0 rather than your decision at t=1).
The CDT agent will agree that the predictor is perfect but just deny that this is relevant because it doesn’t change the fact that NP rewards people based on agent type (at t=0) and not decision, nor does it change the fact that the decision now can’t causally influence the agent type at t=0.
Whether this is the right question to ask seems to me to be open to debate but I don’t think that the proponent of CDT has an internal inconsistency in their consideration of whether the predictor is perfect.
That’s where they lose me. By definition of a perfect predictor, there is no option of “two-box and get $1000 and $1,000,000” in the problem setup, why would they even consider it?
From their perspective, they don’t need to consider it.
The CDT agent can have a credence of 0 in the proposition that they will get $M + $1000. After all, if they have a credence of 1 that the predictor is perfect and a credence of 1 that they were a two-boxing sort of agent at t=0 then they should have a credence of 0 that they will get $M + $1000. The CDT agent won’t deny this.
They then say, however, that they have a credence of 1 in the world state where there is $0 in the second box. Given this credence, the smart decision is to two-box (and get $1000) rather than 1-box (and get $0).
So the short answer is: they don’t even consider this possibility but this doesn’t change the fact that, on their view, the best decision is to two-box.
I’m not entirely sure what we’re discussing here but what I’m saying is that the view isn’t internally inconsistent: they don’t have contradictory credences in world states and they don’t think that there is an option of two-boxing and getting $M and $1000 (they assign credence 0 to this possibility—of course, presuming they have credence 1 that they were a two-boxing type of agent at t=0 then they also assign credence 0 to the possibility of one-boxing and getting $M because they hold that what matters is the decision type at t=0 and not the decision at t=1).
So if you’re saying that their view is internally inconsistent in one of the above ways then one or the other of us is confused. On the other hand, if you’re just saying that this way of thinking seems alien to you then what I’ve said in this comment is pretty much irrelevant...
Ah, right, they never expect anything to be in the opaque box, so for them taking the opaque box is basically redundant (“might as well, no harm can come from it”). So they correctly assign the probability of zero to the event “I’m a two-boxer and there is $1M to be had”.
However, this is supplemented by “CDTer must two-box” because “predictor’s choice has already been made”, as if this choice is independent of what they decide. This strange loop can only be unwound by considering how the predictor might know what they will decide before they think that they decided something. And this requires taking the outside view and going into the free-will analysis.
Yeah, that’s right—so I think the proponent of CDT can be criticised for all sorts of reasons but I don’t think they’re (straight-forwardly) inconsistent.
As a note, the decision in NP is whether to take the opaque and the transparent box or whether to just take the opaque box—so the CDT agent doesn’t just think they “may as well” two-box, they think they’re actively better to do so because doing so will gain them the $1000 in the transparent box.
And yes, I agree that considerations of free will are relevant to NP. People have all sorts of opinions about what conclusion we should draw from these considerations and how important they are.
OK, thanks for clearing this CDT self-consistency stuff up for me.
That’s cool, glad I had something useful to say (and it’s nice to know we weren’t just talking at cross purposes but were actually getting somewhere!)
It is, however, quite frustrating to realize, in retrospect, that I had already gone through this chain of reasoning at least once, and then forgot it completely :(
The same way I could self-consistently discuss Newcomb’s without going into the issues of Pirates vs Ninjas. Even if those issues are all particularly relevant to Newcomb’s problem it isn’t self-inconsistent to just not bother talking about all possible tangential issues. Heck, even if it were outright erroneous to not talk about your list of issues when discussing Newcomb’s (this is decidedly counterfactual) then it still wouldn’t be self-inconsistent to not do so. It’d merely be wrong.
Edited to make clearer. Now says:
Hmm… What is BT, and what’s the psychopath button? The terms don’t appear in the Sequences or the LessWrong Wiki. Searching the whole site, I found a few references to “Benchmark Theory” which I presume is what you mean, but no definition.
Do you define them elsewhere in your FAQ, or give references to where they are defined… ?
Yes, these are both explained earlier in the FAQ.
If you are independently interested then the Psychopath Button is described here (paywall): http://philreview.dukejournals.org/content/116/1/93.citation (ETA: http://fitelson.org/few/few_05/egan.pdf)
And Benchmark Theory is described here: http://www-personal.umich.edu/~ericsw/2fef/gandalf.ltr.pdf (I’m not sure if this is a draft or a pre-print)
It is also discussed here (another paywall, unfortunately): http://philreview.dukejournals.org/content/119/1/1.abstract
ETA: While these may not have received much discussion on LW, they’ve attracted a fair bit of attention in academia which is why they’re being mentioned in an introductory FAQ.
Thanks for this. I am a bit surprised they haven’t cropped up more on Less Wrong, if they are indeed standard in the literature. I thought I’d come across pretty much every variant of chewing gum, smoking lesion, and Newcomb by now… but clearly not.
Incidentally, having very quickly glanced at “Psychopath button”, I wonder if the decider should first imagine a “safe psychopath button” which would kill every psychopath in the world apart from the presser. Consider whether you would push that button. If you are sure you would push it (and under the preferences described in the problem, the decider should be sure) then you get strong evidence that you are a psychopath yourself, so CDT says you shouldn’t push the original button. So I can’t see a very convincing counter-example to CDT here.
Yes, you might be interested in http://www-personal.umich.edu/~jjoyce/papers/rscdt.pdf
I think this is only correct if you accept the CDT view of what a decision actually is—i.e. that decisions are made at a particular point in space-time and can be made one way or the other independently of what happens in the rest of the world.
If you instead define decisions as occurring at some point in algorithms-we-haven’t-yet-computed space, I think you’ll end up with something TDT-like in either case—whether you focus on making the rational decision or being the rational agent.
Yes, though this is just a FAQ section dealing with the “standard” view on how decision theory and winning interact and the reason that proponents of CDT hold their position even though they think winning is important. Perhaps that’s all there is and there’s no lesson to be had from it other than “this is what lots of philosophers think”.
Personally, however, I think it reveals more than this. People who are new to the debate sometimes have the following view about CDT:
(Of particular note, some people take this claim as a a basic, obvious fact—as opposed to others who reach this conclusion at the end of a long argument for the position).
However, I think the proponent of CDT actually holds a more subtle position than this (which is partly outlined in the FAQ section above). As your comment highlights, the question then becomes a complex one about which view of decisions we should accept. The answer to this debate is likely to be motivated in part by the result of a debate about which technical definition of winning we should accept (technical because we’re not just counting average utility received by agents because if we did then not chewing in the chewing gum problem would come out as rational). The above section reveals that the proponent of CDT has their own views about what definition of winning matters in decision theory, just as the proponent of TDT does (it’s not that one simply doesn’t care about winning) and so it seems to me that the debate requires more steps to reach the above view than simply accepting it at face value.
I don’t actually disagree with this statement (except its tone) - but in order to have rational debates we need to [construct the strongest possible version of the opposite view|http://lesswrong.com/lw/85h/better_disagreement/] before we have a go at demolishing it. So with that in mind, I definitely like the way CDT is being framed here.
I just brought this up because I wasn’t sure whether the original sentence I quoted was painted with “this sentence has lukeprog/crazy88′s unconditional support” or “this sentence belongs as part of the CDT philosophy”.
Is it worth mentioning (in a different section) the problems of reconciling the CDT model of a “decision” with reductionism? i.e. no matter how small you grind up the physical universe, you won’t find anything that looks like a “decision”, but you can grind up algorithm space until you find something that looks like “you”. Or is this too advanced (or nonsensical) for the FAQ?
I agree. In terms of my statement regarding those who hold that:
My claim wasn’t that this wasn’t a suitable conclusion but rather that it wasn’t a suitable starting point. As you note, it’s good to construct a steel man of an opponent’s argument before attacking it but even more crucially, it’s important that we don’t attack straw men. The view of CDT in the FAQ isn’t even a steel man, it’s just the position advocated by many proponents of CDT. Attacking anything weaker than this is attacking a straw man. So proponents of CDT may whine that NP punishes irrational agents but they at least have an argument as to why NP punishes irrational agents as well. Ignoring this argument while attacking its conclusion is undesirable.
(That’s all just a statement of what we seem to be agreeing about).
Definitely not—this section of the FAQ is summarising a popular view, not endorsing it. I took the words “on this account” to mean that the statement wasn’t being endorsed but rather a particular account being summarised. However, perhaps this could be made more clear (as follows?)
In terms of this:
The FAQ is meant to be more of a basic level introduction to decision theory so I’m not sure if this is the place for it. But Luke’s the one who knows the master plan.
Is there missing context in the FAQ? It seems to assume that people already know what Newcomb’s Problem and the chewing gum problem are. The former is easy to google, but searching for the latter suggests that the chewing gum problem is that it’s hard to clean up when people stick chewing gum to public surfaces.
Yep, context is in the rest of the FAQ—both of those cases are covered in their own earlier sections,