My guess is that a large part of the divergence relates to the fact that LWers and philosophers are focused on different questions. Philosophers (two-boxing philosophers, at least) are focused on the question of which decision “wins” whereas LWers are focused on the question of which theory “wins” (or, at least, this is what it seems to me that a large group of LWers is doing, more on which soon).
So philosophical proponents of CDT will almost all (all, in my experience) agree that it is rational if choosing a decision theory to follow to choose a one-boxing decision theory but they will say that it is rational if choosing a decision to two-box.
A second part of the divergence seems to me to relate to the toolsets available to the divergent groups. LWers have TDT and UDT, philosophers have Parfit on rational irrationality (and a whole massive literature on this sort of issue).
I actually think that LWers need to be described into two distinct groups: those who have done a lot of research into decision theory and those that haven’t.
For those that haven’t, I suspect that the “disagreement” with philosophers is mostly apparent and not actual (these people don’t distinguish the two questions above and don’t realise that philosophers are answering the winning decision question and not the winning theory question and don’t realise that philosophers don’t just ignorantly set aside these issues but have a whole literature on rational irrationality, winning decisions vs winning theories and so on).
This is especially powerful because the story is often told on LW as if LW takes rationality to be about “winning” whereas philosophers are interested in analysing our concept of rationality (and surely it’s easy to pick between these). More accurately, though, it’s between two divergent views of “winning”.
For those that have studied more decision theory, the story is a different one and I don’t know that I know the views of these people in enough depth to comment on them.
From the standpoint of reflective consistency, there should not be a divergence between rational decisions and rational algorithms; the rational algorithm should search for and output the rational decision, and the rational decision should be to adopt the rational algorithm. Suppose you regard Newcomb’s Problem as rewarding an agent with a certain decision-type, namely the sort of agent who one-boxes. TDT can be viewed as an algorithm which searches a space of decision-types and always decides to have the decision-type such that this decision-type has the maximal payoff. (UDT and other extensions of TDT can be viewed as maximizing over spaces broader than decision-types, such as sensory-info-dependent strategies or (in blackmail) maximization vantage points). Once you have an elegant theory which does this, and once you realize that a rational algorithm can just as easily maximize over its own decision-type as the physical consequences of its acts, there is just no reason to regard two-boxing as a winning decision or winning action in any sense, nor regard yourself as needing to occupy a meta-level vantage point in which you maximize over theories. This seems akin to precommitment, and precommitment means dynamic inconsistency means reflective inconsistency. Trying to maximize over theories means you have not found the single theory which directly maximizes without any recursion or metaness, and that means your theory is not maximizing the right thing.
Claiming that TDTers are maximizing over decision theories, then, is very much a CDT standpoint which is not at all how someone who sees logical decision theories as natural would describe it. From our perspective we are just picking the winning algorithm output (be the sort of agent who picks one box) in one shot, and without any retreat to a meta-level. The output of the winning algorithm is the winning decision, that’s what makes the winning algorithm winning.
Yes. Which is to say, clearly you fall into the second class of people (those who have studied decision theory a lot) and hence my explanation was not meant to apply to you.
Which isn’t to say I agree with everything you say.
From the standpoint of reflective consistency, there should not be a divergence between rational decisions and rational algorithms; the rational algorithm should search for and output the rational decision,
Decisions can have different causal impacts to decision theories and so there seems to be no reason to accept this claim. Insofar as the rational decision is the decision which wins which depends on the causal effects of the decision and the rational algorithm is the algorithm which wins which depends on the causal effects of the algorithm then there seems to be no reason to think these should coincide. Plus, I like being able to draw distinctions that can’t be drawn using your terminology.
and the rational decision should be to adopt the rational algorithm.
Agreed (if you are faced with a decision of which algorithm to follow). Of course, this is not the decision that you’re faced with in NP (and adding more options is just to deny the hypothetical)
Suppose you regard Newcomb’s Problem as rewarding an agent with a certain decision-type, namely the sort of agent who one-boxes. TDT can be viewed as an algorithm which searches a space of decision-types and always decides to have the decision-type such that this decision-type has the maximal payoff. (UDT and other extensions of TDT can be viewed as maximizing over spaces broader than decision-types, such as sensory-info-dependent strategies or (in blackmail) maximization vantage points).
Yes, and I think this is an impressive achievement and I find TDT/UDT to be elegant, useful theories. The fact that I make the distinction between rational theories and rational decisions does not mean I cannot value the answers to both questions.
once you realize that a rational algorithm can just as easily maximize over its own decision-type as the physical consequences of its acts, there is just no reason to regard two-boxing as a winning decision or winning action in any sense, nor regard yourself as needing to occupy a meta-level vantage point in which you maximize over theories.
Well...perhaps. Obviously just because you can maximise over algorithms, it doesn’t follow that you can’t still talk about maximising over causal consequences. So either we have a (boring) semantic debate about what we mean by “decisions” or a debate about practicality: that is, the argument would be that talk about maximising over algorithms is clearly more useful than talk about maximising over causal consequences so why care about the second of these. For the most part, I buy this argument about practicality (but it doesn’t mean that two-boxing philosophers are wrong, just that they’re playing a game that both you and I feel little concern for).
This seems akin to precommitment, and precommitment means dynamic inconsistency means reflective inconsistency. Trying to maximize over theories means you have not found the single theory which directly maximizes without any recursion or metaness, and that means your theory is not maximizing the right thing.
I know what all these phrases mean but don’t know why it follows that your theory is not maximising the “right” thing. Perhaps it is not maximising a thing that you find to be useful or interesting (particularly for self-modifying AIs). If this is what you mean then fine. If you mean, however, that two-boxers are wrong on their own terms then I would need a more compelling argument (I’ve read your TDT paper btw, so reference to that won’t resolve things here).
Claiming that TDTers are maximizing over decision theories, then, is very much a CDT standpoint which is not at all how someone who sees logical decision theories as natural would describe it. From our perspective we are just picking the winning algorithm output (be the sort of agent who picks one box) in one shot, and without any retreat to a meta-level. The output of the winning algorithm is the winning decision, that’s what makes the winning algorithm winning.
Sure, the distinction is from the CDT perspective. You use words differently to the proponents of CDT (at which point, the whole difference between LWer views and philosopher’s views should be unsurprising). I’m not really interested in getting into a semantic debate though. I think that LWers are too quick to think that philosophers are playing the game wrong whereas I think the view should actually be that they’re playing the wrong game.
Well...perhaps. Obviously just because you can maximise over algorithms, it doesn’t follow that you can’t still talk about maximising over causal consequences. So either we have a (boring) semantic debate about what we mean by “decisions” or a debate about practicality: that is, the argument would be that talk about maximising over algorithms is clearly more useful than talk about maximising over causal consequences so why care about the second of these.
No, my point is that TDT, as a theory, maximizes over a space of decisions, not a space of algorithms, and in holding TDT to be rational, I am not merely holding it to occupy the most rational point in the space of algorithms, but saying that on its target problem class, TDT’s output is indeed always the most rational decision within the space of decisions. I simply don’t believe that it’s particularly rational to maximize over only the physical consequences of an act in a problem where the payoff is determined significantly by logical consequences of your algorithm’s output, such as Omega’s prediction of your output, or cohorts who will decide similarly to you. Your algorithm can choose to have any sort of decision-type it likes, so it should choose the decision-type with the best payoff. There is just nothing rational about blindly shutting your eyes to logical consequences and caring only about physical consequences, any more than there’s something rational about caring only about causes that work through blue things instead of red things. None of this discussion is taking place inside a space of algorithms rather than a space of decisions or object-level outputs.
The fact that you would obviously choose a non-CDT algorithm at meta level, on a fair problem in which payoffs have no lexical dependence on algorithms’ exact code apart from their outputs, is, on my view, very indictive of the rationality of CDT. But the justification for TDT is not that it is what CDT would choose at the meta-level. At the meta-level, if CDT self-modifies at 7pm, it will modify to a new algorithm which one-boxes whenever Omega has glimpsed its source code after 7pm but two-box if Omega saw its code at 6:59pm, even though Omega is taking the self-modification into account in its prediction. Since the original CDT is not reflectively consistent on a fair problem, it must be wrong. Insofar as TDT chooses TDT, when offered a chance to self-modify on problems within its problem space, it is possible that TDT is right for that problem space.
But the main idea is just that TDT is directly outputting the rational action because we want the giant heap of money and not to stick to this strange, ritual concept of the ‘rational’ decision being the one that cares only about causal consequences and not logical consesquences. We need not be forced to increasingly contorted redefinitions of winning in order to say that the two-boxer is winning despite not being rich, or that the problem is unfair to rationalists despite Omega not caring why you choose what you chose and your algorithm being allowed to choose any decision-type it likes. In my mental world, the object-level output of TDT just is the right thing to do, and so there is no need to go meta.
I would also expect this to be much the same reasoning, at a lower level of sophistication, for more casual LW readers; I don’t think they’d be going meta to justify TDT from within CDT, especially since the local writeups of TDT also justified themselves at the object and not meta level.
As you say, academic decision theorists do indeed have Parfit on rational irrationality and a massive literature thereupon, which we locally throw completely out the window. It’s not a viewpoint very suited to self-modifying AI, where the notion of programmers working from a theory which says their AI will end up ‘rationally irrational’ ought to give you the screaming willies. Even notions like, “I will nuke us both if you don’t give me all your lunch money!”, if it works on the blackmailee, can be considered as maximizing over a matrix of strategic responses when the opponent is only maximizing over their action without considering how you’re considering their reactions. We can do anything a Parfitian rational irrationalist can do from within a single theory, and we have no prejudice to call that theory’s advice irrational, nor reason to resort to precommitment.
Interesting. I have a better grasp of what you’re saying now (or maybe not what you’re saying, but why someone might think that what you are saying is true). Rapid responses to information that needs digesting are unhelpful so I have nothing further to say for now (though I still think my original post goes some way to explaining the opinions of those on LW that haven’t thought in detail about decision theory: a focus on algorithm rather than decisions means that people think one-boxing is rational even if they don’t agree with your claims about focusing on logical rather than causal consequences [and for these people, the disagreement with CDT is only apparent]).
ETA: On the CDT bit, which I can comment on, I think you overstate how “increasingly contorted” the CDTers “redefinitions of winning” are. They focus on whether the decision has the best causal consequences. This is hardly contorted (it’s fairly straightforward) and doesn’t seem to be much of a redefinition: if you’re focusing on “winning decisions” as the CDTer does (rather than “winning agents”) it seems to me that the causal consequences are the most natural way of separating out the part of the agent’s winning relates to the decision from the parts that relate to the agent more generally. As a definition of a winning decision, I think the definition used on LW is more revisionary than the CDTers definition (as a definition of winning algorithm or agent, the definition on LW seems natural but as a way of separating out the part of the agent’s winning that relate to the decision, logical consequences seems far more revisionary). In other words, everyone agrees what winning means. What people disagree about is when we can attribute the winningness to the decision rather than to some other factor and I think the CDTer takes the natural line here (which isn’t to say they’re right but I think the accusations of “contorted” definitions are unreasonable).
If agents whose decision-type is always the decision with the best physical consequences ignoring logical consequences, don’t end up rich, then it seems to me to require a good deal of contortion to redefine the “winning decision” as “the decision with the best physical consequences”, and in particular you must suppose that Omega is unfairly punishing rationalists even though Omega has no care for your algorithm apart from the decision it outputs, etc. I think that to believe that the Prisoner’s Dilemma against your clone or Parfit’s Hitchhiker or voting are ‘unfair’ situations requires explicit philosophical training, and most naive respondents would just think that the winning decision was the one corresponding to the giant heap of money on a problem where the scenario doesn’t care about your algorithm apart from its output.
To clarify: everyone should agree that the winning agent is the one with the giant heap of money on the table. The question is how we attribute parts of that winning to the decision rather than other aspects of the agent (because this is the game the CDTers are playing and you said you think they are playing the game wrong, not just playing the wrong game). CDTers use the following means to attribute winning to the decision: they attribute the winning that is caused by the decision. This may be wrong and there may be room to demonstrate that this is the case but it seems unreasonable to me to describe it as “contorted” (it’s actually quite a straightforward way to attribute the winning to the decision) and I think that using such descriptions skews the debate in an unreasonable way. This is basically just a repetition of my previous point so perhaps further reiteration is not of any use to either of us...
In terms of NP being “unfair”, we need to be clear about what the CDTer means by this (using the word “unfair” makes it sound like the CDTer is just closing their eyes and crying). On the basic level, though, the CDTer simply mean that the agent’s winning in this case isn’t entirely determined by the winning that can be attributed to the decision and hence that the agent’s winning is not a good guide to what decision wins. More specifically, the claim is that the agent’s winning is determined in part by things that are correlated with the agent’s decision but which aren’t attributable to the agent’s decision and so the agent’s overall winning in this case is a bad guide to determining which decision wins. Obviously you would disagree with the claims they’re making but this is different to claiming that CDTers think NP is unfair in some more everyday sense (where it seems absurd to think that Omega is being unfair because Omega cares only about what decision you are going to make).
I don’t necessarily think the CDTers are right but I don’t think the way you outline their views does justice to them.
So to summarise. On LW the story is often told as follows: CDTers don’t care about winning (at least not in any natural sense) and they avoid the problems raised by NP by saying the scenario is unfair. This makes the CDTer sound not just wrong but also so foolish it’s hard to understand why the CDTer exists.
But expanded to show what the CDT actually means, this becomes: CDTers agree that winning is what matters to rationality but because they’re interested in rational decisions they are interested in what winning can be attributed to decisions. Specifically, they say that winning can be attributed to a decision if it was caused by that decision. In response to NP, the CDTer notes that the agent’s overall winning is not a good guide to the winning decision as in this case, the agent’s winning it also determined by factors other than their decisions (that is, the winning cannot be attributed to the agent’s decision). Further, because the agent’s winnings correlate with their decisions, even though it can’t be attributed to their decisions, the case can be particularly misleading when trying to determine the winning decisions.
Now this second view may be both false and may be playing the wrong game but it at least gives the CDTer a fair hearing in a way that the first view doesn’t.
In Newcomb the outcome “pick two boxes, get $1.001M” is not in the outcome space, unless you fight the hypothetical, so the properly restricted CDT one-boxes. In the payoff matrix [1000, 0; 1001000, 1000000] the off-diagonal cases are inconsistent with the statement that Omega is a perfect predictor, so if you take them into account, you are not solving Newcomb, but some other problem where Omega is imperfect with unknown probability. Once the off-diagonal outcomes are removed, CDT trivially agrees with EDT.
First, removal of those scenarios is inconsistent with CDT as it is normally interpreted: CDT evaluates the utility of an act by the expected outcome of an exogenous choice being set without dependence on past causes, i.e. what would happen if a force from some unanticipated outside context came in and forced you to one-box or two-box, regardless of what you would otherwise have done.
It doesn’t matter if the counterfactual computed in this way is unphysical, at least without changing the theory.
Second, to avoid wrangling over this, many presentations add small or epsilon error rates (e.g. the Predictor flips a weighted coin to determine whether to predict accurately or inaccurately, and is accurate 99% of the time, or 99.999999% of the time). What’s your take with that adjustment?
Are you saying that the “CDT as it is normally interpreted” cannot help but fight the hypothetical? Then the Newcomb problem with a perfect predictor is not one where such CDT can be applied at all, it’s simply not in the CDT domain. Or you can interpret CDT as dealing with the possible outcomes only, and happily use it to one-box.
In the second case, first, you assume the existence of the limit if you extrapolate from imperfect to perfect predictor, which is a non-trivial mathematical assumption of continuity and is not guaranteed to hold in general (for example, a circle, no matter low large, is never topologically equivalent to a line).
That notwithstanding, CDT does take probabilities into account, at least the CDT as described in Wikipedia, so the question is, what is the counterfactual probability that if I were to two-box, then I get $1.001M, as opposed to the conditional probability of the same thing. The latter is very low, the former has to be evaluated on some grounds.
if the prediction is for both A and B to be taken, then the player’s decision becomes a matter of choosing between $1,000 (by taking A and B) and $0 (by taking just B), in which case taking both boxes is obviously preferable. But, even if the prediction is for the player to take only B, then taking both boxes yields $1,001,000, and taking only B yields only $1,000,000—taking both boxes is still better, regardless of which prediction has been made.
Unpacking this logic, I conclude that “even if the prediction is for the player to take only B, then taking both boxes yields $1,001,000, and taking only B yields only $1,000,000—taking both boxes is still better” means assigning equal conterfactual probability to both outcomes, which goes against the problem setup, as it discards the available information (“it does not matter what omega did, the past is past, let’s pick the dominant strategy”). This also highlights the discontinuity preventing one from taking this “information-discarding CDT” limit. This is similar to the information-discarding EDT deciding to not smoke in the smoking lesion problem.
Are you saying that the “CDT as it is normally interpreted” cannot help but fight the hypothetical?
It doesn’t have to fight the hypothetical. CDT counterfactuals don’t have to be possible.
The standard CDT algorithm computes the value of each action by computing the expected utility conditional on a miraculous intervention changing one’s decision to that action, separately from early deterministic causes, and computing the causal consequences of that. See Anna’s discussion here, including modifications in which the miraculous intervention changes other things, like one’s earlier dispositions (perhaps before the Predictor scanned you) or the output of one’s algorithm (instantiated in you and the Predictor’s model).
Say before the contents of the boxes are revealed our CDTer assigns some probability p to the state of the world where box B is full and his internal makeup will deterministically lead him to one-box, and probability (1-p) to the state of the world where box B is empty and that his internal makeup will deterministically lead him to two-box.
That notwithstanding, CDT does take probabilities into account, at least the CDT as described in Wikipedia, so the question is, what is the counterfactual probability that if I were to two-box, then I get $1.001M, as opposed to the conditional probability of the same thing. The latter is very low, the former has to be evaluated on some grounds.
Altering your action miraculously and exogenously would not change the box contents causally. So the CDTer uses the old probabilities for the box contents, the utility of one-boxing is computed to be $1,000,000 times p, and the utility of two boxing is calculated to be $1,001,000p+$1,000 times (1-p).
If she is confident that she will apply CDT based on past experience, or introspection, she will have previously updated to thinking that p is very low.
utility of one-boxing is computed to be $1,000,000 times p,
utility of two boxing is calculated to be $1,001,000p+$1,000 times (1-p).
If she is confident that she will apply CDT based on past experience, or introspection, she will have previously updated to thinking that p is very low.
Right, I forgot. The reasoning is “I’m a two-boxer because I follow a loser’s logic and Omega knows it, so I may as well two-box.” There is no anticipation of winning $1,001,000. No, that does not sound quite right...
The last bit about p going low with introspection isn’t necessary. The conclusion (two-boxing preferred, or at best indifference between one-boxing and two-boxing if one is certain one will two-box) follows under CDT with the usual counterfactuals for any value of p.
The reasoning is “well, if the world is such that I am going to two-box, then I should two-box, and if the world is such that I am going to one-box, then I should two-box” Optional extension: “hmm, sounds like I’ll be two-boxing then, alas! No million dollars for me...” (Unless I wind up changing my mind or the like, which keeps p above 0).
CDT doesn’t assign credences to outcomes in the way you are suggesting.
One way to think about it is as follows: Basically CDT says that you should use your prior probability in a state (not an outcome) and update this probability only in those cases where the decision being considered causally influences the state. So whatever prior credence you had in the “box contains $M” state, given that the decision doesn’t causally influence the box contents, you should have that same credence regardless of decision and same for the other state.
There are so many different ways of outlining CDT that I don’t intend to discuss why the above account doesn’t describe each of these versions of CDT but some equivalent answer to that above will apply to all such accounts.
So philosophical proponents of CDT will almost all (all, in my experience) agree that it is rational if choosing a decision theory to follow to choose a one-boxing decision theory but they will say that it is rational if choosing a decision to two-box.
How can one simultaneously
consider it rational, when choosing a decision theory, to pick one that tells you to one-box; and
be a proponent of CDT, a decision theory that tells you to two-box?
It seems to me that this is possible only for those who (1) actually think one can’t or shouldn’t choose a decision theory (c.f. some responses to Pascal’s wager) and/or (2) think it reasonable to be a proponent of a theory it would be irrational to choose. Those both seem a bit odd.
[EDITED to replace some “you”s with “one”s and similar locutions, to clarify that I’m not accusing PhilosophyStudent of being in that position.]
We need to distinguish two meanings of “being a proponent of CDT”. If by “be a proponent of CDT” we mean, “think CDT describes the rational decision” then the answer is simply that the CDTer thinks that rational decisions relate to the causal impact of decisions and rational algorithms relate to the causal impact of algorithms and so there’s no reason to think that the rational decision must be endorsed by the rational algorithm (as we are considering different causal impacts in the two cases).
If by “be a proponent of CDT” we mean “think we should decide according to CDT in all scenarios including NP” then we definitely have a problem but no smart person should be a proponent of CDT in this way (all CDTers should have decided to become one-boxers if they have the capacity to do so because CDT itself entails that this is the best decision)
there’s no reason to think that the rational decision must be endorsed by the rational algorithm (as we are considering different causal impacts in the two cases).
You can describe things this way. This description in hand, what does one do if dropped into NP (the scan has already been made, the boxes filled or not)? Go with the action dictated by algorithm and collect the million, or the lone action and collect the thousand?
(all CDTers should have decided to become one-boxers if they have the capacity to do so because CDT itself entails that this is the best decision)
Are you thinking of something like hiring a hitman to shoot you unless you one-box, so that the payoffs don’t match NP? Or of changing your beliefs about what you should do in NP?
For the former, convenient ways of avoiding the problem aren’t necessarily available, and one can ask why the paraphernalia are needed when no one is stopping you from just one-boxing. For the latter, I’d need a bit more clarification.
This comment was only meant to suggest how it was internally consistent for a CDTer to:
consider it rational, when choosing a decision theory, to pick one that tells you to one-box; and
be a proponent of CDT, a decision theory that tells you to two-box?
In other words, I was not trying here to offer a defence of a view (or even an outline of my view) but merely to show why it is that the CDTer can hold both of these things without inconsistency.
Are you thinking of something like hiring a hitman to shoot you unless you one-box, so that the payoffs don’t match NP? Or of changing your beliefs about what you should do in NP?
I’m thinking about changing your dispositions to decide. How one might do that will depend on their capabilities (for myself, I have some capacity to resolutely commit to later actions without changing my beliefs about the rationality of that decision). For some agents, this may well not be possible.
This comment was only meant to suggest how it was internally consistent for a CDTer to: consider it rational, when choosing a decision theory, to pick one that tells you to one-box; and be a proponent of CDT, a decision theory that tells you to two-box?
You didn’t, quite. CDT favors modifying to one-box on all problems where there is causal influence from your physical decision to make the change. So it favors one-boxing on Newcomb with a Predictor who predicts by scanning you after the change, but two-boxing with respect to earlier causal entanglements, or logical/algorithmic similarities. In the terminology of this post CDT (counterfactuals over acts) attempts to replace itself with counterfactuals over earlier innards at the time of replacement, not counterfactuals over algorithms.
OK, that’s all good, but already part of the standard picture and leaves almost all the arguments intact over cases one didn’t get to precommit for, which is the standard presentation in any case. So I’d say it doesn’t much support the earlier claim:
For those that haven’t, I suspect that the “disagreement” with philosophers is mostly apparent and not actual
Nevertheless, I do think that people on LW who haven’t thought about the issues a lot might well not have a solid enough opinion to be either agreeing or disagreeing with the LW one-boxing view or the two-boxing philosopher’s view. I suspect some of these people just note that one-boxing is the best algorithm and think that this means that they’re agreeing with LW when in fact this leaves them neutral on the issue until they make their claim more precise.
I also think one of the reasons for the lack of two-boxers on LW is that LW often presents two-boxing arguments in a slogan form which fails to do justice to these arguments (see my comments here and here). Which isn’t to say that the two-boxers are right but is to say I think the debate gets skewed unreasonably in one-boxers’ favour on LW (not always, but often enough to influence people’s opinions).
My guess is that a large part of the divergence relates to the fact that LWers and philosophers are focused on different questions. Philosophers (two-boxing philosophers, at least) are focused on the question of which decision “wins” whereas LWers are focused on the question of which theory “wins” (or, at least, this is what it seems to me that a large group of LWers is doing, more on which soon).
So philosophical proponents of CDT will almost all (all, in my experience) agree that it is rational if choosing a decision theory to follow to choose a one-boxing decision theory but they will say that it is rational if choosing a decision to two-box.
A second part of the divergence seems to me to relate to the toolsets available to the divergent groups. LWers have TDT and UDT, philosophers have Parfit on rational irrationality (and a whole massive literature on this sort of issue).
I actually think that LWers need to be described into two distinct groups: those who have done a lot of research into decision theory and those that haven’t.
For those that haven’t, I suspect that the “disagreement” with philosophers is mostly apparent and not actual (these people don’t distinguish the two questions above and don’t realise that philosophers are answering the winning decision question and not the winning theory question and don’t realise that philosophers don’t just ignorantly set aside these issues but have a whole literature on rational irrationality, winning decisions vs winning theories and so on).
This is especially powerful because the story is often told on LW as if LW takes rationality to be about “winning” whereas philosophers are interested in analysing our concept of rationality (and surely it’s easy to pick between these). More accurately, though, it’s between two divergent views of “winning”.
For those that have studied more decision theory, the story is a different one and I don’t know that I know the views of these people in enough depth to comment on them.
From the standpoint of reflective consistency, there should not be a divergence between rational decisions and rational algorithms; the rational algorithm should search for and output the rational decision, and the rational decision should be to adopt the rational algorithm. Suppose you regard Newcomb’s Problem as rewarding an agent with a certain decision-type, namely the sort of agent who one-boxes. TDT can be viewed as an algorithm which searches a space of decision-types and always decides to have the decision-type such that this decision-type has the maximal payoff. (UDT and other extensions of TDT can be viewed as maximizing over spaces broader than decision-types, such as sensory-info-dependent strategies or (in blackmail) maximization vantage points). Once you have an elegant theory which does this, and once you realize that a rational algorithm can just as easily maximize over its own decision-type as the physical consequences of its acts, there is just no reason to regard two-boxing as a winning decision or winning action in any sense, nor regard yourself as needing to occupy a meta-level vantage point in which you maximize over theories. This seems akin to precommitment, and precommitment means dynamic inconsistency means reflective inconsistency. Trying to maximize over theories means you have not found the single theory which directly maximizes without any recursion or metaness, and that means your theory is not maximizing the right thing.
Claiming that TDTers are maximizing over decision theories, then, is very much a CDT standpoint which is not at all how someone who sees logical decision theories as natural would describe it. From our perspective we are just picking the winning algorithm output (be the sort of agent who picks one box) in one shot, and without any retreat to a meta-level. The output of the winning algorithm is the winning decision, that’s what makes the winning algorithm winning.
Yes. Which is to say, clearly you fall into the second class of people (those who have studied decision theory a lot) and hence my explanation was not meant to apply to you.
Which isn’t to say I agree with everything you say.
Decisions can have different causal impacts to decision theories and so there seems to be no reason to accept this claim. Insofar as the rational decision is the decision which wins which depends on the causal effects of the decision and the rational algorithm is the algorithm which wins which depends on the causal effects of the algorithm then there seems to be no reason to think these should coincide. Plus, I like being able to draw distinctions that can’t be drawn using your terminology.
Agreed (if you are faced with a decision of which algorithm to follow). Of course, this is not the decision that you’re faced with in NP (and adding more options is just to deny the hypothetical)
Yes, and I think this is an impressive achievement and I find TDT/UDT to be elegant, useful theories. The fact that I make the distinction between rational theories and rational decisions does not mean I cannot value the answers to both questions.
Well...perhaps. Obviously just because you can maximise over algorithms, it doesn’t follow that you can’t still talk about maximising over causal consequences. So either we have a (boring) semantic debate about what we mean by “decisions” or a debate about practicality: that is, the argument would be that talk about maximising over algorithms is clearly more useful than talk about maximising over causal consequences so why care about the second of these. For the most part, I buy this argument about practicality (but it doesn’t mean that two-boxing philosophers are wrong, just that they’re playing a game that both you and I feel little concern for).
I know what all these phrases mean but don’t know why it follows that your theory is not maximising the “right” thing. Perhaps it is not maximising a thing that you find to be useful or interesting (particularly for self-modifying AIs). If this is what you mean then fine. If you mean, however, that two-boxers are wrong on their own terms then I would need a more compelling argument (I’ve read your TDT paper btw, so reference to that won’t resolve things here).
Sure, the distinction is from the CDT perspective. You use words differently to the proponents of CDT (at which point, the whole difference between LWer views and philosopher’s views should be unsurprising). I’m not really interested in getting into a semantic debate though. I think that LWers are too quick to think that philosophers are playing the game wrong whereas I think the view should actually be that they’re playing the wrong game.
No, my point is that TDT, as a theory, maximizes over a space of decisions, not a space of algorithms, and in holding TDT to be rational, I am not merely holding it to occupy the most rational point in the space of algorithms, but saying that on its target problem class, TDT’s output is indeed always the most rational decision within the space of decisions. I simply don’t believe that it’s particularly rational to maximize over only the physical consequences of an act in a problem where the payoff is determined significantly by logical consequences of your algorithm’s output, such as Omega’s prediction of your output, or cohorts who will decide similarly to you. Your algorithm can choose to have any sort of decision-type it likes, so it should choose the decision-type with the best payoff. There is just nothing rational about blindly shutting your eyes to logical consequences and caring only about physical consequences, any more than there’s something rational about caring only about causes that work through blue things instead of red things. None of this discussion is taking place inside a space of algorithms rather than a space of decisions or object-level outputs.
The fact that you would obviously choose a non-CDT algorithm at meta level, on a fair problem in which payoffs have no lexical dependence on algorithms’ exact code apart from their outputs, is, on my view, very indictive of the rationality of CDT. But the justification for TDT is not that it is what CDT would choose at the meta-level. At the meta-level, if CDT self-modifies at 7pm, it will modify to a new algorithm which one-boxes whenever Omega has glimpsed its source code after 7pm but two-box if Omega saw its code at 6:59pm, even though Omega is taking the self-modification into account in its prediction. Since the original CDT is not reflectively consistent on a fair problem, it must be wrong. Insofar as TDT chooses TDT, when offered a chance to self-modify on problems within its problem space, it is possible that TDT is right for that problem space.
But the main idea is just that TDT is directly outputting the rational action because we want the giant heap of money and not to stick to this strange, ritual concept of the ‘rational’ decision being the one that cares only about causal consequences and not logical consesquences. We need not be forced to increasingly contorted redefinitions of winning in order to say that the two-boxer is winning despite not being rich, or that the problem is unfair to rationalists despite Omega not caring why you choose what you chose and your algorithm being allowed to choose any decision-type it likes. In my mental world, the object-level output of TDT just is the right thing to do, and so there is no need to go meta.
I would also expect this to be much the same reasoning, at a lower level of sophistication, for more casual LW readers; I don’t think they’d be going meta to justify TDT from within CDT, especially since the local writeups of TDT also justified themselves at the object and not meta level.
As you say, academic decision theorists do indeed have Parfit on rational irrationality and a massive literature thereupon, which we locally throw completely out the window. It’s not a viewpoint very suited to self-modifying AI, where the notion of programmers working from a theory which says their AI will end up ‘rationally irrational’ ought to give you the screaming willies. Even notions like, “I will nuke us both if you don’t give me all your lunch money!”, if it works on the blackmailee, can be considered as maximizing over a matrix of strategic responses when the opponent is only maximizing over their action without considering how you’re considering their reactions. We can do anything a Parfitian rational irrationalist can do from within a single theory, and we have no prejudice to call that theory’s advice irrational, nor reason to resort to precommitment.
Interesting. I have a better grasp of what you’re saying now (or maybe not what you’re saying, but why someone might think that what you are saying is true). Rapid responses to information that needs digesting are unhelpful so I have nothing further to say for now (though I still think my original post goes some way to explaining the opinions of those on LW that haven’t thought in detail about decision theory: a focus on algorithm rather than decisions means that people think one-boxing is rational even if they don’t agree with your claims about focusing on logical rather than causal consequences [and for these people, the disagreement with CDT is only apparent]).
ETA: On the CDT bit, which I can comment on, I think you overstate how “increasingly contorted” the CDTers “redefinitions of winning” are. They focus on whether the decision has the best causal consequences. This is hardly contorted (it’s fairly straightforward) and doesn’t seem to be much of a redefinition: if you’re focusing on “winning decisions” as the CDTer does (rather than “winning agents”) it seems to me that the causal consequences are the most natural way of separating out the part of the agent’s winning relates to the decision from the parts that relate to the agent more generally. As a definition of a winning decision, I think the definition used on LW is more revisionary than the CDTers definition (as a definition of winning algorithm or agent, the definition on LW seems natural but as a way of separating out the part of the agent’s winning that relate to the decision, logical consequences seems far more revisionary). In other words, everyone agrees what winning means. What people disagree about is when we can attribute the winningness to the decision rather than to some other factor and I think the CDTer takes the natural line here (which isn’t to say they’re right but I think the accusations of “contorted” definitions are unreasonable).
If agents whose decision-type is always the decision with the best physical consequences ignoring logical consequences, don’t end up rich, then it seems to me to require a good deal of contortion to redefine the “winning decision” as “the decision with the best physical consequences”, and in particular you must suppose that Omega is unfairly punishing rationalists even though Omega has no care for your algorithm apart from the decision it outputs, etc. I think that to believe that the Prisoner’s Dilemma against your clone or Parfit’s Hitchhiker or voting are ‘unfair’ situations requires explicit philosophical training, and most naive respondents would just think that the winning decision was the one corresponding to the giant heap of money on a problem where the scenario doesn’t care about your algorithm apart from its output.
To clarify: everyone should agree that the winning agent is the one with the giant heap of money on the table. The question is how we attribute parts of that winning to the decision rather than other aspects of the agent (because this is the game the CDTers are playing and you said you think they are playing the game wrong, not just playing the wrong game). CDTers use the following means to attribute winning to the decision: they attribute the winning that is caused by the decision. This may be wrong and there may be room to demonstrate that this is the case but it seems unreasonable to me to describe it as “contorted” (it’s actually quite a straightforward way to attribute the winning to the decision) and I think that using such descriptions skews the debate in an unreasonable way. This is basically just a repetition of my previous point so perhaps further reiteration is not of any use to either of us...
In terms of NP being “unfair”, we need to be clear about what the CDTer means by this (using the word “unfair” makes it sound like the CDTer is just closing their eyes and crying). On the basic level, though, the CDTer simply mean that the agent’s winning in this case isn’t entirely determined by the winning that can be attributed to the decision and hence that the agent’s winning is not a good guide to what decision wins. More specifically, the claim is that the agent’s winning is determined in part by things that are correlated with the agent’s decision but which aren’t attributable to the agent’s decision and so the agent’s overall winning in this case is a bad guide to determining which decision wins. Obviously you would disagree with the claims they’re making but this is different to claiming that CDTers think NP is unfair in some more everyday sense (where it seems absurd to think that Omega is being unfair because Omega cares only about what decision you are going to make).
I don’t necessarily think the CDTers are right but I don’t think the way you outline their views does justice to them.
So to summarise. On LW the story is often told as follows: CDTers don’t care about winning (at least not in any natural sense) and they avoid the problems raised by NP by saying the scenario is unfair. This makes the CDTer sound not just wrong but also so foolish it’s hard to understand why the CDTer exists.
But expanded to show what the CDT actually means, this becomes: CDTers agree that winning is what matters to rationality but because they’re interested in rational decisions they are interested in what winning can be attributed to decisions. Specifically, they say that winning can be attributed to a decision if it was caused by that decision. In response to NP, the CDTer notes that the agent’s overall winning is not a good guide to the winning decision as in this case, the agent’s winning it also determined by factors other than their decisions (that is, the winning cannot be attributed to the agent’s decision). Further, because the agent’s winnings correlate with their decisions, even though it can’t be attributed to their decisions, the case can be particularly misleading when trying to determine the winning decisions.
Now this second view may be both false and may be playing the wrong game but it at least gives the CDTer a fair hearing in a way that the first view doesn’t.
In Newcomb the outcome “pick two boxes, get $1.001M” is not in the outcome space, unless you fight the hypothetical, so the properly restricted CDT one-boxes. In the payoff matrix [1000, 0; 1001000, 1000000] the off-diagonal cases are inconsistent with the statement that Omega is a perfect predictor, so if you take them into account, you are not solving Newcomb, but some other problem where Omega is imperfect with unknown probability. Once the off-diagonal outcomes are removed, CDT trivially agrees with EDT.
First, removal of those scenarios is inconsistent with CDT as it is normally interpreted: CDT evaluates the utility of an act by the expected outcome of an exogenous choice being set without dependence on past causes, i.e. what would happen if a force from some unanticipated outside context came in and forced you to one-box or two-box, regardless of what you would otherwise have done.
It doesn’t matter if the counterfactual computed in this way is unphysical, at least without changing the theory.
Second, to avoid wrangling over this, many presentations add small or epsilon error rates (e.g. the Predictor flips a weighted coin to determine whether to predict accurately or inaccurately, and is accurate 99% of the time, or 99.999999% of the time). What’s your take with that adjustment?
Are you saying that the “CDT as it is normally interpreted” cannot help but fight the hypothetical? Then the Newcomb problem with a perfect predictor is not one where such CDT can be applied at all, it’s simply not in the CDT domain. Or you can interpret CDT as dealing with the possible outcomes only, and happily use it to one-box.
In the second case, first, you assume the existence of the limit if you extrapolate from imperfect to perfect predictor, which is a non-trivial mathematical assumption of continuity and is not guaranteed to hold in general (for example, a circle, no matter low large, is never topologically equivalent to a line).
That notwithstanding, CDT does take probabilities into account, at least the CDT as described in Wikipedia, so the question is, what is the counterfactual probability that if I were to two-box, then I get $1.001M, as opposed to the conditional probability of the same thing. The latter is very low, the former has to be evaluated on some grounds.
The standard two-boxer reasoning is that
Unpacking this logic, I conclude that “even if the prediction is for the player to take only B, then taking both boxes yields $1,001,000, and taking only B yields only $1,000,000—taking both boxes is still better” means assigning equal conterfactual probability to both outcomes, which goes against the problem setup, as it discards the available information (“it does not matter what omega did, the past is past, let’s pick the dominant strategy”). This also highlights the discontinuity preventing one from taking this “information-discarding CDT” limit. This is similar to the information-discarding EDT deciding to not smoke in the smoking lesion problem.
The standard CDT algorithm computes the value of each action by computing the expected utility conditional on a miraculous intervention changing one’s decision to that action, separately from early deterministic causes, and computing the causal consequences of that. See Anna’s discussion here, including modifications in which the miraculous intervention changes other things, like one’s earlier dispositions (perhaps before the Predictor scanned you) or the output of one’s algorithm (instantiated in you and the Predictor’s model).
Say before the contents of the boxes are revealed our CDTer assigns some probability p to the state of the world where box B is full and his internal makeup will deterministically lead him to one-box, and probability (1-p) to the state of the world where box B is empty and that his internal makeup will deterministically lead him to two-box.
Altering your action miraculously and exogenously would not change the box contents causally. So the CDTer uses the old probabilities for the box contents, the utility of one-boxing is computed to be $1,000,000 times p, and the utility of two boxing is calculated to be $1,001,000p+$1,000 times (1-p).
If she is confident that she will apply CDT based on past experience, or introspection, she will have previously updated to thinking that p is very low.
Right, I forgot. The reasoning is “I’m a two-boxer because I follow a loser’s logic and Omega knows it, so I may as well two-box.” There is no anticipation of winning $1,001,000. No, that does not sound quite right...
The last bit about p going low with introspection isn’t necessary. The conclusion (two-boxing preferred, or at best indifference between one-boxing and two-boxing if one is certain one will two-box) follows under CDT with the usual counterfactuals for any value of p.
The reasoning is “well, if the world is such that I am going to two-box, then I should two-box, and if the world is such that I am going to one-box, then I should two-box” Optional extension: “hmm, sounds like I’ll be two-boxing then, alas! No million dollars for me...” (Unless I wind up changing my mind or the like, which keeps p above 0).
CDT doesn’t assign credences to outcomes in the way you are suggesting.
One way to think about it is as follows: Basically CDT says that you should use your prior probability in a state (not an outcome) and update this probability only in those cases where the decision being considered causally influences the state. So whatever prior credence you had in the “box contains $M” state, given that the decision doesn’t causally influence the box contents, you should have that same credence regardless of decision and same for the other state.
There are so many different ways of outlining CDT that I don’t intend to discuss why the above account doesn’t describe each of these versions of CDT but some equivalent answer to that above will apply to all such accounts.
How can one simultaneously
consider it rational, when choosing a decision theory, to pick one that tells you to one-box; and
be a proponent of CDT, a decision theory that tells you to two-box?
It seems to me that this is possible only for those who (1) actually think one can’t or shouldn’t choose a decision theory (c.f. some responses to Pascal’s wager) and/or (2) think it reasonable to be a proponent of a theory it would be irrational to choose. Those both seem a bit odd.
[EDITED to replace some “you”s with “one”s and similar locutions, to clarify that I’m not accusing PhilosophyStudent of being in that position.]
We need to distinguish two meanings of “being a proponent of CDT”. If by “be a proponent of CDT” we mean, “think CDT describes the rational decision” then the answer is simply that the CDTer thinks that rational decisions relate to the causal impact of decisions and rational algorithms relate to the causal impact of algorithms and so there’s no reason to think that the rational decision must be endorsed by the rational algorithm (as we are considering different causal impacts in the two cases).
If by “be a proponent of CDT” we mean “think we should decide according to CDT in all scenarios including NP” then we definitely have a problem but no smart person should be a proponent of CDT in this way (all CDTers should have decided to become one-boxers if they have the capacity to do so because CDT itself entails that this is the best decision)
I think this elides distinctions too quickly.
You can describe things this way. This description in hand, what does one do if dropped into NP (the scan has already been made, the boxes filled or not)? Go with the action dictated by algorithm and collect the million, or the lone action and collect the thousand?
Are you thinking of something like hiring a hitman to shoot you unless you one-box, so that the payoffs don’t match NP? Or of changing your beliefs about what you should do in NP?
For the former, convenient ways of avoiding the problem aren’t necessarily available, and one can ask why the paraphernalia are needed when no one is stopping you from just one-boxing. For the latter, I’d need a bit more clarification.
This comment was only meant to suggest how it was internally consistent for a CDTer to:
In other words, I was not trying here to offer a defence of a view (or even an outline of my view) but merely to show why it is that the CDTer can hold both of these things without inconsistency.
I’m thinking about changing your dispositions to decide. How one might do that will depend on their capabilities (for myself, I have some capacity to resolutely commit to later actions without changing my beliefs about the rationality of that decision). For some agents, this may well not be possible.
You didn’t, quite. CDT favors modifying to one-box on all problems where there is causal influence from your physical decision to make the change. So it favors one-boxing on Newcomb with a Predictor who predicts by scanning you after the change, but two-boxing with respect to earlier causal entanglements, or logical/algorithmic similarities. In the terminology of this post CDT (counterfactuals over acts) attempts to replace itself with counterfactuals over earlier innards at the time of replacement, not counterfactuals over algorithms.
Yes. So it is consistent for a CDTer to believe that:
(1) When picking a decision theory, you should pick one that tells you to one-box in instances of NP where the prediction has not yet occurred; and
(2) CDT correctly describes two-boxing as the rational decision in NP.
I committed the sin of brevity in order to save time (LW is kind of a guilty pleasure rather than something I actually have the time to be doing).
OK, that’s all good, but already part of the standard picture and leaves almost all the arguments intact over cases one didn’t get to precommit for, which is the standard presentation in any case. So I’d say it doesn’t much support the earlier claim:
Also:
No pressure.
Perhaps my earlier claim was too strong.
Nevertheless, I do think that people on LW who haven’t thought about the issues a lot might well not have a solid enough opinion to be either agreeing or disagreeing with the LW one-boxing view or the two-boxing philosopher’s view. I suspect some of these people just note that one-boxing is the best algorithm and think that this means that they’re agreeing with LW when in fact this leaves them neutral on the issue until they make their claim more precise.
I also think one of the reasons for the lack of two-boxers on LW is that LW often presents two-boxing arguments in a slogan form which fails to do justice to these arguments (see my comments here and here). Which isn’t to say that the two-boxers are right but is to say I think the debate gets skewed unreasonably in one-boxers’ favour on LW (not always, but often enough to influence people’s opinions).