From the standpoint of reflective consistency, there should not be a divergence between rational decisions and rational algorithms; the rational algorithm should search for and output the rational decision, and the rational decision should be to adopt the rational algorithm. Suppose you regard Newcomb’s Problem as rewarding an agent with a certain decision-type, namely the sort of agent who one-boxes. TDT can be viewed as an algorithm which searches a space of decision-types and always decides to have the decision-type such that this decision-type has the maximal payoff. (UDT and other extensions of TDT can be viewed as maximizing over spaces broader than decision-types, such as sensory-info-dependent strategies or (in blackmail) maximization vantage points). Once you have an elegant theory which does this, and once you realize that a rational algorithm can just as easily maximize over its own decision-type as the physical consequences of its acts, there is just no reason to regard two-boxing as a winning decision or winning action in any sense, nor regard yourself as needing to occupy a meta-level vantage point in which you maximize over theories. This seems akin to precommitment, and precommitment means dynamic inconsistency means reflective inconsistency. Trying to maximize over theories means you have not found the single theory which directly maximizes without any recursion or metaness, and that means your theory is not maximizing the right thing.
Claiming that TDTers are maximizing over decision theories, then, is very much a CDT standpoint which is not at all how someone who sees logical decision theories as natural would describe it. From our perspective we are just picking the winning algorithm output (be the sort of agent who picks one box) in one shot, and without any retreat to a meta-level. The output of the winning algorithm is the winning decision, that’s what makes the winning algorithm winning.
Yes. Which is to say, clearly you fall into the second class of people (those who have studied decision theory a lot) and hence my explanation was not meant to apply to you.
Which isn’t to say I agree with everything you say.
From the standpoint of reflective consistency, there should not be a divergence between rational decisions and rational algorithms; the rational algorithm should search for and output the rational decision,
Decisions can have different causal impacts to decision theories and so there seems to be no reason to accept this claim. Insofar as the rational decision is the decision which wins which depends on the causal effects of the decision and the rational algorithm is the algorithm which wins which depends on the causal effects of the algorithm then there seems to be no reason to think these should coincide. Plus, I like being able to draw distinctions that can’t be drawn using your terminology.
and the rational decision should be to adopt the rational algorithm.
Agreed (if you are faced with a decision of which algorithm to follow). Of course, this is not the decision that you’re faced with in NP (and adding more options is just to deny the hypothetical)
Suppose you regard Newcomb’s Problem as rewarding an agent with a certain decision-type, namely the sort of agent who one-boxes. TDT can be viewed as an algorithm which searches a space of decision-types and always decides to have the decision-type such that this decision-type has the maximal payoff. (UDT and other extensions of TDT can be viewed as maximizing over spaces broader than decision-types, such as sensory-info-dependent strategies or (in blackmail) maximization vantage points).
Yes, and I think this is an impressive achievement and I find TDT/UDT to be elegant, useful theories. The fact that I make the distinction between rational theories and rational decisions does not mean I cannot value the answers to both questions.
once you realize that a rational algorithm can just as easily maximize over its own decision-type as the physical consequences of its acts, there is just no reason to regard two-boxing as a winning decision or winning action in any sense, nor regard yourself as needing to occupy a meta-level vantage point in which you maximize over theories.
Well...perhaps. Obviously just because you can maximise over algorithms, it doesn’t follow that you can’t still talk about maximising over causal consequences. So either we have a (boring) semantic debate about what we mean by “decisions” or a debate about practicality: that is, the argument would be that talk about maximising over algorithms is clearly more useful than talk about maximising over causal consequences so why care about the second of these. For the most part, I buy this argument about practicality (but it doesn’t mean that two-boxing philosophers are wrong, just that they’re playing a game that both you and I feel little concern for).
This seems akin to precommitment, and precommitment means dynamic inconsistency means reflective inconsistency. Trying to maximize over theories means you have not found the single theory which directly maximizes without any recursion or metaness, and that means your theory is not maximizing the right thing.
I know what all these phrases mean but don’t know why it follows that your theory is not maximising the “right” thing. Perhaps it is not maximising a thing that you find to be useful or interesting (particularly for self-modifying AIs). If this is what you mean then fine. If you mean, however, that two-boxers are wrong on their own terms then I would need a more compelling argument (I’ve read your TDT paper btw, so reference to that won’t resolve things here).
Claiming that TDTers are maximizing over decision theories, then, is very much a CDT standpoint which is not at all how someone who sees logical decision theories as natural would describe it. From our perspective we are just picking the winning algorithm output (be the sort of agent who picks one box) in one shot, and without any retreat to a meta-level. The output of the winning algorithm is the winning decision, that’s what makes the winning algorithm winning.
Sure, the distinction is from the CDT perspective. You use words differently to the proponents of CDT (at which point, the whole difference between LWer views and philosopher’s views should be unsurprising). I’m not really interested in getting into a semantic debate though. I think that LWers are too quick to think that philosophers are playing the game wrong whereas I think the view should actually be that they’re playing the wrong game.
Well...perhaps. Obviously just because you can maximise over algorithms, it doesn’t follow that you can’t still talk about maximising over causal consequences. So either we have a (boring) semantic debate about what we mean by “decisions” or a debate about practicality: that is, the argument would be that talk about maximising over algorithms is clearly more useful than talk about maximising over causal consequences so why care about the second of these.
No, my point is that TDT, as a theory, maximizes over a space of decisions, not a space of algorithms, and in holding TDT to be rational, I am not merely holding it to occupy the most rational point in the space of algorithms, but saying that on its target problem class, TDT’s output is indeed always the most rational decision within the space of decisions. I simply don’t believe that it’s particularly rational to maximize over only the physical consequences of an act in a problem where the payoff is determined significantly by logical consequences of your algorithm’s output, such as Omega’s prediction of your output, or cohorts who will decide similarly to you. Your algorithm can choose to have any sort of decision-type it likes, so it should choose the decision-type with the best payoff. There is just nothing rational about blindly shutting your eyes to logical consequences and caring only about physical consequences, any more than there’s something rational about caring only about causes that work through blue things instead of red things. None of this discussion is taking place inside a space of algorithms rather than a space of decisions or object-level outputs.
The fact that you would obviously choose a non-CDT algorithm at meta level, on a fair problem in which payoffs have no lexical dependence on algorithms’ exact code apart from their outputs, is, on my view, very indictive of the rationality of CDT. But the justification for TDT is not that it is what CDT would choose at the meta-level. At the meta-level, if CDT self-modifies at 7pm, it will modify to a new algorithm which one-boxes whenever Omega has glimpsed its source code after 7pm but two-box if Omega saw its code at 6:59pm, even though Omega is taking the self-modification into account in its prediction. Since the original CDT is not reflectively consistent on a fair problem, it must be wrong. Insofar as TDT chooses TDT, when offered a chance to self-modify on problems within its problem space, it is possible that TDT is right for that problem space.
But the main idea is just that TDT is directly outputting the rational action because we want the giant heap of money and not to stick to this strange, ritual concept of the ‘rational’ decision being the one that cares only about causal consequences and not logical consesquences. We need not be forced to increasingly contorted redefinitions of winning in order to say that the two-boxer is winning despite not being rich, or that the problem is unfair to rationalists despite Omega not caring why you choose what you chose and your algorithm being allowed to choose any decision-type it likes. In my mental world, the object-level output of TDT just is the right thing to do, and so there is no need to go meta.
I would also expect this to be much the same reasoning, at a lower level of sophistication, for more casual LW readers; I don’t think they’d be going meta to justify TDT from within CDT, especially since the local writeups of TDT also justified themselves at the object and not meta level.
As you say, academic decision theorists do indeed have Parfit on rational irrationality and a massive literature thereupon, which we locally throw completely out the window. It’s not a viewpoint very suited to self-modifying AI, where the notion of programmers working from a theory which says their AI will end up ‘rationally irrational’ ought to give you the screaming willies. Even notions like, “I will nuke us both if you don’t give me all your lunch money!”, if it works on the blackmailee, can be considered as maximizing over a matrix of strategic responses when the opponent is only maximizing over their action without considering how you’re considering their reactions. We can do anything a Parfitian rational irrationalist can do from within a single theory, and we have no prejudice to call that theory’s advice irrational, nor reason to resort to precommitment.
Interesting. I have a better grasp of what you’re saying now (or maybe not what you’re saying, but why someone might think that what you are saying is true). Rapid responses to information that needs digesting are unhelpful so I have nothing further to say for now (though I still think my original post goes some way to explaining the opinions of those on LW that haven’t thought in detail about decision theory: a focus on algorithm rather than decisions means that people think one-boxing is rational even if they don’t agree with your claims about focusing on logical rather than causal consequences [and for these people, the disagreement with CDT is only apparent]).
ETA: On the CDT bit, which I can comment on, I think you overstate how “increasingly contorted” the CDTers “redefinitions of winning” are. They focus on whether the decision has the best causal consequences. This is hardly contorted (it’s fairly straightforward) and doesn’t seem to be much of a redefinition: if you’re focusing on “winning decisions” as the CDTer does (rather than “winning agents”) it seems to me that the causal consequences are the most natural way of separating out the part of the agent’s winning relates to the decision from the parts that relate to the agent more generally. As a definition of a winning decision, I think the definition used on LW is more revisionary than the CDTers definition (as a definition of winning algorithm or agent, the definition on LW seems natural but as a way of separating out the part of the agent’s winning that relate to the decision, logical consequences seems far more revisionary). In other words, everyone agrees what winning means. What people disagree about is when we can attribute the winningness to the decision rather than to some other factor and I think the CDTer takes the natural line here (which isn’t to say they’re right but I think the accusations of “contorted” definitions are unreasonable).
If agents whose decision-type is always the decision with the best physical consequences ignoring logical consequences, don’t end up rich, then it seems to me to require a good deal of contortion to redefine the “winning decision” as “the decision with the best physical consequences”, and in particular you must suppose that Omega is unfairly punishing rationalists even though Omega has no care for your algorithm apart from the decision it outputs, etc. I think that to believe that the Prisoner’s Dilemma against your clone or Parfit’s Hitchhiker or voting are ‘unfair’ situations requires explicit philosophical training, and most naive respondents would just think that the winning decision was the one corresponding to the giant heap of money on a problem where the scenario doesn’t care about your algorithm apart from its output.
To clarify: everyone should agree that the winning agent is the one with the giant heap of money on the table. The question is how we attribute parts of that winning to the decision rather than other aspects of the agent (because this is the game the CDTers are playing and you said you think they are playing the game wrong, not just playing the wrong game). CDTers use the following means to attribute winning to the decision: they attribute the winning that is caused by the decision. This may be wrong and there may be room to demonstrate that this is the case but it seems unreasonable to me to describe it as “contorted” (it’s actually quite a straightforward way to attribute the winning to the decision) and I think that using such descriptions skews the debate in an unreasonable way. This is basically just a repetition of my previous point so perhaps further reiteration is not of any use to either of us...
In terms of NP being “unfair”, we need to be clear about what the CDTer means by this (using the word “unfair” makes it sound like the CDTer is just closing their eyes and crying). On the basic level, though, the CDTer simply mean that the agent’s winning in this case isn’t entirely determined by the winning that can be attributed to the decision and hence that the agent’s winning is not a good guide to what decision wins. More specifically, the claim is that the agent’s winning is determined in part by things that are correlated with the agent’s decision but which aren’t attributable to the agent’s decision and so the agent’s overall winning in this case is a bad guide to determining which decision wins. Obviously you would disagree with the claims they’re making but this is different to claiming that CDTers think NP is unfair in some more everyday sense (where it seems absurd to think that Omega is being unfair because Omega cares only about what decision you are going to make).
I don’t necessarily think the CDTers are right but I don’t think the way you outline their views does justice to them.
So to summarise. On LW the story is often told as follows: CDTers don’t care about winning (at least not in any natural sense) and they avoid the problems raised by NP by saying the scenario is unfair. This makes the CDTer sound not just wrong but also so foolish it’s hard to understand why the CDTer exists.
But expanded to show what the CDT actually means, this becomes: CDTers agree that winning is what matters to rationality but because they’re interested in rational decisions they are interested in what winning can be attributed to decisions. Specifically, they say that winning can be attributed to a decision if it was caused by that decision. In response to NP, the CDTer notes that the agent’s overall winning is not a good guide to the winning decision as in this case, the agent’s winning it also determined by factors other than their decisions (that is, the winning cannot be attributed to the agent’s decision). Further, because the agent’s winnings correlate with their decisions, even though it can’t be attributed to their decisions, the case can be particularly misleading when trying to determine the winning decisions.
Now this second view may be both false and may be playing the wrong game but it at least gives the CDTer a fair hearing in a way that the first view doesn’t.
From the standpoint of reflective consistency, there should not be a divergence between rational decisions and rational algorithms; the rational algorithm should search for and output the rational decision, and the rational decision should be to adopt the rational algorithm. Suppose you regard Newcomb’s Problem as rewarding an agent with a certain decision-type, namely the sort of agent who one-boxes. TDT can be viewed as an algorithm which searches a space of decision-types and always decides to have the decision-type such that this decision-type has the maximal payoff. (UDT and other extensions of TDT can be viewed as maximizing over spaces broader than decision-types, such as sensory-info-dependent strategies or (in blackmail) maximization vantage points). Once you have an elegant theory which does this, and once you realize that a rational algorithm can just as easily maximize over its own decision-type as the physical consequences of its acts, there is just no reason to regard two-boxing as a winning decision or winning action in any sense, nor regard yourself as needing to occupy a meta-level vantage point in which you maximize over theories. This seems akin to precommitment, and precommitment means dynamic inconsistency means reflective inconsistency. Trying to maximize over theories means you have not found the single theory which directly maximizes without any recursion or metaness, and that means your theory is not maximizing the right thing.
Claiming that TDTers are maximizing over decision theories, then, is very much a CDT standpoint which is not at all how someone who sees logical decision theories as natural would describe it. From our perspective we are just picking the winning algorithm output (be the sort of agent who picks one box) in one shot, and without any retreat to a meta-level. The output of the winning algorithm is the winning decision, that’s what makes the winning algorithm winning.
Yes. Which is to say, clearly you fall into the second class of people (those who have studied decision theory a lot) and hence my explanation was not meant to apply to you.
Which isn’t to say I agree with everything you say.
Decisions can have different causal impacts to decision theories and so there seems to be no reason to accept this claim. Insofar as the rational decision is the decision which wins which depends on the causal effects of the decision and the rational algorithm is the algorithm which wins which depends on the causal effects of the algorithm then there seems to be no reason to think these should coincide. Plus, I like being able to draw distinctions that can’t be drawn using your terminology.
Agreed (if you are faced with a decision of which algorithm to follow). Of course, this is not the decision that you’re faced with in NP (and adding more options is just to deny the hypothetical)
Yes, and I think this is an impressive achievement and I find TDT/UDT to be elegant, useful theories. The fact that I make the distinction between rational theories and rational decisions does not mean I cannot value the answers to both questions.
Well...perhaps. Obviously just because you can maximise over algorithms, it doesn’t follow that you can’t still talk about maximising over causal consequences. So either we have a (boring) semantic debate about what we mean by “decisions” or a debate about practicality: that is, the argument would be that talk about maximising over algorithms is clearly more useful than talk about maximising over causal consequences so why care about the second of these. For the most part, I buy this argument about practicality (but it doesn’t mean that two-boxing philosophers are wrong, just that they’re playing a game that both you and I feel little concern for).
I know what all these phrases mean but don’t know why it follows that your theory is not maximising the “right” thing. Perhaps it is not maximising a thing that you find to be useful or interesting (particularly for self-modifying AIs). If this is what you mean then fine. If you mean, however, that two-boxers are wrong on their own terms then I would need a more compelling argument (I’ve read your TDT paper btw, so reference to that won’t resolve things here).
Sure, the distinction is from the CDT perspective. You use words differently to the proponents of CDT (at which point, the whole difference between LWer views and philosopher’s views should be unsurprising). I’m not really interested in getting into a semantic debate though. I think that LWers are too quick to think that philosophers are playing the game wrong whereas I think the view should actually be that they’re playing the wrong game.
No, my point is that TDT, as a theory, maximizes over a space of decisions, not a space of algorithms, and in holding TDT to be rational, I am not merely holding it to occupy the most rational point in the space of algorithms, but saying that on its target problem class, TDT’s output is indeed always the most rational decision within the space of decisions. I simply don’t believe that it’s particularly rational to maximize over only the physical consequences of an act in a problem where the payoff is determined significantly by logical consequences of your algorithm’s output, such as Omega’s prediction of your output, or cohorts who will decide similarly to you. Your algorithm can choose to have any sort of decision-type it likes, so it should choose the decision-type with the best payoff. There is just nothing rational about blindly shutting your eyes to logical consequences and caring only about physical consequences, any more than there’s something rational about caring only about causes that work through blue things instead of red things. None of this discussion is taking place inside a space of algorithms rather than a space of decisions or object-level outputs.
The fact that you would obviously choose a non-CDT algorithm at meta level, on a fair problem in which payoffs have no lexical dependence on algorithms’ exact code apart from their outputs, is, on my view, very indictive of the rationality of CDT. But the justification for TDT is not that it is what CDT would choose at the meta-level. At the meta-level, if CDT self-modifies at 7pm, it will modify to a new algorithm which one-boxes whenever Omega has glimpsed its source code after 7pm but two-box if Omega saw its code at 6:59pm, even though Omega is taking the self-modification into account in its prediction. Since the original CDT is not reflectively consistent on a fair problem, it must be wrong. Insofar as TDT chooses TDT, when offered a chance to self-modify on problems within its problem space, it is possible that TDT is right for that problem space.
But the main idea is just that TDT is directly outputting the rational action because we want the giant heap of money and not to stick to this strange, ritual concept of the ‘rational’ decision being the one that cares only about causal consequences and not logical consesquences. We need not be forced to increasingly contorted redefinitions of winning in order to say that the two-boxer is winning despite not being rich, or that the problem is unfair to rationalists despite Omega not caring why you choose what you chose and your algorithm being allowed to choose any decision-type it likes. In my mental world, the object-level output of TDT just is the right thing to do, and so there is no need to go meta.
I would also expect this to be much the same reasoning, at a lower level of sophistication, for more casual LW readers; I don’t think they’d be going meta to justify TDT from within CDT, especially since the local writeups of TDT also justified themselves at the object and not meta level.
As you say, academic decision theorists do indeed have Parfit on rational irrationality and a massive literature thereupon, which we locally throw completely out the window. It’s not a viewpoint very suited to self-modifying AI, where the notion of programmers working from a theory which says their AI will end up ‘rationally irrational’ ought to give you the screaming willies. Even notions like, “I will nuke us both if you don’t give me all your lunch money!”, if it works on the blackmailee, can be considered as maximizing over a matrix of strategic responses when the opponent is only maximizing over their action without considering how you’re considering their reactions. We can do anything a Parfitian rational irrationalist can do from within a single theory, and we have no prejudice to call that theory’s advice irrational, nor reason to resort to precommitment.
Interesting. I have a better grasp of what you’re saying now (or maybe not what you’re saying, but why someone might think that what you are saying is true). Rapid responses to information that needs digesting are unhelpful so I have nothing further to say for now (though I still think my original post goes some way to explaining the opinions of those on LW that haven’t thought in detail about decision theory: a focus on algorithm rather than decisions means that people think one-boxing is rational even if they don’t agree with your claims about focusing on logical rather than causal consequences [and for these people, the disagreement with CDT is only apparent]).
ETA: On the CDT bit, which I can comment on, I think you overstate how “increasingly contorted” the CDTers “redefinitions of winning” are. They focus on whether the decision has the best causal consequences. This is hardly contorted (it’s fairly straightforward) and doesn’t seem to be much of a redefinition: if you’re focusing on “winning decisions” as the CDTer does (rather than “winning agents”) it seems to me that the causal consequences are the most natural way of separating out the part of the agent’s winning relates to the decision from the parts that relate to the agent more generally. As a definition of a winning decision, I think the definition used on LW is more revisionary than the CDTers definition (as a definition of winning algorithm or agent, the definition on LW seems natural but as a way of separating out the part of the agent’s winning that relate to the decision, logical consequences seems far more revisionary). In other words, everyone agrees what winning means. What people disagree about is when we can attribute the winningness to the decision rather than to some other factor and I think the CDTer takes the natural line here (which isn’t to say they’re right but I think the accusations of “contorted” definitions are unreasonable).
If agents whose decision-type is always the decision with the best physical consequences ignoring logical consequences, don’t end up rich, then it seems to me to require a good deal of contortion to redefine the “winning decision” as “the decision with the best physical consequences”, and in particular you must suppose that Omega is unfairly punishing rationalists even though Omega has no care for your algorithm apart from the decision it outputs, etc. I think that to believe that the Prisoner’s Dilemma against your clone or Parfit’s Hitchhiker or voting are ‘unfair’ situations requires explicit philosophical training, and most naive respondents would just think that the winning decision was the one corresponding to the giant heap of money on a problem where the scenario doesn’t care about your algorithm apart from its output.
To clarify: everyone should agree that the winning agent is the one with the giant heap of money on the table. The question is how we attribute parts of that winning to the decision rather than other aspects of the agent (because this is the game the CDTers are playing and you said you think they are playing the game wrong, not just playing the wrong game). CDTers use the following means to attribute winning to the decision: they attribute the winning that is caused by the decision. This may be wrong and there may be room to demonstrate that this is the case but it seems unreasonable to me to describe it as “contorted” (it’s actually quite a straightforward way to attribute the winning to the decision) and I think that using such descriptions skews the debate in an unreasonable way. This is basically just a repetition of my previous point so perhaps further reiteration is not of any use to either of us...
In terms of NP being “unfair”, we need to be clear about what the CDTer means by this (using the word “unfair” makes it sound like the CDTer is just closing their eyes and crying). On the basic level, though, the CDTer simply mean that the agent’s winning in this case isn’t entirely determined by the winning that can be attributed to the decision and hence that the agent’s winning is not a good guide to what decision wins. More specifically, the claim is that the agent’s winning is determined in part by things that are correlated with the agent’s decision but which aren’t attributable to the agent’s decision and so the agent’s overall winning in this case is a bad guide to determining which decision wins. Obviously you would disagree with the claims they’re making but this is different to claiming that CDTers think NP is unfair in some more everyday sense (where it seems absurd to think that Omega is being unfair because Omega cares only about what decision you are going to make).
I don’t necessarily think the CDTers are right but I don’t think the way you outline their views does justice to them.
So to summarise. On LW the story is often told as follows: CDTers don’t care about winning (at least not in any natural sense) and they avoid the problems raised by NP by saying the scenario is unfair. This makes the CDTer sound not just wrong but also so foolish it’s hard to understand why the CDTer exists.
But expanded to show what the CDT actually means, this becomes: CDTers agree that winning is what matters to rationality but because they’re interested in rational decisions they are interested in what winning can be attributed to decisions. Specifically, they say that winning can be attributed to a decision if it was caused by that decision. In response to NP, the CDTer notes that the agent’s overall winning is not a good guide to the winning decision as in this case, the agent’s winning it also determined by factors other than their decisions (that is, the winning cannot be attributed to the agent’s decision). Further, because the agent’s winnings correlate with their decisions, even though it can’t be attributed to their decisions, the case can be particularly misleading when trying to determine the winning decisions.
Now this second view may be both false and may be playing the wrong game but it at least gives the CDTer a fair hearing in a way that the first view doesn’t.