One-boxers end up with 1 000 000 utility
Two-boxers end up with 1 000 utility
So everyone agrees that one-boxers are the winning agents (1 000 000 > 1 000)
The question is, how much of this utility can be attributed to the agent’s decision rather than type. The two-boxer says that to answer this question we ask about what utility the agent’s decision caused them to gain. So they say that we can attribute the following utility to the decisions:
One-boxing: 0
Two-boxing: 1000
And the following utility to the agent’s type (there will be some double counting because of overlapping causal effects):
One-boxing type: 1 000 000
Two-boxing type: 1 000
So the proponent of two-boxing says that the winning decision is two-boxing and the winning agent type is a one-boxing type.
I’m not interpreting it so that it’s good (for a start, I’m not necessarily a proponent of this view, I’m just outlining it). All I’m discussing is the two-boxer’s response to the accusation that they don’t win. They say they are interested not in winning agents but winning decisions and that two boxing is the winning decision (because 1000 > 0).
The LW approach has focused on finding agent types that win on decision problems. Lots of the work has been in trying to formalize TDT/UDT, providing sketches of computer programs that implement these informal ideas. Having read a fair amount of the philosophy literature (including some of the recent stuff by Egan, Hare/Hedden and others), I think that this agent/program approach has been extremely fruitful. It has not only given compelling solutions to a large number of problems in the literature (Newcomb’s, trivial coordination problems like Stag Hunt that CDT fails on, PD playing against a selfish copy of yourself) but it also has elucidated the deep philosophical issues that the Newcomb Problem dramatizes (concerning pre-commitment, free will / determinism and uncertainty about purely apriori/logical question). The focus on agents as programs has brought to light the intricate connection between decision making, computability and logic (esp. Godelian issues) --- something merely touched on in the philosophy literature.
These successes provide a sufficient reason to push the agent-centered approach (even if there were no compelling foundational argument that the ‘decision’ centered approach was incoherent). Similarly, I think there is no overwhelming foundational argument for Bayesian probability theory but philosophers should study it because of its fruitfulness in illuminating many particular issues in the philosophy of science and the foundations of statistics (not to mention its success in practical machine learning and statistics).
Rough arguments against the decision-centered approach:
Point 1
Suppose I win the lottery after playing 10 times. My decision of which numbers to pick on the last lottery was the cause of winning money. (Whereas previous decisions over numbers produced only disutility). But it’s not clear there’s anything interesting about this distinction. If I lost money on average, the important lesson is the failing of my agent-type (i.e. the way my decision algorithm makes decisions on lottery problems).
And yet in many practical cases that humans face, it is very useful to look back at which decisions led to high utility. If we compare different algorithms playing casino games, or compare following the advice of a poker expert vs. a newbie, we’ll get useful information by looking at the utility caused by each decision. But this investigation of decisions that cause high utility is completely explainable from the agent-centered approach. When simulation and logical correlations between agents are not part of the problem, the optimal agent will make decisions that cause the most utility. UDT/TDT and variants all (afaik) act like CDT in these simple decision problems. If we came upon a Newcomb problem without being told the setup (and without any familiarity with these decision theory puzzles), we would see that the CDTer’s decisions were causing utility and the EDTer’s decisions were not causing any utility. The EDTer would look like lunatic with bizarrely good luck. Here we are following a local causal criterion in comparing actions. While usually fine, we would clearly be missing out on an important part of the story in the Newcomb problem.
Point 2
In AI, we want to build decision making agents that win. In life, we want to improve our decision making so that we win. Thinking about the utility caused by individual decisions may be a useful subgoal in coming up with winning agents, but it seems hard to see it as the central issue. The Newcomb problem (and the counterfactual mugging and Parfit’s Hitchhiker) make clear that a local Markovian criterion (e.g. choose the action that will cause the highest utility, ignoring all previous actions/commitments) is inadequate for winning.
Point 3
The UDT one-boxer’s agent type does not cause utility in the NP. However it does logically determine the utility. (More specifically, we could examine the one-boxing program as a formal system and try to isolate which rules/axioms lead to its one boxing in this type of problem). Similarly, if two people were using different sets of axioms (where one set is inconsistent), we might point to one of the axioms and say that its inclusion is what determines the inconsistency of the system. This is a mere sketch, but it might be possible to develop a local criterion by which “responsibility” for utility gains can be assigned to particular aspects of an agent.
It’s clear that we can learn about good agent types by examining particular decisions. We don’t have to always work with a fully specified program. (And we don’t have the code of any AI that can solve decision problems the way humans can). So the more local approach may have some value.
Generally agree. I think there are good arguments for focusing on decision types rather than decisions. A few comments:
Point 1: That’s why rationality of decisions is evaluated in terms of expected outcome, not actual outcome. So actually, it wasn’t just your agent type that was flawed here but also your decisions. But yes, I agree with the general point that agent type is important.
Point 2: Agree
Point 3: Yes. I agree that there could be ways other than causation to attribute utility to decisions and that these ways might be superior. However, I also think that the causal approach is one natural way to do this and so I think claims that the proponent of two-boxing doesn’t care about winning are false. I also think it’s false to say they have a twisted definition of winning. It may be false but I think it takes work to show that (I don’t think they are just obviously coming up with absurd definitions of winning).
By decision, the two-boxer means something like a proposition that the agent can make true or false at will (decisions don’t need to be analysed in terms of propositions but it makes the point fairly clearly). In other words, a decision is a thing that an agent can bring about with certainty.
By agent type, in the case of Newcomb’s problem, the two-boxer is just going to mean *the thing that Omega based their prediction on”. Let’s say the agent’s brain state at the time of prediction.
Why think these are the same thing?
If these are the same thing, CDT will one-box. Given that, is there any reason to think that the LW view is best presented as requiring a new decision theory rather than as requiring a new theory of what constitutes a decision?
They are not the same thing, but they aren’t independent. And they are not only causally dependent, but logically—which is why CDT intervention at the action node, leaving the agent-type node untouched, makes no sense. CDT behaves as if it were possible to be one agent type for the purpose of Omega’s prediction, and then take an action corresponding to another agent type, even though that is logically impossible. CDT is unable to view its own action as predetermined, but its action is predetermined by the algorithm that is the agent. TDT can take this into account and reason with it, which is why it’s such a beautiful idea.
In that case: the two-boxer isn’t just wrong, they’re double-wrong. You can’t just come up with some related-but-different function (“caused gain”) to maximize. The problem is about maximizing the money you receive, not “caused gain”.
For example, I’ve seen some two-boxers justify two-boxing as a moral thing. They’re willing to pay 999000$ for the benefit of throwing being predicted in the predictors face, somehow. Fundamentally, they’re making the same mistake: fighting the hypothetical by saying the payoffs are different than what was stated in the problem.
The two-boxer is trying to maximise money (utility). They are interested in the additional question of which bits of that money (utility) can be attributed to which things (decisions/agent types). “Caused gain” is a view about how we should attribute the gaining of money (utility) to different things.
So they agree that the problem is about maximising money (utility) and not “caused gain”. But they are interested in not just which agents end up with the most money (utility) but also which aspects of those agents is responsible for them receiving the money. Specifically, they are interested in whether the decisions the agent makes are responsible for the money they receive. This does not mean they are trying to maximise something other than money (utility). It means they are interested in maximising money and then also in how you can maximise money via different mechanisms.
An additional point (discussed intelligence.org/files/TDT.pdf) is that CDT seems to recommend modifying oneself to a non-CDT based decision theory. (For instance, imagine that the CDTer contemplates for a moment the mere possibility of encountering NPs and can cheaply self-modify). After modification, the interest in whether decisions are responsible causally for utility will have been eliminated. So this interest seems extremely brittle. Agents able to modify and informed of the NP scenario will immediately lose the interest. (If the NP seems implausible, consider the ubiquity of some kind of logical correlation between agents in almost any multi-agent decision problem like the PD or stag hunt).
Now you may have in mind a two-boxer notion distinct from that of a CDTer. It might be fundamental to this agent to not forgo local causal gains. Thus a proposed self-modification that would preclude acting for local causal gains would always be rejected. This seems like a shift out of decision theory into value theory. (I think it’s very plausible that absent typical mechanisms of maintaining commitments, many humans would find it extremely hard to resist taking a large ‘free’ cash prize from the transparent box. Even prior schooling in one-boxing philosophy might be hard to stick to when face to face with the prize. Another factor that clashes with human intuitions is the predictor’s infallibility. Generally, I think grasping verbal arguments doesn’t “modify” humans in the relevant sense and that we have strong intuitions that may (at least in the right presentation of the NP) push us in the direction of local causal efficacy.)
The question is, how much of this utility can be attributed to the agent’s decision rather than type.
To many two-boxers, this isn’t the question. At least some two-boxing proponents in the philosophical literature seem to distinguish between winning decisions and rational decisions, the contention being that winning decisions can be contingent on something stupid about the universe. For example, you could live in a universe that specifically rewards agents who use a particular decision theory, and that says nothing about the rationality of that decision theory.
I’m not convinced this is actually the appropriate way to interpret most two-boxers. I’ve read papers that say things that sound like this claim but I think the distinction that it generally being gestured at is the distinction I’m making here (with different terminology). I even think we get hints of that with the last sentence of your post where you start to talk about agent’s being rewards for their decision theory rather than their decision.
One-boxers end up with 1 000 000 utility Two-boxers end up with 1 000 utility
So everyone agrees that one-boxers are the winning agents (1 000 000 > 1 000)
The question is, how much of this utility can be attributed to the agent’s decision rather than type. The two-boxer says that to answer this question we ask about what utility the agent’s decision caused them to gain. So they say that we can attribute the following utility to the decisions:
One-boxing: 0 Two-boxing: 1000
And the following utility to the agent’s type (there will be some double counting because of overlapping causal effects):
One-boxing type: 1 000 000 Two-boxing type: 1 000
So the proponent of two-boxing says that the winning decision is two-boxing and the winning agent type is a one-boxing type.
I’m not interpreting it so that it’s good (for a start, I’m not necessarily a proponent of this view, I’m just outlining it). All I’m discussing is the two-boxer’s response to the accusation that they don’t win. They say they are interested not in winning agents but winning decisions and that two boxing is the winning decision (because 1000 > 0).
The LW approach has focused on finding agent types that win on decision problems. Lots of the work has been in trying to formalize TDT/UDT, providing sketches of computer programs that implement these informal ideas. Having read a fair amount of the philosophy literature (including some of the recent stuff by Egan, Hare/Hedden and others), I think that this agent/program approach has been extremely fruitful. It has not only given compelling solutions to a large number of problems in the literature (Newcomb’s, trivial coordination problems like Stag Hunt that CDT fails on, PD playing against a selfish copy of yourself) but it also has elucidated the deep philosophical issues that the Newcomb Problem dramatizes (concerning pre-commitment, free will / determinism and uncertainty about purely apriori/logical question). The focus on agents as programs has brought to light the intricate connection between decision making, computability and logic (esp. Godelian issues) --- something merely touched on in the philosophy literature.
These successes provide a sufficient reason to push the agent-centered approach (even if there were no compelling foundational argument that the ‘decision’ centered approach was incoherent). Similarly, I think there is no overwhelming foundational argument for Bayesian probability theory but philosophers should study it because of its fruitfulness in illuminating many particular issues in the philosophy of science and the foundations of statistics (not to mention its success in practical machine learning and statistics).
This response may not be very satisfying but I can only recommend the UDT posts (http://wiki.lesswrong.com/wiki/Updateless_decision_theory) and the recent MIRI paper http://intelligence.org/files/RobustCooperation.pdf.)
Rough arguments against the decision-centered approach:
Point 1
Suppose I win the lottery after playing 10 times. My decision of which numbers to pick on the last lottery was the cause of winning money. (Whereas previous decisions over numbers produced only disutility). But it’s not clear there’s anything interesting about this distinction. If I lost money on average, the important lesson is the failing of my agent-type (i.e. the way my decision algorithm makes decisions on lottery problems).
And yet in many practical cases that humans face, it is very useful to look back at which decisions led to high utility. If we compare different algorithms playing casino games, or compare following the advice of a poker expert vs. a newbie, we’ll get useful information by looking at the utility caused by each decision. But this investigation of decisions that cause high utility is completely explainable from the agent-centered approach. When simulation and logical correlations between agents are not part of the problem, the optimal agent will make decisions that cause the most utility. UDT/TDT and variants all (afaik) act like CDT in these simple decision problems. If we came upon a Newcomb problem without being told the setup (and without any familiarity with these decision theory puzzles), we would see that the CDTer’s decisions were causing utility and the EDTer’s decisions were not causing any utility. The EDTer would look like lunatic with bizarrely good luck. Here we are following a local causal criterion in comparing actions. While usually fine, we would clearly be missing out on an important part of the story in the Newcomb problem.
Point 2
In AI, we want to build decision making agents that win. In life, we want to improve our decision making so that we win. Thinking about the utility caused by individual decisions may be a useful subgoal in coming up with winning agents, but it seems hard to see it as the central issue. The Newcomb problem (and the counterfactual mugging and Parfit’s Hitchhiker) make clear that a local Markovian criterion (e.g. choose the action that will cause the highest utility, ignoring all previous actions/commitments) is inadequate for winning.
Point 3
The UDT one-boxer’s agent type does not cause utility in the NP. However it does logically determine the utility. (More specifically, we could examine the one-boxing program as a formal system and try to isolate which rules/axioms lead to its one boxing in this type of problem). Similarly, if two people were using different sets of axioms (where one set is inconsistent), we might point to one of the axioms and say that its inclusion is what determines the inconsistency of the system. This is a mere sketch, but it might be possible to develop a local criterion by which “responsibility” for utility gains can be assigned to particular aspects of an agent.
It’s clear that we can learn about good agent types by examining particular decisions. We don’t have to always work with a fully specified program. (And we don’t have the code of any AI that can solve decision problems the way humans can). So the more local approach may have some value.
Generally agree. I think there are good arguments for focusing on decision types rather than decisions. A few comments:
Point 1: That’s why rationality of decisions is evaluated in terms of expected outcome, not actual outcome. So actually, it wasn’t just your agent type that was flawed here but also your decisions. But yes, I agree with the general point that agent type is important.
Point 2: Agree
Point 3: Yes. I agree that there could be ways other than causation to attribute utility to decisions and that these ways might be superior. However, I also think that the causal approach is one natural way to do this and so I think claims that the proponent of two-boxing doesn’t care about winning are false. I also think it’s false to say they have a twisted definition of winning. It may be false but I think it takes work to show that (I don’t think they are just obviously coming up with absurd definitions of winning).
That’s the wrong question, because it presupposes that the agent’s decision and type are separable.
By decision, the two-boxer means something like a proposition that the agent can make true or false at will (decisions don’t need to be analysed in terms of propositions but it makes the point fairly clearly). In other words, a decision is a thing that an agent can bring about with certainty.
By agent type, in the case of Newcomb’s problem, the two-boxer is just going to mean *the thing that Omega based their prediction on”. Let’s say the agent’s brain state at the time of prediction.
Why think these are the same thing?
If these are the same thing, CDT will one-box. Given that, is there any reason to think that the LW view is best presented as requiring a new decision theory rather than as requiring a new theory of what constitutes a decision?
They are not the same thing, but they aren’t independent. And they are not only causally dependent, but logically—which is why CDT intervention at the action node, leaving the agent-type node untouched, makes no sense. CDT behaves as if it were possible to be one agent type for the purpose of Omega’s prediction, and then take an action corresponding to another agent type, even though that is logically impossible. CDT is unable to view its own action as predetermined, but its action is predetermined by the algorithm that is the agent. TDT can take this into account and reason with it, which is why it’s such a beautiful idea.
In that case: the two-boxer isn’t just wrong, they’re double-wrong. You can’t just come up with some related-but-different function (“caused gain”) to maximize. The problem is about maximizing the money you receive, not “caused gain”.
For example, I’ve seen some two-boxers justify two-boxing as a moral thing. They’re willing to pay 999000$ for the benefit of throwing being predicted in the predictors face, somehow. Fundamentally, they’re making the same mistake: fighting the hypothetical by saying the payoffs are different than what was stated in the problem.
The two-boxer is trying to maximise money (utility). They are interested in the additional question of which bits of that money (utility) can be attributed to which things (decisions/agent types). “Caused gain” is a view about how we should attribute the gaining of money (utility) to different things.
So they agree that the problem is about maximising money (utility) and not “caused gain”. But they are interested in not just which agents end up with the most money (utility) but also which aspects of those agents is responsible for them receiving the money. Specifically, they are interested in whether the decisions the agent makes are responsible for the money they receive. This does not mean they are trying to maximise something other than money (utility). It means they are interested in maximising money and then also in how you can maximise money via different mechanisms.
An additional point (discussed intelligence.org/files/TDT.pdf) is that CDT seems to recommend modifying oneself to a non-CDT based decision theory. (For instance, imagine that the CDTer contemplates for a moment the mere possibility of encountering NPs and can cheaply self-modify). After modification, the interest in whether decisions are responsible causally for utility will have been eliminated. So this interest seems extremely brittle. Agents able to modify and informed of the NP scenario will immediately lose the interest. (If the NP seems implausible, consider the ubiquity of some kind of logical correlation between agents in almost any multi-agent decision problem like the PD or stag hunt).
Now you may have in mind a two-boxer notion distinct from that of a CDTer. It might be fundamental to this agent to not forgo local causal gains. Thus a proposed self-modification that would preclude acting for local causal gains would always be rejected. This seems like a shift out of decision theory into value theory. (I think it’s very plausible that absent typical mechanisms of maintaining commitments, many humans would find it extremely hard to resist taking a large ‘free’ cash prize from the transparent box. Even prior schooling in one-boxing philosophy might be hard to stick to when face to face with the prize. Another factor that clashes with human intuitions is the predictor’s infallibility. Generally, I think grasping verbal arguments doesn’t “modify” humans in the relevant sense and that we have strong intuitions that may (at least in the right presentation of the NP) push us in the direction of local causal efficacy.)
EDIT: fixeds some typos.
To many two-boxers, this isn’t the question. At least some two-boxing proponents in the philosophical literature seem to distinguish between winning decisions and rational decisions, the contention being that winning decisions can be contingent on something stupid about the universe. For example, you could live in a universe that specifically rewards agents who use a particular decision theory, and that says nothing about the rationality of that decision theory.
I’m not convinced this is actually the appropriate way to interpret most two-boxers. I’ve read papers that say things that sound like this claim but I think the distinction that it generally being gestured at is the distinction I’m making here (with different terminology). I even think we get hints of that with the last sentence of your post where you start to talk about agent’s being rewards for their decision theory rather than their decision.