A Critique of Functional Decision Theory
A Critique of Functional Decision Theory
NB: My writing this note was prompted by Carl Shulman, who suggested we could try a low-time-commitment way of attempting to understanding the disagreement between some folks in the rationality community and academic decision theorists (including myself, though I’m not much of a decision theorist). Apologies that it’s sloppier than I’d usually aim for in a philosophy paper, and lacking in appropriate references. And, even though the paper is pretty negative about FDT, I want to emphasise that my writing this should be taken as a sign of respect for those involved in developing FDT. I’ll also caveat I’m unlikely to have time to engage in the comments; I thought it was better to get this out there all the same rather than delay publication further.
Introduction
There’s a long-running issue where many in the rationality community take functional decision theory (and its variants) very seriously, but the academic decision theory community does not. But there’s been little public discussion of FDT from academic decision theorists (one exception is here); this note attempts to partly address this gap.
So that there’s a clear object of discussion, I’m going to focus on Yudkowsky and Soares’ ‘Functional Decision Theory’ (which I’ll refer to as Y&S), though I also read a revised version of Soares and Levinstein’s Cheating Death in Damascus.
This note is structured as follows. Section II describes causal decision theory (CDT), evidential decision theory (EDT) and functional decision theory (FDT). Sections III-VI describe problems for FDT: (i) that it sometimes makes bizarre recommendations, recommending an option that is certainly lower-utility than another option; (ii) that it fails to one-box in most instances of Newcomb’s problem, even though the correctness of one-boxing is supposed to be one of the guiding motivations for the theory; (iii) that it results in implausible discontinuities, where what is rational to do can depend on arbitrarily small changes to the world; and (iv) that, because there’s no real fact of the matter about whether a particular physical process implements a particular algorithm, it’s deeply indeterminate what FDT’s implications are. In section VII I discuss the idea that FDT ‘does better at getting utility’ than EDT or CDT; I argue that Y&S’s claims to this effect are unhelpfully vague, and on any more precise way of understanding their claim, aren’t plausible. In section VIII I briefly describe a view that captures some of the motivation behind FDT, and in my view is more plausible. I conclude that FDT faces a number of deep problems and little to say in its favour.
In what follows, I’m going to assume a reasonable amount of familiarity with the debate around Newcomb’s problem.
II. CDT, EDT and FDT
Informally: CDT, EDT and FDT differ in what non-causal correlations they care about when evaluating a decision. For CDT, what you cause to happen is all that matters; if your action correlates with some good outcome, that’s nice to know, but it’s not relevant to what you ought to do. For EDT, all correlations matter: you should pick whatever action will result in you believing you will have the highest expected utility. For FDT, only some non-causal correlations matter, namely only those correlations between your action and events elsewhere in time and space that would be different in the (logically impossible) worlds in which the output of the algorithm you’re running is different. Other than for those correlations, FDT behaves in the same way as CDT.
Formally, where S represents states of nature, A, B etc represent acts, P is a probability function, and represents the utility the agent gains from the outcome of choosing A given state , and ‘≽’ represents the ‘at least as choiceworthy as’ relation:
On EDT:
Where ‘|’ represents conditional probability.
On CDT:
Where ‘∖’ is a ‘causal probability function’ that represents the decision-maker’s judgments about her ability to causally influence the events in the world by doing a particular action. Most often, this is interpreted in counterfactual terms (so P (S∖A) represents something like the probability of S coming about were I to choose A) but it needn’t be.
On FDT:
Where I introduce the operator “ † ” to represent the special sort of function that Yudkowsky and Soares propose, where P (S † A) represents the probability of S occurring were the output of the algorithm that the decision-maker is running, in this decision situation, to be A. (I’m not claiming that it’s clear what this means. E.g. seehere, second bullet point, arguing there can be no such probability function, because any probability function requires certainty in logical facts and all their entailments. I also note that strictly speaking FDT doesn’t assess acts in the same sense that CDT assesses acts; rather it assesses algorithmic outputs, and that Y&S have a slightly different formal set up than the one I describe above. I don’t think this will matter for the purposes of this note, though.)
With these definitions on board, we can turn to objections to FDT.
III. FDT sometimes makes bizarre recommendations
The criterion that Y&S regard as most important in assessing a decision theory is ‘amount of utility achieved’. I think that this idea is importantly underspecified (which I discuss more in section VII), but I agree with the spirit of it. But FDT does very poorly by that criterion, on any precisification of it.
In particular, consider the following principle:
Guaranteed Payoffs: In conditions of certainty — that is, when the decision-maker has no uncertainty about what state of nature she is in, and no uncertainty about the utility payoff of each action is — the decision-maker should choose the action that maximises utility.
That is: for situations where there’s no uncertainty, we don’t need to appeal to expected utility theory in any form to work out what we ought to do. You just ought to do whatever will give you the highest utility payoff. This should be a constraint on any plausible decision theory. But FDT violates that principle.
Consider the following case:
Bomb.
You face two open boxes, Left and Right, and you must take one of them. In the Left box, there is a live bomb; taking this box will set off the bomb, setting you ablaze, and you certainly will burn slowly to death. The Right box is empty, but you have to pay $100 in order to be able to take it.
A long-dead predictor predicted whether you would choose Left or Right, by running a simulation of you and seeing what that simulation did. If the predictor predicted that you would choose Right, then she put a bomb in Left. If the predictor predicted that you would choose Left, then she did not put a bomb in Left, and the box is empty.
The predictor has a failure rate of only 1 in a trillion trillion. Helpfully, she left a note, explaining that she predicted that you would take Right, and therefore she put the bomb in Left.
You are the only person left in the universe. You have a happy life, but you know that you will never meet another agent again, nor face another situation where any of your actions will have been predicted by another agent. What box should you choose?
The right action, according to FDT, is to take Left, in the full knowledge that as a result you will slowly burn to death. Why? Because, using Y&S’s counterfactuals, if your algorithm were to output ‘Left’, then it would also have outputted ‘Left’ when the predictor made the simulation of you, and there would be no bomb in the box, and you could save yourself $100 by taking Left. In contrast, the right action on CDT or EDT is to take Right.
The recommendation is implausible enough. But if we stipulate that in this decision-situation the decision-maker is certain in the outcome that her actions would bring about, we see that FDT violates Guaranteed Payoffs.
(One might protest that no good Bayesian would ever have credence 1 in an empirical proposition. But, first, that depends on what we could as ‘evidence’ — if a proposition is part of your evidence base, you have credence 1 in it. And, second, we could construct very similar principles to Guaranteed Payoffs that don’t rely on the idea of certainty, but on approximations to certainty.)
Note that FDT’s recommendation in this case is much more implausible than even the worst of the prima facie implausible recommendations of EDT or CDT. So, if we’re going by appeal to cases, or by ‘who gets more utility’, FDT is looking very unmotivated.
IV. FDT fails to get the answer Y&S want in most instances of the core example that’s supposed to motivate it
On FDT, you consider what things would look like in the closest (logically impossible) world in which the algorithm you are running were to produce a different output than what it in fact does. Because, so the argument goes, in Newcomb problems the predictor is also running your algorithm, or a ‘sufficiently similar’ algorithm, or a representation of your algorithm, you consider the correlation between your action and the predictor’s prediction (even though you don’t consider other sorts of correlations.)
However, the predictor needn’t be running your algorithm, or have anything like a representation of that algorithm, in order to predict whether you’ll one box or two-box. Perhaps the Scots tend to one-box, whereas the English tend to two-box. Perhaps the predictor knows how you’ve acted prior to that decision. Perhaps the Predictor painted the transparent box green, and knows that’s your favourite colour and you’ll struggle not to pick it up. In none of these instances is the Predictor plausibly doing anything like running the algorithm that you’re running when you make your decision. But they are still able to predict what you’ll do. (And bear in mind that the Predictor doesn’t even need to be very reliable. As long as the Predictor is better than chance, a Newcomb problem can be created.)
In fact, on the vast majority of ways that the Predictor could predicting your behavior, she isn’t running the algorithm that you are running, or representing it. But if the Predictor isn’t running the algorithm that you are running, or representing it, then, on the most natural interpretation, FDT will treat this as ‘mere statistical correlation’, and therefore act like CDT. So, in the vast majority of Newcomb cases, FDT would recommend two-boxing. But the intuition in favour of one-boxing in Newcomb cases was exactly what was supposed to motivate FDT in the first place.
Could we instead interpret FDT, such that it doesn’t have to require the Predictor to be running the exact algorithm — some similar algorithm would do? But I’m not sure how that would help: in the examples given above, the Predictor’s predictions aren’t based on anything like running your algorithm. In fact, the predictor may know very little about you, perhaps only whether you’re English or Scottish.
One could suggest that, even though the Predictor is not running a sufficiently similar algorithm to you, nonetheless the Predictor’s prediction is subjunctively dependent on your decision (in the Y&S sense of ‘subjunctive’). But, without any account of Y&S’s notion of subjunctive counterfactuals, we just have no way of assessing whether that’s true or not. Y&S note that specifying an account of their notion of counterfactuals is an ‘open problem,’ but the problem is much deeper than that. Without such an account, it becomes completely indeterminate what follows from FDT, even in the core examples that are supposed to motivate it — and that makes FDT not a new decision theory so much as a promissory note.
Indeed, on the most plausible ways of cashing this out, it doesn’t give the conclusions that Y&S would want. If I imagine the closest world in which 6288 + 1048 = 7336 is false (Y&S’s example), I imagine a world with laws of nature radically unlike ours — because the laws of nature rely, fundamentally, on the truths of mathematics, and if one mathematical truth is false then either (i) mathematics as a whole must be radically different, or (ii) all mathematical propositions are true because it is simple to prove a contradiction and every proposition follows from a contradiction. Either way, when I imagine worlds in which FDT outputs something different than it in fact does, then I imagine valueless worlds (no atoms or electrons, etc) — and this isn’t what Y&S are wanting us to imagine.
Alternatively (as Abram Demski suggested to me in a comment), Y&S could accept that the decision-maker should two-box in the cases given above. But then, it seems to me, that FDT has lost much of its initial motivation: the case for one-boxing in Newcomb’s problem didn’t seem to stem from whether the Predictor was running a simulation of me, or just using some other way to predict what I’d do.
V. Implausible discontinuities
A related problem is as follows: FDT treats ‘mere statistical regularities’ very differently from predictions. But there’s no sharp line between the two. So it will result in implausible discontinuities. There are two ways we can see this.
First, take some physical processes S (like the lesion from the Smoking Lesion) that causes a ‘mere statistical regularity’ (it’s not a Predictor). And suppose that the existence of S tends to cause both (i) one-boxing tendencies and (ii) whether there’s money in the opaque box or not when decision-makers face Newcomb problems. If it’s S alone that results in the Newcomb set-up, then FDT will recommending two-boxing.
But now suppose that the pathway by which S causes there to be money in the opaque box or not is that another agent looks at S and, if the agent sees that S will cause decision-maker X to be a one-boxer, then the agent puts money in X’s opaque box. Now, because there’s an agent making predictions, the FDT adherent will presumably want to say that the right action is one-boxing. But this seems arbitrary — why should the fact that S’s causal influence on whether there’s money in the opaque box or not go via another agent much such a big difference? And we can think of all sorts of spectrum cases in between the ‘mere statistical regularity’ and the full-blooded Predictor: What if the ‘predictor’ is a very unsophisticated agent that doesn’t even understand the implications of what they’re doing? What if they only partially understand the implications of what they’re doing? For FDT, there will be some point of sophistication at which the agent moves from simply being a conduit for a causal process to instantiating the right sort of algorithm, and suddenly FDT will switch from recommending two-boxing to recommending one-boxing.
Second, consider that same physical process S, and consider a sequence of Newcomb cases, each of which gradually make S more and more complicated and agent-y, making it progressively more similar to a Predictor making predictions. At some point, on FDT, there will be a point at which there’s a sharp jump; prior to that point in the sequence, FDT would recommend that the decision-maker two-boxes; after that point, FDT would recommend that the decision-maker one-boxes. But it’s very implausible that there’s some S such that a tiny change in its physical makeup should affect whether one ought to one-box or two-box.
VI. FDT is deeply indeterminate
Even putting the previous issues aside, there’s a fundamental way in which FDT is indeterminate, which is that there’s no objective fact of the matter about whether two physical processes A and B are running the same algorithm or not, and therefore no objective fact of the matter of which correlations represent implementations of the same algorithm or are ‘mere correlations’ of the form that FDT wants to ignore. (Though I’ll focus on ‘same algorithm’ cases, I believe that the same problem would affect accounts of when two physical processes are running similar algorithms, or any way of explaining when the output of some physical process, which instantiates a particular algorithm, is Y&S-subjunctively dependent on the output of another physical process, which instantiates a different algorithm.)
To see this, consider two calculators. The first calculator is like calculators we are used to. The second calculator is from a foreign land: it’s identical except that the numbers it outputs always come with a negative sign (‘–’) in front of them when you’d expect there to be none, and no negative sign when you expect there to be one. Are these calculators running the same algorithm or not? Well, perhaps on this foreign calculator the ‘–’ symbol means what we usually take it to mean — namely, that the ensuing number is negative — and therefore every time we hit the ‘=’ button on the second calculator we are asking it to run the algorithm ‘compute the sum entered, then output the negative of the answer’. If so, then the calculators are systematically running different algorithms.
But perhaps, in this foreign land, the ‘–’ symbol, in this context, means that the ensuing number is positive and the lack of a ‘–’ symbol means that the number is negative. If so, then the calculators are running exactly the same algorithms; their differences are merely notational.
Ultimately, in my view, all we have, in these two calculators, are just two physical processes. The further question of whether they are running the same algorithm or not depends on how we interpret the physical outputs of the calculator. There is no deeper fact about whether they’re ‘really’ running the same algorithm or not. And in general, it seems to me, there’s no fact of the matter about which algorithm a physical process is implementing in the absence of a particular interpretation of the inputs and outputs of that physical process.
But if that’s true, then, even in the Newcomb cases where a Predictor is simulating you, it’s a matter of choice of symbol-interpretation whether the predictor ran the same algorithm that you are now running (or a representation of that same algorithm). And the way you choose that symbol-interpretation is fundamentally arbitrary. So there’s no real fact of the matter about whether the predictor is running the same algorithm as you. It’s indeterminate how you should act, given FDT: you should one-box, given one way of interpreting the inputs and outputs of the physical process the Predictor is running, but two-box given an alternative interpretation.
Now, there’s a bunch of interesting work on concrete computation, trying to give an account of when two physical processes are performing the same computation. The best response that Y&S could to make this problem is to provide a compelling account of when two physical processes are running the same algorithm that gives them the answers they want. But almost all accounts of computation in physical processes have the issue that very many physical processes are running very many different algorithms, all at the same time. (Because most accounts rely on there being some mapping from physical states to computational states, and there can be multiple mappings.) So you might well end up with the problem that in the closest (logically impossible) world in which FDT outputs something other than what it does output, not only do the actions of the Predictor change, but so do many other aspects of the world. For example, if the physical process underlying some aspect of the US economy just happened to be isomorphic with FDT’s algorithm, then in the logically impossible world where FDT outputs a different algorithm, not only does the predictor act differently, but so does the US economy. And that will probably change the value of the world under consideration, in a way that’s clearly irrelevant to the choice at hand.
VII. But FDT gets the most utility!
Y&S regard the most important criterion to be ‘utility achieved’, and thinks that FDT does better than all its rivals in this regard. Though I agree with something like the spirit of this criterion, its use by Y&S is unhelpfully ambiguous. To help explain this, I’ll go on a little detour to present some distinctions that are commonly used by academic moral philosophers and, to a lesser extent, decision theorists. (For more on these distinctions, see Toby Ord’s DPhil thesis.)
Evaluative focal points
An evaluative focal point is an object of axiological or normative evaluation. (‘Axiological’ means ‘about goodness/badness’; ‘normative’ means ‘about rightness/wrongness’. If you’re a consequentialist, x is best iff it’s right, but if you’re a non-consequentialist the two can come apart.) When doing moral philosophy or decision theory, the most common evaluative focal points are acts, but we can evaluate other things too: characters, motives, dispositions, sets of rules, beliefs, and so on.
Any axiological or normative theory needs to specify which focal point it is evaluating. The theory can evaluate a single focal point (e.g. act utilitarianism, which only evaluates acts) or many (e.g. global utilitarianism, which evaluates everything).
The theory can also differ on whether it is direct or indirect with respect to a given evaluative focal point. For example, Hooker’s rule-consequentialism is a direct theory with respect to sets of rules, and an indirect theory with respect to acts: it evaluates sets of rules on the basis of their consequences, but evaluates acts with respect to how they conform to those sets of rules. Because of this, on Hooker’s view, the right act need not maximize good consequences.
Criterion of rightness vs decision procedure
In chess, there’s a standard by which it is judged who has won the game, namely, the winner is whoever first puts their opponent’s king into checkmate. But relying solely on that standard of evaluation isn’t going to go very well if you actually want to win at chess. Instead, you should act according to some other set of rules and heuristics, such as: “if white, play e4 on the first move,” “don’t get your Queen out too early,” “rooks are worth more than bishops” etc.
A similar distinction can be made for axiological or normative theories. The criterion of rightness, for act utilitarianism, is, “The right actions are those actions which maximize the sum total of wellbeing.” But that’s not the decision procedure one ought to follow. Instead, perhaps, you should rely on rules like ‘almost never lie’, ‘be kind to your friends and family’, ‘figure out how much you can sustainably donate to effective charities, and do that,’ and so on.
For some people, in fact, learning that utilitarianism is true will cause one to be a worse utilitarian by the utilitarian’s criterion of rightness! (Perhaps you start to come across as someone who uses others as means to an end, and that hinders your ability to do good.) By the utilitarian criterion of rightness, someone could in principle act rightly in every decision, even though they have never heard of utilitarianism, and therefore never explicitly tried to follow utilitarianism.
These distinctions and FDT
From Y&S, it wasn’t clear to me whether FDT is really meant to assess acts, agents, characters, decision procedures, or outputs of decision procedures, and it wasn’t clear to me whether it is meant to be a direct or an indirect theory with respect to acts, or with respect to outputs of decision procedures. This is crucial, because it’s relevant to which decision theory ‘does best at getting utility’.
With these distinctions in hand, we can see that Y&S employ multiple distinct interpretations of their key criterion. Sometimes, for example, Y&S talk about how “FDT agents” (which I interpret as ‘agents who follow FDT to make decisions’) get more utility, e.g.:
“Using one simple and coherent decision rule, functional decision theorists (for example) achieve more utility than CDT on Newcomb’s problem, more utility than EDT on the smoking lesion problem, and more utility than both in Parfit’s hitchhiker problem.”
“We propose an entirely new decision theory, functional decision theory (FDT), that maximizes agents’ utility more reliably than CDT or EDT.”
“FDT agents attain high utility in a host of decision problems that have historically proven challenging to CDT and EDT: FDT outperforms CDT in Newcomb’s problem; EDT in the smoking lesion problem; and both in Parfit’s hitchhiker problem.”
“It should come as no surprise that an agent can outperform both CDT and EDT as measured by utility achieved; this has been known for some time (Gibbard and Harper 1978).”
“Expanding on the final argument, proponents of EDT, CDT, and FDT can all
agree that it would be great news to hear that a beloved daughter adheres to FDT, because FDT agents get more of what they want out of life. Would it not then be strange if the correct theory of rationality were some alternative to the theory that produces the best outcomes, as measured in utility? (Imagine hiding decision theory textbooks from loved ones, lest they be persuaded to adopt the “correct” theory and do worse thereby!) We consider this last argument—the argument from utility—to be the one that gives the precommitment and value-of-information arguments their teeth. If self- binding or self-blinding were important for getting more utility in certain scenarios, then we would plausibly endorse those practices. Utility has primacy, and FDT’s success on that front is the reason we believe that FDT is a more useful and general theory of rational choice.”
Sometimes Y&S talk about how different decision theories produce more utility on average if they were to face a specific dilemma repeatedly:
“Measuring by utility achieved on average over time, CDT outperforms EDT in some well-known dilemmas (Gibbard and Harper 1978), and EDT outperforms CDT in others (Ahmed 2014b).”
“Imagine an agent that is going to face first Newcomb’s problem, and then the smoking lesion problem. Imagine measuring them in terms of utility achieved, by which we mean measuring them by how much utility we expect them to attain, on average, if they face the dilemma repeatedly. The sort of agent that we’d expect to do best, measured in terms of utility achieved, is the sort who one-boxes in Newcomb’s problem, and smokes in the smoking lesion problem.”
Sometimes Y&S talk about which agent will achieve more utility ‘in expectation’, though they don’t define the point at which they gain more expected utility (or what notion of ‘expected utility’ is being used):
“One-boxing in the transparent Newcomb problem may look strange, but it works. Any predictor smart enough to carry out the arguments above can see that CDT and EDT agents two-box, while FDT agents one-box. Followers of CDT and EDT will therefore almost always see an empty box, while followers of FDT will almost always see a full one. Thus, FDT agents achieve more utility in expectation.”
Sometimes they talk about how much utility ‘decision theories tend to achieve in practice’:
“It is for this reason that we turn to Newcomblike problems to distinguish between the three theories, and demonstrate FDT’s superiority, when measuring in terms of utility achieved.”
“we much prefer to evaluate decision theories based on how much utility they tend to achieve in practice.”
Sometimes they talk about how well the decision theory does in a circumscribed class of cases (though they note in footnote 15 that they can’t define what this class of cases are):
“FDT does appear to be superior to CDT and EDT in all dilemmas where the agent’s beliefs are accurate and the outcome depends only on the agent’s behavior in the dilemma at hand. Informally, we call these sorts of problems “fair problems.””
“FDT, we claim, gets the balance right. An agent who weighs her options by imagining worlds where her decision function has a different output, but where logical, mathematical, nomic, causal, etc. constraints are otherwise respected, is an agent with the optimal predisposition for whatever fair dilemma she encounters.”
And sometimes they talk about how much utility the agent would receive in different possible worlds than the one she finds herself in:
“When weighing actions, Fiona simply imagines hypotheticals corresponding to those actions, and takes the action that corresponds to the hypothetical with higher expected utility—even if that means imagining worlds in which her observations were different, and even if that means achieving low utility in the world corresponding to her actual observations.”
As we can see, the most common formulation of this criterion is that they are looking for the decision theory that, if run by an agent, will produce the most utility over their lifetime. That is, they’re asking what the best decision procedure is, rather than what the best criterion of rightness is, and are providing an indirect account of the rightness of acts, assessing acts in terms of how well they conform with the best decision procedure.
But, if that’s what’s going on, there are a whole bunch of issues to dissect. First, it means that FDT is not playing the same game as CDT or EDT, which are proposed as criteria of rightness, directly assessing acts. So it’s odd to have a whole paper comparing them side-by-side as if they are rivals.
Second, what decision theory does best, if run by an agent, depends crucially on what the world is like. To see this, let’s go back to question that Y&S ask of what decision theory I’d want my child to have. This depends on a whole bunch of empirical facts: if she might have a gene that causes cancer, I’d hope that she adopts EDT; though if, for some reason, I knew whether or not she did have that gene and she didn’t, I’d hope that she adopts CDT. Similarly, if there were long-dead predictors who can no longer influence the way the world is today, then, if I didn’t know what was in the opaque boxes, I’d hope that she adopts EDT (or FDT); if I did know what was in the opaque boxes (and she didn’t) I’d hope that she adopts CDT. Or, if I’m in a world where FDT-ers are burned at the stake, I’d hope that she adopts anything other than FDT.
Third, the best decision theory to run is not going to look like any of the standard decision theories. I don’t run CDT, or EDT, or FDT, and I’m very glad of it; it would be impossible for my brain to handle the calculations of any of these decision theories every moment. Instead I almost always follow a whole bunch of rough-and-ready and much more computationally tractable heuristics; and even on the rare occasions where I do try to work out the expected value of something explicitly, I don’t consider the space of all possible actions and all states of nature that I have some credence in — doing so would take years.
So the main formulation of Y&S’s most important principle doesn’t support FDT. And I don’t think that the other formulations help much, either. Criteria of how well ‘a decision theory does on average and over time’, or ‘when a dilemma is issued repeatedly’ run into similar problems as the primary formulation of the criterion. Assessing by how well the decision-maker does in possible worlds that she isn’t in fact in doesn’t seem a compelling criterion (and EDT and CDT could both do well by that criterion, too, depending on which possible worlds one is allowed to pick).
Fourth, arguing that FDT does best in a class of ‘fair’ problems, without being able to define what that class is or why it’s interesting, is a pretty weak argument. And, even if we could define such a class of cases, claiming that FDT ‘appears to be superior’ to EDT and CDT in the classic cases in the literature is simply begging the question: CDT adherents claims that two-boxing is the right action (which gets you more expected utility!) in Newcomb’s problem; EDT adherents claims that smoking is the right action (which gets you more expected utility!) in the smoking lesion. The question is which of these accounts is the right way to understand ‘expected utility’; they’ll therefore all differ on which of them do better in terms of getting expected utility in these classic cases.
Finally, in a comment on a draft of this note, Abram Demski said that: “The notion of expected utility for which FDT is supposed to do well (at least, according to me) is expected utility with respect to the prior for the decision problem under consideration.” If that’s correct, it’s striking that this criterion isn’t mentioned in the paper. But it also doesn’t seem compelling as a principle by which to evaluate between decision theories, nor does it seem FDT even does well by it. To see both points: suppose I’m choosing between an avocado sandwich and a hummus sandwich, and my prior was that I prefer avocado, but I’ve since tasted them both and gotten evidence that I prefer hummus. The choice that does best in terms of expected utility with respect to my prior for the decision problem under consideration is the avocado sandwich (and FDT, as I understood it in the paper, would agree). But, uncontroversially, I should choose the hummus sandwich, because I prefer hummus to avocado.
VIII. An alternative approaches that captures the spirit of FDT’s aims
Academic decision theorists tends to focus on what actions are rational, but not talk very much about what sort of agent to become. Something that’s distinctive and good about the rationalist community’s discussion of decision theory is that there’s more of an emphasis on what sort of agent to be, and what sorts of rules to follow.
But this is an area where we can eat our cake and have it. There’s nothing to stop us assessing agents, acts and anything else we like in terms of our favourite decision theory.
Let’s define: Global expected utility theory =df for any x that is an evaluative focal point, the right x is that which maximises expected utility.
I think that Global CDT can get everything we want, without the problems that face FDT. Consider, for example, the Prisoner’s Dilemma. On the global version of CDT, we can say both that (i) the act of defecting is the right action (assuming that the other agent will use their money poorly); and that (ii) the right sort of person to be is one who cooperates in prisoner’s dilemmas.
(ii) would be true, even though (i) is true, if you will face repeated prisoner’s dilemmas, if whether or not you find yourself in opportunities to cooperate depend on whether or not you’ve cooperated in the past, if other agents can tell what sort of person you are even independently in your actions in Prisoner’s Dilemmas, and so on. Similar things can be said about blackmail cases and about Parfit’s Hitchhiker. And similar things can be said more broadly about what sort of person to be given consequentialism — if you become someone who keeps promises, doesn’t tell lies, sticks up for their friends (etc), and who doesn’t analyse these decisions in consequentialist terms, you’ll do more good than someone who tries to apply the consequentialist criterion of rightness for every decision.
(Sometimes behaviour like this is described as ‘rational irrationality’. But I don’t think that’s an accurate description. It’s not that one and the same thing (the act) is both rational and irrational. Instead, we continue to acknowledge that the act is the irrational one; we just also acknowledge that it results from the rational disposition to have.)
There are other possible ways of capturing some of the spirit of FDT, such as a sort of rule-consequentialism, where the right set of rules to follow are those that would produce the best outcome if all agents followed those rules, and the right act is that which conforms to that set of rules. But I think that global causal decision theory is the most promising idea in this space.
IX. Conclusion
In this note, I argued that FDT faces multiple major problems. In my view, these are fatal to FDT in its current form. I think it’s possible that, with very major work, a version of FDT could be developed that could overcome some of these problems (in particular, the problems described in sections IV, V and VI, that are based, in one way or another, on the issue of when two processes are Y&S-subjunctively dependent on one another). But it’s hard to see what the motivation for doing so is: FDT in any form will violate Guaranteed Payoffs, which should be one of the most basic constraints on a decision theory; and if, instead, we want to seriously undertake the project of what decision-procedure is the best for an agent to run (or ‘what should we code into an AI?’), the answer will be far messier, and far more dependent on particular facts about the world and the computational resources of the agent in question, than any of EDT, CDT or FDT.
- Can you control the past? by 27 Aug 2021 19:39 UTC; 175 points) (
- 2019 AI Alignment Literature Review and Charity Comparison by 19 Dec 2019 2:58 UTC; 147 points) (EA Forum;
- 2019 AI Alignment Literature Review and Charity Comparison by 19 Dec 2019 3:00 UTC; 130 points) (
- AI Alignment 2018-19 Review by 28 Jan 2020 2:19 UTC; 126 points) (
- What are examples of EA work being reviewed by non-EA researchers? by 24 Mar 2020 6:04 UTC; 63 points) (EA Forum;
- Notes on “Can you control the past” by 20 Oct 2022 3:41 UTC; 57 points) (
- Dutch-Booking CDT: Revised Argument by 27 Oct 2020 4:31 UTC; 51 points) (
- Can you control the past? by 27 Aug 2021 19:34 UTC; 46 points) (EA Forum;
- Realism and Rationality by 16 Sep 2019 3:09 UTC; 45 points) (
- Eliezer Yudkowsky Is Frequently, Confidently, Egregiously Wrong by 27 Aug 2023 1:07 UTC; 44 points) (EA Forum;
- 18 Nov 2019 10:13 UTC; 32 points) 's comment on I’m Buck Shlegeris, I do research and outreach at MIRI, AMA by (EA Forum;
- Dissolving Confusion around Functional Decision Theory by 5 Jan 2020 6:38 UTC; 32 points) (
- A Defense of Functional Decision Theory by 12 Nov 2021 20:59 UTC; 21 points) (
- 22 Nov 2019 14:01 UTC; 19 points) 's comment on I’m Buck Shlegeris, I do research and outreach at MIRI, AMA by (EA Forum;
- [AN #82]: How OpenAI Five distributed their training computation by 15 Jan 2020 18:20 UTC; 19 points) (
- Decision Theory but also Ghosts by 20 Nov 2022 13:24 UTC; 17 points) (
- Notes on “Can you control the past” by 20 Oct 2022 3:41 UTC; 15 points) (EA Forum;
- Critiques of the Agent Foundations agenda? by 24 Nov 2020 16:11 UTC; 15 points) (
- 3 Oct 2022 1:38 UTC; 13 points) 's comment on To those who contributed to Carrick Flynn in Oregon CD-6 - Please help now by (EA Forum;
- 23 Nov 2019 0:50 UTC; 9 points) 's comment on I’m Buck Shlegeris, I do research and outreach at MIRI, AMA by (EA Forum;
- 23 Nov 2019 3:06 UTC; 7 points) 's comment on I’m Buck Shlegeris, I do research and outreach at MIRI, AMA by (EA Forum;
- Method of statements: an alternative to taboo by 16 Nov 2022 10:57 UTC; 7 points) (
- 8 Dec 2019 22:32 UTC; 6 points) 's comment on Counterfactuals: Smoking Lesion vs. Newcomb’s by (
- 6 Jan 2020 21:18 UTC; 6 points) 's comment on 2019 AI Alignment Literature Review and Charity Comparison by (
- Defending Functional Decision Theory by 8 Feb 2022 14:58 UTC; 6 points) (
- Acknowledgements & References by 14 Dec 2019 7:04 UTC; 6 points) (
- 20 Nov 2019 3:42 UTC; 3 points) 's comment on I’m Buck Shlegeris, I do research and outreach at MIRI, AMA by (EA Forum;
- 25 Sep 2019 2:01 UTC; 2 points) 's comment on This is a test post by (
- Eliezer Yudkowsky Is Frequently, Confidently, Egregiously Wrong by 27 Aug 2023 1:06 UTC; -25 points) (
I saw an earlier draft of this, and hope to write an extensive response at some point. For now, the short version:
As I understand it, FDT was intended as an umbrella term for MIRI-style decision theories, which illustrated the critical points without making too many commitments. So, the vagueness of FDT was partly by design.
I think UDT is a more concrete illustration of the most important points relevant to this discussion.
The optimality notion of UDT is clear. “UDT gets the most utility” means “UDT gets the highest expected value with respect to its own prior”. This seems quite well-defined, hopefully addressing your (VII).
There are problems applying UDT to realistic situations, but UDT makes perfect sense and is optimal in a straightforward sense for the case of single-player extensive form games. That doesn’t address multi-player games or logical uncertainty, but it is enough for much of Will’s discussion.
FDT focused on the weird logical cases, which is in fact a major part of the motivation for MIRI-style decision theory. However, UDT for single-player extensive-form games actually gets at a lot of what MIRI-style decision theory wants, without broaching the topic of logical counterfactuals or proving-your-own-action directly.
The problems which create a deep indeterminacy seem, to me, to be problems for other decision theories than FDT as well. FDT is trying to face them head-on. But there are big problems for applying EDT to agents who are physically instantiated as computer programs and can prove too much about their own actions.
This also hopefully clarifies the sense in which I don’t think the decisions pointed out in (III) are bizarre. The decisions are optimal according to the very probability distribution used to define the decision problem.
There’s a subtle point here, though, since Will describes the decision problem from an updated perspective—you already know the bomb is in front of you. So UDT “changes the problem” by evaluating “according to the prior”. From my perspective, because the very statement of the Bomb problem suggests that there were also other possible outcomes, we can rightly insist to evaluate expected utility in terms of those chances.
Perhaps this sounds like an unprincipled rejection of the Bomb problem as you state it. My principle is as follows: you should not state a decision problem without having in mind a well-specified way to predictably put agents into that scenario. Let’s call the way-you-put-agents-into-the-scenario the “construction”. We then evaluate agents on how well they deal with the construction.
For examples like Bomb, the construction gives us the overall probability distribution—this is then used for the expected value which UDT’s optimality notion is stated in terms of.
For other examples, as discussed in Decisions are for making bad outcomes inconsistent, the construction simply breaks when you try to put certain decision theories into it. This can also be a good thing; it means the decision theory makes certain scenarios altogether impossible.
The point about “constructions” is possibly a bit subtle (and hastily made); maybe a lot of the disagreement will turn out to be there. But I do hope that the basic idea of UDT’s optimality criterion is actually clear—“evaluate expected utility of policies according to the prior”—and clarifies the situation with FDT as well.
Replying to one of Will’s edits on account of my comments to the earlier draft:
Yeah, the thing is, the FDT paper focused on examples where “expected utility according to the prior” becomes an unclear notion due to logical uncertainty issues. It wouldn’t have made sense for the FDT paper to focus on that, given the desire to put the most difficult issues into focus. However, FDT is supposed to accomplish similar things to UDT, and UDT provides the more concrete illustration.
The policy that does best in expected utility according to the prior is the policy of taking whatever you like. In games of partial information, decisions are defined as functions of information states; and in the situation as described, there are separate information states for liking hummus and liking avocado. Choosing the one you like achieves a higher expected utility according to the prior, in comparison to just choosing avocado no matter what. In this situation, optimizing the decision in this way is equivalent to updating on the information; but, not always (as in transparent newcomb, Bomb, and other such problems).
To re-state that a different way: in a given information state, UDT is choosing what to do as a function of the information available, and judging the utility of that choice according to the prior. So, in this scenario, we judge the expected utility of selecting avocado in response to liking hummus. This is worse (according to the prior!) than selecting hummus in response to liking hummus.
So, this is an interesting one. I could make the argument that UDT would actually suggest taking the opposite of the one you like currently.
It depends on how far you think the future (and yourself) will extend. You can reason that if you were to like both hummus and avocado, you should take both. The problem as stated doesn’t appear to exclude this.
If you know the information observed about humans that we tend to get used to what we do repeatedly as part of your prior, then you can predict that you will come to like (whichever of avocado or hummus that you don’t currently like), if you repeatedly choose to consume it.
Then since there’s no particular reason why doing this would make you later prefer the other option less (and indeed, a certain amount of delayed gratification can increase later enjoyment), in order to achieve the most total utility you would take either both together if you predicted you would like that more at the immediate decision point, or if you are indifferent between both and the unappealing one, then you should take only the unappealing one because doing that more often will allow you to later obtain more utility.
I think this would be the recommendation of UDT if the prior were to say that you would face similar choices to this one “sufficiently often”.
This is why, for example, I almost always eat salads/greens or whichever part of a meal is less appealing before the later, more enjoyable part—you get more utility both immediately (over the course of the meal) and long term by not negatively preferring the unappealing food option so much.
Here are some (very lightly edited) comments I left on Will’s draft of this post. (See also my top-level response.)
Responses to Sections II and III:
This point shows the intertwining of logical counterfactuals (counterpossibles) and logical uncertainty. I take logical induction to represent significant progress generalizing probability theory to the case of logical uncertainty, ie, objects which have many of the virtues of probability functions while not requiring certainty about entailment of known facts. So, we can substantially reply to this objection.
However, replying to this objection does not necessarily mean we can define logical counterfactuals as we would want. So far we have only been able to use logical induction to specify a kind of “logically uncertain evidential conditional”. (IE, something closer in spirit to EDT, which does behave more like FDT in some problems but not in general.)
I want to emphasize that I agree that specifying what logical counterfactuals are is a grave difficulty, so grave as to seem (to me, at present) to be damning, provided one can avoid the difficulty in some other approach. However, I don’t actually think that the difficulty can be avoided in any other approach! I think CDT ultimately has to grapple with the question as well, because physics is math, and so physical counterfactuals are ultimately mathematical counterfactuals. Even EDT has to grapple with this problem, ultimately, due to the need to handle cases where one’s own action can be logically known. (Or provide a convincing argument that such cases cannot arise, even for an agent which is computable.)
(Obligatory remark that what maximizes utility is part of what’s at issue here, and for precisely this reason, an FDTist could respond that it’s CDT and EDT which fail in the Bomb example—by failing to maximize the a priori expected utility of the action taken.)
FDT would disagree with this principle in general, since full certainty implies certainty about one’s action, and the utility to be received, as well as everything else. However, I think we can set that aside and say there’s a version of FDT which would agree with this principle in terms of prior uncertainty. It seems cases like Bomb cannot be set up without either invoking prior uncertainty (taking the form of the predictor’s failure rate) or bringing the question of how to deal with logically impossible decisions to the forefront (if we consider the case of a perfect predictor).
Why should prior uncertainty be important, in cases of posterior certainty? Because of the prior-optimality notion (in which a decision theory is judged on a decision problem based on the utility received in expectation according to the prior probability which defines the decision problem).
Prior-optimality considers the guaranteed-payoff objection to be very similar to objecting to a gambling strategy by pointing out that the gambling strategy sometimes loses. In Bomb, the problem clearly stipulates that an agent who follows the FDT recommendation has a trillion trillion to one odds of doing better than an agent who follows the CDT/EDT recommendation. Complaining about the one-in-a-trillion-trillion chance that you get the bomb while being the sort of agent who takes the bomb is, to an FDT-theorist, like a gambler who has just lost a trillion-trillion-to-one bet complaining that the bet doesn’t look so rational now that the outcome is known with certainty to be the one-in-a-trillion-trillion case where the bet didn’t pay well.
And why, on your account, is this implausible? To my eye, this is right there in the decision problem, not a weird counterintuitive consequence of FDT: the decision problem stipulates that algorithms which output ‘left’ will not end up in the situation of taking a bomb, with very, very high probability.
Again, complaining that you now know with certainty that you’re in the unlucky position of seeing the bomb seems irrelevant in the way that a gambler complaining that they now know how the dice fell seems irrelevant—it’s still best to gamble according to the odds, taking the option which gives the best chance of success.
(But what I most want to convey here is that there is a coherent sense in which FDT does the optimal thing, whether or not one agrees with it.)
One way of thinking about this is to say that the FDT notion of “decision problem” is different from the CDT or EDT notion, in that FDT considers the prior to be of primary importance, whereas CDT and EDT consider it to be of no importance. If you had instead specified ‘bomb’ with just the certain information that ‘left’ is (causally and evidentially) very bad and ‘right’ is much less bad, then CDT and EDT would regard it as precisely the same decision problem, whereas FDT would consider it to be a radically different decision problem.
Another way to think about this is to say that FDT “rejects” decision problems which are improbable according to their own specification. In cases like Bomb where the situation as described is by its own description a one in a trillion trillion chance of occurring, FDT gives the outcome only one-trillion-trillion-th consideration in the expected utility calculation, when deciding on a strategy.
Also, I note that this analysis (on the part of FDT) does not hinge in this case on exotic counterfactuals. If you set Bomb up in the Savage framework, you would be forced to either give only the certain choice between bomb and not-bomb (so you don’t represent the interesting part of the problem, involving the predictor) or to give the decision in terms of the prior, in which case the Savage framework would endorse the FDT recommendation.
Another framework in which we could arrive at the same analysis would be that of single-player extensive-form games, in which the FDT recommendation corresponds to the simple notion of optimal strategy, whereas the CDT recommendation amounts to the stipulation of subgame-optimality.
Response to Section IV:
I am basically sympathetic to this concern: I think there’s a clear intuition that FDT is 2-boxing more than we would like (and a clear formal picture, in toy formalisms which show FDT-ish DTs failing on Agent Simulates Predictor problems).
Of course, it all depends on how logical counterfactuals are supposed to work. From a design perspective, I’m happy to take challenges like this as extra requirements for the behavior of logical counterfactuals, rather than objections to the whole project. I intuitively think there is a notion of logical counterfactual which fails in this respect, but, this does not mean there isn’t some other notion which succeeds. Perhaps we can solve the easy problem of one-boxing with a strong predictor first, and then look for ways to one-box more generally (and in fact, this is what we’ve done—one-boxing with a strong predictor is not so difficult).
However, I do want to add that when Omega uses very weak prediction methods such as the examples given, it is not so clear that we want to one-box. Will is presuming that Y&S simply want to one-box in any Newcomb problem. However, we could make a distinction between evidential Newcomb problems and functional Newcomb problems. Y&S already state that they consider some things to be functional Newcomb problems despite them not being evidential Newcomb problems (such as transparent Newcomb). It stands to reason that there would be some evidential Newcomb problems which are not functional Newcomb problems, as well, and that Y&S would prefer not to one-box in such cases.
In this example, it seems quite plausible that there’s a (logico-causal) reason for the regularity, so that in the logical counterfactual where you act differently, your reference class also acts somewhat differently. Say you’re Scottish, and 10% of Scots read a particular fairy tale growing up, and this is connected with why you two-box. Then in the counterfactual in which you one-box, it is quite possible that those 10% also one-box. Of course, this greatly weakens the connection between Omega’s prediction and your action; perhaps the change of 10% is not enough to tip the scales in Omega’s prediction.
In the TDT document, Eliezer addresses this concern by pointing out that CDT also takes a description of the causal structure of a problem as given, begging the question of how we learn causal counterfactuals. In this regard, FDT and CDT are on the same level of promissory-note-ness.
It might, of course, be taken as much more plausible that a technique of learning the physical-causal structure can be provided, in contrast to a technique which learns the logical-counterfactual structure.
I want to inject a little doubt about which is easier. If a robot is interacting with an exact simulation of itself (in an iterated prisoner’s dilemma, say), won’t it be easier to infer that it directly controls the copy than it is to figure out that the two are running on different computers and thus causally independent?
Put more generally: logical uncertainty has to be handled one way or another; it cannot be entirely put aside. Existing methods of testing causality are not designed to deal with it. It stands to reason that such methods applied naively to cases including logical uncertainty would treat such uncertainty like physical uncertainty, and therefore tend to produce logical-counterfactual structure. This would not necessarily be very good for FDT purposes, being the result of unprincipled accident—and the concern for FDT’s counterfactuals is that there may be no principled foundation. Still, I tend to think that other decision theories merely brush the problem under the rug, and actually have to deal with logical counterfactuals one way or another.
To this I can only say again that FDT’s problem of defining counterfactuals seems not so different to me from CDT’s problem. A causal decision theorist should be able to work in a mathematical universe; indeed, this seems rather consistent with the ontology of modern science, though not forced by it. I find it implausible that a CDT advocate should have to deny Tegmark’s mathematical universe hypothesis, or should break down and be unable to make decisions under the supposition. So, physical counterfactuals seem like they have to be at least capable of being logical counterfactuals (perhaps a different sort of logical counterfactual than FDT would use, since physical counterfactuals are supposed to give certain different answers, but a sort of logical counterfactual nonetheless).
(But this conclusion is far from obvious, and I don’t expect ready agreement that CDT has to deal with this.)
Response to Section VIII:
I’m somewhat confused about how you can buy FDT as far as you seem to buy it in this section, while also claiming not to understand FDT to the point of saying there is no sensible perspective at all in which it can be said to achieve higher utility. From the perspective in this section, it seems you can straightforwardly interpret FDT’s notion of expected utility maximization via an evaluative focal point such as “the output of the algorithm given these inputs”.
This evaluative focal point addresses the concern you raise about how bounded ability to implement decision procedures interacts with a “best decision procedure” evaluative focal point (making it depart from FDT’s recommendations in so far as the agent can’t manage to act like FDT), since those concerns don’t arise (at least not so clearly) when we consider what FDT would recommend for the response to one situation in particular. On the other hand, we also can make sense of the notion that taking the bomb is best, since (according to both global-CDT and global-EDT) it is best for an algorithm to output “left” when given the inputs of the bomb problem (in that it gives us the best news about how that agent would do in bomb problems, and causes the agent to do well when put in bomb problems, in so far as a causal intervention on the output of the algorithm also affects a predictor running the same algorithm).
Responses to Sections V and VI:
I’m puzzled by this concern. Is the doctrine of expected utility plagued by a corresponding ‘implausible discontinuity’ problem because if action 1 has expected value .999 and action 2 has expected value 1, then you should take action 2, but a very small change could mean you should take action 1? Is CDT plagued by an implausible-discontinuity problem because two problems which EDT would treat as the same will differ in causal expected value, and there must be some in-between problem where uncertainty about the causal structure balances between the two options, so CDT’s recommendation implausibly makes a sharp shift when the uncertainty is jiggled a little? Can’t we similarly boggle at the implausibility that a tiny change in the physical structure of a problem should make such a large difference in the causal structure so as to change CDT’s recommendation? (For example, the tiny change can be a small adjustment to the coin which determines which of two causal structures will be in play, with no overall change in the evidential structure.)
It seems like what you find implausible about FDT here has nothing to do with discontinuity, unless you find CDT and EDT similarly implausible.
This is obviously a big challenge for FDT; we don’t know what logical counterfactuals look like, and invoking them is problematic until we do.
However, I can point to some toy models of FDT which lend credence to the idea that there’s something there. The most interesting may be MUDT (see the “modal UDT” section of this summary post). This decision theory uses the notion of “possible” from the modal logic of provability, so that despite being a deterministic agent and therefore only taking one particular action in fact, agents have a well-defined possible-world structure to consider in making decisions, derived from what they can prove.
I have a post planned that focuses on a different toy model, single-player extensive-form games. This has the advantage of being only as exotic as standard game theory.
In both of these cases, FDT can be well-specified (at least, to the extent we’re satisfied with calling the toy DTs examples of FDT—which is a bit awkward, since FDT is kind of a weird umbrella term for several possible DTs, but also kind of specifically supposed to use functional graphs, which MUDT doesn’t use).
It bears mentioning that a Bayesian already regards the probability distribution representing a problem to be deeply indeterminate, so this seems less bad if you start from such a perspective. Logical counterfactuals can similarly be thought of as subjective objects, rather than some objective fact which we have to uncover in order to know what FDT does.
On the other hand, greater indeterminacy is still worse; just because we already have lots of degrees of freedom to mess ourselves up with doesn’t mean we happily accept even more.
Part of the reason that I’m happy for FDT to need such a fact is that I think I need such a fact anyway, in order to deal with anthropic uncertainty, and other issues.
If you don’t think there’s such a fact, then you can’t take a computationalist perspective on theory of mind—in which case, I wonder what position you take on questions such as consciousness. Obviously this leads to a number of questions which are quite aside from the point at hand, but I would personally think that questions such as whether an organism is experiencing suffering have to do with what computations are occurring. This ultimately cashes out to physical facts, yes, but it seems as if suffering should be a fundamentally computational fact which cashes out in terms of physical facts only in a substrate-independent way (ie, the physical facts of importance are precisely those which pertain to the question of which computation is running).
Indeed, I think this is one of the main obstacles to a satisfying account—a successful account should not have this property.
Response to Section VII:
You make the claim that EDT and CDT can claim optimality in exactly the same way that FDT can, here, but I think the arguments are importantly not symmetric. CDT and EDT are optimal according to their own optimality notions, but given the choice to implement different decision procedures on later problems, both the CDT and EDT optimality notions would endorse selecting FDT over themselves in many of the problems mentioned in the paper, whereas FDT will endorse itself.
Most of this section seems to me to be an argument to make careful level distinctions, in an attempt to avoid the level-crossing argument which is FDT’s main appeal. Certainly, FDTers such as myself will often use language which confuses the various levels, since we take a position which says they should be confusable—the best decision procedures should follow the best policies, which should take the best actions. But making careful level distinctions does not block the level-crossing argument, it only clarifies it. FDT may not be the only “consistent fixed-point of normativity” (to the extent that it even is that), but CDT and EDT are clearly not that.
I basically agree that the FDT paper dropped the ball here, in that it could have given a toy setting in which ‘fair’ is rigorously defined (in a pretty standard game-theoretic setting) and FDT has the claimed optimality notion. I hope my longer writeup can make such a setting clear.
Briefly: my interpretation of the “FDT does better” claim in the FDT paper is that FDT is supposed to take UDT-optimal actions. To the extent that it doesn’t take UDT-optimal actions, I mostly don’t endorse the claim that it does better (though I plan to note in a follow-up post an alternate view in which the FDT notion of optimality may be better).
The toy setting I have in mind that makes “UDT-optimal” completely well-defined is actually fairly general. The idea is that if we can represent a decision problem as a (single-player) extensive-form game, UDT is just the idea of throwing out the requirement of subgame-optimality. In other words, we don’t even need a notion of “fairness” in the setting of extensive-form games—the setting isn’t rich enough to represent any “unfair” problems. Yet it is a pretty rich setting.
This observation was already made here: https://www.lesswrong.com/posts/W4sDWwGZ4puRBXMEZ/single-player-extensive-form-games-as-a-model-of-udt. Note that there are some concerns in the comments. I think the concerns make sense, and I’m not quite sure how I want to address them, but I also don’t think they’re damning to the toy model.
The FDT paper may have left out this model out of a desire for greater generality, which I do think is an important goal—from my perspective, it makes sense not to reduce things to the toy model in which everything works out nicely.
“Physics is math” is ontologically reductive.
Physics can often be specified as a dynamical system (along with interpretations of e.g. what high-level entities it represents, how it gets observed). Dynamical systems can be specified mathematically. Dynamical systems also have causal counterfactuals (what if you suddenly changed the system state to be this instead?).
Causal counterfactuals defined this way have problems (violation of physical law has consequences). But they are well-defined.
Yeah, agreed, I no longer endorse the argument I was making there—one has to say more than “physics is math” to establish the importance of dealing with logical counterfactuals.
It seems important to acknowledge that there’s a version of the Bomb argument that actually works, at least if we want to apply UDT to humans as opposed to AIs, and this may be part of what’s driving Will’s intuitions. (I’ll use “UDT” here because that’s what I’m more familiar with, but presumably everything transfers to FDT.)
First there’s an ambiguity in Bomb as written, namely what does my simulation see? Does it see a bomb in Left, or no bomb? Suppose the setup is that the simulation sees no bomb in Left. In that case since obviously I should take Left when there’s no bomb in it (and that’s what my simulation would do), if I am seeing a bomb in Left it must mean I’m in the 1 in a trillion trillion situation where the predictor made a mistake, therefore I should (intuitively) take Right. UDT also says I should take Right so there’s no problem here.
Now suppose the simulation is set up to see a bomb in Left. In that case, when I see a bomb in Left, I don’t know if I’m a simulation or a real person. If I was selfish in an indexical way, I would think something like “If I’m a simulation then it doesn’t matter what I choose. The simulation will end as soon as I make a choice so my choice is inconsequential. But if I’m a real person, choosing Left will cause me to be burned. So I should choose Right.” The thing is, UDT is incompatible with this kind of selfish values, because UDT takes a utility function that is defined over possible histories of the world and not possible centered histories of the world (i.e., histories with an additional pointer that says this is “me”). UDT essentially forces an agent to be altruistic to its copies, and therefore is unable to give the intuitively correct answer in this case.
If we’re doing decision theory for humans, then the incompatibility with this kind of selfish values would be a problem because humans plausibly do have this kind of selfish values as part of our complex values and whatever decision theory we use perhaps should be able to handle it. However if we’re building an AI, it doesn’t seem to make sense to let it have selfish values (i.e., have a utility function over centered histories as opposed to uncentered histories), so UDT seems fine (at least as far as this issue is concerned) for thinking about how AIs should ideally make decisions.
It seems to me that even in this example, a person (who is selfish in an indexical way) would prefer—before opening their eyes—to make a binding commitment to choose left. If so, the “intuitively correct answer” that UDT is unable to give is actually just the result of a failure to make a beneficial binding commitment.
That’s true, but they could say, “Well, given that no binding commitment was in fact made, and given my indexically selfish values, it’s rational for me to choose Right.” And I’m not sure how to reply to that, unless we can show that such indexically selfish values are wrong somehow.
I agree. It seems that in that situation the person would be “rational” to choose Right.
I’m still confused about the “UDT is incompatible with this kind of selfish values” part. It seems that an indexically-selfish person—after failing to make a binding commitment and seeing the bomb—could still rationally commit to UDT from that moment on, by defining the utility s.t. only copies that found themselves in that situation (i.e. those who failed to make a binding commitment and saw the bomb) matter. That utility is a function over uncentered histories of the world, and would result in UDT choosing Right.
I don’t see anything wrong with what you’re saying, but if you did that you’d end up not being an indexically selfish person anymore. You’d be selfish in a different, perhaps alien or counterintuitive way. So you might be reluctant to make that kind of commitment until you’ve thought about it for a much longer time, and UDT isn’t compatible with your values in the meantime. Also, without futuristic self-modification technologies, you are probably not able to make such a commitment truly binding even if you wanted to and you tried.
Some tangentially related thoughts:
It seems that in many simple worlds (such as the Bomb world), an indexically-selfish agent with a utility function u over centered histories would prefer to commit to UDT with a utility function u′ over uncentered histories; where u′ is defined as the sum of all the “uncentered versions” of u (version i corresponds to u when the pointer is assumed to point to agent i).
Things seem to get more confusing in messy worlds in which the inability of an agent to define a utility function (over uncentered histories) that distinguishes between agent1 and agent2 does not entail that the two agents are about to make the same decision.
By the way, selfish values seem related to the reward vs. utility distinction. An agent that pursues a reward that’s about particular events in the world rather than a more holographic valuation seems more like a selfish agent in this sense than a maximizer of a utility function with a small-in-space support. If a reward-seeking agent looks for reward channel shaped patterns instead of the instance of a reward channel in front of it, it might tile the world with reward channels or search the world for more of them or something like that.
A possible response to this argument is that the predictor may be able to accurately predict the agent without explicitly simulating them. A possible counter-response to this is to posit that any sufficiently accurate model of a conscious agent is necessarily conscious itself, whether the model takes the form of an explicit simulation or not.
It is more probable that you are misinformed about the predictor. But your conclusion is correct, take the right box.
(I work at MIRI, and edited the Cheating Death in Damascus paper, but this comment wasn’t reviewed by anyone else at MIRI.)
But this principle prevents you from cooperating with yourself across empirical branches in the world!
Suppose a good predictor offers you a fair coin flip at favorable odds (say, 2 of their dollars to one of yours). If you called correctly, you can either forgive (no money moves) or demand; if you called incorrectly, you can either pay up or back out. The predictor only responds to your demand that they pay up if they predict that you would yourself pay up when you lose, but otherwise this interaction doesn’t affect the rest of your life.
You call heads, the coin comes up tails. The Guaranteed Payoffs principle says:
The FDT perspective is to say:
Note that the Bomb case is one in which we condition on the 1 in a trillion trillion failure case, and ignore the 999999999999999999999999 cases in which FDT saves $100. This is like pointing at people who got into a plane that crashed and saying “what morons, choosing to get on a plane that would crash!” instead of judging their actions from the state of uncertainty that they were in when they decided to get on the plane.
This is what Abram means when he says “with respect to the prior of the decision problem”; not that the FDT agent is expected to do well from any starting spot, but from the ‘natural’ one. (If the problem statement is as described and the FDT agent sees “you’ll take the right box” and the FDT agent takes the left box, then it must be the case that this was the unlucky bad prediction and made unlikely accordingly.) It’s not that the FDT agent wanders through the world unable to determine where it is even after obtaining evidence; it’s that as the FDT agent navigates the world it considers its impact across all (connected) logical space instead of just immediately downstream of itself. Note that in my coin flip case, FDT is still trying to win the reward when the coin comes up heads even though in this case it came up tails, as opposed to saying “well, every time I see this problem the coin will come up tails, therefore I shouldn’t participate in the bet.”
[I do think this jump, from ‘only consider things downstream of you’ to ‘consider everything’, does need justification and I think the case hasn’t been as compelling as I’d like it to be. In particular, the old name for this, ‘updatelessness’, threw me for a loop for a while because it sounded like the dumb “don’t take input from your environment” instead of the conscious “consider what impact you’re having on hypothetical versions of yourself”.]
It seems to me like either you are convinced that the predictor is using features you can control (based on whether or not you decide to one-box) or features you can’t control (like whether you’re English or Scottish). If you think the latter, you two-box (because regardless of whether the predictor is rewarding you for being Scottish or not, you benefit from the $1000), and if you think the former you one-box (because you want to move the probability that the predictor fills the large box).
According to me, the simulation is just a realistic way to instantiate an actual dependence between the decision I’m making now and the prediction. (Like, when we have AIs we’ll actually be able to put them in Newcomb-like scenarios!) If you want to posit a different, realistic version of that, then FDT is able to handle it (and the difficulty is all in moving from the English description of the problem to the subjunctive dependency graph).
I don’t think this is right; I think this is true only if the FDT agent thinks that S (a physically verifiable fact about the world, like the lesion) is logically downstream of its decision. In the simplest such graph I can construct, S is still logically upstream of the decision; are we making different graphs?
I don’t buy this as an objection; decisions are often discontinuous. Suppose I’m considering staying at two different hotels, one with price A and the other with price B with B<A; then construct a series of changes to A that moves it imperceptibly, and at some point my decision switches abruptly from staying at hotel B to staying at hotel A. Whenever you pass multiple continuous quantities through an argmin or argmax, you can get sudden changes.
(Or, put a more analogous way, you can imagine insurance against an event with probability p, and we smoothly vary p, and at some point our action discontinuously jumps from not buying the insurance to buying the insurance.)
I am deeply confused how someone who is taking decision theory seriously can accept Guaranteed Payoffs as correct. I’m even more confused how it can seem so obvious that anyone violating it has a fatal problem.
Under certainty, this is assuming CDT is correct, when CDT seems to have many problems other than certainty. We can use Vaniver’s examples above, or use a reliable insurance agent to remove any uncertainty, or we also can use any number of classic problems without any uncertainty (or remove it), and see that such an agent loses—e.g. Parfit’s Hitchhiker in the case where he has 100% accuracy.
As a further example, consider glomarization. If I haven’t committed a crime, pleading the fifth is worse than pleading innocence; however it means that when I have committed a crime, I have to either pay the costs of pleading guilty, pay the costs of lying, or plead the fifth (which will code to “I’m guilty”, because I never say it when I’m innocent). If I care about honesty and being difficult to distinguish from the versions of myself who commit crimes, then I want to glomarize even before I commit any crimes.
Comment removed for posterity.
See also Nate Soares in Decisions are for making bad outcomes inconsistent. This is sort of a generalization, where ‘decisions are for making bad outcomes unlikely.’
I have to say, I find these criticisms a bit weak. Going through them:
III. FDT sometimes makes bizarre recommendations
I’d note that successfully navigating Parfit’s hitchhiker also involve violating “Guaranteed Payoffs”: you pay the driver at a time when there is no uncertainty, and where you get better utility from not doing so. So I don’t think Guaranteed Payoffs is that sound a principle.
Your bomb example is a bit underdefined, since the predictor is predicting your actions AND giving you the prediction. If the predictor is simulating you and asking “would you go left after reading a prediction that you are going right”, then you should go left; because, by the probabilities in the setup, you are almost certainly a simulation (this is kind of a “counterfactual Parfit hitchhiker” situation).
If the predictor doesn’t simulate you, and you KNOW they said to go right, you are in a slightly different situation, and you should go right. This is akin to waking up in the middle of the Parfit hitchhiker experiment, when the driver has already decided to save you, and deciding whether to pay them.
IV. FDT fails to get the answer Y&S want in most instances of the core example that’s supposed to motivate it
This section is incorrect, I think. In this variant, the contents of the boxes are determined not by your decision algorithm, but by your nationality. And of course two-boxing is the right decision in that situation!
But it does depend on things like this. There’s no point in one-boxing unless your one-boxing is connected with the predictor believing that you’d one-box. In a simulation, that’s the case; in some other situations where the predictor looks at your algorithm, that’s also the case. But if the predictor is predicting based on nationality, then you can freely two-box without changing the predictor’s prediction.
V. Implausible discontinuities
There’s nothing implausible about discontinuity in the optimal policy, even if the underlying data is continuous. If p is the probability that we’re in a smoking lesion vs a Newcomb problem, then as p changes from 0 to 1, the expected utility of one-boxing falls and the expected utility of two-boxing rises. At some point, the optimal action will jump discontinuously from one to the other.
VI. FDT is deeply indeterminate
I agree FDT is indeterminate, but I don’t agree with your example. Your two calculators are clearly isomorphic, just as if we used a different numbering system for one versus the other. Talking about isomorphic algorithms avoids worrying about whether they’re the “same” algorithm.
Indeed. But since you and your simulation are isomorphic, you can look at what the consequences are of you outputting “two-box” while your simulation outputs “deux boites” (or “one-box” and “une boite”). And {one-box, une boite} is better than {two-box, deux boites}.
But why did I use those particular interpretations of me and my simulation’s physical processes? Because those interpretations are the ones relevant to the problem at hand. Me and my simulation will have a different weight, consume different amounts of power, are run at different times, and probably at different speeds. If those were relevant to the Newcomb problem, then the fact we are different becomes relevant. But since they aren’t, we can focus in on the core of the matter. (you can also consider the example of playing the prisoner’s dilemma against an almost-but-not-quite-identical copy of yourself).
I object to the framing of the bomb scenario on the grounds that low probabilities of high stakes are a source of cognitive bias that trip people up for reasons having nothing to do with FDT. Consider the following decision problem: “There is a button. If you press the button, you will be given $100. Also, pressing the button has a very small (one in a trillion trillion) chance of causing you to burn to death.” Most people would not touch that button. Using the same payoffs and probabilies in a scenario to challenge FDT thus exploits cognitive bias to make FDT look bad. A better scenario would be to replace the bomb with something that will fine you $1000 (and, if you want, also increase the chance of of error).
I think the crucial difference here is how easily you can cause the predictor to be wrong. In the case where the predictor simulates you, if you two-box, then the predictor expects you to two-box. In the case where the predictor uses your nationality to predict your behavior, Scots usually one-box, and you’re Scottish, if you two-box, then the predictor will still expect you to one-box because you’re Scottish.
I didn’t think that was supposed to matter at all? I haven’t actually read the FDT paper, and have mostly just been operating under the assumption that FDT is basically the same as UDT, but UDT didn’t build in any dependency on external agents, and I hadn’t heard about any such dependency being introduced in FDT; it would surprise me if it did.
(I’m not a decision theorist)
Fulfilling the Guaranteed Payoffs principle as defined here seems to entail two-boxing in the Transparent Newcomb’s Problem, and generally not being able to follow through on precommitments when facing a situation with no uncertainty.
My understanding is that a main motivation for UDT (which FDT is very similar to?) is to get an agent that, when finding itself in a situation X, follows through on any precommitment that—before learning anything about the world—the agent would have wanted to follow through on when it is in situation X. Such a behavior would tend to violate the Guaranteed Payoffs principle, but would be beneficial for the agent?
Yeah, wouldn’t someone following Guaranteed Payoffs as laid out in the post be unable to make credible promises?
There’s an interesting relationship with mathematizing of decision problems here, which I think is reflective of normal philosophy practice.
For example, in the Smoking Lesion problem, and in similar cases where you consider an agent to have “urges” or “dispositions” et c., it’s important to note that these are pre-mathematical descriptions of something we’d like our decision theory to consider, and that to try to directly apply them to a mathematical theory is to commit a sort of type error.
Specifically, a decision-making procedure that “has a disposition to smoke” is not FDT. It is some other decision theory that has the capability to operate in uncertainty about its own dispositions.
I think it’s totally reasonable to say that we want to research decision theories that are capable of this, because this epistemic state of not being quite sure of your own mind is something humans have to deal with all the time. But one cannot start with a mathematically specified decision theory like proof-based UDT or causal-graph-based CDT and then ask “what it would do if it had the smoking lesion.” It’s a question that seems intuitively reasonable but, when made precise, is nonsense.
I think what this feels like to philosophers is giving the verbal concepts primacy over the math. (With positive associations to “concepts” and negative associations to “math” implied). But what it leads to in practice is people saying “but what about the tickle defense?” or “but what about different formulations of CDT” as if they were talking about different facets of unified concepts (the things that are supposed to have primacy), when these facets have totally distinct mathematizations.
At some point, if you know that a tree falling in the forest makes the air vibrate but doesn’t lead to auditory experiences, it’s time to stop worrying about whether it makes a sound.
So obviously I (and LW orthodoxy) are on the pro-math side, and I think most philosophers are on the pro-concepts side (I’d say “pro-essences,” but that’s a bit too on the nose). But, importantly, if we agree that this descriptive difference exists, then we can at least work to bridge it by being clear about whether were’s using the math perspective or the concept perspective. Then we can keep different mathematizations strictly separate when using the math perspective, but work to amalgamate them when talking about concepts.
Not a decision theorist, but my intuition on the first example with the bomb also says “take the bomb”. I don’t think it’s obvious or universal that one should choose to avoid burning slowly to death; the example may make more sense if one optimizes over “agents like me who encounter the box”, instead of “the specific agent who sees a bomb”; ie. acting under a Rawlsian veil. The standard argument is if you could commit yourself in advance to slowly burning to death if you see a bomb, you would certainly do so; the commitment all but guarantees it does not happen. For another example, “maximize payoff for any situation you find yourself in” fails to second-strike in global thermonuclear warfare (MAD), leading to the extinction of humanity. (This is not dissimilar to slowly burning to death.) So I think your “guaranteed payoff” rule is contradicted in practice; one may argue it does little more than judge FDT by CDT.
I wrote a response here: https://www.lesswrong.com/posts/R8muGSShCXZEnuEi6/a-defense-of-functional-decision-theory, where I attempt to refute some of the points made here.
Thanks for posting Will!
Would you mind me translating all of your formulas into proper LaTeX? That will allow more browsers to render them correctly, and will also improve the layout in various small ways.
Planned summary for the Alignment Newsletter:
Planned opinion:
Regarding Guaranteed Payoffs (if I am understanding what that means), I think a relevant point was made in response to a previous review https://www.lesswrong.com/posts/BtN6My9bSvYrNw48h/open-thread-january-2019#7LXDN9WHa2fo7dYLk
Yes, FDT rejects some pretty foundational principles, yes, it’s wild, yes we know, we really do think those principles might be wrong. Would you be willing to explain what’s so important about guaranteed payoffs?
CDT makes its decisions as a pure function of the present and future, this seems reasonable and people use that property to simplify their decisionmaking all of the time, but it requires them to ignore promises that we would have liked them to have made in the past. This seems very similar to being unable to sign contracts or treaties because no one can trust you to keep to it when it becomes convenient for you to break it. It’s a missing capability. Usually, missing a capability is not helpful.
I note that there is a common kind of agent that is cognitively transparent enough to prove whether or not it can keep a commitment; governments. They need to be able to make and hold commitments all of the time. I’d conject that maybe most discourse about decisionmaking is about the decisions of large organisations rather than individuals.
Regarding the difficulty of robustly identifying algorithms in physical process… I’m fairly sure having that ability is going to be a fairly strict pre-requisite to being able to reason abstractly about anything at all. I’m not sure how to justify this, but I might be able to disturb your preconceptions with a paradox if… I’ll have to ask first, do you consider there to be anything mysterious about consciousness? If you’re a dennetian, the paradox I have in mind wont land for you and I’ll have to try to think of another one.
An agent also faces a guaranteed payoffs problem in Parfit’s hitchhiker, since the driver has already made their prediction (the agent knows they’re safe in the town) so the agent’s choice is between losing $1,000 and losing $0. Is it also a bad idea for the agent to pay the $1,000 in this problem?
Heres an attempt to explain it using only causal subjunctives:
Say you build an agent, and you know in advance that it will face a certain decision problem. What choice should you have it make, to achieve as much utility as possible? Take Newcombs problem. Your choice to make the agent one-box will cause the agent to one-box, getting however much utility is in the box. It will also cause the simulator to predict the agent will one-box, meaning that the box will be filled. Thus CDT recommends building a one-boxing agent.
In the Sandwich problem, noone is trying to predict the agent. Therefore your choice of how to build it will cause only its actions and their consequences, and so you want the agent to switch to hummus after it learned that they are better.
FDT generalizes this approach into a criterion of rightness. It says that the right action in a given decision problem is the one that the agent CDT recommends you to build would take.
Now the point where the logical uncertainty does come in is the idea of “what you knew at the time”. While that doesnt avoid the issue, it puts it into a form thats more acceptable to academic philosophy. Clearly some version of “what you knew at the time” is needed to do decision theory at all, because we want to say that if you get unlucky in a gamble with positive expected value, you acted rationally.
I wrote a LW post as a reply to this. I explain several points of disagreement with MacAskill and Y&S alike. See here.
I feel the bomb problem could be better defined. What is the predictor predicting? Is it always predicting what you’ll do when you see the note saying it will predict right? What about if you don’t see this note because it predicts you’ll go left? Then there’s the issue that if it makes a prediction by a) trying to predict whether you’ll see such a note or not, then b) predicting what the agent does in this case, then it’d already have to predict the agent’s choice in order to make the prediction in stage a). In other words, a depends on b and b depends on a; the situation is circular. (edited since my previous comment was incorrect)
My intuition is that two-boxing is the correct move in this scenario where the Predictor always fills the box with $1M for the Scots and never for the English. An Englishman has no hope of walking away with the $1M, so why should he one-box? He could wind up being one of the typical Englishmen who walk away with $1000, or one of the atypical Englishmen who walk away with $0, but he is not going to wind up being an Englishman who walks away with $1M because those don’t exist and he is not going to wind up being a Scottish millionaire because he is English.
EDT might also recommend two-boxing in this scenario, because empirically p($1M | English & one-box) = 0.
In part IV, can you explain more about what your examples prove?
You say FDT is motivated by an intuition in favor of one-boxing, but apparently this is false by your definition of Newcomb’s Problem. FDT was ultimately motivated by an intuition that it would win. It also seems based on intuitions regarding AI, if you read that post—specifically, that a robot programmed to use CDT would self-modify to use a more reflective decision theory if given the chance, because that choice gives it more utility. Your practical objection about humans may not be applicable to MIRI.
As far as your examples go, neither my actions nor my abstract decision procedure controls whether or not I’m Scottish. Therefore, one-boxing gives me less utility in the Scot-Box Problem and I should not do it.
Exception: Perhaps Scotland in this scenario is known for following FDT. Then FDT might in fact say to one-box (I’m not an expert) and this may well be the right answer.
I have a terminological question about Causal Decision Theory.
Now it seems to me that causation is understood to be antisymmetric, i.e. we can have at most one of “A causes B” and “B causes A”. In contrast, counterfactuals are not antisymmetric, and “if I chose A then my simulation would also do so” and “If my simulation chose A then I would also do so” are both true. Brian Hedden’s Counterfactual Decision Theory seems like a version of FDT.
Maybe I am reading the quoted sentence without taking context sufficiently into account, and I should understand “causal counterfactual” where “counterfactual” was written. Still, in that case, I think it’s worth noting that antisymmetry is a distinguishing mark of CDT in contrast to FDT.
Update: John Collins says that “Causal Decision Theory” is a misnomer because (some?) classical formulations make subjunctive conditionals, not causality as such, central. Cited by the Wolfgang Schwarz paper mentioned by wdmcaskill in the Introduction.
I wrote a response to this critique here: https://medium.com/how-to-build-an-asi/a-defense-of-functional-decision-theory-d86a9a19a755. I’m happy to receive feedback!
I get “page not found”. Why not crosspost it to LessWrong? Then it can be more easily discussed here and can be tagged so more people see it.
See: https://www.lesswrong.com/posts/R8muGSShCXZEnuEi6/a-defense-of-functional-decision-theory. Thanks again!
Thanks! I see it now, weird. No idea why that link doesn’t work, but crossposting indeed seems like a better idea. So thanks! I’ll do that instead.
There’s a period at the end of the URL that was automatically included; deleting that fixes the issue (I’ve edited your comment accordingly).
Hey, thanks! That’s awesome.
Typo:
The bolded should be “make” I think.
The post makes much more sense once you get to this part:
Edit: focusing on one thing.
So, some very brief comments then I’m off to do some serious writing.
This article was hilarious. The criticisms of FDT are reliably way off base, and it’s clear that whoever had these didn’t bother to look up the historical context for these decision theories.
A quick example. In the bomb hypothetical, the recommendation of FDT is obviously correct. Why? The fact that the predictor left a “helpful” note tells you absolutely nothing. I think people are assuming that “helpful” = honest or something; anyway the correct thing to do regarding the note is to completely ignore it because you have no idea, and can’t ask (since the predictor is long gone) what it meant by its “helpful” note. This is the recommendation of UDT as I understand it; it’s possible that FDT is general enough to contain as a subset UDT but even if not, as real agents we can just pick the appropriate decision theory (or lack of one) to apply in each situation.
With this view, it’s extremely clear that you should pick Left which gives you the same trillion-trillion-to-one chance of survival, and doesn’t cost anything. A decision theory that is to be used by actual agents shouldn’t be vulnerable to mugging of any kind, counterfactual or Pascal’s Wager-ish.
Most of the criticism is either completely wrong or misses the point in ways similar to this. I would explain point by point but I think I’ve taken what amusement I shall out of this.
Firstly, this is false. MacAskill works in academic philosophy, and I’m confident he’s read up on decision theory a fair bit.
Secondly, it’s unkind and unfair to repeatedly describe how you’re laughing at someone, and it’s especially bad to do it instead of presenting a detailed argument, as you say you’re doing in your last sentence.
I don’t think this needs to rise to the level of a formal moderator warning, I just want to ask you to please not be mean like this on LessWrong in future. That said, I hope you do get around to writing up your critique of this post sometime.
Look, I never said it wasn’t a serious attempt to engage with the subject, and I respect that, and I respect the author(s).
Let me put it this way. If someone writes something unintentionally funny, are you laughing at them or at what they wrote? To me there is a clear separation between author and written text.
If you’ve heard of the TV show “America’s Funniest Home Videos”, that is an example of something I don’t laugh at, because it seems to be all people getting hurt.
If someone was truly hurt by my comment then I apologise. I did not mean it that way.
I still stand by the substance of my criticism though. The fact that I was amused has nothing to do with whether what I wrote was genuine—it was. It’s sort of… Who is at fault when someone misinterprets your tone online? I don’t think either party can really have a strong claim, because writing is extremely hard to get the tone right and then as a reader you don’t know the person who wrote it either, so you could have totally different expectations of what the author of a piece of writing is thinking. Not to mention online you’re very likely to be from different countries and cultural backgrounds, who have different norms.
As a further apology, I am very very unlikely to write any more detail on this unless the original article author messages me to ask me for it.
The note is just set-dressing; you could have both the boxes have glass windows that let you see whether or not they contain a Bomb for the same conclusions if it throws you off.
Okay, because I’m bored and have nothing to do, and I’m not going to be doing serious work today, I’ll explain my reasoning more fully on this problem. As stated:
Without reference to any particular decision theory, let’s look at which is the actually correct option, and then we can see which decision theory would output that action in order to evaluate which one might “obtain more utility.”
The situation you describe with glass windows is a completely different problem and has a possibly different conclusion, so I’m not going to analyse that one.
Given in the problem statement we have:
This is an impossible situation, in that no actual agent could actually be put in this situation. However, so far, the implications are as follows:
Our experiential knowledge may be incorrect. If it is, then the logically certain knowledge can be ignored because it is as if we have a false statement as the precondition for the material conditional.
If it isn’t, then the logical implication goes:
Okay so far.
We will not assume that this “predictor” was any particular type of thing; in particular, it need not be a person.
In order to make the problem as written less confusing, we will assume that “Left” and “Right” for the predictor refer to the same things I’ve explained above.
Since there is no possible causal source for this information, as we have been magically instantiated into an impossible situation, the above quoted knowledge must also be logically certain.
Now, in thinking through this problem, we may pause, and reason that this predictor seems helpful; in that it deliberately put the bomb in the box which it, to its best judgement, predicted we would not take. Further, given in the problem statement, there is the sentence “Helpfully, she left a note, explaining that she predicted that you would take Right, and therefore she put the bomb in Left.”
This is strong evidence that the predictor, which recall we are not assuming is any type of particular thing (but this very fact doesn’t exclude the possibility that it is an agent or person-like being), is behaving as if it were helpful. So by default, we would like to trust the predictor. So when our logically certain knowledge says that she has a failure rate of 1 in 1,000,000,000,000,000,000,000,000, we would prefer to trust that as being correct.
Now, previously we found the possibility that our experiential knowledge may be incorrect; that is, we may not in fact be faced with two open boxes, even though it looks like we are; or the boxes may not contain a bomb/be empty, or some other thing. This depends on the “magically placed” being’s confidence in the ability of itself to make inferences based on the sensory information it receives.
What we observe, from the problem statement, is that there does appear to be a bomb in the Left box, and that the Right box does appear to be empty. However, we also observe that we would prefer this observation to be wrong such that our logical knowledge that we must take a box is incorrect. Because if we can avoid taking either box, then there is no risk of death, nor of losing $100.
By default, one would wish to continue a “happy life,” however in the problem statement we are given that we will never see another agent again. The prediction that a rational agent can make from this is that their life will eventually become unhappy, because happiness is known to be a temporary condition; and other agents can be physically made from natural resources given enough time; and therefore there are limited physical resources and/or time such that there is not enough to make another agent.
Making another agent when you are the only agent in existence is probably one of the hardest possible problems, but nevertheless, if you cannot do it, then you can predict that you will eventually run out of physical resources and time no matter what happens, and therefore you are in a finite universe regardless of anything else.
Since you have definitively located yourself in a finite universe; and you also have the logically certain knowledge that the simulator/predictor is long-dead and appears to be helpful, this is logically consistent as a possible world-state.
Now we have to reason about whether the experiential evidence we are seeing has a chance of more than 1,000,000,000,000,000,000,000,000 of being correct. We know how to do this: just use probability theory, which can be reduced to a mechanical procedure.
However, since we have limited resources, before we actually do this computation, we should reason about what our decision would be in each case, since there are only three possibilities:
1. the experiential evidence is less likely, in which case the simulator probably hasn’t made an error, but the first part of the logically certain knowledge we have can be ignored.
2. or the experiential evidence is more likely, in which case it’s possible that the simulator made a mistake, and although it appears trustworthy, we would be able to say that its prediction may be wrong and (separately), perhaps its note was not helpful.
3. They are exactly equally likely, in which case we default to trusting the simulator.
In each case, what would be our action?
1. In this case, the logically certain knowledge we have that we must choose one of the boxes can be ignored, but it may still be correct. So we have to find some way to check independently whether it might be true without making use of the logically certain knowledge. One way is to take the same action as option two; in addition you can split the propositions in the problem statement into atoms and take the power set and consider the implications of each one. The total information obtained by this process will inform your decision. However, logical reasoning is just another form of obtaining evidence for non-logically-omniscient agents and so in practice this option reduces to exactly the same set of possible actions as option 2. following:
2. In this case, all we have to go on is our current experiential knowledge, because the source of all our logically certain knowledge is the simulator, and since in this branch the experiential knowledge is more likely, the simulator is more likely to have made a mistake, and we must work out for ourselves what the actual situation is.
Depending on the results of that process, you might
1. Just take right, if you have $100 on you and you observe you are under coercion of some form (including “too much” time pressure; ie, if you do not have enough time to come to a decision)
2. Take neither box, because both are net negative
3. Figure out what is going on and then come back and potentially disarm the bomb/box setup in some way. Potentially in this scenario (or 2) you may be in a universe which is not finite and so even if you observe you are completely alone, it may be possible to create other agents or to do whatever else interests you and therefore have whichever life you choose for an indefinitely long time.
4. Take left and it does in fact result in the bomb exploding and you painfully dying, if the results of your observations and reasoning process output that this is the best option for some reason.
5. Take left and nothing happens because the bomb triggering process failed, and you save yourself $100.
For the purposes of this argument, we don’t need to (and realistically, can’t) know precisely which situations would cause outcome 4 to occur, because it seems extremely unlikely that any rational agent would deliberately produce this outcome except if it had a preference for dying painfully.
Trying to imagine the possible worlds in which this could occur is a fruitless endeavour because the entire setup is already impossible. However you will notice that we have already decided in advance on a small number of potential actions that we might take if we did find ourselves in this impossible scenario.
That in itself substantially reduces the resources required to make a decision if the situation were to somehow happen even though it’s impossible—we have reduced the problem to a choice of 5 actions rather than infinite, and also helped our (counterfactual self, in this impossible world) make their choice easier.
3. Case three (exactly equal likelihood) is the same action as case 1 (and hence also case 2), because the bomb setup gives us only negative utility options and the simulator setup has both positive and negative utility options, so we trust the simulator.
Now, the only situation in which right is ever taken is if the simulator is wrong and you are under coercion.
Since in the problem statement it says:
then by the problem definition, this cannot be the case except if you do not have adequate time to make a decision. So if you can come to a decision before the universe you are in ends, then Right will never be chosen, because the only possible type of coercion (since there are no other agents) is inadequate time/resources. If you can’t, then you might take right.
However, you can just use FDT to make your decision near-instantly, since this has already been studied, and it outputs Left. Since this is the conclusion you have come to by your chain of reasoning, you can pick left.
But it may still be the case, independently of both of these things, (since we are in an impossible world), that the bomb will go off.
So for an actual agent, the actual action you would take can only be described as “make the best decision you can at the time, using everything you know.”
Since we have reasoned about the possible set of actions ahead of time, we can choose from the (vaguely specified) set of 5 actions above, or we can do something else; given that we know about this reasoning that we have already performed and if actually placed in the situation we would have more evidence which could inform our actions.
However, the set of 5 actions covers all the possibilities. We also know that we would only take right if we can’t come to a decision in time, or if we are under coercion. In all other cases we prefer to take Left or take neither, or do something else entirely.
Since there are exactly two possible worlds under which we take Right, and an indefinitely large number in which we take Left, the maximum-utility option is outputted correctly by FDT which is to take Left.
In the bomb question, what we need is causal theory in which the ASI agent accurately gauges that a universe of one indicates loneliness and not in fact happiness, which is predicated on friendliness (at least for an ASI) (and I would be slightly concerned as an external observer as to why the universe was reduced to a single agent if it were not due to entropy), then figures out the perfect predictor was a prior ASI, not from that universe, giving it a clue, and then, adding all its available power to the bomb, following Asimov says: “LET THERE BE LIGHT!” And with an almighty bang (and perhaps even with all that extra explosive power, no small pain) there was light--