In Functional Decision Theory: A New Theory of Instrumental Rationality and Cheating Death in Damascus, the authors go through a multitude of decision problems and compare the performance of functional decision theory (FDT), on the one hand, and classical evidential and causal decision theory (EDT and CDT) on the other; and they come to the conclusion that former “outperforms”[1] the latter two. That is, FDT is implicitly treated as a decision theory directly comparable to EDT and CDT. I will argue that this methodology is misguided and that there is a strong sense of incommensurability in terms of their respective action recommendations: FDT belongs in a different ontology (together with all other logical decision theories), and although we might be able to compare ontologies, directly comparing the respective decision theories is not meaningful since we lack a common measurement for doing so.
The 2x2x2 decision theory grid
In the MIRI/OP decision theory discussion, Garrabrant points out that we can think about different decision theories as locations in the following three-dimensional grid (simplifying of course):
Conditionals vs. causalist counterfactuals (EDT vs. CDT).
Updatefulness vs. updatelessness (“from the perspective of what epistemic state should I make the decision, the prior or posterior?”).
That is, P(o|...,obs) or P(o|...).
Perhaps this dimension should be split up into empirical updatelessness, on the one hand, and logical updatelessness on the other.)
Physicalist agent ontology vs. algorithmic/logical agent ontology (“an agent is just a particular configuration of physical stuff doing physical things” vs. “an agent is just an algorithm relating inputs and outputs; independent of substrate and underlying structure”).
Notationally, this corresponds to using P(o|a) or P(o|"DT(s)=a") if we are evidentialists, and P(o|do(a)) or P(o|do("DT(s)=a")) if we are causalists.
(I am using “ontology” to mean something like “a map/model of the world”, which I think is the standard LW usage.)
Demski (also in the MIRI/OP discussion) makes this distinction even more fine-grained: “I am my one instance (physical); evaluate actions (of one instance; average over cases if there is anthropic uncertainty)” vs. “I am my instances: evaluate action (for all instances; no worries about anthropics)” vs. “I am my policy: evaluate policies”.
I think the third axis differs from the first two: it is arguably reasonable to think of ontology as something that is not fundamentally tied to the decision theoryin and of itself. Instead, we might want to think of decision theories as objects in a given ontology, meaning we should decide on that ontology independently—or so I will argue.
Nonetheless, we now have (2x2x2=)eight decision theories respectively corresponding to the usage of the following probabilities in calculating the expected utility (note that we are doing action selection here for the sake of simplicity[3]):
That is, FDT (as formulated in the papers) could be seen as ‘updateless CDT’ in the algorithmic agent ontology, instead of the physicalist. (And timeless decision theory, TDT, is just ‘updateful CDT’ in the algorithmic ontology.) In other words, thinking in terms of logical counterfactuals and subjunctive dependence as opposed to regular causal counterfactuals and causal dependence constitutes an ontology shift, and nothing else. And the same goes for thinking in terms of logical conditionals as opposed to regular empirical ones.
Different ontologies, different decision problems?
As said, the different ontologies correspond to different ways of thinking about what agents are. If we are then analysing a given decision problem and comparing FDT’s recommendations to the ones of EDT and CDT, we are operating in two different ontologies and using two different notions of what an agent is. In effect, since decision problems include at least one agent, decision problems considered by FDT are not the same decision problemsthat are considered by EDT and CDT.
And this is why the methodology in the FDT papers is dubious: the authors are (seemingly) making the mistake of treating the action recommendations of decision theories from different ontologies as commensurable by directly comparing their “performance”. But since we lack a common measurement— ontology-neutral decision problems—such a methodology can not give us meaningful results.
Comparing FDT to CDT
Suppose that we want to situate decision theories in the logical ontology. What is then the specific difference between FDT and what we normally call (updateful) CDT? As said, the latter is just the causalist version of TDT, and the difference between FDT and TDT is updatelessness. Or vice versa, if we are operating in the physicalist ontology, then what we call FDT is just updateless CDT and the only difference between updateless CDT and CDT is, of course, again updatelessness.
That is, in arguing for (or against) FDT over CDT the focus should be on the question of “from what epistemic state should I make the decision, the prior or posterior?”—the question of updatelessness.
But let’s look at an example where the ontological difference between FDT and CDT is the reason behind their differing action recommendations: the Twin Prisoner’s Dilemma.
Example: the Twin Prisoner’s Dilemma
Consider the Psychological Twin Prisoner’s Dilemma (from the FDT paper):
Psychological Twin Prisoner’s Dilemma. An agent and her twin must both choose to either “cooperate” or “defect.” If both cooperate, they each receive $1,000,000. If both defect, they each receive $1,000. If one cooperates and the other defects, the defector gets $1,001,000 and the cooperator gets nothing. The agent and the twin know that they reason the same way, using the same considerations to come to their conclusions. However, their decisions are causally independent, made in separate rooms without communication. Should the agent cooperate with her twin?
We have the following analyses by the different decision theories:
CDT: “the two of us are causally isolated, and defecting dominates cooperating, so I will therefore defect”.
EDT: “the two of us are highly correlated such that if I cooperate then that is strong evidence that my twin will also cooperate (and if I defect then that is strong evidence that my twin will also defect), so I will therefore cooperate”.
FDT: “I am deciding now what the output of my decision algorithm is going to be (in fact, I am my algorithm), and since we are twins (and thus share the same algorithm) I am also determining the output of my twin, so I will therefore cooperate since u(C,C)>u(D,D)”.[5]
So why exactly does FDT achieve mutual cooperation and not CDT? As I see it, this is solely due to thinking about agents differently—as algorithms and not specific physical configurations. That is, if you and another agent implement the same algorithm, then you are the same agent (at least for the purposes of decision-theoretic agency). This means that if you take action a, then this other instantiation of you will also a, because you are them. Updatelessness—which we said was the only purely decision-theoretic difference between FDT and CDT—does not even come into the picture here. It is therefore unclear what an FDT-to-CDT comparison is supposed to tell us in the Twin Prisoner’s Dilemma if the difference in recommendations just boils down to a disagreement about whether “you” are (for the purposes of decision theory) also “your twin” or not.
To see this, consider an agent ontology which says that “an agent” is simply genetic material; that is, “you” are simply an equivalence classwith respect to the relation of genetic identity (just as “you” are an equivalence class with respect to algorithmic identity in the algorithmic ontology). And suppose we want to look at a decision theory with causalist-esque counterfactuals in this ontology. Let’s call it ‘genetic decision theory’ (GDT).
(Another way of looking at it: in the “genetic ontology”, you intervene on the purely biological parts of your predisposition when making a decision, and consider downstream effects—whatever that means.)
What does GDT recommend in the (Identical) Twin Prisoner’s Dilemma? Well, since they share the same genetic material, they are the same agent, meaning if one of them decides to cooperate the other one will also cooperate (and the same for defection). It is therefore best to cooperate since u(C,C)>u(D,D).
We could then say that GDT “outperforms” CDT in the (Identical) Twin Prisoner’s Dilemma (just as FDT “outperforms” CDT). But this is clearly not meaningful nor interesting: (i) we never argued for why this conception of an agent is useful or intuitive (and it is the choice of ontology that is ensuring cooperation here, nothing else); and (ii) and the decision problem in question—the Twin Prisoner’s Dilemma—is now a different decision problem in the genetic ontology (and we are, in fact, operating in this ontology since we are applying GDT). You are that other person in the other room, it is not just your twin with whom you are correlated.
Another analogy: comparing updateless CDT and FDT is like comparing two hedonistic utilitarian theories that merely disagree about which beings are conscious. Of course, these two theories will recommend different actions in many ethical dilemmas, but that does not tell us anything of ethical substance.
Comparing FDT to EDT
In comparing FDT to EDT, on the other hand, there are two differences conditional on fixing an ontology (physicalist or algorithmic): (i) the use of causalist counterfactuals over conditionals; and (ii) updatelessness again. That is, fixing the algorithmic ontology, what we are doing is comparing the evidential variant of TDT to FDT. Or, if we are instead fixing the physicalist ontology, what we are doing is comparing EDT to updateless CDT.
But in the papers, FDT is instead directly compared to EDT, and it is once again unclear what such a comparison is supposed to tell us; especially in the cases where the results depend on the ontology that we are fixing. The Smoking Lesion is a decision problem where this is partially the case—counterfactuals matter, but also the ontology.
Smoking Lesion. An agent is debating whether or not to smoke. She knows that smoking is correlated with an invariably fatal variety of lung cancer, but the correlation is (in this imaginary world) entirely due to a common cause: an arterial lesion that causes those afflicted with it to love smoking and also (99% of the time) causes them to develop lung cancer. There is no direct causal link between smoking and lung cancer. Agents without this lesion contract lung cancer only 1% of the time, and an agent can neither directly observe nor control whether she suffers from the lesion. The agent gains utility equivalent to $1,000 by smoking (regardless of whether she dies soon), and gains utility equivalent to $1,000,000 if she doesn’t die of cancer. Should she smoke, or refrain?
The analyses:
CDT: “me getting cancer is not downstream of my choice whether to smoke or not, and smoking dominates not smoking; so I will therefore smoke”.
EDT (modulo the Tickle Defence): “due to the correlation between smoking and getting cancer, choosing to smoke would give me evidence that I will get cancer, so I will therefore not smoke”.
FDT: “smoking or not is downstream of my decision theory and the lesion is in turn not upstream of my decision theory, meaning if the decision theory outputs ‘smoke’ then that has no effect on whether I have cancer; so I will therefore smoke”.
The causal graph for the decision problem, stolen from the FDT paper:
Here CDT and FDT are said to give the correct recommendation, and EDT the incorrect one (again, modulo the Tickle Defence). But note that if we think of (updateful) EDT as something algorithmic (namely as “TEDT”), then an evidential variant of the FDT-argument can be made for smoking. Namely, if I smoke because my decision theory tells me to, then that puts me in a different reference class than the rest of the population (from where we got the brute correlation specified in the problem) and the evidence is screened off, meaning I should smoke.[6] This is not surprising: the drawing of the additional “decision theory nodes” in the causal diagram does not correspond to something endogenous to the specific decision theory in question (FDT, in this case), but it is rather a stipulation about the (logi-)causal structure of the world; and that (logi-)causal structure plausibly remains even if we are doing something evidential. (Unless, of course, the motivation for being evidentialist in the first place is having metaphysical qualms with the notion of a “causal structure of the world”.) I.e., FDT does not “outperform” algorithmic EDT in this decision problem. Once again, the importance of keeping within the same ontology is illustrated.
(But although algorithmic EDT would smoke in the Smoking Lesion, there is an analogous problem where algorithmic CDT and EDT supposedly come apart and the latter is said to give the incorrect recommendation, namely Troll Bridge.)
Objection: comparing ontologies by comparing decision theories
Here is one way of understanding the methodology in the papers (a steelman if you will); or simply one way of making sense of cross-ontological decision theory comparisons in general. We have some desiderata for our agent ontology—intuitive things that we want to be captured. And some of these features are arguably related to the behaviour of decision theories within the different ontologies. So when two decision theories from two different ontologies are compared in a decision problem where purely ontological differences are the reason for the different action recommendations (rather than some other feature, e.g. updatelessness), it is the ontologies that are being compared, and such a comparison arguably is meaningful.[7]
But then—setting aside the meaningfulness of comparing ontologies themselves for the next section—what exactly is the desiderata? Given the authors’ focus on performance, one might think that one desideratum would be something like “decision theories in the ontology should perform well”. That is, saying that e.g. the Twin Prisoner’s Dilemma is an argument for FDT, and against CDT, is just saying that the algorithmic ontology is preferable because we get better results when using it. But I don’t think this makes sense at all: we can easily make up an (arbitrarily absurd) ontology that yields great decision-theoretic results from the perspective of that ontology, e.g. the genetic one with respect to the Twin Prisoner’s Dilemma.
Furthermore, why is it that we can’t compare the ontologies directly once we have the desiderata? What would a cross-ontological comparison of decision theories tell us in that case? For example, if a desideratum is something like “agents should think that they are determining the choice of other identical agents (and not merely in an evidential way)”, then plausibly we can just deduce which ontologies satisfy this criterion without having to compare decision theories in the relevant decision problems.
In the end, I think that (i) it is very unclear why comparing ontologies by comparing the recommendations of decision theories would be useful; and (ii) the authors’ focus on performance in the FDT papers is too large for this to be the correct interpretation of their methodology. That is, my original argument should still be applicable—at least in the face of this particular objection.
One could respond that it is generally difficult to identify the desiderata, and in most cases, all we have is intuitions over decision problems that are not easily reducible. In particular, it might not be possible to tell if some intuition has to do with ontology or decision theory. For example, perhaps one just wants to take mutual cooperation in the Twin Prisoner’s Dilemma as a primitive, and until one has figured out why this is a desideratum (and thus figured out if it is about ontology or decision theory), comparisons of decision theories that merely involve ontological differences do in fact carry some information about what ontology is reasonable.[8] I am somewhat sympathetic to this argument in and of itself, although I disagree about the extent to which our intuitions are irreducible such that we can not tell whether they are about ontology or decision theory.[9] Furthermore, note that the claim (as I understand it) in the FDT papers is that mutual cooperation in the Twin Prisoner’s Dilemma is desirable because of the utility that comes with it, and it is not taken as a primitive.
Ontological commensurability
Note that I am not arguing that the physicalist ontology is preferable to the algorithmic one: for all that I (currently) know, the latter might be a better way of thinking about the nature of decision-theoretic agency.[10] My point in this post is rather, as said, that it is not meaningful to compare decision theories cross-ontologically, even if this is meant as a proxy for comparing the ontologies themselves. We should rather fix an ontology exogenously and situate all decision theories within that one—at least for the purposes of assessing their respective recommendations in decision problems.
The following questions then arise:
Can we meaningfully compare ontologies in the first place?
If yes, what makes one ontology preferable to another?
I think these are difficult questions, but ultimately I think that we probably can compare ontologies; some ontologies are simply more reasonable than others, and they do not simply correspond to “different ways of looking at the world” and that’s that. For example, one might argue that ‘agency’ is a high-level emergent phenomenon and that a reductionist physicalist ontology might be too “fine-grained” to capture what we care about, whilst the algorithmic conception abstracts away the correct amount of details.[11] Or, one might think that going from physics to logic introduces too many difficult problems, such as logical uncertainty, and this is an instrumental reason to stick with physics.
Conclusion
In sum, I made the following claims:
FDT belongs in the ‘algorithmic’ or ‘logical’ ontology, where agents are simply viewed as fixed functions relating inputs (states of the world) and outputs (actions/policies).
Classical EDT and CDT belong in the standard ‘physicalist’ ontology, where agents are physical objects.
It is not meaningful to compare the action recommendations of decision theories that belong in different agent ontologies, e.g. FDT to EDT/CDT, since there does not exist a common and objective measurement for doing so; namely, decision problems that are formulated in ontology-neutral terms.
For example, the Twin Prisoner’s Dilemma that is considered by EDT and CDT is not the same as the Twin Prisoner’s Dilemma considered by FDT since the latter comes with a different inbuilt conception of what ‘agents’ are.
But, we might be able to compare ontologies themselves, and if it is the case that we prefer one, or think that one is more ‘correct’, then we should situate decision theories in that one map before comparing them. That is, in doing decision theory, we should ideally first settle on metaphysical matters (such as “what is an agent?”), and then formulate decision theories with the same underlying metaphysical assumptions.
This methodology does not imply that we are in practice paralyzed until we have solved the metaphysics, or even until we know what the relevant questions are—we can meaningfully proceed with uncertainty and unawareness.[12] The point is rather that we should keep decision theory (which, in the framework of the aforementioned 2x2x2 grid, is about updatelessness and counterfactuals) and metaphysics distinct for the purposes of comparing decision theories, and make clear when we are operating in which ontology.
Acknowledgements
Thanks to Lukas Finnveden, Tristan Cook, and James Faville for helpful feedback and discussion. All views are mine.
Decision-theoretic performance is arguably underspecified in that there is no “objective” metric for comparing the success of different theories. See The lack of performance metrics for CDT versus EDT, etc. by Caspar Oesterheld for more. This is not the point I am making in this post, albeit somewhat related.
Note that despite using do-operators here, I’m not necessarily committed to Pearlian causality—this is just one way of cashing out the causalist counterfactual.
Arguably you only get mutual FDT cooperation when the agents are implementing the exact same algorithm, and this is somewhat unlikely for realistic agents who are approximating FDT. More on this here.
h/t Julian Stastny. And if the lesion is upstream of my decision theory and there is a stipulated correlation between being evidentialist, say, and getting cancer, then the decision problem is both “unfair” and uninteresting.
For example, in the case of the Twin Prisoner’s Dilemma, it is somewhat plausible to me that a reasonable ontology should tell us that we determine the action of exactly identical agents (however we want to cash that out), and not merely in an evidential way. But I don’t think ontology has much to do with why I want mutual cooperation in the Twin Prisoner’s Dilemma when the agents are similar, but not identical. Concretely, I don’t think the algorithmic ontology gives us mutual cooperation in that case (because, arguably, logical counterfactuals are brittle), and you need evidentialism (which is about decision theory) as well.
So I see all axes except the “algorithm” axis as “live debates” — basically anyone who has thought about it very much seems to agree that you control “the policy of agents who sufficiently resemble you” (rather than something more myopic like “your individual action”), but there are reasonable disagreements to be had about updatelessness and counterfactuals.
I am fairly confused by this though: “you control “the policy of agents who sufficiently resemble you”” could just be understood as a commitment to evidentialism, regardless of the ontology; and, tritely, I think it should be noted that academic decision theorists have thought about this (if not in the context of causalism vs. evidentialism), and most of them are CDTers. See this blogpost by Wolfgang Schwarz for commentary on FDT, for example.
Suppose we are agnostic with respect to the ontology-axis. Then I think that we should proceed thinking about e.g. updateless CDT and FDT as the same decision theory, and recognize that their superficial differences do not tell us anything of decision-theoretic substance.
FDT is not directly comparable to CDT and EDT
In Functional Decision Theory: A New Theory of Instrumental Rationality and Cheating Death in Damascus, the authors go through a multitude of decision problems and compare the performance of functional decision theory (FDT), on the one hand, and classical evidential and causal decision theory (EDT and CDT) on the other; and they come to the conclusion that former “outperforms”[1] the latter two. That is, FDT is implicitly treated as a decision theory directly comparable to EDT and CDT. I will argue that this methodology is misguided and that there is a strong sense of incommensurability in terms of their respective action recommendations: FDT belongs in a different ontology (together with all other logical decision theories), and although we might be able to compare ontologies, directly comparing the respective decision theories is not meaningful since we lack a common measurement for doing so.
The 2x2x2 decision theory grid
In the MIRI/OP decision theory discussion, Garrabrant points out that we can think about different decision theories as locations in the following three-dimensional grid (simplifying of course):
Conditionals vs. causalist counterfactuals (EDT vs. CDT).
That is, P(o | a) or P(o | do(a)).[2]
Updatefulness vs. updatelessness (“from the perspective of what epistemic state should I make the decision, the prior or posterior?”).
That is, P(o | ..., obs) or P(o | ...).
Perhaps this dimension should be split up into empirical updatelessness, on the one hand, and logical updatelessness on the other.)
Physicalist agent ontology vs. algorithmic/logical agent ontology (“an agent is just a particular configuration of physical stuff doing physical things” vs. “an agent is just an algorithm relating inputs and outputs; independent of substrate and underlying structure”).
Notationally, this corresponds to using P(o | a) or P(o | "DT(s)=a") if we are evidentialists, and P(o | do(a)) or P(o | do("DT(s)=a")) if we are causalists.
(I am using “ontology” to mean something like “a map/model of the world”, which I think is the standard LW usage.)
Demski (also in the MIRI/OP discussion) makes this distinction even more fine-grained: “I am my one instance (physical); evaluate actions (of one instance; average over cases if there is anthropic uncertainty)” vs. “I am my instances: evaluate action (for all instances; no worries about anthropics)” vs. “I am my policy: evaluate policies”.
I think the third axis differs from the first two: it is arguably reasonable to think of ontology as something that is not fundamentally tied to the decision theory in and of itself. Instead, we might want to think of decision theories as objects in a given ontology, meaning we should decide on that ontology independently—or so I will argue.
Nonetheless, we now have (2x2x2=)eight decision theories respectively corresponding to the usage of the following probabilities in calculating the expected utility (note that we are doing action selection here for the sake of simplicity[3]):
P(o | a, obs)—updateful EDT (aka EDT)
P(o | do(a), obs)—updateful CDT (aka CDT)
P(o | a)—updateless EDT
P(o | do(a))—updateless CDT
P(o | "DT(s)=a", obs)—”TEDT” (timeless EDT)
P(o | do("DT(s)=a"), obs)—TCDT (timeless CDT, aka TDT)
P(o |"DT(s)=a")—UEDT/”FEDT”
P(o | do("DT(s)=a"))—UCDT (aka UDT)/FCDT (aka FDT)[4]
That is, FDT (as formulated in the papers) could be seen as ‘updateless CDT’ in the algorithmic agent ontology, instead of the physicalist. (And timeless decision theory, TDT, is just ‘updateful CDT’ in the algorithmic ontology.) In other words, thinking in terms of logical counterfactuals and subjunctive dependence as opposed to regular causal counterfactuals and causal dependence constitutes an ontology shift, and nothing else. And the same goes for thinking in terms of logical conditionals as opposed to regular empirical ones.
Different ontologies, different decision problems?
As said, the different ontologies correspond to different ways of thinking about what agents are. If we are then analysing a given decision problem and comparing FDT’s recommendations to the ones of EDT and CDT, we are operating in two different ontologies and using two different notions of what an agent is. In effect, since decision problems include at least one agent, decision problems considered by FDT are not the same decision problems that are considered by EDT and CDT.
And this is why the methodology in the FDT papers is dubious: the authors are (seemingly) making the mistake of treating the action recommendations of decision theories from different ontologies as commensurable by directly comparing their “performance”. But since we lack a common measurement— ontology-neutral decision problems—such a methodology can not give us meaningful results.
Comparing FDT to CDT
Suppose that we want to situate decision theories in the logical ontology. What is then the specific difference between FDT and what we normally call (updateful) CDT? As said, the latter is just the causalist version of TDT, and the difference between FDT and TDT is updatelessness. Or vice versa, if we are operating in the physicalist ontology, then what we call FDT is just updateless CDT and the only difference between updateless CDT and CDT is, of course, again updatelessness.
That is, in arguing for (or against) FDT over CDT the focus should be on the question of “from what epistemic state should I make the decision, the prior or posterior?”—the question of updatelessness.
But let’s look at an example where the ontological difference between FDT and CDT is the reason behind their differing action recommendations: the Twin Prisoner’s Dilemma.
Example: the Twin Prisoner’s Dilemma
Consider the Psychological Twin Prisoner’s Dilemma (from the FDT paper):
We have the following analyses by the different decision theories:
CDT: “the two of us are causally isolated, and defecting dominates cooperating, so I will therefore defect”.
EDT: “the two of us are highly correlated such that if I cooperate then that is strong evidence that my twin will also cooperate (and if I defect then that is strong evidence that my twin will also defect), so I will therefore cooperate”.
FDT: “I am deciding now what the output of my decision algorithm is going to be (in fact, I am my algorithm), and since we are twins (and thus share the same algorithm) I am also determining the output of my twin, so I will therefore cooperate since u(C,C)>u(D,D)”.[5]
So why exactly does FDT achieve mutual cooperation and not CDT? As I see it, this is solely due to thinking about agents differently—as algorithms and not specific physical configurations. That is, if you and another agent implement the same algorithm, then you are the same agent (at least for the purposes of decision-theoretic agency). This means that if you take action a, then this other instantiation of you will also a, because you are them. Updatelessness—which we said was the only purely decision-theoretic difference between FDT and CDT—does not even come into the picture here. It is therefore unclear what an FDT-to-CDT comparison is supposed to tell us in the Twin Prisoner’s Dilemma if the difference in recommendations just boils down to a disagreement about whether “you” are (for the purposes of decision theory) also “your twin” or not.
To see this, consider an agent ontology which says that “an agent” is simply genetic material; that is, “you” are simply an equivalence class with respect to the relation of genetic identity (just as “you” are an equivalence class with respect to algorithmic identity in the algorithmic ontology). And suppose we want to look at a decision theory with causalist-esque counterfactuals in this ontology. Let’s call it ‘genetic decision theory’ (GDT).
(Another way of looking at it: in the “genetic ontology”, you intervene on the purely biological parts of your predisposition when making a decision, and consider downstream effects—whatever that means.)
What does GDT recommend in the (Identical) Twin Prisoner’s Dilemma? Well, since they share the same genetic material, they are the same agent, meaning if one of them decides to cooperate the other one will also cooperate (and the same for defection). It is therefore best to cooperate since u(C,C)>u(D,D).
We could then say that GDT “outperforms” CDT in the (Identical) Twin Prisoner’s Dilemma (just as FDT “outperforms” CDT). But this is clearly not meaningful nor interesting: (i) we never argued for why this conception of an agent is useful or intuitive (and it is the choice of ontology that is ensuring cooperation here, nothing else); and (ii) and the decision problem in question—the Twin Prisoner’s Dilemma—is now a different decision problem in the genetic ontology (and we are, in fact, operating in this ontology since we are applying GDT). You are that other person in the other room, it is not just your twin with whom you are correlated.
Another analogy: comparing updateless CDT and FDT is like comparing two hedonistic utilitarian theories that merely disagree about which beings are conscious. Of course, these two theories will recommend different actions in many ethical dilemmas, but that does not tell us anything of ethical substance.
Comparing FDT to EDT
In comparing FDT to EDT, on the other hand, there are two differences conditional on fixing an ontology (physicalist or algorithmic): (i) the use of causalist counterfactuals over conditionals; and (ii) updatelessness again. That is, fixing the algorithmic ontology, what we are doing is comparing the evidential variant of TDT to FDT. Or, if we are instead fixing the physicalist ontology, what we are doing is comparing EDT to updateless CDT.
But in the papers, FDT is instead directly compared to EDT, and it is once again unclear what such a comparison is supposed to tell us; especially in the cases where the results depend on the ontology that we are fixing. The Smoking Lesion is a decision problem where this is partially the case—counterfactuals matter, but also the ontology.
Example: the Smoking Lesion
Consider the Smoking Lesion (from the FDT paper):
The analyses:
CDT: “me getting cancer is not downstream of my choice whether to smoke or not, and smoking dominates not smoking; so I will therefore smoke”.
EDT (modulo the Tickle Defence): “due to the correlation between smoking and getting cancer, choosing to smoke would give me evidence that I will get cancer, so I will therefore not smoke”.
FDT: “smoking or not is downstream of my decision theory and the lesion is in turn not upstream of my decision theory, meaning if the decision theory outputs ‘smoke’ then that has no effect on whether I have cancer; so I will therefore smoke”.
The causal graph for the decision problem, stolen from the FDT paper:
Here CDT and FDT are said to give the correct recommendation, and EDT the incorrect one (again, modulo the Tickle Defence). But note that if we think of (updateful) EDT as something algorithmic (namely as “TEDT”), then an evidential variant of the FDT-argument can be made for smoking. Namely, if I smoke because my decision theory tells me to, then that puts me in a different reference class than the rest of the population (from where we got the brute correlation specified in the problem) and the evidence is screened off, meaning I should smoke.[6] This is not surprising: the drawing of the additional “decision theory nodes” in the causal diagram does not correspond to something endogenous to the specific decision theory in question (FDT, in this case), but it is rather a stipulation about the (logi-)causal structure of the world; and that (logi-)causal structure plausibly remains even if we are doing something evidential. (Unless, of course, the motivation for being evidentialist in the first place is having metaphysical qualms with the notion of a “causal structure of the world”.) I.e., FDT does not “outperform” algorithmic EDT in this decision problem. Once again, the importance of keeping within the same ontology is illustrated.
(But although algorithmic EDT would smoke in the Smoking Lesion, there is an analogous problem where algorithmic CDT and EDT supposedly come apart and the latter is said to give the incorrect recommendation, namely Troll Bridge.)
Objection: comparing ontologies by comparing decision theories
Here is one way of understanding the methodology in the papers (a steelman if you will); or simply one way of making sense of cross-ontological decision theory comparisons in general. We have some desiderata for our agent ontology—intuitive things that we want to be captured. And some of these features are arguably related to the behaviour of decision theories within the different ontologies. So when two decision theories from two different ontologies are compared in a decision problem where purely ontological differences are the reason for the different action recommendations (rather than some other feature, e.g. updatelessness), it is the ontologies that are being compared, and such a comparison arguably is meaningful.[7]
But then—setting aside the meaningfulness of comparing ontologies themselves for the next section—what exactly is the desiderata? Given the authors’ focus on performance, one might think that one desideratum would be something like “decision theories in the ontology should perform well”. That is, saying that e.g. the Twin Prisoner’s Dilemma is an argument for FDT, and against CDT, is just saying that the algorithmic ontology is preferable because we get better results when using it. But I don’t think this makes sense at all: we can easily make up an (arbitrarily absurd) ontology that yields great decision-theoretic results from the perspective of that ontology, e.g. the genetic one with respect to the Twin Prisoner’s Dilemma.
Furthermore, why is it that we can’t compare the ontologies directly once we have the desiderata? What would a cross-ontological comparison of decision theories tell us in that case? For example, if a desideratum is something like “agents should think that they are determining the choice of other identical agents (and not merely in an evidential way)”, then plausibly we can just deduce which ontologies satisfy this criterion without having to compare decision theories in the relevant decision problems.
In the end, I think that (i) it is very unclear why comparing ontologies by comparing the recommendations of decision theories would be useful; and (ii) the authors’ focus on performance in the FDT papers is too large for this to be the correct interpretation of their methodology. That is, my original argument should still be applicable—at least in the face of this particular objection.
One could respond that it is generally difficult to identify the desiderata, and in most cases, all we have is intuitions over decision problems that are not easily reducible. In particular, it might not be possible to tell if some intuition has to do with ontology or decision theory. For example, perhaps one just wants to take mutual cooperation in the Twin Prisoner’s Dilemma as a primitive, and until one has figured out why this is a desideratum (and thus figured out if it is about ontology or decision theory), comparisons of decision theories that merely involve ontological differences do in fact carry some information about what ontology is reasonable.[8] I am somewhat sympathetic to this argument in and of itself, although I disagree about the extent to which our intuitions are irreducible such that we can not tell whether they are about ontology or decision theory.[9] Furthermore, note that the claim (as I understand it) in the FDT papers is that mutual cooperation in the Twin Prisoner’s Dilemma is desirable because of the utility that comes with it, and it is not taken as a primitive.
Ontological commensurability
Note that I am not arguing that the physicalist ontology is preferable to the algorithmic one: for all that I (currently) know, the latter might be a better way of thinking about the nature of decision-theoretic agency.[10] My point in this post is rather, as said, that it is not meaningful to compare decision theories cross-ontologically, even if this is meant as a proxy for comparing the ontologies themselves. We should rather fix an ontology exogenously and situate all decision theories within that one—at least for the purposes of assessing their respective recommendations in decision problems.
The following questions then arise:
Can we meaningfully compare ontologies in the first place?
If yes, what makes one ontology preferable to another?
I think these are difficult questions, but ultimately I think that we probably can compare ontologies; some ontologies are simply more reasonable than others, and they do not simply correspond to “different ways of looking at the world” and that’s that. For example, one might argue that ‘agency’ is a high-level emergent phenomenon and that a reductionist physicalist ontology might be too “fine-grained” to capture what we care about, whilst the algorithmic conception abstracts away the correct amount of details.[11] Or, one might think that going from physics to logic introduces too many difficult problems, such as logical uncertainty, and this is an instrumental reason to stick with physics.
Conclusion
In sum, I made the following claims:
FDT belongs in the ‘algorithmic’ or ‘logical’ ontology, where agents are simply viewed as fixed functions relating inputs (states of the world) and outputs (actions/policies).
Classical EDT and CDT belong in the standard ‘physicalist’ ontology, where agents are physical objects.
It is not meaningful to compare the action recommendations of decision theories that belong in different agent ontologies, e.g. FDT to EDT/CDT, since there does not exist a common and objective measurement for doing so; namely, decision problems that are formulated in ontology-neutral terms.
For example, the Twin Prisoner’s Dilemma that is considered by EDT and CDT is not the same as the Twin Prisoner’s Dilemma considered by FDT since the latter comes with a different inbuilt conception of what ‘agents’ are.
But, we might be able to compare ontologies themselves, and if it is the case that we prefer one, or think that one is more ‘correct’, then we should situate decision theories in that one map before comparing them. That is, in doing decision theory, we should ideally first settle on metaphysical matters (such as “what is an agent?”), and then formulate decision theories with the same underlying metaphysical assumptions.
This methodology does not imply that we are in practice paralyzed until we have solved the metaphysics, or even until we know what the relevant questions are—we can meaningfully proceed with uncertainty and unawareness.[12] The point is rather that we should keep decision theory (which, in the framework of the aforementioned 2x2x2 grid, is about updatelessness and counterfactuals) and metaphysics distinct for the purposes of comparing decision theories, and make clear when we are operating in which ontology.
Acknowledgements
Thanks to Lukas Finnveden, Tristan Cook, and James Faville for helpful feedback and discussion. All views are mine.
Decision-theoretic performance is arguably underspecified in that there is no “objective” metric for comparing the success of different theories. See The lack of performance metrics for CDT versus EDT, etc. by Caspar Oesterheld for more. This is not the point I am making in this post, albeit somewhat related.
Note that despite using do-operators here, I’m not necessarily committed to Pearlian causality—this is just one way of cashing out the causalist counterfactual.
Everything in this post should, mutatis mutandis, hold for doing policy or program selection as well.
See this post for the (fairly minor) differences between FDT and UDT.
Arguably you only get mutual FDT cooperation when the agents are implementing the exact same algorithm, and this is somewhat unlikely for realistic agents who are approximating FDT. More on this here.
h/t Julian Stastny. And if the lesion is upstream of my decision theory and there is a stipulated correlation between being evidentialist, say, and getting cancer, then the decision problem is both “unfair” and uninteresting.
h/t Tristan Cook.
h/t Lukas Finnveden for comments that led to this paragraph.
For example, in the case of the Twin Prisoner’s Dilemma, it is somewhat plausible to me that a reasonable ontology should tell us that we determine the action of exactly identical agents (however we want to cash that out), and not merely in an evidential way. But I don’t think ontology has much to do with why I want mutual cooperation in the Twin Prisoner’s Dilemma when the agents are similar, but not identical. Concretely, I don’t think the algorithmic ontology gives us mutual cooperation in that case (because, arguably, logical counterfactuals are brittle), and you need evidentialism (which is about decision theory) as well.
Interestingly, in the MIRI/OP discussion on decision theory, Demski writes:
I am fairly confused by this though: “you control “the policy of agents who sufficiently resemble you”” could just be understood as a commitment to evidentialism, regardless of the ontology; and, tritely, I think it should be noted that academic decision theorists have thought about this (if not in the context of causalism vs. evidentialism), and most of them are CDTers. See this blogpost by Wolfgang Schwarz for commentary on FDT, for example.
h/t James Faville.
Suppose we are agnostic with respect to the ontology-axis. Then I think that we should proceed thinking about e.g. updateless CDT and FDT as the same decision theory, and recognize that their superficial differences do not tell us anything of decision-theoretic substance.