A Reaction to Wolfgang Schwarz’s “On Functional Decision Theory”
So I finished reading On Functional Decision Theory by Wolfgang Schwarz. In this critique of FDT, Schwarz makes quite some claims I either find to be unfair criticism of FDT or just plain wrong—and I think it’s interesting to discuss them. Let’s go through them one by one. (Note that this post will not make much sense if you aren’t familiar with FDT, which is why I linked the paper by Yudkowsky and Soares.)
Schwarz first defines three problems:
Blackmail. Donald has committed an indiscretion. Stormy has found out and considers blackmailing Donald. If Donald refuses and blows Stormy’s gaff, she is revealed as a blackmailer and his indiscretion becomes public; both suffer. It is better for Donald to pay hush money to Stormy. Knowing this, it is in Stormy’s interest to blackmail Donald. If Donald were irrational, he would blow Stormy’s gaff even though that would hurt him more than paying the hush money; knowing this, Stormy would not blackmail Donald. So Donald would be better off if here were (known to be) irrational.
Prisoner’s Dilemma with a Twin. Twinky and her clone have been arrested. If they both confess, each gets a 5 years prison sentence. If both remain silent, they can’t be convicted and only get a 1 year sentence for obstructing justice. If one confesses and the other remains silent, the one who confesses is set free and the other gets a 10 year sentence. Neither cares about what happens to the other. Here, confessing is the dominant act and the unique Nash equilibrium. So if Twinky and her clone are rational, they’ll each spend 5 years in prison. If they were irrational and remained silent, they would get away with 1 year.
Newcomb’s Problem with Transparent Boxes. A demon invites people to an experiment. Participants are placed in front of two transparent boxes. The box on the left contains a thousand dollars. The box on the right contains either a million or nothing. The participants can choose between taking both boxes (two-boxing) and taking just the box on the right (one-boxing). If the demon has predicted that a participant one-boxes, she put a million dollars into the box on the right. If she has predicted that a participant two-boxes, she put nothing into the box. The demon is very good at predicting, and the participants know this. Each participant is only interested in getting as much money as possible. Here, the rational choice is to take both boxes, because you are then guaranteed to get $1000 more than if you one-box. But almost all of those who irrationally take just one box end up with a million dollars, while most of those who rationally take both boxes leave with $1000.
Blackmail is a bit vaguely defined here, but the question is whether or not Donald should pay if he actually gets blackmailed—given that he prefers paying to blowing Stormy’s gaff and of course prefers not being blackmailed above all. Aside from this, I disagree with the definitions of rational and irrational Schwarz uses here, but that’s partly the point of this whole discussion.
Schwarz goes on to say Causal Decision Theory (CDT) will pay on Blackmail, confess on Prisoner’s Dilemma with a Twin and two-box on Newcomb’s Problem with Transparent Boxes. FDT will not pay, remain silent and one-box, respectively. So far we agree.
However, Schwarz also claims “there’s an obvious sense in which CDT agents fare better than FDT agents in the cases we’ve considered”. On Blackmail, he says: “You can escape the ruin by paying $1 once to a blackmailer. Of course you should pay!” (Apparently the hush money is $1.) It may seem this way, because given Donald is already blackmailed, paying is better than not paying, and FDT recommends not paying while CDT pays. But it’s worth noting that this is totally irrelevant, since FDT agents never end up in this scenario anyway. The problem statement specifies Stormy would know an FDT agent wouldn’t pay, so she wouldn’t blackmail such an agent. Schwarz acknowledges this point later on, but doesn’t seem to realize it completely refutes his earlier point of CDT doing better in “an obvious sense”.
Surprisingly, Schwarz doesn’t analyze CDT’s and FDT’s answer to Prisoner’s Dilemma with a Twin (besides just giving the answers). It’s worth noting FDT clearly does better than CDT here, because the FDT agent (and its twin) both get away with 1 year in prison while the CDT agent and its twin both get 5. This is because the agents and their twins are clones—and therefore have the same decision theory and thus reach the same conclusion to this problem. FDT recognizes this, but CDT doesn’t. I am baffled Schwarz calls FDT’s recommendation on this problem “insane”, as it’s easily the right answer.
Newcomb’s Problem with Transparent Boxes is interesting. Given the specified scenario, two-boxing outperforms one-boxing, but this is again irrelevant. Two-boxing results in a logically impossible scenario (given perfect prediction), since then Omega would have predicted you two-box and put nothing in the right box. Given less-then-perfect (but still good) prediction, the scenario is still very unlikely: it’s one two-boxers almost never end up in. It’s the one-boxers who get the million. Schwarz again acknowledges this point—and again he doesn’t seem to realize it means CDT doesn’t do better in an obvious sense.
Edit: Vladimir Nesov left a comment which made me realize my above analysis of Newcomb’s Problem with Transparent Boxes is a reaction to the formulation in Yudkowsky and Soares’ paper instead of the formulation by Schwarz. Since Schwarz left his formulation relatively unspecified, I’ll leave the above analysis for what it is. However, note that it is assumed the demon filled the left box if and only if she predicted the participant leaves the left box behind upon seeing two full boxes. The question, then, is what to do upon seeing two full boxes.
So there’s an obvious sense in which CDT agents fare better than FDT agents in the cases we’ve considered. But there’s also a sense in which FDT agents fare better. Here we don’t just compare the utilities scored in particular decision problems, but also the fact that FDT agents might face other kinds of decision problems than CDT agents. For example, FDT agents who are known as FDT agents have a lower chance of getting blackmailed and thus of facing a choice between submitting and not submitting. I agree that it makes sense to take these effects into account, at least as long as they are consequences of the agent’s own decision-making dispositions. In effect, we would then ask what decision rule should be chosen by an engineer who wants to build an agent scoring the most utility across its lifetime. Even then, however, there is no guarantee that FDT would come out better. What if someone is set to punish agents who use FDT, giving them choices between bad and worse options, while CDTers are given great options? In such an environment, the engineer would be wise not build an FDT agent.
I agree for a large part. I care about FDT from the perspective of building the right decision theory for an A(S)I, in which case it is about something like scoring the most utility across a lifetime. The part of the quote about FDT agents being worse off if someone directly punishes “agents who use FDT” is moot though. What if someone decides to punish agents for using CDT?
Schwarz continues with an interesting decision problem:
Procreation. I wonder whether to procreate. I know for sure that doing so would make my life miserable. But I also have reason to believe that my father faced the exact same choice, and that he followed FDT. If FDT were to recommend not procreating, there’s a significant probability that I wouldn’t exist. I highly value existing (even miserably existing). So it would be better if FDT were to recommend procreating. So FDT says I should procreate. (Note that this (incrementally) confirms the hypothesis that my father used FDT in the same choice situation, for I know that he reached the decision to procreate.)
He says:
In Procreation, FDT agents have a much worse life than CDT agents.
True, but things are a bit more complicated than this. An FDT agent facing Procreation recognizes the subjunctive dependence her and her father have on FDT, and, realizing she wants to have been born, procreates. A CDT agent with an FDT father doesn’t have this subjunctive dependence (and wouldn’t use it if she did) and doesn’t procreate, gaining more utils than the FDT agent. But note that the FDT agent is facing a different problem than the CDT agent: she faces one where her father has the same decision theory she does. The CDT agent doesn’t have this issue. What if we put the FDT agent in a modified Procreation problem, one where her father is a CDT agent? Correctly realizing she can make a decision other than that of her father, she doesn’t procreate. Obviously, in this scenario, the CDT agent also doesn’t procreate—even though, through subjunctive dependence, her decision is the exact same as her father’s. So, here the CDT agent does worse, because her father wouldn’t have procreated either and she isn’t even born. So, this gives us two scenarios: one where the FDT agent procreates and lives miserably while the CDT agent lives happily, and one where the FDT agent lives happily while the CDT agent doesn’t live at all. FDT is, again, the better decision theory.
It seems, then, we can construct a more useful version of Procreation, called Procreation*:
Procreation*. I wonder whether to procreate. I know for sure that doing so would make my life miserable. But I also have reason to believe that my father faced the exact same choice, and I know he followed the same decision theory I do. If my decision theory were to recommend not procreating, there’s a significant probability that I wouldn’t exist. I value a miserable life to no life at all, but obviously I value a happy life to a miserable one. Should I procreate?
FDT agents procreate and live miserably—CDT agents don’t procreate and, well, don’t exist since their father didn’t procreate either.
All that said, I agree that there’s an apparent advantage of the “irrational” choice in cases like Blackmail or Prisoner’s Dilemma with a Twin, and that this raises an important issue. The examples are artificial, but structurally similar cases arguably come up a lot, and they have come up a lot in our evolutionary history. Shouldn’t evolution have favoured the “irrational” choices?
Not necessarily. There is another way to design agents who refuse to submit to blackmail and who cooperate in Prisoner Dilemmas. The trick is to tweak the agents’ utility function. If Twinky cares about her clone’s prison sentence as much as about her own, remaining silent becomes the dominant option in Prisoner’s Dilemma with a Twin. If Donald develops a strong sense of pride and would rather take Stormy down with him than submit to her blackmail, refusing to pay becomes the rational choice in Blackmail.
“The trick is to tweak the agents’ utility function.”? No. I mean, sure, Twinky, it’s good to care about others. I do, so does almost everybody. But this completely misses the point. In the above problems, the utility function is specified. Tweaking it gives a new problem. If Twinky indeed cares about her clone’s prison years as much as she does about her own, then the payoff matrix would become totally different. I realize that’s Schwarz’s point, because that gives a new dominant option—but it literally doesn’t solve the actual problem. You solve a decision problem by taking one of the allowed actions—not by changing the problem itself. Deep Blue didn’t define the opening position as a winning position in order to beat Kasparov. All Schwarz does here is defining new problems CDT does solve correctly. That’s fine, but it doesn’t solve the issue that CDT still fails the original problems.
FDT agents rarely find themselves in Blackmail scenarios. Neither do CDT agents with a vengeful streak. If I wanted to design a successful agent for a world like ours, I would build a CDT agent who cares what happens to others.
Of course, me too (a vengeful streak though? That’s not caring about others). So would Yudkowsky and Soares. But don’t you think a successful agent should have a decision theory that can at least solve the basic cases like Newcomb’s Problem, with or without transparent boxes? Also note how Schwarz is making ad hoc adjustments for each problem: Twinky has to care about her clone’s prison time, while Donald has to have a sense of pride/vengeful streak.
My CDT agent would still two-box in Newcomb’s Problem with Transparent Boxes (or in the original Newcomb Problem). But this kind of situation practically never arises in worlds like ours.
But if we can set up a scenario that breaks your decision theory even when we do allow modifying utility functions, that points to a serious flaw in your theory. Would you trust to build it into an Artificial Superintelligence?
Schwarz goes on to list a number of points of questions he has/unclarities he found in Yudkowsky and Soares’ paper, which I don’t find relevant for the purpose of this post. So this is where I conclude my post: FDT is still standing, and not only that: it is better than CDT.
Thanks. A few points, mostly for clarification.
I’m not assuming that the relevant predictors in my scenarios are infallible. In the Blackmail scenario, for example, I’m assuming that the blackmailer is fairly good but not perfect at predicting your reaction. So it’s perfectly possible for an FDT agent to find themselves in that scenario. If they do, they will clearly do worse than a CDT agent.
You’re right that I shouldn’t have called FDT’s recommendation in the Twin case “insane”. I do think FDT’s recommendation is insane for the other cases I discuss, but the Twin case is tricky. It’s a Newcomb Problem. I’d still say that FDT gives the wrong advise here, and CDT gives the right advice. I’m a two-boxer.
Of course making agents care about others (and about their integrity etc.) changes the utility function and therefore the decision problem. That’s exactly the point. The idea is that in many realistic scenarios such agents will tend to do better for themselves than purely egoistical agents. So if I were to build an agent with the goal that they do well for themselves, I’d give them this kind of utility function, rather than implement FDT.
“What if someone decides to punish agents for using CDT?”—Sure, this can happen. It’s what happens in Newcomb’s Problem.
“Schwarz goes on to list a number of points of questions he has/unclarities he found in Yudkowsky and Soares’ paper, which I don’t find relevant”—Their relevance is that FDT isn’t actually a theory, unlike CDT and EDT. In its present form it is only an underdeveloped sketch, and I have doubts that it can be spelled out properly.
You say that CDT “fails” the original problems. You don’t give any argument for this. My intuition is that FDT gets all the problems I discuss wrong and CDT gets them right. For what it’s worth, I’d bet that most people’s intuitions about cases like Blackmail, Procreation, and Newcomb’s Problem with Transparent Boxes are on my side. Of course intuitions can be wrong. But as a general rule, you need better arguments in support of a counter-intuitive hypothesis than in support of an intuitive hypothesis. I’m not aware of any good arguments in support of the FDT verdict.
Thanks for your reply. And I apologize: I should have looked whether you have an account on LessWrong and tag you in the post.
Alright, then it depends on the accuracy of Stormy’s prediction. Call this a, where 0 ⇐ a ⇐ 1. Let’s assume paying upon getting blackmailed gives −1 utility, not paying upon blackmail gives −9 utility and not getting blackmailed at all gives 0 utility. Then, if Donald’s decision theory says to blow the gaff, Stormy predicts this with p accuracy and thus blackmails Donald with probability 1 - p. This gives Donald an expected utility of p x 0 + (1 - p) x −9 = 9p − 9 utils for blowing the gaff. If instead Donald’s decision theory says to pay, then Stormy blackmails with probability p. This gives Donald an expected utility of p x −1 + (1 - p) x 0 = -p utils for paying. Solving 9p − 9 = -p gives 10p = 9, or p = 0.9. This means FDT would recommend blowing the gaff for p > 0.9. For p < 0.9 FDT recommends paying.
Confessing and two-boxing ignore the logical connection between the clones and the player and the demon, respectively. It’s worth noting that (given perfect prediction accuracy for the demon) two-boxers always walk away with only $1000. Given imperfect prediction, we can do an expected value calculation again, but you get my point, which is similar for the Twin case.
I know that’s your point; I said it’s your point. My point is that changing the utility function of a problem ignores the original problem, which your theory still doesn’t solve. If I build an algorithm for playing games, which doesn’t know how to play chess well, the right thing to do is improve the algorithm so it does play chess well, not redefining what a winning position in chess is.
Your agent may do better in (some) of these modified scenarios, but FDT does well in both the modified and the original scenarios.
My point here was that you can directly punish agents for having any decision theory, so this is no relative disadvantage of FDT. Btw, I disagree on Newcomb’s problem punishing CDT agents: it punishes two-boxers. CDT two-boxing is CDT’s choice and problem. Not so for your original example of an environment giving FDT’ers worse options than CDT’ers: FDT’ers simply don’t get the better options there, whereas CDT’ers in Newcomb’s problem do.
Note that I said “relevant for the purpose of this post”. I didn’t say they aren’t relevant in general. The point of this post was to react to points I found to be clearly wrong/unfair.
I agree I could have made a clearer argument here, even though I gave some argumentation throughout my post. I maintain CDT fails the examples for the reason that if I were to adhere to CDT, I would be worse off than if I were to adhere to FDT given the three problems. CDT’ers do get blackmailed by Stormy; FDT’ers don’t. CDT’ers don’t end up in Newcomb’s Problem with Transparent Boxes as you described it: they end up with only the $1000 available. FDT’ers do end up in that scenario and get a million.
As for Procreation, note that my point was about the problems of which you wanted to change the utility function, and Procreation wasn’t one of them. CDT does better on Procreation, like I said; I further explained how Procreation* is a better problem for comparing CDT and FDT.
The fundamental problem with your arguments is that the scenarios in which you’re imagining FDT agents “lose” are logically impossible. You’re not seeing the broader perspective that the FDT agents’ non-negotiation with terrorists policy prevents them from being blackmailed in the first place.
I personally agree that cooperating in the Twin PD is the correct choice, but I don’t think it is meaningful to argue for this on the grounds of decision-theoretic performance (as you seem to do). From The lack of performance metrics for CDT versus EDT, etc. by Caspar Oesterheld:
Indeed, Schwarz makes a similar point in the post you are responding to:
Thanks for responding!
I disagree. There’s a clear measure of performance given in the Twin PD: the utilities.
I disagree with Oesterheld’s point about CDT vs EDT and metrics; I think we know enough math to say EDT is simply a wrong decision theory. We could, in principle, even demonstrate this in real life, by having e.g. 1000 people play a version of XOR Blackmail (500 people with and 500 people without a “termite infestation”) and see which theory performs best. It’ll be trivial to see EDT makes the wrong decision.
Every time I see this I think, ‘What if you flip a coin?’
The formulation quoted from Schwarz’s post unnecessarily implicitly disallows unpredictability. The usual more general formulation of Transparent Newcomb is to say that $1M is in the big box iff Omega succeeds in predicting that you one-box in case the big box is full. So if instead you succeed in confusing Omega, the box will be empty. A situation where Omega can’t be confused also makes sense, in which case the two statements of the problem are equivalent.
Thank you, I edited my post.
Every discussion of decision theories that is not just “agents with max EV win”, where EV is calculated as a sum of “probability of the outcome times the value of the outcome” ends up fighting the hypothetical, usually by yelling that in zero-probability worlds someone’s pet DT does better than the competition. A trivial calculation shows that winning agents do not succumb to blackmail, stay silent in twin PD, one-box in all Newcomb’s variants and procreate in the miserable existence case. I don’t know if that’s what FDT does, but hopefully what a naive max EV calculation suggests.