I recently refereed Eliezer Yudkowsky and Nate Soares’s “Functional Decision Theory” for a philosophy journal. My recommendation was to accept resubmission with major revisions, but since the article had already undergone a previous round of revisions and still had serious problems, the editors (understandably) decided to reject it. I normally don’t publish my referee reports, but this time I’ll make an exception because the authors are well-known figures from outside academia, and I want to explain why their account has a hard time gaining traction in academic philosophy. I also want to explain why I think their account is wrong, which is a separate point.
I’m not gonna go comment on his blog because his confusion about the theory (supposedly) isn’t related to his rejection of the paper, and also because I think talking to a judge about the theory out of band would bias their judgement of the clarity of the writing in future (it would come to seem more clear and readable to them than it is, just as it would to me) and is probably bad civics, but I just have to let this out because someone is wrong on the internet, damnit
FDT says you should not pay because, if you were the kind of person who doesn’t pay, you likely wouldn’t have been blackmailed. How is that even relevant? You are being blackmailed.
So he’s using a counterexample that’s predicated on a logical inconsistency and could not happen. If a decision theory fails in situations that couldn’t really happen, that’s actually not a problem.
Again
If you are in Newcomb’s Problem with Transparent Boxes and see a million in the right-hand box, you again fare better if you follow CDT. Likewise if you see nothing in the right-hand box.
is the same deal, if you take the right box, that’s logically inconsistent with the money having been there to take, that scenario can’t happen (or happens only rarely, if he’s using that version of newcomb’s problem), and it’s no mark against a decision procedure if it doesn’t win in those conditions. It will never have to face those conditions.
What if someone is set to punish agents who use FDT, giving them choices between bad and worse options, while CDTers are given great options? In such an environment, the engineer would be wise not build an FDT agent.
What if someone is set to punish agents who use CDT, giving them choices between bad and worse options, while FDTers are given great options? In such an environment, the engineer would be wise not build an CDT agent.
What if a skeleton pops out in the night and demands that you must recite the magna carta or else it will munch your nose off? Will you learn how to recite the magna carta in light of this damning thought experiment?
It is impossible to build an agent that wins in scenarios that are specifically contrived to foil that kind of agent. It will always be possible to propose specifically contrived situations for any proposed decision procedure.
Aaargh this has all been addressed by the arbital articles! :<
FDT says you should not pay because, if you were the kind of person who doesn’t pay, you likely wouldn’t have been blackmailed. How is that even relevant? You are being blackmailed.
I’m quoting this because, even though it’s wrong, it’s actually an incredibly powerful naive intuition. I think many people who have internalized TDT/UDT/FDT-style reasoning have forgotten just how intuitive the quoted block is. The unstated underlying assumption here (which is unstated because Schwarz most likely doesn’t even realize it is an assumption) is extremely persuasive, extremely obvious, and extremely wrong:
If you find yourself in a particular situation, the circumstances that led you to that situation are irrelevant, because they don’t change the undeniable fact that you are already here.
This is the intuition driving causal decision theory, and it is so powerful that mainstream academic philosophers are nearly incapable of recognizing it as an assumption (and a false assumption at that). Schwarz himself demonstrates just how hard it is to question this assumption: even when the opposing argument was laid right in front of him, he managed to misunderstand the point so hard that he actually used the very mistaken assumption the paper was criticizing as ammunition against the paper. (Note: this is not intended to be dismissive toward Schwarz. Rather, it’s simply meant as an illustrative example, emphasizing exactly how hard it is for anyone, Schwarz included, to question an assumption that’s baked into their model of the world.) And if even if you already understand why FDT is correct, it still shouldn’t be hard to see why the assumption in question is so compelling:
How could what happened in the past be relevant for making a decision in the present? The only thing your present decision can affect is the future, so how could there be any need to consider the past when making your decision? Surely the only relevant factors are the various possible futures each of your choices leads to? Or, to put things in Pearlian terms: it’s known that the influence of distant causal nodes is screened off by closer nodes through which they’re connected, and all future nodes are only connected to past nodes through the present—there’s no such thing as a future node that’s directly connected to a past node while not being connected to the present, after all. So doesn’t that mean the effects of the past are screened off when making a decision? Only what’s happening in the present matters, surely?
Phrased that way, it’s not immediately obvious what’s wrong with this assumption (which is the point, of course, since otherwise people wouldn’t find it so difficult to discard). What’s actually wrong is something that’s a bit hard to explain, and evidently the explanation Eliezer and Nate used in their paper didn’t manage to convey it. My favorite way of putting it, however, is this:
In certain decision problems, your counterfactual behavior matters as much—if not more—than your actual behavior. That is to say, there exists a class of decision problems where the outcome depends on something that never actually happens. Here’s a very simple toy example of such a problem:
Omega, the alien superintelligence, predicts the outcome of a chess game between you and Kasparov. If he predicted that you’d win, he gives you $500 in reality; otherwise you get nothing.
Strictly speaking, this actually isn’t a decision problem, since the real you is never faced with a choice to make, but it illustrates the concept clearly enough: the question of whether or not you receive the $500 is entirely dependent on a chess game that never actually happened. Does that mean the chess game wasn’t real? Well, maybe; it depends on your view of Platonic computations. But one thing it definitely does not mean is that Omega’s decision was arbitrary. Regardless of whether you feel Omega based his decision on a “real” chess game, you would in fact either win or not win against Kasparov, and whether you get the $500 really does depend on the outcome of that hypothetical game. (To make this point clearer, imagine that Omega actually predicted that you’d win. Surprised by his own prediction, Omega is now faced with the prospect of giving you $500 that he never expected he’d actually have to give up. Can he back out of the deal by claiming that since the chess game never actually happened, the outcome was up in the air all along and therefore he doesn’t have to give you anything? If your answer to that question is no, then you understand what I’m trying to get at.)
So outcomes can depend on things that never actually happen in the real world. Cool, so what does that have to do with the past influencing the future? Well, the answer is that it doesn’t—at least, not directly. But that’s where the twist comes in:
Earlier, when I gave my toy example of a hypothetical chess game between you and Kasparov, I made sure to phrase the question so that the situation was presented from the perspective of your actual self, not your hypothetical self. (This makes sense; after all, if Omega’s prediction method was based on something other than a direct simulation, your hypothetical self might not even exist.) But there was another way of describing the situation:
You’re going about your day normally when suddenly, with no warning whatsoever, you’re teleported into a white void of nothingness. In front of you is a chessboard; on the other side of the chessboard sits Kasparov, who challenges you to a game of chess.
Here, we have the same situation, but presented from the viewpoint of the hypothetical you on whom Omega’s prediction is based. Crucially, the hypothetical you doesn’t know that they’re hypothetical, or that the real you even exists. So from their perspective, something random just happened for no reason at all. (Yes, yes, if Omega used some method other than a simulation to make his prediction, the hypothetical you wouldn’t have existed and wouldn’t have had a perspective—but hey, that doesn’t stop me from writing from their perspective, right? After all, real people write from the perspectives of unreal people all the time; that’s just called writing fiction. And besides, we’ve already established that real or unreal, the outcome of the game really does determine whether you get the $500, so the thoughts and feelings of the hypothetical you are nonetheless important in that they partially determine the outcome of the game.)
And now we come to the final, crucial point that makes sense of the blackmail scenario and all the other thought experiments in the paper, the point that Schwarz and most mainstream philosophers haven’t taken into account:
Every single one of those thought experiments could have been written from the perspective, not of the real you, but a hypothetical, counterfactual version of yourself.
When “you’re” being blackmailed, Schwarz makes the extremely natural assumption that “you” are you. But there’s no reason to suppose this is the case. The scenario never stipulates why you’re being blackmailed, only that you’re being blackmailed. So the person being blackmailed could be either the real you or a hypothetical. And the thing that determines whether it’s the real you or a mere hypothetical is...
...your decision whether or not to pay up, of course.
If you cave into the blackmail and pay up, then you’re almost certainly the real deal. On the other hand, if you refuse to give in, it’s very likely that you’re simply a counterfactual version of yourself living in an extremely low-probability (if not outright inconsistent) world. So your decision doesn’t just determine the future; it also determines (with high probability) which you “you” are. And so then the problem simplifies into this: which you do you want to be?
If you’re the real you, then life kinda sucks. You just got blackmailed and you paid up, so now you’re down a bunch of money. If, on the other hand, you’re the hypothetical version of yourself, then congratulations: “you” were never real in the first place, and by counterfactually refusing to pay, you just drastically lowered the probability of your actual self ever having to face this situation and (in the process) becoming you. And when things are put that way, well, the correct decision becomes rather obvious.
But this kind of counterfactual reasoning is extremely counterintuitive. Our brains aren’t designed for this kind of thinking (well, not explicitly, anyway). You have to think about hypothetical versions of yourself that have never existed and (if all goes well) will never exist, and therefore only exist in the space of logical possibility. What does that even mean, anyway? Well, answering confused questions like that is pretty much MIRI’s goal these days, so I dunno, maybe we can ask them.
So your decision doesn’t just determine the future; it also determines (with high probability) which you “you” are.
Worse. It doesn’t change who you are, you are the person being blackmailed. This you know. What you don’t know is whether you exist (or ever existed). Whether you ever existed is determined by your decision.
(The distinction from the quoted sentence may matter if you put less value on the worlds of people slightly different from yourself, so you may prefer to ensure your own existence, even in a blackmailed situation, over the existence of the alternative you who is not blackmailed but who is different and so less valuable. This of course involves unstable values, but motivates degree of existence phrasing of the effect of decisions, over the change in the content of the world phrasing, since the latter doesn’t let us weigh whole alternative worlds differently.)
“FDT means choosing a decision algorithm so that, if a blackmailer inspects your decision algorithm, they will know you can’t be blackmailed. If you’ve chosen such a decision algorithm, a blackmailer won’t blackmail you.”
I only just saw this comment. I think that there’s a lot of value in imagining the possibility of both a real you and a hypothetical you, but I expect Schwarz would object to this as the problem statement says, “you”. By default, this is assumed to refer to a version of you that is real within the scope of the problem, not a version that is hypothetical within this scope. Indeed, you seem to recognise that your reasoning isn’t completely solid when you ask what these hypothetical selves even mean.
Fortunately, this is where some of my arguments from Deconfusing Logical Counterfactuals can come in. If we are being blackmailed and we are only blackmailed if we pay out, then it is logically impossible for us to not pay out, so we don’t have a decision theory problem. Such a problem requires multiple possible actions and the only way to obtain this is by erasing some of the information. So if we erase the information about whether or not you are being blackmailed, we end up with two possibilities: a) you paying the the situation and getting blackmailed or b) you not being the kind of person who would pay and you not getting blackmailed. And in that case we’d pick option a).
We can then answer your the question about how your decision affects the past—it doesn’t. The past is fixed and cannot be changed. So why does it look like your decision changes the past? Actually, your decision is fixed as well. You can’t actually change what it will be. However, both you decision and the past may be different in counterfactuals, after all, they aren’t factual. If we have a perfect predictor and we posit a different decision, then we must necessarily posit a different prediction in the past to maintain consistency. Even though we might talking colloquially about changing a decision, what’s actually happening is that we are positing a different counterfactual.
Belatedly, is this a fair summary of your critique?
When someone thinks about another person (e.g. to predict whether they’ll submit to blackmail), the act of thinking about the other person creates a sort of ‘mental simulation’ that has detailed concious experiences in its own right. So you never really know whether you’re a flesh-and-blood person or a ‘mental simulation’ based on a flesh-and-blood person.
Now, suppose you seem to find yourself in a situation where you’ve been blackmailed. In this context, it’s reasonable to wonder whether you’re actually a flesh-and-blood person who’s been blackmailed—or merely a ‘mental simulation’ that exists in the mind of a potential blackmailer. If you’re a mental simulation, and you care about the flesh-and-blood person you’re based on, then you have reason to resist blackmail. The reason is that the decision you take as a simulation will determine the blackmailer’s prediction about how the flesh-and-blood person will behave. If you resist blackmail, then the blackmailer will predict the flesh-and-blood person will refuse blackmail and therefore decide not to blackmail them.
If this is roughly in the right ballpark, then I would have a couple responses:
I disagree that the act of thinking about a person will tend to create a mental simulation that has detailed concious experiences in its own right. This seems like a surprising position that goes against the grain of conventional neuroscience and views on the philosophy of conciousness. As a simple illustrative case, suppose that Omega makes a prediction about Person A purely on the basis of their body language. Surely thinking “This guys looks really nervous, he’s probably worried he’ll be seen as the sort of guy who’ll submit to blackmail—because he is” doesn’t require bringing a whole new conciousness into existance.
Suppose that when a blackmailer predicts someone’s behavior, they do actually create a concious mental simulation. Suppose you don’t know whether you’re this kind of simulation or the associated flesh-and-blood person, but you care about what happens to the flesh-and-blood person in either case. Then, depending on certain parameter values, CDT does actually say you should resist blackmail. This is because there is some chance that you will cause the flesh-and-blood person to avoid being blackmailed. So CDT gives the response you want in this case.
Overall, I don’t think this line of argument really damages CDT. It seems to be based on a claim about conciousness that I think probably wrong. But even if the claim is right, all this implies is that CDT recommends a different action than one would otherwise have thought.
(If my summary is roughly in the right ballpark, then I also think it’s totally reasonable for academic decision theorists to read the FDT paper to fail to know that a non-mainstream neuroscience/philosophy-of-conciousness view is being assumed and provides the main justification for FDT. The paper really doesn’t directly say anything about this. It seems wrong, then, to me to suggest that Schwarz only disagrees because he lacks the ability to see his own assumptions.)
[[EDIT: Oops, rereading your comment, seems like the summary is probably not fair. I didn’t process this bit:
Yes, yes, if Omega used some method other than a simulation to make his prediction, the hypothetical you wouldn’t have existed and wouldn’t have had a perspective—but hey, that doesn’t stop me from writing from their perspective, right? After all, real people write from the perspectives of unreal people all the time; that’s just called writing fiction.
But now, reading the rest of the comment in light of this point, I don’t think this reduces my qualms. The suggestion seems to be that, when seem to find yourself in the box room, you should in some cases be uncertain about whether or not you exist at all. And in these cases you should one box, because, if it turns out that you don’t exist, then your decision to one box will (in some sense) cause a corresponding person who does exist to get more money. You also don’t personally get less money by one boxing, because you don’t get any money either way, because you don’t exist.
Naively, this line of thought seems sketchy. You can have uncertainty about the substrate your mind is being run on or about the features of the external world—e.g. you can be unsure whether or not you’re a simulation—but there doesn’t seem to be room for uncertainty about whether or not you exist. “Cogito, ergo sum” and all that.
There is presumably some set of metaphysical/epistemological positions under which this line of reasoning makes sense, but, again, the paper really doesn’t make any of these positions explicit or argue for them directly. I mainly think it’s premature to explain the paper’s faillure to persuade philosophers in terms of their rigidity or inability to question assumptions.]]
So he’s using a counterexample that’s predicated on a logical inconsistency and could not happen. If a decision theory fails in situations that couldn’t really happen, that’s actually not a problem.
but that counterexample isn’t predicated on a logical inconsistency. An FDT agent can still get blackmailed, by someone who doesn’t know it’s an FDT agent or who doesn’t care that the outcome will be bad or who is lousy at predicting the behaviour of FDT agents.
What prevents the FDT agent from getting blackmailed is not that he is known to be an FDT agent. Rather, it’s simply that he’s known to be the sort of person who will blow the whistle on blackmailers. Potential blackmailers need not know or care why the FDT agent behaves like this (indeed, it does not matter at all, for the purpose of each individual case, why the agent behaves like this—whether it’s because the dictates of FDT so direct him, or because he’s naturally stubborn, or whatever; it suffices that blackmailers expect the agent to blow the whistle).
So if we stipulate an FDT agent, we also, thereby, stipulate that he is known to be a whistle-blower. (We’re assuming, in every version of this scenario, that, regardless of the agent’s motivations, his propensity toward such behavior is known.) And that does indeed make the “he’s blackmailed anyway” scenario logically inconsistent.
Even if you are known to be the sort of person who will blow the whistle on blackmailers it is still possible that someone will try to blackmail you. How could that possibly involve a logical inconsistency? (People do stupid things all the time.)
I’ll say it again in different words, I did not understand the paper (and consequently, the blog) to be talking about actual blackmail in a big messy physical world. I understood them to be talking about a specific, formalized blackmail scenario, in which the blackmailer’s decision to blackmail is entirely contingent on the victim’s counterfactional behaviour, in which case resolving to never pay and still being blackmailed isn’t possible- in full context, it’s logically inconsistent.
Different formalisation are possible, but I’d guess the strict one is what was used. In the softer ones you still generally wont pay.
The paper discusses two specific blackmail scenarios. One (“XOR Blackmail”) is a weirdly contrived situation and I don’t think any of what wo says is talking about it. The other (“Mechanical Blackmail”) is a sort of stylized version of real blackmail scenarios, and does assume that the blackmailer is a perfect predictor. The paper’s discussion of Mechanical Blackmail then considers the case where the blackmailer is an imperfect (but still very reliable) predictor, and says that there too an FDT agent should refuse to pay.
wo’s discussion of blackmail doesn’t directly address either of the specific scenarios discussed in the FDT paper. The first blackmail scenario wo discusses (before saying much about FDT) is a generic case of blackmail (the participants being labelled “Donald” and “Stormy”, perhaps suggesting that we are not supposed to imagine either of them as any sort of perfectly rational perfect predictor...). Then, when specifically discussing FDT, wo considers a slightly different scenario, which again is clearly not meant to involve perfect prediction, simulation, etc., because he says things like ” FDT says you should not pay because, if you were the kind of person who doesn’t pay, you likely wouldn’t have been blackmailed” and “FDT agents who are known as FDT agents have a lower chance of getting blackmailed” (emphasis mine, in both cases).
So. The paper considers a formalized scenario in which the blackmailer’s decision is made on the basis of a perfectly accurate prediction of the victim, but then is at pains to say that it all works just the same if the prediction is imperfect. The blog considers only imperfect-prediction scenarios. Real-world blackmail, of course, never involves anything close to the sort of perfect prediction that would make it take a logical inconsistency for an FDT agent to get blackmailed.
So taking the paper and the blogpost to be talking only about provably-perfect-prediction scenarios seems to me to require (1) reading the paper oddly selectively, (2) interpreting the blogpost very differently from me, and (3) not caring about situations that could ever occur in the actual world, even though wo is clearly concerned with real-world applicability and the paper makes at least some gestures towards such applicability.
For the avoidance of doubt, I don’t think there’s necessarily anything wrong with being interested primarily in such scenarios: the best path to understanding how a theory works in practice might well go via highly simplified scenarios. But it seems to me simply untrue that the paper, still more the blogpost, considers (or should have considered) only such scenarios when discussing blackmail.
It might. It’s pretty long. I don’t suppose you’d like to be more specific?
(If the answer is no, you wouldn’t, because all you mean is that it seems like the sort of discussion that might have useful things in it, then fair enough of course.)
I read that under a subtext where we were talking about the same blackmail scenario, but, okay, others are possible.
In cases where the blackmail truly seems not to be contingent on its policy, (and in many current real-world cases) the FDT agent will pay.
The only cases when an FDT agent actually will get blackmailed and refuse to pay are cases where being committed to not paying shifts the probabilities enough to make that profitable on average.
It is possible to construct obstinate kinds of agents that aren’t sensitive to FDT’s acausal dealmaking faculties. Evolution might produce them often. They will not be seen as friends. As an LDT-like human, my feelings towards those sorts of blackmailers is that we should destroy all of them as quickly as we can, because their existence is a blight to ours. In light of that, I’m not sure they have a winning strategy. When you start to imagine the directed ocean of coordinated violence that an LDT-aligned faction (so, literally any real-world state with laws against blackmail) points in your direction as soon as they can tell what you are, you may start to wonder if pretending you can’t understand their source code is really a good idea.
I imagine a time when the distinction between CDT and LDT is widely essentially understood, by this time, the very idea of blackmailing will come to seem very strange, we will wonder how there was this era when a person could just say “If you don’t do X, then I will do the fairly self-destructive action Y which I gain nothing from doing” and have everyone just believe them unconditionally, just believe this unqualified statement about their mechanism. Wasn’t that stupid? To lie like that? And even stupider for their victims to pretend that they believe the lie? We will not be able to understand it any more.
Imagine that you see an agnostic community head walking through the park at night. You know it’s a long shot, but you amble towards him, point your gun at him and say “give me your wallet.” He looks back at you and says, “I don’t understand the deal. You’ll shoot me? How does that help you? Because you want my wallet? I don’t understand the logic there, why are those two things related? That doesn’t get you my wallet.”
Only it does, because when you shoot someone you can loot their corpse, so it occurs to me that muggers are a bad example of blackmail. I imagine they’ve got to have a certain amount of comfort with actually killing people, to do that. It’s not really threatening to do something self-destructive, in their view, they still benefit a little from killing you. They still get to empty your pockets. To an extent, mugging is often just a display of a power imbalance and consequent negotiation of a mutually beneficial alternative to violence.
The gang can profit from robbing your store at gunpoint, but you and them both will profit more if you just pay them protection money. LDT only refuses to pay protection money if it realises that having all of the other entangled LDT store owners paying protection money as well would make the gang profitable enough to grow, and that having a grown gang around would have been, on the whole, worse than the amortised risk of being robbed.
The standards for deserving publication in academic philosophy are relatively simple and self-explanatory. A paper should make a significant point, it should be clearly written, it should correctly position itself in the existing literature, and it should support its main claims by coherent arguments. The paper I read sadly fell short on all these points, except the first. (It does make a significant point.) [...]
I still think the paper could probably have been published after a few rounds of major revisions. But I also understand that the editors decided to reject it. Highly ranked philosophy journals have acceptance rates of under 5%. So almost everything gets rejected. This one got rejected not because Yudkowsky and Soares are outsiders or because the paper fails to conform to obscure standards of academic philosophy, but mainly because the presentation is not nearly as clear and accurate as it could be.
So apparently the short version of “why their account has a hard time gaining traction in academic philosophy” is (according to this author) just “the paper’s presentation and argumentation aren’t good enough for the top philosophy journals”.
On Functional Decision Theory (Wolfgang Schwarz)
I’m not gonna go comment on his blog because his confusion about the theory (supposedly) isn’t related to his rejection of the paper, and also because I think talking to a judge about the theory out of band would bias their judgement of the clarity of the writing in future (it would come to seem more clear and readable to them than it is, just as it would to me) and is probably bad civics, but I just have to let this out because someone is wrong on the internet, damnit
So he’s using a counterexample that’s predicated on a logical inconsistency and could not happen. If a decision theory fails in situations that couldn’t really happen, that’s actually not a problem.
Again
is the same deal, if you take the right box, that’s logically inconsistent with the money having been there to take, that scenario can’t happen (or happens only rarely, if he’s using that version of newcomb’s problem), and it’s no mark against a decision procedure if it doesn’t win in those conditions. It will never have to face those conditions.
What if someone is set to punish agents who use CDT, giving them choices between bad and worse options, while FDTers are given great options? In such an environment, the engineer would be wise not build an CDT agent.
What if a skeleton pops out in the night and demands that you must recite the magna carta or else it will munch your nose off? Will you learn how to recite the magna carta in light of this damning thought experiment?
It is impossible to build an agent that wins in scenarios that are specifically contrived to foil that kind of agent. It will always be possible to propose specifically contrived situations for any proposed decision procedure.
Aaargh this has all been addressed by the arbital articles! :<
I’m quoting this because, even though it’s wrong, it’s actually an incredibly powerful naive intuition. I think many people who have internalized TDT/UDT/FDT-style reasoning have forgotten just how intuitive the quoted block is. The unstated underlying assumption here (which is unstated because Schwarz most likely doesn’t even realize it is an assumption) is extremely persuasive, extremely obvious, and extremely wrong:
If you find yourself in a particular situation, the circumstances that led you to that situation are irrelevant, because they don’t change the undeniable fact that you are already here.
This is the intuition driving causal decision theory, and it is so powerful that mainstream academic philosophers are nearly incapable of recognizing it as an assumption (and a false assumption at that). Schwarz himself demonstrates just how hard it is to question this assumption: even when the opposing argument was laid right in front of him, he managed to misunderstand the point so hard that he actually used the very mistaken assumption the paper was criticizing as ammunition against the paper. (Note: this is not intended to be dismissive toward Schwarz. Rather, it’s simply meant as an illustrative example, emphasizing exactly how hard it is for anyone, Schwarz included, to question an assumption that’s baked into their model of the world.) And if even if you already understand why FDT is correct, it still shouldn’t be hard to see why the assumption in question is so compelling:
How could what happened in the past be relevant for making a decision in the present? The only thing your present decision can affect is the future, so how could there be any need to consider the past when making your decision? Surely the only relevant factors are the various possible futures each of your choices leads to? Or, to put things in Pearlian terms: it’s known that the influence of distant causal nodes is screened off by closer nodes through which they’re connected, and all future nodes are only connected to past nodes through the present—there’s no such thing as a future node that’s directly connected to a past node while not being connected to the present, after all. So doesn’t that mean the effects of the past are screened off when making a decision? Only what’s happening in the present matters, surely?
Phrased that way, it’s not immediately obvious what’s wrong with this assumption (which is the point, of course, since otherwise people wouldn’t find it so difficult to discard). What’s actually wrong is something that’s a bit hard to explain, and evidently the explanation Eliezer and Nate used in their paper didn’t manage to convey it. My favorite way of putting it, however, is this:
In certain decision problems, your counterfactual behavior matters as much—if not more—than your actual behavior. That is to say, there exists a class of decision problems where the outcome depends on something that never actually happens. Here’s a very simple toy example of such a problem:
Strictly speaking, this actually isn’t a decision problem, since the real you is never faced with a choice to make, but it illustrates the concept clearly enough: the question of whether or not you receive the $500 is entirely dependent on a chess game that never actually happened. Does that mean the chess game wasn’t real? Well, maybe; it depends on your view of Platonic computations. But one thing it definitely does not mean is that Omega’s decision was arbitrary. Regardless of whether you feel Omega based his decision on a “real” chess game, you would in fact either win or not win against Kasparov, and whether you get the $500 really does depend on the outcome of that hypothetical game. (To make this point clearer, imagine that Omega actually predicted that you’d win. Surprised by his own prediction, Omega is now faced with the prospect of giving you $500 that he never expected he’d actually have to give up. Can he back out of the deal by claiming that since the chess game never actually happened, the outcome was up in the air all along and therefore he doesn’t have to give you anything? If your answer to that question is no, then you understand what I’m trying to get at.)
So outcomes can depend on things that never actually happen in the real world. Cool, so what does that have to do with the past influencing the future? Well, the answer is that it doesn’t—at least, not directly. But that’s where the twist comes in:
Earlier, when I gave my toy example of a hypothetical chess game between you and Kasparov, I made sure to phrase the question so that the situation was presented from the perspective of your actual self, not your hypothetical self. (This makes sense; after all, if Omega’s prediction method was based on something other than a direct simulation, your hypothetical self might not even exist.) But there was another way of describing the situation:
Here, we have the same situation, but presented from the viewpoint of the hypothetical you on whom Omega’s prediction is based. Crucially, the hypothetical you doesn’t know that they’re hypothetical, or that the real you even exists. So from their perspective, something random just happened for no reason at all. (Yes, yes, if Omega used some method other than a simulation to make his prediction, the hypothetical you wouldn’t have existed and wouldn’t have had a perspective—but hey, that doesn’t stop me from writing from their perspective, right? After all, real people write from the perspectives of unreal people all the time; that’s just called writing fiction. And besides, we’ve already established that real or unreal, the outcome of the game really does determine whether you get the $500, so the thoughts and feelings of the hypothetical you are nonetheless important in that they partially determine the outcome of the game.)
And now we come to the final, crucial point that makes sense of the blackmail scenario and all the other thought experiments in the paper, the point that Schwarz and most mainstream philosophers haven’t taken into account:
Every single one of those thought experiments could have been written from the perspective, not of the real you, but a hypothetical, counterfactual version of yourself.
When “you’re” being blackmailed, Schwarz makes the extremely natural assumption that “you” are you. But there’s no reason to suppose this is the case. The scenario never stipulates why you’re being blackmailed, only that you’re being blackmailed. So the person being blackmailed could be either the real you or a hypothetical. And the thing that determines whether it’s the real you or a mere hypothetical is...
...your decision whether or not to pay up, of course.
If you cave into the blackmail and pay up, then you’re almost certainly the real deal. On the other hand, if you refuse to give in, it’s very likely that you’re simply a counterfactual version of yourself living in an extremely low-probability (if not outright inconsistent) world. So your decision doesn’t just determine the future; it also determines (with high probability) which you “you” are. And so then the problem simplifies into this: which you do you want to be?
If you’re the real you, then life kinda sucks. You just got blackmailed and you paid up, so now you’re down a bunch of money. If, on the other hand, you’re the hypothetical version of yourself, then congratulations: “you” were never real in the first place, and by counterfactually refusing to pay, you just drastically lowered the probability of your actual self ever having to face this situation and (in the process) becoming you. And when things are put that way, well, the correct decision becomes rather obvious.
But this kind of counterfactual reasoning is extremely counterintuitive. Our brains aren’t designed for this kind of thinking (well, not explicitly, anyway). You have to think about hypothetical versions of yourself that have never existed and (if all goes well) will never exist, and therefore only exist in the space of logical possibility. What does that even mean, anyway? Well, answering confused questions like that is pretty much MIRI’s goal these days, so I dunno, maybe we can ask them.
Worse. It doesn’t change who you are, you are the person being blackmailed. This you know. What you don’t know is whether you exist (or ever existed). Whether you ever existed is determined by your decision.
(The distinction from the quoted sentence may matter if you put less value on the worlds of people slightly different from yourself, so you may prefer to ensure your own existence, even in a blackmailed situation, over the existence of the alternative you who is not blackmailed but who is different and so less valuable. This of course involves unstable values, but motivates degree of existence phrasing of the effect of decisions, over the change in the content of the world phrasing, since the latter doesn’t let us weigh whole alternative worlds differently.)
“FDT means choosing a decision algorithm so that, if a blackmailer inspects your decision algorithm, they will know you can’t be blackmailed. If you’ve chosen such a decision algorithm, a blackmailer won’t blackmail you.”
Is this an accurate summary?
I only just saw this comment. I think that there’s a lot of value in imagining the possibility of both a real you and a hypothetical you, but I expect Schwarz would object to this as the problem statement says, “you”. By default, this is assumed to refer to a version of you that is real within the scope of the problem, not a version that is hypothetical within this scope. Indeed, you seem to recognise that your reasoning isn’t completely solid when you ask what these hypothetical selves even mean.
Fortunately, this is where some of my arguments from Deconfusing Logical Counterfactuals can come in. If we are being blackmailed and we are only blackmailed if we pay out, then it is logically impossible for us to not pay out, so we don’t have a decision theory problem. Such a problem requires multiple possible actions and the only way to obtain this is by erasing some of the information. So if we erase the information about whether or not you are being blackmailed, we end up with two possibilities: a) you paying the the situation and getting blackmailed or b) you not being the kind of person who would pay and you not getting blackmailed. And in that case we’d pick option a).
We can then answer your the question about how your decision affects the past—it doesn’t. The past is fixed and cannot be changed. So why does it look like your decision changes the past? Actually, your decision is fixed as well. You can’t actually change what it will be. However, both you decision and the past may be different in counterfactuals, after all, they aren’t factual. If we have a perfect predictor and we posit a different decision, then we must necessarily posit a different prediction in the past to maintain consistency. Even though we might talking colloquially about changing a decision, what’s actually happening is that we are positing a different counterfactual.
Belatedly, is this a fair summary of your critique?
When someone thinks about another person (e.g. to predict whether they’ll submit to blackmail), the act of thinking about the other person creates a sort of ‘mental simulation’ that has detailed concious experiences in its own right. So you never really know whether you’re a flesh-and-blood person or a ‘mental simulation’ based on a flesh-and-blood person.
Now, suppose you seem to find yourself in a situation where you’ve been blackmailed. In this context, it’s reasonable to wonder whether you’re actually a flesh-and-blood person who’s been blackmailed—or merely a ‘mental simulation’ that exists in the mind of a potential blackmailer. If you’re a mental simulation, and you care about the flesh-and-blood person you’re based on, then you have reason to resist blackmail. The reason is that the decision you take as a simulation will determine the blackmailer’s prediction about how the flesh-and-blood person will behave. If you resist blackmail, then the blackmailer will predict the flesh-and-blood person will refuse blackmail and therefore decide not to blackmail them.
If this is roughly in the right ballpark, then I would have a couple responses:
I disagree that the act of thinking about a person will tend to create a mental simulation that has detailed concious experiences in its own right. This seems like a surprising position that goes against the grain of conventional neuroscience and views on the philosophy of conciousness. As a simple illustrative case, suppose that Omega makes a prediction about Person A purely on the basis of their body language. Surely thinking “This guys looks really nervous, he’s probably worried he’ll be seen as the sort of guy who’ll submit to blackmail—because he is” doesn’t require bringing a whole new conciousness into existance.
Suppose that when a blackmailer predicts someone’s behavior, they do actually create a concious mental simulation. Suppose you don’t know whether you’re this kind of simulation or the associated flesh-and-blood person, but you care about what happens to the flesh-and-blood person in either case. Then, depending on certain parameter values, CDT does actually say you should resist blackmail. This is because there is some chance that you will cause the flesh-and-blood person to avoid being blackmailed. So CDT gives the response you want in this case.
Overall, I don’t think this line of argument really damages CDT. It seems to be based on a claim about conciousness that I think probably wrong. But even if the claim is right, all this implies is that CDT recommends a different action than one would otherwise have thought.
(If my summary is roughly in the right ballpark, then I also think it’s totally reasonable for academic decision theorists to read the FDT paper to fail to know that a non-mainstream neuroscience/philosophy-of-conciousness view is being assumed and provides the main justification for FDT. The paper really doesn’t directly say anything about this. It seems wrong, then, to me to suggest that Schwarz only disagrees because he lacks the ability to see his own assumptions.)
[[EDIT: Oops, rereading your comment, seems like the summary is probably not fair. I didn’t process this bit:
But now, reading the rest of the comment in light of this point, I don’t think this reduces my qualms. The suggestion seems to be that, when seem to find yourself in the box room, you should in some cases be uncertain about whether or not you exist at all. And in these cases you should one box, because, if it turns out that you don’t exist, then your decision to one box will (in some sense) cause a corresponding person who does exist to get more money. You also don’t personally get less money by one boxing, because you don’t get any money either way, because you don’t exist.
Naively, this line of thought seems sketchy. You can have uncertainty about the substrate your mind is being run on or about the features of the external world—e.g. you can be unsure whether or not you’re a simulation—but there doesn’t seem to be room for uncertainty about whether or not you exist. “Cogito, ergo sum” and all that.
There is presumably some set of metaphysical/epistemological positions under which this line of reasoning makes sense, but, again, the paper really doesn’t make any of these positions explicit or argue for them directly. I mainly think it’s premature to explain the paper’s faillure to persuade philosophers in terms of their rigidity or inability to question assumptions.]]
You say
but that counterexample isn’t predicated on a logical inconsistency. An FDT agent can still get blackmailed, by someone who doesn’t know it’s an FDT agent or who doesn’t care that the outcome will be bad or who is lousy at predicting the behaviour of FDT agents.
No, this is wrong.
What prevents the FDT agent from getting blackmailed is not that he is known to be an FDT agent. Rather, it’s simply that he’s known to be the sort of person who will blow the whistle on blackmailers. Potential blackmailers need not know or care why the FDT agent behaves like this (indeed, it does not matter at all, for the purpose of each individual case, why the agent behaves like this—whether it’s because the dictates of FDT so direct him, or because he’s naturally stubborn, or whatever; it suffices that blackmailers expect the agent to blow the whistle).
So if we stipulate an FDT agent, we also, thereby, stipulate that he is known to be a whistle-blower. (We’re assuming, in every version of this scenario, that, regardless of the agent’s motivations, his propensity toward such behavior is known.) And that does indeed make the “he’s blackmailed anyway” scenario logically inconsistent.
Even if you are known to be the sort of person who will blow the whistle on blackmailers it is still possible that someone will try to blackmail you. How could that possibly involve a logical inconsistency? (People do stupid things all the time.)
I’ll say it again in different words, I did not understand the paper (and consequently, the blog) to be talking about actual blackmail in a big messy physical world. I understood them to be talking about a specific, formalized blackmail scenario, in which the blackmailer’s decision to blackmail is entirely contingent on the victim’s counterfactional behaviour, in which case resolving to never pay and still being blackmailed isn’t possible- in full context, it’s logically inconsistent.
Different formalisation are possible, but I’d guess the strict one is what was used. In the softer ones you still generally wont pay.
The paper discusses two specific blackmail scenarios. One (“XOR Blackmail”) is a weirdly contrived situation and I don’t think any of what wo says is talking about it. The other (“Mechanical Blackmail”) is a sort of stylized version of real blackmail scenarios, and does assume that the blackmailer is a perfect predictor. The paper’s discussion of Mechanical Blackmail then considers the case where the blackmailer is an imperfect (but still very reliable) predictor, and says that there too an FDT agent should refuse to pay.
wo’s discussion of blackmail doesn’t directly address either of the specific scenarios discussed in the FDT paper. The first blackmail scenario wo discusses (before saying much about FDT) is a generic case of blackmail (the participants being labelled “Donald” and “Stormy”, perhaps suggesting that we are not supposed to imagine either of them as any sort of perfectly rational perfect predictor...). Then, when specifically discussing FDT, wo considers a slightly different scenario, which again is clearly not meant to involve perfect prediction, simulation, etc., because he says things like ” FDT says you should not pay because, if you were the kind of person who doesn’t pay, you likely wouldn’t have been blackmailed” and “FDT agents who are known as FDT agents have a lower chance of getting blackmailed” (emphasis mine, in both cases).
So. The paper considers a formalized scenario in which the blackmailer’s decision is made on the basis of a perfectly accurate prediction of the victim, but then is at pains to say that it all works just the same if the prediction is imperfect. The blog considers only imperfect-prediction scenarios. Real-world blackmail, of course, never involves anything close to the sort of perfect prediction that would make it take a logical inconsistency for an FDT agent to get blackmailed.
So taking the paper and the blogpost to be talking only about provably-perfect-prediction scenarios seems to me to require (1) reading the paper oddly selectively, (2) interpreting the blogpost very differently from me, and (3) not caring about situations that could ever occur in the actual world, even though wo is clearly concerned with real-world applicability and the paper makes at least some gestures towards such applicability.
For the avoidance of doubt, I don’t think there’s necessarily anything wrong with being interested primarily in such scenarios: the best path to understanding how a theory works in practice might well go via highly simplified scenarios. But it seems to me simply untrue that the paper, still more the blogpost, considers (or should have considered) only such scenarios when discussing blackmail.
This thread might be relevant to your question.
It might. It’s pretty long. I don’t suppose you’d like to be more specific?
(If the answer is no, you wouldn’t, because all you mean is that it seems like the sort of discussion that might have useful things in it, then fair enough of course.)
I read that under a subtext where we were talking about the same blackmail scenario, but, okay, others are possible.
In cases where the blackmail truly seems not to be contingent on its policy, (and in many current real-world cases) the FDT agent will pay.
The only cases when an FDT agent actually will get blackmailed and refuse to pay are cases where being committed to not paying shifts the probabilities enough to make that profitable on average.
It is possible to construct obstinate kinds of agents that aren’t sensitive to FDT’s acausal dealmaking faculties. Evolution might produce them often. They will not be seen as friends. As an LDT-like human, my feelings towards those sorts of blackmailers is that we should destroy all of them as quickly as we can, because their existence is a blight to ours. In light of that, I’m not sure they have a winning strategy. When you start to imagine the directed ocean of coordinated violence that an LDT-aligned faction (so, literally any real-world state with laws against blackmail) points in your direction as soon as they can tell what you are, you may start to wonder if pretending you can’t understand their source code is really a good idea.
I imagine a time when the distinction between CDT and LDT is widely essentially understood, by this time, the very idea of blackmailing will come to seem very strange, we will wonder how there was this era when a person could just say “If you don’t do X, then I will do the fairly self-destructive action Y which I gain nothing from doing” and have everyone just believe them unconditionally, just believe this unqualified statement about their mechanism. Wasn’t that stupid? To lie like that? And even stupider for their victims to pretend that they believe the lie? We will not be able to understand it any more.
Imagine that you see an agnostic community head walking through the park at night. You know it’s a long shot, but you amble towards him, point your gun at him and say “give me your wallet.” He looks back at you and says, “I don’t understand the deal. You’ll shoot me? How does that help you? Because you want my wallet? I don’t understand the logic there, why are those two things related? That doesn’t get you my wallet.”
Only it does, because when you shoot someone you can loot their corpse, so it occurs to me that muggers are a bad example of blackmail. I imagine they’ve got to have a certain amount of comfort with actually killing people, to do that. It’s not really threatening to do something self-destructive, in their view, they still benefit a little from killing you. They still get to empty your pockets. To an extent, mugging is often just a display of a power imbalance and consequent negotiation of a mutually beneficial alternative to violence.
The gang can profit from robbing your store at gunpoint, but you and them both will profit more if you just pay them protection money. LDT only refuses to pay protection money if it realises that having all of the other entangled LDT store owners paying protection money as well would make the gang profitable enough to grow, and that having a grown gang around would have been, on the whole, worse than the amortised risk of being robbed.
Relevant excerpt for why exactly it was rejected:
So apparently the short version of “why their account has a hard time gaining traction in academic philosophy” is (according to this author) just “the paper’s presentation and argumentation aren’t good enough for the top philosophy journals”.
I feel like rejection-with-explanation is still an improvement over the norm.
Maybe pulling back and attacking the wrong intuitions Schwarz is using directly and generally would be worthwhile.