FDT says you should not pay because, if you were the kind of person who doesn’t pay, you likely wouldn’t have been blackmailed. How is that even relevant? You are being blackmailed.
I’m quoting this because, even though it’s wrong, it’s actually an incredibly powerful naive intuition. I think many people who have internalized TDT/UDT/FDT-style reasoning have forgotten just how intuitive the quoted block is. The unstated underlying assumption here (which is unstated because Schwarz most likely doesn’t even realize it is an assumption) is extremely persuasive, extremely obvious, and extremely wrong:
If you find yourself in a particular situation, the circumstances that led you to that situation are irrelevant, because they don’t change the undeniable fact that you are already here.
This is the intuition driving causal decision theory, and it is so powerful that mainstream academic philosophers are nearly incapable of recognizing it as an assumption (and a false assumption at that). Schwarz himself demonstrates just how hard it is to question this assumption: even when the opposing argument was laid right in front of him, he managed to misunderstand the point so hard that he actually used the very mistaken assumption the paper was criticizing as ammunition against the paper. (Note: this is not intended to be dismissive toward Schwarz. Rather, it’s simply meant as an illustrative example, emphasizing exactly how hard it is for anyone, Schwarz included, to question an assumption that’s baked into their model of the world.) And if even if you already understand why FDT is correct, it still shouldn’t be hard to see why the assumption in question is so compelling:
How could what happened in the past be relevant for making a decision in the present? The only thing your present decision can affect is the future, so how could there be any need to consider the past when making your decision? Surely the only relevant factors are the various possible futures each of your choices leads to? Or, to put things in Pearlian terms: it’s known that the influence of distant causal nodes is screened off by closer nodes through which they’re connected, and all future nodes are only connected to past nodes through the present—there’s no such thing as a future node that’s directly connected to a past node while not being connected to the present, after all. So doesn’t that mean the effects of the past are screened off when making a decision? Only what’s happening in the present matters, surely?
Phrased that way, it’s not immediately obvious what’s wrong with this assumption (which is the point, of course, since otherwise people wouldn’t find it so difficult to discard). What’s actually wrong is something that’s a bit hard to explain, and evidently the explanation Eliezer and Nate used in their paper didn’t manage to convey it. My favorite way of putting it, however, is this:
In certain decision problems, your counterfactual behavior matters as much—if not more—than your actual behavior. That is to say, there exists a class of decision problems where the outcome depends on something that never actually happens. Here’s a very simple toy example of such a problem:
Omega, the alien superintelligence, predicts the outcome of a chess game between you and Kasparov. If he predicted that you’d win, he gives you $500 in reality; otherwise you get nothing.
Strictly speaking, this actually isn’t a decision problem, since the real you is never faced with a choice to make, but it illustrates the concept clearly enough: the question of whether or not you receive the $500 is entirely dependent on a chess game that never actually happened. Does that mean the chess game wasn’t real? Well, maybe; it depends on your view of Platonic computations. But one thing it definitely does not mean is that Omega’s decision was arbitrary. Regardless of whether you feel Omega based his decision on a “real” chess game, you would in fact either win or not win against Kasparov, and whether you get the $500 really does depend on the outcome of that hypothetical game. (To make this point clearer, imagine that Omega actually predicted that you’d win. Surprised by his own prediction, Omega is now faced with the prospect of giving you $500 that he never expected he’d actually have to give up. Can he back out of the deal by claiming that since the chess game never actually happened, the outcome was up in the air all along and therefore he doesn’t have to give you anything? If your answer to that question is no, then you understand what I’m trying to get at.)
So outcomes can depend on things that never actually happen in the real world. Cool, so what does that have to do with the past influencing the future? Well, the answer is that it doesn’t—at least, not directly. But that’s where the twist comes in:
Earlier, when I gave my toy example of a hypothetical chess game between you and Kasparov, I made sure to phrase the question so that the situation was presented from the perspective of your actual self, not your hypothetical self. (This makes sense; after all, if Omega’s prediction method was based on something other than a direct simulation, your hypothetical self might not even exist.) But there was another way of describing the situation:
You’re going about your day normally when suddenly, with no warning whatsoever, you’re teleported into a white void of nothingness. In front of you is a chessboard; on the other side of the chessboard sits Kasparov, who challenges you to a game of chess.
Here, we have the same situation, but presented from the viewpoint of the hypothetical you on whom Omega’s prediction is based. Crucially, the hypothetical you doesn’t know that they’re hypothetical, or that the real you even exists. So from their perspective, something random just happened for no reason at all. (Yes, yes, if Omega used some method other than a simulation to make his prediction, the hypothetical you wouldn’t have existed and wouldn’t have had a perspective—but hey, that doesn’t stop me from writing from their perspective, right? After all, real people write from the perspectives of unreal people all the time; that’s just called writing fiction. And besides, we’ve already established that real or unreal, the outcome of the game really does determine whether you get the $500, so the thoughts and feelings of the hypothetical you are nonetheless important in that they partially determine the outcome of the game.)
And now we come to the final, crucial point that makes sense of the blackmail scenario and all the other thought experiments in the paper, the point that Schwarz and most mainstream philosophers haven’t taken into account:
Every single one of those thought experiments could have been written from the perspective, not of the real you, but a hypothetical, counterfactual version of yourself.
When “you’re” being blackmailed, Schwarz makes the extremely natural assumption that “you” are you. But there’s no reason to suppose this is the case. The scenario never stipulates why you’re being blackmailed, only that you’re being blackmailed. So the person being blackmailed could be either the real you or a hypothetical. And the thing that determines whether it’s the real you or a mere hypothetical is...
...your decision whether or not to pay up, of course.
If you cave into the blackmail and pay up, then you’re almost certainly the real deal. On the other hand, if you refuse to give in, it’s very likely that you’re simply a counterfactual version of yourself living in an extremely low-probability (if not outright inconsistent) world. So your decision doesn’t just determine the future; it also determines (with high probability) which you “you” are. And so then the problem simplifies into this: which you do you want to be?
If you’re the real you, then life kinda sucks. You just got blackmailed and you paid up, so now you’re down a bunch of money. If, on the other hand, you’re the hypothetical version of yourself, then congratulations: “you” were never real in the first place, and by counterfactually refusing to pay, you just drastically lowered the probability of your actual self ever having to face this situation and (in the process) becoming you. And when things are put that way, well, the correct decision becomes rather obvious.
But this kind of counterfactual reasoning is extremely counterintuitive. Our brains aren’t designed for this kind of thinking (well, not explicitly, anyway). You have to think about hypothetical versions of yourself that have never existed and (if all goes well) will never exist, and therefore only exist in the space of logical possibility. What does that even mean, anyway? Well, answering confused questions like that is pretty much MIRI’s goal these days, so I dunno, maybe we can ask them.
So your decision doesn’t just determine the future; it also determines (with high probability) which you “you” are.
Worse. It doesn’t change who you are, you are the person being blackmailed. This you know. What you don’t know is whether you exist (or ever existed). Whether you ever existed is determined by your decision.
(The distinction from the quoted sentence may matter if you put less value on the worlds of people slightly different from yourself, so you may prefer to ensure your own existence, even in a blackmailed situation, over the existence of the alternative you who is not blackmailed but who is different and so less valuable. This of course involves unstable values, but motivates degree of existence phrasing of the effect of decisions, over the change in the content of the world phrasing, since the latter doesn’t let us weigh whole alternative worlds differently.)
“FDT means choosing a decision algorithm so that, if a blackmailer inspects your decision algorithm, they will know you can’t be blackmailed. If you’ve chosen such a decision algorithm, a blackmailer won’t blackmail you.”
I only just saw this comment. I think that there’s a lot of value in imagining the possibility of both a real you and a hypothetical you, but I expect Schwarz would object to this as the problem statement says, “you”. By default, this is assumed to refer to a version of you that is real within the scope of the problem, not a version that is hypothetical within this scope. Indeed, you seem to recognise that your reasoning isn’t completely solid when you ask what these hypothetical selves even mean.
Fortunately, this is where some of my arguments from Deconfusing Logical Counterfactuals can come in. If we are being blackmailed and we are only blackmailed if we pay out, then it is logically impossible for us to not pay out, so we don’t have a decision theory problem. Such a problem requires multiple possible actions and the only way to obtain this is by erasing some of the information. So if we erase the information about whether or not you are being blackmailed, we end up with two possibilities: a) you paying the the situation and getting blackmailed or b) you not being the kind of person who would pay and you not getting blackmailed. And in that case we’d pick option a).
We can then answer your the question about how your decision affects the past—it doesn’t. The past is fixed and cannot be changed. So why does it look like your decision changes the past? Actually, your decision is fixed as well. You can’t actually change what it will be. However, both you decision and the past may be different in counterfactuals, after all, they aren’t factual. If we have a perfect predictor and we posit a different decision, then we must necessarily posit a different prediction in the past to maintain consistency. Even though we might talking colloquially about changing a decision, what’s actually happening is that we are positing a different counterfactual.
Belatedly, is this a fair summary of your critique?
When someone thinks about another person (e.g. to predict whether they’ll submit to blackmail), the act of thinking about the other person creates a sort of ‘mental simulation’ that has detailed concious experiences in its own right. So you never really know whether you’re a flesh-and-blood person or a ‘mental simulation’ based on a flesh-and-blood person.
Now, suppose you seem to find yourself in a situation where you’ve been blackmailed. In this context, it’s reasonable to wonder whether you’re actually a flesh-and-blood person who’s been blackmailed—or merely a ‘mental simulation’ that exists in the mind of a potential blackmailer. If you’re a mental simulation, and you care about the flesh-and-blood person you’re based on, then you have reason to resist blackmail. The reason is that the decision you take as a simulation will determine the blackmailer’s prediction about how the flesh-and-blood person will behave. If you resist blackmail, then the blackmailer will predict the flesh-and-blood person will refuse blackmail and therefore decide not to blackmail them.
If this is roughly in the right ballpark, then I would have a couple responses:
I disagree that the act of thinking about a person will tend to create a mental simulation that has detailed concious experiences in its own right. This seems like a surprising position that goes against the grain of conventional neuroscience and views on the philosophy of conciousness. As a simple illustrative case, suppose that Omega makes a prediction about Person A purely on the basis of their body language. Surely thinking “This guys looks really nervous, he’s probably worried he’ll be seen as the sort of guy who’ll submit to blackmail—because he is” doesn’t require bringing a whole new conciousness into existance.
Suppose that when a blackmailer predicts someone’s behavior, they do actually create a concious mental simulation. Suppose you don’t know whether you’re this kind of simulation or the associated flesh-and-blood person, but you care about what happens to the flesh-and-blood person in either case. Then, depending on certain parameter values, CDT does actually say you should resist blackmail. This is because there is some chance that you will cause the flesh-and-blood person to avoid being blackmailed. So CDT gives the response you want in this case.
Overall, I don’t think this line of argument really damages CDT. It seems to be based on a claim about conciousness that I think probably wrong. But even if the claim is right, all this implies is that CDT recommends a different action than one would otherwise have thought.
(If my summary is roughly in the right ballpark, then I also think it’s totally reasonable for academic decision theorists to read the FDT paper to fail to know that a non-mainstream neuroscience/philosophy-of-conciousness view is being assumed and provides the main justification for FDT. The paper really doesn’t directly say anything about this. It seems wrong, then, to me to suggest that Schwarz only disagrees because he lacks the ability to see his own assumptions.)
[[EDIT: Oops, rereading your comment, seems like the summary is probably not fair. I didn’t process this bit:
Yes, yes, if Omega used some method other than a simulation to make his prediction, the hypothetical you wouldn’t have existed and wouldn’t have had a perspective—but hey, that doesn’t stop me from writing from their perspective, right? After all, real people write from the perspectives of unreal people all the time; that’s just called writing fiction.
But now, reading the rest of the comment in light of this point, I don’t think this reduces my qualms. The suggestion seems to be that, when seem to find yourself in the box room, you should in some cases be uncertain about whether or not you exist at all. And in these cases you should one box, because, if it turns out that you don’t exist, then your decision to one box will (in some sense) cause a corresponding person who does exist to get more money. You also don’t personally get less money by one boxing, because you don’t get any money either way, because you don’t exist.
Naively, this line of thought seems sketchy. You can have uncertainty about the substrate your mind is being run on or about the features of the external world—e.g. you can be unsure whether or not you’re a simulation—but there doesn’t seem to be room for uncertainty about whether or not you exist. “Cogito, ergo sum” and all that.
There is presumably some set of metaphysical/epistemological positions under which this line of reasoning makes sense, but, again, the paper really doesn’t make any of these positions explicit or argue for them directly. I mainly think it’s premature to explain the paper’s faillure to persuade philosophers in terms of their rigidity or inability to question assumptions.]]
I’m quoting this because, even though it’s wrong, it’s actually an incredibly powerful naive intuition. I think many people who have internalized TDT/UDT/FDT-style reasoning have forgotten just how intuitive the quoted block is. The unstated underlying assumption here (which is unstated because Schwarz most likely doesn’t even realize it is an assumption) is extremely persuasive, extremely obvious, and extremely wrong:
If you find yourself in a particular situation, the circumstances that led you to that situation are irrelevant, because they don’t change the undeniable fact that you are already here.
This is the intuition driving causal decision theory, and it is so powerful that mainstream academic philosophers are nearly incapable of recognizing it as an assumption (and a false assumption at that). Schwarz himself demonstrates just how hard it is to question this assumption: even when the opposing argument was laid right in front of him, he managed to misunderstand the point so hard that he actually used the very mistaken assumption the paper was criticizing as ammunition against the paper. (Note: this is not intended to be dismissive toward Schwarz. Rather, it’s simply meant as an illustrative example, emphasizing exactly how hard it is for anyone, Schwarz included, to question an assumption that’s baked into their model of the world.) And if even if you already understand why FDT is correct, it still shouldn’t be hard to see why the assumption in question is so compelling:
How could what happened in the past be relevant for making a decision in the present? The only thing your present decision can affect is the future, so how could there be any need to consider the past when making your decision? Surely the only relevant factors are the various possible futures each of your choices leads to? Or, to put things in Pearlian terms: it’s known that the influence of distant causal nodes is screened off by closer nodes through which they’re connected, and all future nodes are only connected to past nodes through the present—there’s no such thing as a future node that’s directly connected to a past node while not being connected to the present, after all. So doesn’t that mean the effects of the past are screened off when making a decision? Only what’s happening in the present matters, surely?
Phrased that way, it’s not immediately obvious what’s wrong with this assumption (which is the point, of course, since otherwise people wouldn’t find it so difficult to discard). What’s actually wrong is something that’s a bit hard to explain, and evidently the explanation Eliezer and Nate used in their paper didn’t manage to convey it. My favorite way of putting it, however, is this:
In certain decision problems, your counterfactual behavior matters as much—if not more—than your actual behavior. That is to say, there exists a class of decision problems where the outcome depends on something that never actually happens. Here’s a very simple toy example of such a problem:
Strictly speaking, this actually isn’t a decision problem, since the real you is never faced with a choice to make, but it illustrates the concept clearly enough: the question of whether or not you receive the $500 is entirely dependent on a chess game that never actually happened. Does that mean the chess game wasn’t real? Well, maybe; it depends on your view of Platonic computations. But one thing it definitely does not mean is that Omega’s decision was arbitrary. Regardless of whether you feel Omega based his decision on a “real” chess game, you would in fact either win or not win against Kasparov, and whether you get the $500 really does depend on the outcome of that hypothetical game. (To make this point clearer, imagine that Omega actually predicted that you’d win. Surprised by his own prediction, Omega is now faced with the prospect of giving you $500 that he never expected he’d actually have to give up. Can he back out of the deal by claiming that since the chess game never actually happened, the outcome was up in the air all along and therefore he doesn’t have to give you anything? If your answer to that question is no, then you understand what I’m trying to get at.)
So outcomes can depend on things that never actually happen in the real world. Cool, so what does that have to do with the past influencing the future? Well, the answer is that it doesn’t—at least, not directly. But that’s where the twist comes in:
Earlier, when I gave my toy example of a hypothetical chess game between you and Kasparov, I made sure to phrase the question so that the situation was presented from the perspective of your actual self, not your hypothetical self. (This makes sense; after all, if Omega’s prediction method was based on something other than a direct simulation, your hypothetical self might not even exist.) But there was another way of describing the situation:
Here, we have the same situation, but presented from the viewpoint of the hypothetical you on whom Omega’s prediction is based. Crucially, the hypothetical you doesn’t know that they’re hypothetical, or that the real you even exists. So from their perspective, something random just happened for no reason at all. (Yes, yes, if Omega used some method other than a simulation to make his prediction, the hypothetical you wouldn’t have existed and wouldn’t have had a perspective—but hey, that doesn’t stop me from writing from their perspective, right? After all, real people write from the perspectives of unreal people all the time; that’s just called writing fiction. And besides, we’ve already established that real or unreal, the outcome of the game really does determine whether you get the $500, so the thoughts and feelings of the hypothetical you are nonetheless important in that they partially determine the outcome of the game.)
And now we come to the final, crucial point that makes sense of the blackmail scenario and all the other thought experiments in the paper, the point that Schwarz and most mainstream philosophers haven’t taken into account:
Every single one of those thought experiments could have been written from the perspective, not of the real you, but a hypothetical, counterfactual version of yourself.
When “you’re” being blackmailed, Schwarz makes the extremely natural assumption that “you” are you. But there’s no reason to suppose this is the case. The scenario never stipulates why you’re being blackmailed, only that you’re being blackmailed. So the person being blackmailed could be either the real you or a hypothetical. And the thing that determines whether it’s the real you or a mere hypothetical is...
...your decision whether or not to pay up, of course.
If you cave into the blackmail and pay up, then you’re almost certainly the real deal. On the other hand, if you refuse to give in, it’s very likely that you’re simply a counterfactual version of yourself living in an extremely low-probability (if not outright inconsistent) world. So your decision doesn’t just determine the future; it also determines (with high probability) which you “you” are. And so then the problem simplifies into this: which you do you want to be?
If you’re the real you, then life kinda sucks. You just got blackmailed and you paid up, so now you’re down a bunch of money. If, on the other hand, you’re the hypothetical version of yourself, then congratulations: “you” were never real in the first place, and by counterfactually refusing to pay, you just drastically lowered the probability of your actual self ever having to face this situation and (in the process) becoming you. And when things are put that way, well, the correct decision becomes rather obvious.
But this kind of counterfactual reasoning is extremely counterintuitive. Our brains aren’t designed for this kind of thinking (well, not explicitly, anyway). You have to think about hypothetical versions of yourself that have never existed and (if all goes well) will never exist, and therefore only exist in the space of logical possibility. What does that even mean, anyway? Well, answering confused questions like that is pretty much MIRI’s goal these days, so I dunno, maybe we can ask them.
Worse. It doesn’t change who you are, you are the person being blackmailed. This you know. What you don’t know is whether you exist (or ever existed). Whether you ever existed is determined by your decision.
(The distinction from the quoted sentence may matter if you put less value on the worlds of people slightly different from yourself, so you may prefer to ensure your own existence, even in a blackmailed situation, over the existence of the alternative you who is not blackmailed but who is different and so less valuable. This of course involves unstable values, but motivates degree of existence phrasing of the effect of decisions, over the change in the content of the world phrasing, since the latter doesn’t let us weigh whole alternative worlds differently.)
“FDT means choosing a decision algorithm so that, if a blackmailer inspects your decision algorithm, they will know you can’t be blackmailed. If you’ve chosen such a decision algorithm, a blackmailer won’t blackmail you.”
Is this an accurate summary?
I only just saw this comment. I think that there’s a lot of value in imagining the possibility of both a real you and a hypothetical you, but I expect Schwarz would object to this as the problem statement says, “you”. By default, this is assumed to refer to a version of you that is real within the scope of the problem, not a version that is hypothetical within this scope. Indeed, you seem to recognise that your reasoning isn’t completely solid when you ask what these hypothetical selves even mean.
Fortunately, this is where some of my arguments from Deconfusing Logical Counterfactuals can come in. If we are being blackmailed and we are only blackmailed if we pay out, then it is logically impossible for us to not pay out, so we don’t have a decision theory problem. Such a problem requires multiple possible actions and the only way to obtain this is by erasing some of the information. So if we erase the information about whether or not you are being blackmailed, we end up with two possibilities: a) you paying the the situation and getting blackmailed or b) you not being the kind of person who would pay and you not getting blackmailed. And in that case we’d pick option a).
We can then answer your the question about how your decision affects the past—it doesn’t. The past is fixed and cannot be changed. So why does it look like your decision changes the past? Actually, your decision is fixed as well. You can’t actually change what it will be. However, both you decision and the past may be different in counterfactuals, after all, they aren’t factual. If we have a perfect predictor and we posit a different decision, then we must necessarily posit a different prediction in the past to maintain consistency. Even though we might talking colloquially about changing a decision, what’s actually happening is that we are positing a different counterfactual.
Belatedly, is this a fair summary of your critique?
When someone thinks about another person (e.g. to predict whether they’ll submit to blackmail), the act of thinking about the other person creates a sort of ‘mental simulation’ that has detailed concious experiences in its own right. So you never really know whether you’re a flesh-and-blood person or a ‘mental simulation’ based on a flesh-and-blood person.
Now, suppose you seem to find yourself in a situation where you’ve been blackmailed. In this context, it’s reasonable to wonder whether you’re actually a flesh-and-blood person who’s been blackmailed—or merely a ‘mental simulation’ that exists in the mind of a potential blackmailer. If you’re a mental simulation, and you care about the flesh-and-blood person you’re based on, then you have reason to resist blackmail. The reason is that the decision you take as a simulation will determine the blackmailer’s prediction about how the flesh-and-blood person will behave. If you resist blackmail, then the blackmailer will predict the flesh-and-blood person will refuse blackmail and therefore decide not to blackmail them.
If this is roughly in the right ballpark, then I would have a couple responses:
I disagree that the act of thinking about a person will tend to create a mental simulation that has detailed concious experiences in its own right. This seems like a surprising position that goes against the grain of conventional neuroscience and views on the philosophy of conciousness. As a simple illustrative case, suppose that Omega makes a prediction about Person A purely on the basis of their body language. Surely thinking “This guys looks really nervous, he’s probably worried he’ll be seen as the sort of guy who’ll submit to blackmail—because he is” doesn’t require bringing a whole new conciousness into existance.
Suppose that when a blackmailer predicts someone’s behavior, they do actually create a concious mental simulation. Suppose you don’t know whether you’re this kind of simulation or the associated flesh-and-blood person, but you care about what happens to the flesh-and-blood person in either case. Then, depending on certain parameter values, CDT does actually say you should resist blackmail. This is because there is some chance that you will cause the flesh-and-blood person to avoid being blackmailed. So CDT gives the response you want in this case.
Overall, I don’t think this line of argument really damages CDT. It seems to be based on a claim about conciousness that I think probably wrong. But even if the claim is right, all this implies is that CDT recommends a different action than one would otherwise have thought.
(If my summary is roughly in the right ballpark, then I also think it’s totally reasonable for academic decision theorists to read the FDT paper to fail to know that a non-mainstream neuroscience/philosophy-of-conciousness view is being assumed and provides the main justification for FDT. The paper really doesn’t directly say anything about this. It seems wrong, then, to me to suggest that Schwarz only disagrees because he lacks the ability to see his own assumptions.)
[[EDIT: Oops, rereading your comment, seems like the summary is probably not fair. I didn’t process this bit:
But now, reading the rest of the comment in light of this point, I don’t think this reduces my qualms. The suggestion seems to be that, when seem to find yourself in the box room, you should in some cases be uncertain about whether or not you exist at all. And in these cases you should one box, because, if it turns out that you don’t exist, then your decision to one box will (in some sense) cause a corresponding person who does exist to get more money. You also don’t personally get less money by one boxing, because you don’t get any money either way, because you don’t exist.
Naively, this line of thought seems sketchy. You can have uncertainty about the substrate your mind is being run on or about the features of the external world—e.g. you can be unsure whether or not you’re a simulation—but there doesn’t seem to be room for uncertainty about whether or not you exist. “Cogito, ergo sum” and all that.
There is presumably some set of metaphysical/epistemological positions under which this line of reasoning makes sense, but, again, the paper really doesn’t make any of these positions explicit or argue for them directly. I mainly think it’s premature to explain the paper’s faillure to persuade philosophers in terms of their rigidity or inability to question assumptions.]]