It seems to me that the given defense of FDT is, to put it mildly, unsatisfactory. Whatever “fancy” reasoning is proffered, nevertheless the options on offer are “burn to death” or “pay $100”—and the choice is obvious.
FDT recommends knowingly choosing to burn to death? So much the worse for FDT!
FDT has very persuasive reasoning for why I should choose to burn to death? Uh-huh (asks the non-FDT agent), and if you’re so rational, why are you dead?
Counterfactuals, you say? Well, that’s great, but you still chose to burn to death, instead of choosing not to burn to death.
Unreasonable? I am a rationalist: what do I care about being unreasonable? I don’t have to conform to a particular ritual of cognition. I don’t have to take only box B because I believe my choice affects the box, even though Omega has already left. I can just… take only box B.
Similarly, you don’t have to take the Right box because your decision theory says you should. You can just… take the Right box.
And, you know… not burn to death.
(Maybe the real FDT is “use FDT in all the cases except where doing so will result in you burning to death, in which case use not-FDT”? That way you get the good outcome in all 1 trillion trillion cases, eh?)
P.S. Vaniver’s comment seems completely inapplicable to me, since in the “Bomb” scenario it’s not a question of uncertainty at all.
So even though you are already in the city, you choose to pay and lose utility in that specific scenario? That seems inconsistent with right-boxing on Bomb.
For the record, my answer is also to pay, I but then again I also left-box on Bomb.
Parfit’s Hitchhiker is not an analogous situation, since it doesn’t take place in a context like “you’re the last person in the universe and will never interact with another agent ever”, nor does paying cause me to burn to death (in which case I wouldn’t pay; note that this would defeat the point of being rescued in the first place!).
But more importantly, in the Parfit’s Hitchhiker situation, you have in fact been provided with value (namely, your life!). Then you’re asked to pay a (vastly smaller!) price for that value.
In the Bomb scenario, on the other hand, you’re asked to give up your life (very painfully), and in exchange you get (and have gotten) absolutely nothing whatsoever.
So I really don’t see the relevance of the question…
Actually, I have thought about this a bit more and concluded Bomb and Parfit’s hitchhiker are indeedanalogous in a very important sense: both problems give you the option to “pay” (be it in dollars or with torture and death), even though not paying doesn’t causally affect whether or not you die.
In the Bomb scenario, on the other hand, you’re asked to give up your life (very painfully), and in exchange you get (and have gotten) absolutely nothing whatsoever.
Like Partfit’s hitchhiker, where you are asked to pay $1000 even though you are already rescued.
since it doesn’t take place in a context like “you’re the last person in the universe and will never interact with another agent ever”
That was never relevant to begin with.
Parfit’s Hitchhiker is not an analogous situation
Well, both problems have a predictor and focus on a specific situation after the predictor has already made the prediction. Both problems have subjunctive dependence. So they are analogous, but they have differences as well. However, it seems like you don’t pay because of subjunctive dependence reasons, so never mind, I guess.
FDT recommends knowingly choosing to burn to death? So much the worse for FDT!
This is where, at least in part, your misunderstanding lies (IMO). FDT doesn’t recommend choosing to burn to death. It recommends Left-boxing, which avoids burning to death AND avoids paying $100.
In doing so, FDT beats both CDT and EDT, which both pay $100. It really is as simple as that. The Bomb is an argument for FDT, and quite an excellent one.
… huh? How does this work? The scenario, as described in the OP, is that the Left box has a bomb in it. By taking it, you burn to death. But FDT, as you say, recommends Left-boxing. Therefore, FDT recommends knowingly choosing to burn to death.
I don’t understand how you can deny this when your own post clearly describes all of this.
This works because Left-boxing means you’re in a world where the predictors model of you also Left-boxed when the predictor made its prediction, causing it to not put a Bomb in Left.
Put differently, the situation described by MacAskill becomes virtually impossible if you Left-box, since the probability of Left-boxing and burning to death is ~0.
OR, alternatively, we say: no, we see the Bomb. We can’t retroactively change this! If we keep that part of the world fixed, then, GIVEN the subjunctive dependence between us and the predictor (assuming it’s there), that simply means we Right-box (with probability ~1), since that’s what the predictor’s model did.
Of course, then it’s not much of a decision theoretic problem anymore, since the decision is already fixed in the problem statement. If we assume we can still make a decision, then that decision is made in 2 places: first by the predictor’s model, then by us. Left-boxing means the model Left-boxes and we get to live for free. Right-boxing means the model Right-boxes and we get to live at a cost of $100. The right decision must be Left-boxing.
Put differently, the situation described by MacAskill becomes virtually impossible if you Left-box, since the probability of Left-boxing and burning to death is ~0.
Irrelevant, since the described scenario explicitly stipulates that you find yourself in precisely that situation.
OR, alternatively, we say: no, we see the Bomb. We can’t retroactively change this! If we keep that part of the world fixed, then, GIVEN the subjunctive dependence between us and the predictor (assuming it’s there), that simply means we Right-box (with probability ~1), since that’s what the predictor’s model did.
Yes, that’s what I’ve been saying: choosing Right in that scenario is the correct decision.
Of course, then it’s not much of a decision theoretic problem anymore, since the decision is already fixed in the problem statement.
I have no idea what you mean by this.
Left-boxing means the model Left-boxes and we get to live for free.
“Irrelevant, since the described scenario explicitly stipulates that you find yourself in precisely that situation.”
Actually, this whole problem is irrelevant to me, a Left-boxer: Left-boxers never (or extremely rarely) find themselves in the situation with a bomb in Left. That’s the point.
Firstly, there’s a difference between “never” and “extremely rarely”. And in the latter case, the question remains “and what do you do then?”. To which, it seems, you answer “choose the Right box”…? Well, I agree with that! But that’s just the view that I’ve already described as “Left-box unless there’s a bomb in Left, in which case Right-box”.
It remains unclear to me what it is you think we disagree on.
Firstly, there’s a difference between “never” and “extremely rarely”.
That difference is so small as to be neglected.
And in the latter case, the question remains “and what do you do then?”. To which, it seems, you answer “choose the Right box”…? Well, I agree with that! But that’s just the view that I’ve already described as “Left-box unless there’s a bomb in Left, in which case Right-box”.
It seems to me that strategy leaves you manipulatable by the predictor, who can then just always predict you will Right-box, put a bomb in Left, and let you Right-box, causing you to lose $1,000.
By construction it is not, because the scenario is precisely that we find ourselves in one such exceptional case; the posterior probability (having observed that we do so find ourselves) is thus ~1.
It seems to me that strategy leaves you manipulatable by the predictor
… but you have said, in a previous post, that if you find yourself in this scenario, you Right-box. How to reconcile your apparently contradictory statements…?
By construction it is not, because the scenario is precisely that we find ourselves in one such exceptional case; the posterior probability (having observed that we do so find ourselves) is thus ~1.
Except that we don’t find ourselves there if we Left-box. But we seem to be going around in a circle.
… but you have said, in a previous post, that if you find yourself in this scenario, you Right-box. How to reconcile your apparently contradictory statements…?
Right-boxing is the necessary consequence if we assume the predictor’s Right-box prediction is fixed now. So GIVEN the Right-box prediction, I apparently Right-box.
My entire point is that the prediction is NOT a given. I Left-box, and thus change the prediction to Left-box.
I have made no contradictory statements. I am and always have been saying that Left-boxing is the correct decision to resolve this dilemma.
Except that we don’t find ourselves there if we Left-box. But we seem to be going around in a circle.
There’s no “if” about it. The scenario is that we do find ourselves there. (If you’re fighting the hypothetical, you have to be very explicit about that, because then we’re just talking about two totally different, and pretty much unrelated, things. But I have so far understood you to not be doing that.)
Right-boxing is the necessary consequence if we assume the predictor’s Right-box prediction is fixed now. So GIVEN the Right-box prediction, I apparently Right-box.
I don’t know what you mean by “apparently”. You have two boxes—that’s the scenario. Which do you choose—that’s the question. You can pick either one; where does “apparently” come in?
My entire point is that the prediction is NOT a given. I Left-box, and thus change the prediction to Left-box.
What does this mean? The boxes are already in front of you.
I have made no contradictory statements. I am and always have been saying that Left-boxing is the correct decision to resolve this dilemma.
You just said in this very comment that you Right-box in the given scenario! (And also in several other comments… are you really going to make me cite each of them…?)
I’m not going to make you cite anything. I know what you mean. I said Right-boxing is a consequence, given a certain resolution of the problem; I always maintained Left-boxing is the correct decision. Apparently I didn’t explain myself well, that’s on me. But I’m kinda done, I can’t seem to get my point across (not saying it’s your fault btw).
It doesn’t. Instead, it will make it so that there will have never been a bomb in the first place.
To understand this, imagine yourself as a deterministic algorithm. Either you Left-box under all circumstances (even if there is a bomb in the left box), or you Right-box under all circumstances, or you Right-box iff there is a bomb in the left box.
Implementing the first algorithm out of these three is the best choice (the expected utility is 0).
Implementing the third algorithm (that’s what you do) is the worst choice (the expected utility is -$100).
By the way, I want to point out that you apparently disagree with Heighn on this. He says, as I understand him, that if you pick Left, you do indeed burn to death, but this is fine, because in [1 trillion trillion minus one] possible worlds, you live and pay nothing. But you instead say that if you pick Left… something happens… and the bomb in the Left box, which you were just staring directly at, disappears somehow. Or wasn’t ever there (somehow), even though, again, you were just looking right at it.
How do you reconcile this disagreement? One of you has to be wrong about the consequences of picking the Left box.
I think we agree. My stance: if you Left-box, that just means the predictor predicted that with probability close to 1. From there on, there are a trillion trillion − 1 possible worlds where you live for free, and 1 where you die.
I’m not saying “You die, but that’s fine, because there are possible worlds where you live”. I’m saying that “you die” is a possible world, and there are way more possible worlds where you live.
But apparently the consequences of this aren’t deterministic after all, since the predictor is fallible. So this doesn’t help.
If you reread my comments, I simplified it by assuming an infallible predictor.
How?
For this, it’s helpful to define another kind of causality (logical causality) as distinct from physical causality. You can’t physically cause something to have never been that way, because physical causality can’t go to the past. But you can use logical causality for that, since the output of your decision determines not only your output, but the output of all equivalent computations across the entire timeline. By Left-boxing even in case of a bomb, you will have made it so that the predictor’s simulation of you has Left-boxed as well, resulting in the bomb never having been there.
If you reread my comments, I simplified it by assuming an infallible predictor.
… so, in other words, you’re not actually talking about the scenario described in the OP. But that’s what my comments have been about, so… everything you said has been a non sequitur…?
You can’t physically cause something to have never been that way, because physical causality can’t go to the past. But you can use logical causality for that, since the output of your decision determines not only your output, but the output of all equivalent computations across the entire timeline. By Left-boxing even in case of a bomb, you will have made it so that the predictor’s simulation of you has Left-boxed as well, resulting in the bomb never having been there.
This really doesn’t answer the question.
Again, the scenario is: you’re looking at the Left box, and there’s a bomb in it. It’s right there in front of you. What do you do?
So, for example, when you say:
By Left-boxing even in case of a bomb, you will have made it so that the predictor’s simulation of you has Left-boxed as well, resulting in the bomb never having been there.
So if you take the Left box, what actually, physically happens?
… so, in other words, you’re not actually talking about the scenario described in the OP. But that’s what my comments have been about, so… everything you said has been a non sequitur…?
See my top-level comment, this is precisely the problem with the scenario descibed in the OP I pointed out. Your reading is standard, but not the intended meaning.
But it’s also puzzling that you can’t ITT this point, to see both meanings, even if you disagree that it’s reasonable to allow/expect the intended one. Perhaps divesting from having an opinion on the object level question might help? Like, what is the point the others are trying to make, specifically, how does it work, regardless of if it’s a wrong point, described in a way that makes no reference to its wrongness/absurdity?
Like with bug reports, it’s not helpful to say that something “doesn’t work at all”, it’s useful to be more specific. There’s some failure of rationality at play here, you are way too intelligent to be incapable of seeing what the point is, so there is some systematic avoidance of allowing yourself to see what is going on. Heighn’s antagonistic dogmatism doesn’t help, but shouldn’t be this debilitating.
As far as your top-level comment, well, my follow-up questions about it remain unanswered…
I dropped out of that conversation because it seemed to be going in circles, and I think I’ve explained everything already. Apparently the conversation continued, green_leaf seems to be making good points, and Heighn continues needlessly upping the heat.
I don’t think object level conversation is helpful at this point, there is some methodological issue in how you think about this that I don’t see an efficient approach to. I’m already way outside the sort of conversational norms I’m trying to follow for the last few years, which is probably making this comment as hopelessly unhelpful as ever, though in 2010 that’d more likely be the default mode of response for me.
Note that it’s my argumentation that’s being called crazy, which is a large factor in the “antagonism” you seem to observe—a word choice I don’t agree with, btw.
About the “needlessly upping the heat”, I’ve tried this discussion from multiple different angles, seeing if we can come to a resolution. So far, no, alas, but not for lack of trying. I will admit some of my reactions were short and a bit provocative, but I don’t appreciate nor agree with your accusations. I have been honest in my reactions.
I’ve been you ten years ago. This doesn’t help, courtesy or honesty (purposes that tend to be at odds with each other) aren’t always sufficient, it’s also necessary to entertain strange points of view that are obviously wrong, in order to talk in another’s language, to de-escalate where escalation won’t help (it might help with feeding norms, but knowing what norms you are feeding is important). And often enough that is still useless and the best thing is to give up. Or at least more decisively overturn the chess board, as I’m doing with some of the last few comments to this post, to avoid remaining in an interminable failure mode.
These norms are interesting in how well they fade into the background, oppose being examined. If you happen to be a programmer or have enough impression of what that might be like, just imagine a programmer team where talking about bugs can be taboo in some circumstances, especially if they are hypothetical bugs imagined out of whole cloth to check if they happen to be there, or brought to attention to see if it’s cheap to put measures in place to prevent their going unnoticed, even if it eventually turns out that they were never there to begin with in actuality. With rationality, that’s hypotheses about how people think, including hypotheses about norms that oppose examination of such hypotheses and norms.
Sorry, I’m having trouble understanding your point here. I understand your analogy (I was a developer), but am not sure what you’re drawing the analogy to.
I see your point, although I have entertained Said’s view as well. But yes, I could have done better. I tend to get like this when my argumentation is being called crazy, and I should have done better.
You could have just told me this instead of complaining about me to Said though.
Yes, the situation does say the bomb is there. But it also says the bomb isn’t there if you Left-box.
At the very least, this is a contradiction, which makes the scenario incoherent nonsense.
(I don’t think it’s actually true that “it also says the bomb isn’t there if you Left-box”—but if it did say that, then the scenario would be inconsistent, and thus impossible to interpret.)
This is misleading. What happens is that the situation you found yourself in doesn’t take place with significant measure. You live mostly in different situations, not this one.
It is misleading because Said’s perspective is to focus on the current situation, without regarding the other situations as decision relevant. From UDT perspective you are advocating, the other situations remain decision relevant, and that explains much of what you are talking about in other replies. But from that same perspective, it doesn’t matter that you live in the situation Said is asking about, so it’s misleading that you keep attention on this situation in your reply without remarking on how that disagrees with the perspective you are advocating in other replies.
In the parent comment, you say “it is, in virtually all possible worlds, that you live for free”. This is confusing: are you talking about the possible worlds within the situation Said was asking about, or also about possible worlds outside that situation? The distinction matters for the argument in these comments, but you are saying this ambiguously.
… so, in other words, you’re not actually talking about the scenario described in the OP. But that’s what my comments have been about, so… everything you said has been a non sequitur…?
No, non sequitur means something else. (If I say “A, therefore B”, but B doesn’t follow from A, that’s a non sequitur.)
I simplified the problem to make it easier for you to understand.
This really doesn’t answer the question.
It does. Your question was “How?”. The answer is “through logical causality.”
So if you take the Left box, what actually, physically happens?
You take the left box with the bomb, and it has always been empty.
It is. The response to your question “So if you take the Left box, what actually, physically happens?” is “Physically, nothing.” That’s why I defined logical causality—it helps understand why (1) is the algorithm with the best expected utility, and why yours is worse.
Do you see how that makes absolutely no sense as an answer to the question I asked? Like, do you see what makes what you said incomprehensible, what makes it appear to be nonsense? I’m not asking you to admit that it’s nonsense, but can you see why it reads as bizarre moon logic?
I’m no longer sure; you and green_leaf appear to have different, contradictory views, and at this point that divergence has confused me enough that I could no longer say confidently what either of you seem to be saying without going back and carefully re-reading all the comments. And that, I’m afraid, isn’t something that I have time for at the moment… so perhaps it’s best to write this discussion off, after all.
Agreed, but I think it’s important to stress that it’s not like you see a bomb, Left-box, and then see it disappear or something. It’s just that Left-boxing means the predictor already predicted that, and the bomb was never there to begin with.
Put differently, you can only Left-box in a world where the predictor predicted you would.
Put differently, you can only Left-box in a world where the predictor predicted you would.
What stops you from Left-boxing in a world where the predictor didn’t predict that you would?
To make the question clearer, let’s set aside all this business about the fallibility of the predictor. Sure, yes, the predictor’s perfect, it can predict your actions with 100% accuracy somehow, something about algorithms, simulations, models, whatever… fine. We take all that as given.
So: you see the two boxes, and after thinking about it very carefully, you reach for the Right box (as the predictor always knew that you would).
But suddenly, a stray cosmic ray strikes your brain! No way this was predictable—it was random, the result of some chain of stochastic events in the universe. And though you were totally going to pick Right, you suddenly grab the Left box instead.
Surely, there’s nothing either physically or logically impossible about this, right?
So if the predictor predicted you’d pick Right, and there’s a bomb in Left, and you have every intention of picking Right, but due to the aforesaid cosmic ray you actually take the Left box… what happens?
It’s just that Left-boxing means the predictor already predicted that, and the bomb was never there to begin with.
But the scenario stipulates that the bomb is there. Given this, taking the Left box results in… what? Like, in that scenario, if you take the Left box, what actually happens?
Agreed, but I think it’s important to stress that it’s not like you see a bomb, Left-box, and then see it disappear or something. It’s just that Left-boxing means the predictor already predicted that, and the bomb was never there to begin with.
Yes, that’s correct.
By executing the first algorithm, the bomb has never been there.
Put differently, you can only Left-box in a world where the predictor predicted you would.
Here it’s useful to distinguish between agentic ‘can’ and physical ‘can.’
Since I assume a deterministic universe for simplification, there is only one physical ‘can.’ But there are two agentic ’can″s—no matter the prediction, I can agentically choose either way. The predictor’s prediction is logically posterior to my choice, and his prediction (and the bomb’s presence) are the way they are because of my choice. So I can Left-box even if there is a bomb in the left box, even though it’s physically impossible.
(It’s better to use agentic can over physical can for decision-making, since that use of can allows us to act as if we determined the output of all computations identical to us, which brings about better results. The agent that uses the physical can as their definition will see the bomb more often.)
No, that’s just plain wrong. If you Left-box given a perfect predictor, the predictor didn’t put a bomb in Left. That’s a given. If the predictor did put a bomb in Left and you Left-box, then the predictor isn’t perfect.
“Irrelevant, since the described scenario explicitly stipulates that you find yourself in precisely that situation.”
It also stipulates the predictor predicts almost perfectly. So it’s very relevant.
“Yes, that’s what I’ve been saying: choosing Right in that scenario is the correct decision.”
No, it’s the wrong decision. Right-boxing is just the necessary consequence of the predictor predicting I Right-box. But insofar this is a decision problem, Left-boxing is correct, and then the predictor predicted I would Left-box.
“No, Left-boxing means we burn to death.”
No, it means the model Left-boxed and thus the predictor didn’t put a bomb in Left.
Do you understand how subjunctive dependence works?
It also stipulates the predictor predicts almost perfectly. So it’s very relevant.
Yes, almost perfectly (well, it has to be “almost”, because it’s also stipulated that the predictor got it wrong this time).
No, it’s the wrong decision. Right-boxing is just the necessary consequence of the predictor predicting I Right-box. But insofar this is a decision problem, Left-boxing is correct, and then the predictor predicted I would Left-box.
None of this matters, because the scenario stipulates that there’s a bomb in the Left box.
No, it means the model Left-boxed and thus the predictor didn’t put a bomb in Left.
But it’s stipulated that the predictor did put a bomb in Left. That’s part of the scenario.
Do you understand how subjunctive dependence works?
Why does it matter? We know that there’s a bomb in Left, because the scenario tells us so.
Yes, almost perfectly (well, it has to be “almost”, because it’s also stipulated that the predictor got it wrong this time).
Well, not with your answer, because you Right-box. But anyway.
Why does it matter? We know that there’s a bomb in Left, because the scenario tells us so.
It matters a lot, because in a way the problem description is contradicting itself (which happens more often in Newcomblike problems).
It says there’s a bomb in Left.
It also says that if I Left-box, then the predictor predicted this, and will not have put a Bomb in Left. (Unless you assume the predictor predicts so well by looking at, I don’t know, the color of your shoes or something. But it strongly seems like the predictor has some model of your decision procedure.)
You keep repeating (1), ignoring (2), even though (2) is stipulated just as much as (1).
So, yes, my question whether you understand subjunctive dependence is justified, because you keep ignoring that crucial part of the problem.
Well, first of all, if there is actually a contradiction in the scenario, then we’ve been wasting our time. What’s to talk about? In such a case the answer to “what happens in this scenario” is “nothing, it’s logically impossible in the first place”, and we’re done.
But of course there isn’t actually a contradiction. (Which you know, otherwise you wouldn’t have needed to hedge by saying “in a way”.)
It’s simply that the problem says that if you Left-box, then the predictor predicted this, and will not have put a bomb in Left… usually. Almost always! But not quite always. It very rarely makes mistakes! And this time, it would seem, is one of those times.
So there’s no contradiction, there’s just a (barely) fallible predictor.
So the scenario tells us that there’s a bomb in Left, we go “welp, guess the predictor screwed up”, and then… well, apparently FDT tells us to choose Left anyway? For some reason…? (Or does it? You tell me…) But regardless, obviously the correct choice is Right, because Left’s got a bomb in it.
I really don’t know what else there is to say about this.
But of course there isn’t actually a contradiction. (Which you know, otherwise you wouldn’t have needed to hedge by saying “in a way”.)
There is, as I explained. There’s 2 ways of resolving it, but yours isn’t one of them. You can’t have it both ways.
It’s simply that the problem says that if you Left-box, then the predictor predicted this, and will not have put a bomb in Left… usually. Almost always! But not quite always. It very rarely makes mistakes! And this time, it would seem, is one of those times.
Just… no. “The predictor predicted this”, yes, so there are a trillion trillion − 1 follow-up worlds where I don’t burn to death! And yes, 1 - just 1 - world where I do. Why choose to focus on that 1 out of a trillion trillion worlds?
Because the problem talks about a bomb in Left?
No. The problem says more than that. It clearly predicts a trillion trillion − 1 worlds where I don’t burn to death. That 1 world where I do sucks, but paying $100 to avoid it seems odd. Unless, of course, you value your life infinitely (which you do I believe?). That’s fine, it does all depend on the specific valuations.
The problem stipulates that you actually, in fact, find yourself in a world where there’s a bomb in Left. These “other worlds” are—in the scenario we’re given—entirely hypothetical (or “counterfactual”, if you like). Do they even exist? If so, in what sense? Not clear. But in the world you find yourself in (we are told), there’s a bomb in the Left box. You can either take that box, and burn to death, or… not do that.
So, “why choose to focus on” that world? Because that’s the world we find ourselves in, where we have to make the choice.
Paying $100 to avoid burning to death isn’t something that “seems odd”, it’s totally normal and the obviously correct choice.
My point is that those “other worlds” are just as much stipulated by the problem statement as that one world you focus on. So, you pay $100 and don’t burn to death. I don’t pay $100, burn to death in 1 world, and live for free in a trillion trillion − 1 worlds. Even if I value my life at $10,000,000,000,000, my choice gives more utility.
My point is that those “other worlds” are just as much stipulated by the problem statement as that one world you focus on.
Sorry, but no, they’re not. You may choose to infer their “existence” from what’s stated in the problem—but that’s an inference that depends on various additional assumptions (e.g. about the nature of counterfactuals, and all sorts of other things). All that’s actually stipulated is the one world you find yourself in.
You infer the existence of me burning to death from what’s stated in the problem as well. There’s no difference.
I do have the assumption of subjunctive dependence. But without that one—if, say, the predictor predicts by looking at the color of my shoes—then I don’t Left-box anyway.
You infer the existence of me burning to death from what’s stated in the problem as well. There’s no difference.
Of course there’s a difference: inferring burning to death just depends on the perfectly ordinary assumption of cause and effect, plus what is very explicitly stated in the problem. Inferring the existence of other worlds depends on much more esoteric assumptions that that. There’s really no comparison at all.
I do have the assumption of subjunctive dependence.
Not only is that not the only assumption required, it’s not even clear what it means to “assume” subjunctive dependence. Sure, it’s stipulated that the predictor is usually (but not quite always!) right about what you’ll do. What else is there to this “assumption” than that?
But how that leads to “other worlds exist” and “it’s meaningful to aggregate utility across them” and so on… I have no idea.
If they’re just possible worlds, then why do they matter? They’re not actual worlds, after all (by the time the described scenario is happening, it’s too late for any of them to be actual!). So… what’s the relevance?
The UDT convention is that other possible worlds remain relevant, even when you find yourself in a possible world that isn’t compatible with their actuality. It’s confusing to discuss this general point as if it’s specific to this contentious thought experiment.
The setting has a sample space, as in expected utility theory, with situations that take place in some event (let’s call it a situation event) and offer a choice between smaller events resulting from taking alternative actions. The misleading UDT convention is to call the situation event “actual”. It’s misleading because the goal is to optimize expected utility over the whole sample space, not just over the situation event, so the places on the sample space outside the situation event are effectively still in play, still remain relevant, not ruled out by the particular situation event being “actual”.
Alright. But by the time the situation described in the OP happens, it no longer matters whether you optimize expected utility over the whole sample space; that goal is now moot. One event out of the sample space has occurred, and the others have failed to occur. Why would you continue to attempt to achieve that goal, toward which you are no longer capable of taking any action?
by the time the situation described in the OP happens, it no longer matters whether you optimize expected utility over the whole sample space; that goal is now moot
That goal may be moot for some ways of doing decisions. For UDT it’s not moot, it’s the only thing that we care about instead. And calling some situation or another “actual” has no effect at all on the goal, and on the process of decision making in any situation, actual or otherwise, that’s what makes the goal and the decision process reflectively stable.
“But by the time the situation described in the OP happens, it no longer matters whether you optimize expected utility over the whole sample space; that goal is now moot.”
This is what we agree on. If you’re in the situation with a bomb, all that matters is the bomb.
My stance is that Left-boxers virtually never get into the situation to begin with, because of the prediction Omega makes. So with probability close to 1, they never see a bomb.
Your stance (if I understand correctly) is that the problem statement says there is a bomb, so, that’s what’s true with probability 1 (or almost 1).
And so I believe that’s where our disagreement lies. I think Newcomblike problems are often “trick questions” that can be resolved in two ways, one leaning more towards your interpretation.
In spirit of Vladimir’s points, if I annoyed you, I do apologize. I can get quite intense in such discussions.
This is what we agree on. If you’re in the situation with a bomb, all that matters is the bomb.
But that’s false for a UDT agent, it still matters to that agent-instance-in-the-situation what happens in other situations, those without a bomb, it’s not the case that all that matters is the bomb (or even a bomb).
Hmm, interesting. I don’t know much about UDT. From and FDT perspective, I’d say that if you’re in the situation with the bomb, your decision procedure already Right-boxed and therefore you’re Right-boxing again, as logical necessity. (Making the problem very interesting.)
To explain my view more, the question I try to answer in these problems is more or less: if I were to choose a decision theory now to strictly adhere to, knowing I might run into the Bomb problem, which decision theory would I choose?
If I ever find myself in the Bomb scenario, I Right-box. Because in that scenario, the predictor’s model of me already Right-boxed, and therefore I do, too—not as a decision, per se, but as a logical consequence.
The correct decision is another question—that’s Left-boxing, because the decision is being made in two places. If I find myself in the Bomb scenario, that just means the decision to Right-box was already made.
The Bomb problem asks what the correct decision is, and makes clear (at least under my assumption) that the decision is made at 2 points in time. At that first point (in the predictor’s head), Left-boxing leads to the most utility: it avoids burning to death for free. Note that at that point, there is not yet a bomb in Left!
If I ever find myself in the Bomb scenario, I Right-box.
If we agree on that, then I don’t understand what it is that you think we disagree on! (Although the “not as a decision, per se” bit seems… contentless.)
The Bomb problem asks what the correct decision is,
No, it asks what decision you should make. And we apparently agree that the answer is “Right”.
What does it mean to say that Left-boxing is “the correct decision” if you then say that the decision you’d actually make would be to Right-box? This seems to be straightforwardly contradictory, in a way that renders the claim nonsensical.
I read all your comments in this thread. But you seem to be saying things that, in a very straightforward way, simply don’t make any sense…
Alright. The correct decision is Left-boxing, because that means the predictor’s model Left-boxed (and so do I), letting me live for free. Because, at the point where the predictor models me, the Bomb isn’t placed yet (and never will be).
However, IF I’m in the Bomb scenario, then the predictor’s model already Right-boxed. Then, because of subjunctive dependence, it’s apparently not possible for me to Left-box, just as it is impossible for two calculators to give a different result to 2 + 2.
Well, the Bomb scenario is what we’re given. So the first paragraph you just wrote there is… irrelevant? Inapplicable? What’s the point of it? It’s answering a question that’s not being asked.
As for the last sentence of your comment, I don’t understand what you mean by it. Certainly it’s possible for you to Left-box; you just go ahead and Left-box. This would be a bad idea, of course! Because you’d burn to death. But you could do it! You just shouldn’t—a point on which we, apparently, agree.
The bottom line is: to the actual single question the scenario asks—which box do you choose, finding yourself in the given situation?—we give the same answer. Yes?
The bottom line is: to the actual single question the scenario asks—which box do you choose, finding yourself in the given situation?—we give the same answer. Yes?
The bottom line is that Bomb is a decision problem. If I am still free to make a decision (which I suppose I am, otherwise it isn’t much of a problem), then the decision I make is made at 2 points in time. And then, Left-boxing is the better decision.
Yes, the Bomb is what we’re given. But with the very reasonable assumption of subjunctive dependence, it specifies what I am saying...
We agree that if I would be there, I would Right-box, but also everybody would then Right-box, as a logical necessity (well, 1 in a trillion trillion error rate, sure). It has nothing to do with correct or incorrect decisions, viewed like that: the decision is already hard coded into the problem statement, because of the subjunctive dependence.
“But you can just Left-box” doesn’t work: that’s like expecting one calculator to answer to 2 + 2 differently than another calculator.
I think it’s better to explain to such people the problem where the predictor is perfect, and then generalize to an imperfect predictor. They don’t understand the general principle of your present choices pseudo-overwriting the entire timeline and can’t think in the seemingly-noncausal way that optimal decision-making requires. By jumping right to an imperfect predictor, the principle becomes, I think, too complicated to explain.
(Btw, you can call your answer “obvious” and my side “crazy” all you want, but it won’t change a thing until you actually demonstrate why and how FDT is wrong, which you haven’t done.)
I’ve done that: FDT is wrong because it (according to you) recommends that you choose to burn to death, when you could easily choose not to burn to death. Pretty simple.
It seems to me that your argument proves too much.
Let’s set aside this specific example and consider something more everyday: making promises. It is valuable to be able to make promises that others will believe, even when they are promises to do something that (once the relevant situation arises) you will strongly prefer not to do.
Suppose I want a $1000 loan, with $1100 to be repaid one year from now. My counterparty Bob has no trust in the legal system, police, etc., and expects that next year I will be somewhere where he can’t easily find me and force me to pay up. But I really need the money. Fortunately, Bob knows some mad scientists and we agree to the following: I will have implanted in my body a device that will kill me if 366 days from now I haven’t paid up. I get the money. I pay up. Nobody dies. Yay.
I hope we are agreed that (granted the rather absurd premises involved) I should be glad to have this option, even though in the case where I don’t pay up it kills me.
Revised scenario: Bob knows some mad psychologists who by some combination of questioning, brain scanning, etc., are able to determine very reliably what future choices I will make in any given situation. He also knows that in a year’s time I might (but with extremely low probability) be in a situation where I can only save my life at the cost of the $1100 that I owe him. He has no risk tolerance to speak of and will not lend me the money if in that situation I would choose to save my life and not give him the money.
Granted these (again absurd) premises, do you agree with me that it is to my advantage to have the sort of personality that can promise to pay Bob back even if it literally kills me?
It seems to me that: 1. Your argument in this thread would tell me, a year down the line and in the surprising situation that I do in fact need to choose between Bob’s money and my life, “save your life, obviously”. 2. If my personality were such that I would do as you advise in that situation, then Bob will not lend me the money. (Which may in fact mean that in that unlikely future situation I die anyway.) 3. Your reasons for saying “FDT recommends knowingly choosing to burn to death! So much the worse for FDT!”, are equally reasons to say “Being someone who can make and keep this sort of promise means knowingly choosing to pay up and die! So much the worse for being that sort of person!”. 4. Being that sort of person is not in fact worse even though there are situations in which it leads to a worse outcome. 5. There is no version of “being that sort of person” that lets you just decide to live, in that unlikely situation, because paying up at the cost of your own life is what “being that sort of person” means. 6. To whatever extent I get to choose whether to be that sort of person, I have to make the decision before I know whether I’m going to be in that unlikely situation. And, to whatever extent I get to choose, it is reasonable to choose to be that sort of person, because the net benefit is greater. 7. Once again, “be that sort of person and then change your mind” is not one of the available options; if I will change my mind about it, then I was never that sort of person after all.
What (if anything) do you disagree with in that paragraph? What (if anything) do you find relevantly disanalogous between the situation I describe here and the one with the bomb?
Granted these (again absurd) premises, do you agree with me that it is to my advantage to have the sort of personality that can promise to pay Bob back even if it literally kills me?
I do not.
What (if anything) do you disagree with in that paragraph? What (if anything) do you find relevantly disanalogous between the situation I describe here and the one with the bomb?
Your scenario omits the crucial element of the scenario in the OP, where you (the subject) find yourself in a situation where the predictor turns out to have erred in its prediction.
Hmm. I am genuinely quite baffled by this; there seems to be some very fundamental difference in how we are looking at the world. Let me just check that this is a real disagreement and not a misunderstanding (even if it is there would also be a real disagreement, but a different one): I am asking not “do you agree with me that at the point where I have to choose between dying and failing to repay Bob it is to my advantage …” but “do you agree with me that at an earlier point, say when I am negotiating with Bob it is to my advantage …”.
If I am understanding you right and you are understanding me right, then I think the following is true. Suppose that when Bob has explained his position (he is willing to lend me the money if, and only if, his mad scientists determine that I will definitely repay him even if the alternative is death), some supernatural being magically informs me that while it cannot lend me the money it can make me the sort of person who can make the kind of commitment Bob wants and actually follow through. I think you would recommend that I either not accept this offer, or at any rate not make that commitment having been empowered to do so.
Do you feel the same way about the first scenario, where instead of choosing to be a person who will pay up even at the price of death I choose to be a person who will be compelled by brute force to pay up or die? If not, why?
Your scenario omits the crucial element of the scenario in the OP, where you (the subject) find yourself in a situation where the predictor turns out to have erred in its prediction.
Why does that matter? (Maybe it doesn’t; your opinion about my scenario is AIUI the same as your opinion about the one in the OP.)
I am asking not “do you agree with me that at the point where I have to choose between dying and failing to repay Bob it is to my advantage …” but “do you agree with me that at an earlier point, say when I am negotiating with Bob it is to my advantage …”.
Yes, I understood you correctly. My answer stands. (But I appreciate the verification.)
I think you would recommend that I either not accept this offer, or at any rate not make that commitment having been empowered to do so.
Right.
Do you feel the same way about the first scenario, where instead of choosing to be a person who will pay up even at the price of death I choose to be a person who will be compelled by brute force to pay up or die? If not, why?
No, because there’s a difference between “pay up or die” and “pay up and die”.
Your scenario omits the crucial element of the scenario in the OP, where you (the subject) find yourself in a situation where the predictor turns out to have erred in its prediction.
Why does that matter? (Maybe it doesn’t; your opinion about my scenario is AIUI the same as your opinion about the one in the OP.)
The scenario in the OP seems to hinge on it. As described, the situation is that the agent has picked FDT as their decision theory, is absolutely the sort of agent who will choose the Left box and die if so predicted, who is thereby supposed to not actually encounter situations where the Left box has a bomb… but oops! The predictor messed up and there is a bomb there anyhow. And now the agent is left with a choice on which nothing depends except whether he pointlessly dies.
I agree (of course!) that there is a difference between “pay up and die” and “pay up or die”. But I don’t understand how this difference can be responsible for the difference in your opinions about the two scenarios.
Scenario 1: I choose for things to be so arranged that in unlikely situation S (where if I pay Bob back I die), if I don’t pay Bob back then I also die. You agree with me (I think—you haven’t actually said so explicitly) that it can be to my benefit for things to be this way, if this is the precondition for getting the loan from Bob.
Scenario 2: I choose for things to be so arranged that in unlikely scenario S (where, again, if I pay Bob back I die), I will definitely pay. You think this state of affairs can’t be to my advantage.
How is scenario 2 actually worse for me than scenario 1? Outside situation S, they are no different (I will not be faced with such strong incentive not to pay Bob back, and I will in fact pay him back, and I will not die). In situation S, scenario 1 means I die either way, so I might as well pay my debts; scenario 2 means I will pay up and die. I’m equally dead in each case. I choose to pay up in each case.
In scenario 1, I do have the option of saying a mental “fuck you” to Bob, not repaying my debt, and dying at the hand of his infernal machinery rather than whatever other thing I could save myself from with the money. But I’m equally dead either way, and I can’t see why I’d prefer this, and in any case it’s beyond my understanding why having this not-very-appealing extra option would be enough for scenario 1 to be good and scenario 2 to be bad.
What am I missing?
I think we are at cross purposes somehow about the “predictor turns out to have erred” thing. I do understand that this feature is present in the OP’s thought experiment and absent in mine. My thought experiment isn’t meant to be equivalent to the one in the OP, though it is meant to be similar in some ways (and I think we are agreed that it is similar in the ways I intended it to be similar). It’s meant to give me another view of something in your thinking that I don’t understand, in the hope that I might understand it better (hopefully with the eventual effect of improving either my thinking or yours, if it turns out that one of us is making a mistake rather than just starting from axioms that seem alien to one another).
Anyway, it probably doesn’t matter, because so far as I can tell you do in fact have “the same” opinion about the OP’s thought experiment and mine; I was asking about disanalogies between the two in case it turned out that you agreed with all the numbered points in the paragraph before that question. I think you don’t agree with them all, but I’m not sure exactly where the disagreements are; I might understand better if you could tell me which of those numbered points you disagree with.
But it’s stipulated that the predictor did put a bomb in Left. That’s part of the scenario.
This is instead part of the misleading framing. Putting bomb in Left is actually one of the situations being considered, not all that actually happens, even if it says that it’s what actually happens. It’s one of the possible worlds, and there is a misleading convention of saying that when you find yourself in a possible world, what you see is what actually happens. It’s because that’s how it subjectively looks like, even if other worlds are supposed to still matter by UDT convention.
The question is not which action to take. The question is which decision theory gives the most utility. Any candidate for “best decision theory” should take the left box. This results in a virtually guaranteed save of $100 - and yes, a death burn in an extremely unlikely scenario. In that unlikely scenario, yes, taking the right box gives the most utility—but that’s answering the wrong question.
This sort of reasoning makes sense if you must decide on which box to take prior to learning the details of your situation (a.k.a. in a “veil of ignorance”), and cannot change your choice even after you discover that, e.g., taking the Left box will kill you. In such a case, sure, you can say “look, it’s a gamble, and I did lose big this time, but it was a very favorable gamble, with a clearly positive expected outcome”. (Although see Robyn Dawes’ commentary on such “skewed” gambles. However, we can let this pass here.)
But that’s not the case here. Here, you’ve learned that taking the Left box kills you, but you still have a choice! You can still choose to take Right! And live!
Yes, FDT insists that actually, you must choose in advance (by “choosing your algorithm” or what have you), and must stick to the choice no matter what. But that is a feature of FDT, it is not a feature of the scenario! The scenario does not require that you stick to your choice. You’re free to take Right and live, no matter what your decision theory says.
So when selecting a decision theory, you may of course feel free to pick the one that says that you must pick Left, and knowingly burn to death, while I will pick the one that says that I can pick whatever I want. One of us will be dead, and the other will be “smiling from atop a heap of utility”.
(“But what about all those other possible worlds?”, you may ask. Well, by construction, I don’t find myself in any of those, so they’re irrelevant to my decision now, in the actual world.)
Yes, FDT insists that actually, you must choose in advance (by “choosing your algorithm” or what have you), and must stick to the choice no matter what. But that is a feature of FDT, it is not a feature of the scenario! The scenario does not require that you stick to your choice. You’re free to take Right and live, no matter what your decision theory says.
Well, I’d say FDT recognizes that you do choose in advance, because you are predictable. Apparently you have an algorithm running that makes these choices, and the predictor simulates that algorithm. It’s not that you “must” stick to your choice. It’s about constructing a theory that consistently recommends the actions that maximize expected utility.
I know I keep repeating that—but it seems that’s where our disagreement lies. You look at which action is best in a specific scenario, I look at what decision theory produces the most utility. An artificial superintelligence running a decision theory can’t choose freely no matter what the decision theory says: running the decision theory means doing what it says.
An artificial superintelligence running a decision theory can’t choose freely no matter what the decision theory says: running the decision theory means doing what it says.
That seems like an argument against “running a decision theory”, then!
Now, that statement may seem like it doesn’t make sense. I agree! But that’s because, as I see it, your view doesn’t make sense; what I just wrote is consistent with what you write…
Clearly, I, a human agent placed in the described scenario, could choose either Left or Right. Well, then we should design our AGI in such a way that it also has this same capability.
Obviously, the AGI will in fact (definitionally) be running some algorithm. But whatever algorithm that is, ought to be one that results in it being able to choose (and in fact choosing) Right in the “Bomb” scenario.
What decision theory does that correspond to? You tell me…
That seems like an argument against “running a decision theory”, then!
Now, that statement may seem like it doesn’t make sense. I agree! But that’s because, as I see it, your view doesn’t make sense; what I just wrote is consistent with what you write…
Exactly, it doesn’t make sense. It is in fact nonsense, unless you are saying it’s impossible to specify a coherent, utility-maximizing decision theory at all?
Btw, please explain how it’s consistent with what I wrote, because it seems obvious to me it’s not.
And if I select FDT, I would be the one “smiling from atop a heap of utility” in (10^24 − 1) out of 10^24 worlds.
But that’s not the case here. Here, you’ve learned that taking the Left box kills you, but you still have a choice! You can still choose to take Right! And live!
Yes, but the point is to construct a decision theory that recommends actions in a way that maximizes expected utility. Recommending left-boxing does that, because it saves you $100 in virtually every world. That’s it, really. You keep focusing on that 1 out of 10^24 possibility were you burn to death, but that doesn’t take anything away from FDT. Like I said: it’s not about which action to take, let alone which action in such an improbable scenario. It’s about what decision theory we need.
And if I select FDT, I would be the one “smiling from atop a heap of utility” in (10^24 − 1) out of 10^24 worlds.
So you say. But in the scenario (and in any situation we actually find ourselves in), only the one, actual, world is available for inspection. In that actual world, I’m the one with the heap of utility, and you’re dead.
Who knows what I would do in any of those worlds, and what would happen as a result? Who knows what you would do?
In the given scenario, FDT loses, period, and loses really badly and, what is worse, loses in a completely avoidable manner.
You keep focusing on that 1 out of 10^24 possibility were you burn to death, but that doesn’t take anything away from FDT.
As I said, this reasoning makes sense if, at the time of your decision, you don’t know what possibility you will end up with (and are thus making a gamble). It makes no sense at all if you are deciding while in full possession of all relevant facts.
Like I said: it’s not about which action to take, let alone which action in such an improbable scenario. It’s about what decision theory we need.
Totally, and the decision theory we need is one that doesn’t make such terrible missteps!
Of course, it is possible to make an argument like: “yes, FDT fails badly in this improbable scenario, but all other available decision theories fail worse / more often, so the best thing to do is to go with FDT”. But that’s not the argument being made here—indeed, you’ve explicitly disclaimed it…
So you say. But in the scenario (and in any situation we actually find ourselves in), only the one, actual, world is available for inspection. In that actual world, I’m the one with the heap of utility, and you’re dead.
No. We can inspect more worlds. We know what happens given the agent’s choice and the predictor’s prediction. There are multiple paths, each with its own probability. The problem description focuses on that one world, yes. But the point remains—we need a decision theory, we need it to recommend an action (left-boxing or right-boxing), and left-boxing gives the most utility if we consider the bigger picture.
Totally, and the decision theory we need is one that doesn’t make such terrible missteps!
Do you agree that recommending left-boxing before the predictor makes its prediction is rational?
No. We can inspect more worlds. We know what happens given the agent’s choice and the predictor’s prediction.
Well, no. We can reason about more worlds. But we can’t actually inspect them.
Here’s the question I have, though, which I have yet to see a good answer to. You say:
But the point remains—we need a decision theory, we need it to recommend an action (left-boxing or right-boxing), and left-boxing gives the most utility if we consider the bigger picture.
But why can’t our decision theory recommend “choose Left if and only if it contains no bomb; otherwise choose Right”? (Remember, the boxes are open; we can see what’s in there…)
Do you agree that recommending left-boxing before the predictor makes its prediction is rational?
I think that recommending no-bomb-boxing is rational. Or, like: “Take the left box, unless of course the predictor made a mistake and put a bomb in there, in which case, of course, take the right box.”
As to inspection, maybe I’m not familiar enough with the terminology there.
Re your last point: I was just thinking about that too. And strangely enough I missed that the boxes are open. But wouldn’t the note be useless in that case?
I will think about this more, but it seems to me your decision theory can’t recommend “Left-box, unless you see a bomb in left.”, and FDT doesn’t do this. The problem is, in that case the prediction influences what you end up doing. What if the predictor is malevolent, and predicts you choose right, placing the bomb in left? It could make you lose $100 easily. Maybe if you believed the predictor to be benevolent?
And strangely enough I missed that the boxes are open.
Well, uh… that is rather an important aspect of the scenario…
… it seems to me your decision theory can’t recommend “Left-box, unless you see a bomb in left.” …
Why not?
The problem is, in that case the prediction influences what you end up doing.
Yes, it certainly does. And that’s a problem for the predictor, perhaps, but why should it be a problem for me? People condition their actions on knowledge of past events (including predictions of their actions!) all the time.
What if the predictor is malevolent, and predicts you choose right, placing the bomb in left? It could make you lose $100 easily.
Indeed, the predictor doesn’t have to predict anything to make me lose $100; it can just place the bomb in the left box, period. This then boils down to a simple threat: “pay $100 or die!”. Hardly a tricky decision theory problem…
Well, uh… that is rather an important aspect of the scenario…
Sure. But given the note, I had the knowledge needed already, it seems. But whatever.
Indeed, the predictor doesn’t have to predict anything to make me lose $100; it can just place the bomb in the left box, period. This then boils down to a simple threat: “pay $100 or die!”. Hardly a tricky decision theory problem…
Didn’t say it was a tricky decision problem. My point was that your strategy is easily exploitable and may therefore not be a good strategy.
If your strategy is “always choose Left”, then a malevolent “predictor” can put a bomb in Left and be guaranteed to kill you. That seems much worse than being mugged for $100.
I don’t see how that’s relevant. In the original problem, you’ve been placed in this weird situation against your will, where something bad will happen to you (either the loss of $100 or … death). If we’re supposing that the predictor is malevolent, she could certainly do all sorts of things… are we assuming that the predictor is constrained in some way? Clearly, she can make mistakes, so that opens up her options to any kind of thing you like. In any case, your choice (by construction) is as stated: pay $100, or die.
Yes, FDT insists that actually, you must choose in advance (by “choosing your algorithm” or what have you), and must stick to the choice no matter what. But that is a feature of FDT, it is not a feature of the scenario!
FDT doesn’t insist on this at all. FDT recognizes that IF your decision procedure is modelled prior to your current decision, than you did in fact choose in advance. If an FDT’er playing Bomb doesn’t believe her decision procedure was being modelled this way, she wouldn’t take Left!
If and only if it is a feature of the scenario, then FDT recognizes it. FDT isn’t insisting the world to be a certain way. I wouldn’t be a proponent of it if it did.
If a model of you predicts that you will choose A, but in fact you can choose B, and want to choose B, and do choose B, then clearly the model was wrong. Thinking “the model says I will choose A, therefore I have to (???) choose A” is total nonsense.
(Is there some other way to interpret what you’re saying? I don’t see it.)
“Thinking “the model says I will choose A, therefore I have to (???) choose A” is total nonsense.”
I choose whatever I want, knowing that it means the predictor predicted that choice.
In Bomb, if I choose Left, the predictor will have predicted that (given subjunctive dependence). Yes, the predictor said it predicted Right in the problem description; but if I choose Left, that simply means the problem ran differently from the start. It means, starting from the beginning, the predictor predicts I will choose Left, doesn’t put a bomb in Left, doesn’t leave the “I predicted you will pick Right”-note (but maybe leaves a “I predicted you will pick Left”-note) , and then I indeed choose Left, letting me live for free.
If the model is in fact (near) perfect, then choosing B means the model chose B too. That may seem like changing the past, but it really isn’t, that’s just the confusing way these problems are set up.
Claiming you can choose something a (near) perfect model of you didn’t predict is like claiming two identical calculators can give a different answer to 2 + 2.
This sort of reasoning makes sense if you must decide on which box to take prior to learning the details of your situation (a.k.a. in a “veil of ignorance”), and cannot change your choice even after you discover that, e.g., taking the Left box will kill you. In such a case, sure, you can say “look, it’s a gamble, and I did lose big this time, but it was a very favorable gamble, with a clearly positive expected outcome”. (Although see Robyn Dawes’ commentary on such “skewed” gambles. However, we can let this pass here.)
But that’s not the case here.
It is the case, in way. Otherwise the predictor could not have predicted your action. I’m not saying you actively decide what to do beforehand, but apparently you are running a predictable decision procedure.
FDT has very persuasive reasoning for why I should choose to burn to death? Uh-huh (asks the non-FDT agent), and if you’re so rational, why are you dead?
I think the more fundamental issue is that you can construct these sorts of dilemmas for all decision theories. For example, you can easily come up with scenarios where Omega punishes you for following a certain decision theory and rewards you otherwise.
The right question to ask is not whether a decision theory recommends something that makes you burn to death in some scenario, but whether it recommends you do so across a broad class of fair dilemmas. I’m not convinced that FDT does that, and the bomb dilemma did not move me much.
You can of course construct scenarios where Omega punishes you for all sorts of things, but in the given case, FDT recommends a manifestly self-destructive action, in a circumstance where you’re entirely free to instead not take that action. Other decision theories do not do this (whatever their other faults may be).
The right question to ask is not whether a decision theory recommends something that makes you burn to death in some scenario
But of course it is the right question. The given dilemma is perfectly fair. FDT recommends that you knowingly choose to burn to death, when you could instead not choose to burn to death, and incur no bad consequences thereby. This is a clear failure.
What makes the bomb dilemma seem unfair to me is the fact that it’s conditioning on an extremely unlikely event. The only way we blow up is if the predictor predicted incorrectly. But by assumption, the predictor is near-perfect. So it seems implausible that this outcome would ever happen.
What makes the bomb dilemma seem unfair to me is the fact that it’s conditioning on an extremely unlikely event.
Why is this unfair?
Look, I keep saying this, but it doesn’t seem to me like anyone’s really engaged with it, so I’ll try again:
If the scenario were “pick Left or Right; after you pick, then the boxes are opened and the contents revealed; due to [insert relevant causal mechanisms involving a predictor or whatever else here], the Left box should be empty; unfortunately, one time in a trillion trillion, there’ll be some chance mistake, and Left will turn out (after you’ve chosen it) to have a bomb, and you’ll blow up”…
… then FDT telling you to take Left would be perfectly reasonable. I mean, it’s a gamble, right? A gamble with an unambiguously positive expected outcome; a gamble you’ll end up winning in the utterly overwhelming majority of cases. Once in a trillion trillion times, you suffer a painful death—but hey, that’s better odds than each of us take every day when we cross the street on our way to the corner store. In that case, it would surely be unfair to say “hey, but in this extremely unlikely outcome, you end up burning to death!”.
But that’s not the scenario!
In the given scenario, we already know what the boxes have in them. They’re open; the contents are visible. We already know that Left has a bomb. We know, to a certainty, that choosing Left means we burn to death. It’s not a gamble with an overwhelming, astronomical likelihood of a good outcome, and only a microscopically tiny chance of painful death—instead, it’s knowingly choosing a certain death!
Yes, the predictor is near-perfect. But so what? In the given scenario, that’s no longer relevant! The predictor has already predicted, and its prediction has already been evaluated, and has already been observed to have erred! There’s no longer any reason at all to choose Left, and every reason not to choose Left.
And yet FDT still tells us to choose Left. This is a catastrophic failure; and what’s more, it’s an obvious failure, and a totally preventable one.
Now, again: it would be reasonable to say: “Fine, yes, FDT fails horribly in this very, very rare circumstance; this is clearly a terrible mistake. Yet other decision theories fail, at least this badly, or in far more common situations, or both, so FDT still comes out ahead, on net.”
But that’s not the claim in the OP; the claim is that, somehow, knowingly choosing a guaranteed painful death (when it would be trivial to avoid it) is the correct choice, in this scenario.
But that’s not the claim in the OP; the claim is that, somehow, knowingly choosing a guaranteed painful death (when it would be trivial to avoid it) is the correct choice, in this scenario.
And that’s just crazy.
Like I’ve said before, it’s not about which action to take, it’s about which strategy to have. It’s obvious right-boxing gives the most utility in this specific scenario only, but that’s not what it’s about.
It’s obvious right-boxing gives the most utility in this specific scenario only, but that’s not what it’s about.
I reject this. If Right-boxing gives the most utility in this specific scenario, then you should Right-box in this specific scenario. Because that’s the scenario that—by construction—is actually happening to you.
In other scenarios, perhaps you should do other things. But in this scenario, Right is the right answer.
I reject this. If Right-boxing gives the most utility in this specific scenario, then you should Right-box in this specific scenario. Because that’s the scenario that—by construction—is actually happening to you.
In other scenarios, perhaps you should do other things. But in this scenario, Right is the right answer.
And this is the key point. It seems to me impossible to have a decision theory that right-boxes in Bomb but still does as well as FDT does in all other scenarios.
Utility is often measured in dollars. If I had created the Bomb scenario, I would have specified life/death in terms of dollars as well. Like, “Life is worth $1,000,000 to you.” That way, you can easily compare the loss of your life to the $100 cost of Right-boxing.
Look, I keep saying this, but it doesn’t seem to me like anyone’s really engaged with it, so I’ll try again:
If the scenario were “pick Left or Right; after you pick, then the boxes are opened and the contents revealed; due to [insert relevant causal mechanisms involving a predictor or whatever else here], the Left box should be empty; unfortunately, one time in a trillion trillion, there’ll be some chance mistake, and Left will turn out (after you’ve chosen it) to have a bomb, and you’ll blow up”…
… then FDT telling you to take Left would be perfectly reasonable. I mean, it’s a gamble, right? A gamble with an unambiguously positive expected outcome; a gamble you’ll end up winning in the utterly overwhelming majority of cases. Once in a trillion trillion times, you suffer a painful death—but hey, that’s better odds than each of us take every day when we cross the street on our way to the corner store. In that case, it would surely be unfair to say “hey, but in this extremely unlikely outcome, you end up burning to death!”.
But that’s not the scenario!
Yes, you keep saying this, and I still think you’re wrong. Our candidate decision theory has to recommend something for this scenario—and that recommendation gets picked up by the predictor beforehand. You have to take that into account. You seem to be extremely focused on this extremely unlikely scenario, which is odd to me.
And yet FDT still tells us to choose Left. This is a catastrophic failure; and what’s more, it’s an obvious failure, and a totally preventable one.
How exactly is it preventable? I’m honestly asking. If you have a strategy that, if the agent commits to it before the predictor makes her prediction, does better than FDT, I’m all ears.
You seem to have misunderstood the problem statement [1]. If you commit to doing “FDT, except that if the predictor makes a mistake and there’s a bomb in the Left, take Right instead”, then you will almost surely have to pay $100 (since the predictor predicts that you will take Right), whereas if you commit to using pure FDT, then you will almost surely have to pay nothing (with a small chance of death). There really is no “strategy that, if the agent commits to it before the predictor makes her prediction, does better than FDT”.
[1] Which is fair enough, as it wasn’t actually specified correctly: the predictor is actually trying to predict whether you will take Left or Right if it leaves its helpful note, not in the general case. But this assumption has to be added, since otherwise FDT says to take Right.
It sounds like you’re saying that I correctly understood the problem statement as it was written (but it was written incorrectly); but that the post erroneously claims that in the scenario as (incorrectly) written, FDT says to take Left, when in fact FDT in that scenario-as-written says to take right. Do I understand you?
But this assumption has to be added, since otherwise FDT says to take Right.
Why? FDT isn’t influenced in its decision by the note, so there is no loss of subjunctive dependence when this assumption isn’t added. (Or so it seems to me: I am operating at the limits of my FDT-knowledge here.)
FDT, except that if the predictor makes a mistake and there’s a bomb in the Left, take Right instead.
How would this work? Your strategy seems to be “Left-box unless the note says there’s a bomb in Left”. This ensures the predictor is right whether she puts a bomb in Left or not, and doesn’t optimize expected utility.
It costs you p * $100 for 0 ⇐ p ⇐ 1 where p depends on how “mean” you believe the predictor is.
Left-boxing costs 10^-24 * $1,000,000 = $10^-18 if you value life at a million dollars. Then if p > 10^-20, Left-boxing beats your strategy.
Note that FDT Right-boxes when you give life infinite value.
What’s special in this scenario with regards to valuing life finitely?
If you always value life infinitely, it seems to me all actions you can ever take get infinite values, as there is always a chance you die, which makes decision making on basis of utility pointless.
FDT, except that if the predictor makes a mistake and there’s a bomb in the Left, take Right instead.
Unfortunately, that doesn’t work. The predictor, if malevolent, could then easily make you choose right and pay a $100.
Left-boxing is the best strategy possible as far as I can tell. As in, yes, that extremely unlikely scenario where you burn to death sucks big time, but there is no better strategy possible (unless there is a superior strategy I—and it appears everybody—haven’t/hasn’t thought of).
If you commit to taking Left, then the predictor, if malevolent, can “mistakenly” “predict” that you’ll take Right, making you burn to death. Just like in the given scenario: “Whoops, a mistaken prediction! How unfortunate and improbable! Guess you have no choice but to kill yourself now, how sad…”
There absolutely is a better strategy: don’t knowingly choose to burn to death.
For the record, I read Nate’s comments again, and I now think of it like this:
To the extent that the predictor was accurate in her line of reasoning, then you left-boxing does NOT result in you slowly burning to death. It results in, well, the problem statement being wrong, because the following can’t all be true:
The predictor is accurate
The predictor predicts you right-box, and places the bomb in left
You left-box
And yes, apparently the predictor can be wrong, but I’d say, who even cares? The probability of the predictor being wrong is supposed to be virtually zero anyway (although as Nate notes, the problem description isn’t complete in that regard).
What makes the bomb dilemma seem unfair to me is the fact that it’s conditioning on an extremely unlikely event. The only way we blow up is if the predictor predicted incorrectly. But by assumption, the predictor is near-perfect. So it seems implausible that this outcome would ever happen.
Although I strongly disagree with Achmiz on the Bomb scenario in general, here we agree: Bomb is perfectly fair. You just have to take the probabilities into account, after which—if we value life at, say, $1,000,000 - Left-boxing is the only correct strategy.
For the record: I completely agree with Said on this specific point. Bomb is a fair problem. Each decision theory entering this problem gets dealt the exact same hand.
FDT recommends that you knowingly choose to burn to death, when you could instead not choose to burn to death, and incur no bad consequences thereby. This is a clear failure.
No. Ironically, Bomb is an argument for FDT, not against it: for if I adhere to FDT, I will never* burn to death AND save myself $100 if I do face this predictor.
*never here means only 1 in 1 trillion trillion if you meet the predictor
If there is some nontrivial chance that the predictor is adversarial but constrained to be accurate and truthful (within the bounds given), then on the balance of probability people taking the right box upon seeing a note predicting right are worse off. Yes, it sucks that you in particular got screwed, but the chances of that were astronomically low.
This shows up more obviously if you look at repeated iterations, compare performance of decision theories in large populations, or weight outcomes across possible worlds.
Edit: The odds were not astronomically low. I misinterpreted the statement about Predictor’s accuracy to be stronger than it actually was. FDT recommends taking the right box, and paying $100.
on the balance of probability people taking the right box upon seeing a note predicting right are worse off
No, because the scenario stipulates that you find yourself facing a Left box with a bomb. Anyone who finds themselves in this scenario is worse off taking Left than Right, because taking Left kills you painfully, and taking Right does no such thing. There is no question of any “balance of probability”.
Yes, it sucks that you in particular got screwed, but the chances of that were astronomically low.
But you didn’t “get screwed”! You have a choice! You can take Left, or Right.
Again: the scenario stipulates that taking Left kills you, and FDT agrees that taking Left kills you; and likewise it is stipulated (and FDT does not dispute) that you can indeed take whichever box you like.
This shows up more obviously if you look at repeated iterations, compare performance of decision theories in large populations, or weight outcomes across possible worlds.
All of that is completely irrelevant, because in the actual world that you (the agent in the scenario) find yourself in, you can either burn to death, or not. It’s completely up to you. You don’t have to do what FDT says to do, regardless of what happens in any other possible worlds or counterfactuals or what have you.
It really seems to me like anyone who takes Left in the “Bomb” scenario is making almost exactly the same mistake as people who two-box in the classic Newcomb’s problem. Most of the point of “Newcomb’s Problem and Regret of Rationality” is that you don’t have to, and shouldn’t, do things like this.
But actually, it’s a much worse mistake! In the Newcomb case, there’s a disagreement about whether one-boxing can actually somehow cause there to be a million dollars in the box; CDT denies this possibility (because it takes no account of sufficiently accurate predictors), while timeless/logical/functional/whatever decision theories accept it. But here, there is no disagreement at all; FDT admits that choosing Left causes you to die painfully, but says you should do it anyway! That is obviously much worse.
The other point of “Newcomb’s Problem and Regret of Rationality” is that it is a huge mistake to redefine losing (such as, say, burning to death) as winning. That, also, seems like a mistake that’s being made here.
I don’t see that there’s any way of rescuing this result.
According to me, the correct rejoinder to Will is: I have confidently asserted that X is false for X whose probabliity I assign much greater probability than 1 in a trillion trillion, and so I hereby confidently assert that no, I do not see the bomb on the left. You see the bomb on the left, and lose $100. I see no bombs, and lose $0.
I can already hear the peanut gallery objecting that we can increase the fallibility of the predictor to reasonable numbers and I’d still take the bomb, so before we go further, let’s all agree that sometimes you’re faced with uncertainty, and the move that is best given your uncertainty is not the same as the move that is best given perfect knowledge. For example, suppose there are three games (“lowball”, “highball”, and “extremeball”) that work as follows. In each game, I have three actions—low, middle, and high. In the lowball game, my payouts are $5, $4, and $0 respectively. In the highball game, my payouts are $0, $4, and $5 respectively. In the extremeball game, my payouts are $5, $4, and $5 respectively. Now suppose that the real game I’m facing is that one of these games is chosen at uniform random by unobserved die roll. What action should I choose? Clearly ‘middle’, with an expected utility of $4 (compared to $3.33 for either ‘low’ or ‘high’). And when I do choose middle, I hope we can all agree that it’s foul play to say “you fool, you should have chosen low because the game is lowball”, or “you fool, there is no possible world in which that’s the best action”, or “you idiot, that’s literally the worst available action because the game was exrtemeball”. If I knew which game I was playing, I’d play the best move for that game. But insofar as I must enter a single action played against the whole mixture of games, I might have to choose something that’s not the best action in your favorite subgame.
With that in mind, we can now decompose Will’s problem with the bomb into two subgames that I’m bound to play simultaneously.
In one subgame (that happens with probabliity 2 in a trillion trillion, although feel free to assume it’s more likely than that), the predictor is stumped and guesses randomly. We all agree that in that subgame, the best action is to avoid the bomb.
In the other subgame, the predictor is accurate. And insofar as the predictor is accurate, and supposing that we’ve seen the bomb, we can consider our two available actions (taking the bomb, or spending $100 to avoid it). But here something curious happens: insofar as the predictor is accurate, and we see the bomb, and we take it, we refute the problem statement.
That’s not my problem, as an agent. That’s your problem, as the person who stated the problem. If you say “assume you’re facing a perfect predictor, and see the bomb”, then I can validly say “no”. Similar to how if you say “assume you’re going to take the $5 bill, and you can either take the $5 bill or the $10 bill, but if you violate the laws of logic then you get a $100 fine, what do you do?” I can validly say “no”. It’s not my fault that you named a decision problem whose premises I can flatly refute.
Hopefully we all agree that insofar as the predictor is perfect (which, remember, is a case in the case analysis when the predictor is falible), the problem statement here is deeply flawed, because I can by an action of mine refute it outright. The standard rejoinder is a bit of sleight-of-hand, where the person posing the problem says “ah, but the predictor is fallible”. But as we’ve already seen, I can just decompose it right back into two subproblems that we then aggregate across (much like the higball/lowball/extremeball case), at which point one of our case-analyses reveals that insofar as the predictor is accurate, the whole problem-statement is still flawed.
And this isn’t me saying “I wish to be evaluated from an epistemic vantage point that takes into account the other imaginary branches of reality”. This is me saying, your problem statement was wrong. It’s me pointing out that you’re a liar, or at least that I can by a clever choice of actions render you a liar. When you say “the predictor was accurate and you saw the bomb, what do you do?”, and I say “take the bomb”, I don’t get blown up, I reveal your mistake. Your problem statement is indeterminate. You shouldn’ta given me a problem I could refute. I’m not saying “there’s other hypothetical branches of reality that benefit from me taking this bomb”, I’m saying “WRONG, tell me what really happened”. Your story was false, my dude.
There’s some question of what to do when an obviously ill-formed game is mixed in with a properly-formed game, by, eg, adding some uncertainty about whether the predictor is fallible. Like, how are we supposed to analyze games comprising subgames where the problem statement can be refuted in one subgame but not others? And according to me, the obvious answer is that if you say “you are 1% playing problem A and 99% playing problem B”, and if I can by some act refute that I’m playing problem B, then I am perfectly licensed in saying “WRONG (99%)”. Mixing in a little uncertainty (or even a lot of uncertainty!) doesn’t stop you from being wrong (at my will) in the cases where you’re asserting falsehoods about my actions.
So, no, I don’t see the bomb. That’s all but 1 in a trillion trillion ‘WRONG’. What really happened, according to this problem, is—well, it’s still indeterminate, because Will’s problem statement isn’t complete. He doesn’t tell us what happens insofar as the predictor is accurate and I go left if and only if I see the bomb, ie if I always spite the predictor’s prediction. If the predictor would give me a 1 trillion dollar coin for my ingenuity in spiting them, then what really happens is I see a $1T coin (all but 1 in a trillion trillion times). If instead (as is the oral tradition when someone leaves their decision problem indeterminate) counterfactually-spiting the predictor causes me to find myself in a room full of hornets rather than exits, then what really happened is that I saw no bomb (and no hornets), almost certainly.
If you want me to stop denying your problem-statements outright, you’ve gotta stop giving me problem statements that I can (probabilistically) refute by my actions.
After reading your comments, here’s my current explanation of what’s up with the bomb argument:
The problem with claiming that taking Left is wrong in the bomb-problem, is in the inference that “because I see the bomb is in the left box” this necessarily implies that “I am in the one-in-a-trillion-trillion situation where the predictor was inaccurate”.
However, this is forgetting the other option, where this setup is one of the vastly larger other worlds where I am being predicted by the predictor. In those worlds, it’s important that I make the worlds logically inconsistent by taking the left box, and so the predictor cannot accurately predict me taking the right box. Otherwise it may notice that this is a fixed point (i.e. leaving a note saying he predicted I’d take the right box does in fact lead to me taking the right box) and then call it a day and execute an action (that I don’t want) with an accurate prediction.
So there’s a 1-in-a-trillion-trillion chance that I am in the real game, and a notably higher chance that I’m in the mind of some agent predicting what I’ll do here. (And in that simulation-world, I don’t want to do the thing I don’t want them to predict that I’ll do.)
So make your estimates of the probably you’re being perfectly simulated and the probability you’re in the real game, and then compare them, multiply by utilities, and you’re done.
Then I’m a bit confused about how to estimate that probability, but I suspect the reasoning goes like this:
It’s near-certain that I will be simulated in some scenarios by the predictor, and it’s one-in-a-trillion-trillions that this is the real scenario. This scenario seems “plausible” as a scenario the predictor would simulate me in, especially given that the alternative ground-truth is that this is the scenario that they actually went with! I’m going to put it somewhere between “10%” and “100%”. So I think the odds ratio is around the order of magnitude of “one-in-ten” to “one-in-a-trillion-trillion”. And when I multiply them by the expected utility, the ratio is still well in favor of taking the bomb and making it very likely that in reality I will not lose 100 dollars.
Sanity check
As a sanity-check, I note this implies that if the utilities-times-probabilities are different, I would not mind taking the $100 hit. Let’s see what the math says here, and then check whether my intuitions agree.
Suppose I value my life at $1 million. Then I think that I should become more indifferent here when the probability of a mistaken simulation approaches 1 in 100,000, or where the money on the line is closer to $10−17.
[You can skip this, but here’s me stating the two multiplications I compared:
World 1: I fake-kill myself to save $X, with probability 110
World 2: I actually kill myself (cost: $1MM), with probability 1Y
To find the indifference point I want the two multiplications of utility-to-probability to come out to be equal. If X = $100, then Y equals 100,000. If Y is a trillion trillion (1024), then X = 10−17. (Unless I did the math wrong.)]
I think this doesn’t obviously clash with my intuitions, and somewhat matches them.
If the simulator was getting things wrong 1 in 100,000 times, I think I’d be more careful with my life in the “real world case” (insofar as that is a sensible concept). Going further, if you told me they were wrong 1 in 10 times, this would change my action, so there’s got to be a tipping point somewhere, and this seems reasonable for many people (though I actually value my life at more than $1MM).
And if the money was that tiny ($10−17), I’d be fairly open to “not taking even the one-in-a-trillion-trillion chance”. (Though really my intuition is that I don’t care about money way before $10^-17, and would probably not risk anything serious starting at like 0.1 cents, because that sort of money seems kind of irritating to have to deal with. So my intuition doesn’t match perfectly here. Though I think that if I were expecting to play trillions of such games, then I would start to actively care about such tiny amounts of money.)
In the other subgame, the predictor is accurate. And insofar as the predictor is accurate, and supposing that we’ve seen the bomb, we can consider our two available actions (taking the bomb, or spending $100 to avoid it). But here something curious happens: insofar as the predictor is accurate, and we see the bomb, and we take it, we refute the problem statement.
That’s not my problem, as an agent. That’s your problem, as the person who stated the problem. If you say “assume you’re facing a perfect predictor, and see the bomb”, then I can validly say “no”.
Whether the predictor is accurate isn’t specified in the problem statement, and indeed can’t be specified in the problem statement (lest the scenario be incoherent, or posit impossible epistemic states of the agent being tested). What is specified is what existing knowledge you, the agent, have about the predictor’s accuracy, and what you observe in the given situation (from which you can perhaps infer additional things about the predictor, but that’s up to you).
In other words, the scenario is: as per the information you have, so far, the predictor has predicted 1 trillion trillion times, and been wrong once (or, some multiple of those numbers—predicted 2 trillion trillion times and been wrong twice, etc.).
You now observe the given situation (note predicting Right, bomb in Left, etc.). What do you do?
Now, we might ask: but is the predictor perfect? How perfect is she? Well… you know that she’s erred once in a trillion trillion times so far—ah, no, make that twice in a trillion trillion times, as of this iteration you now find yourself in. That’s the information you have at your disposal. What can you conclude from that? That’s up to you.
Likewise, you say:
So, no, I don’t see the bomb. That’s all but 1 in a trillion trillion ‘WRONG’. What really happened, according to this problem, is—well, it’s still indeterminate, because Will’s problem statement isn’t complete. He doesn’t tell us what happens insofar as the predictor is accurate and I go left if and only if I see the bomb, ie if I always spite the predictor’s prediction. If the predictor would give me a 1 trillion dollar coin for my ingenuity in spiting them, then what really happens is I see a $1T coin (all but 1 in a trillion trillion times).
The problem statement absolutely is complete. It asks what you would/should do in the given scenario. There is no need to specify what “would” happen in other (counterfactual) scenarios, because you (the agent) do not observe those scenarios. There’s also no question of what would happen if you “always spite the predictor’s prediction”, because there is no “always”; there’s just the given situation, where we know what happens if you choose Left: you burn to death.
You can certainly say “this scenario has very low probability”. That is reasonable. What you can’t say is “this scenario is logically impossible”, or any such thing. There’s no impossibility or incoherence here.
It’s not complete enough to determine what I do when I don’t see a bomb. And so when the problem statement is corrected to stop flatly asserting consequences of my actions as if they’re facts, you’ll find that my behavior in the corrected problem is underdefined. (If this still isn’t clear, try working out what the predictor does to the agent that takes the bomb if it’s present, but pays the $100 if it isn’t.)
And if we’re really technical, it’s not actually quite complete enough to determine what I do when I see the bomb. That depends on what the predictor does when there are two consistent possible outcomes. Like, if I would go left when there was no bomb and right when there was a bomb, what would the predictor do? If they only place the bomb if I insist on going right when there’s no bomb, then I have no incentive to go left upon seeing a bomb insofar as they’re accurate. To force me to go left, the predictor has to be trigger-happy, dropping the bomb given the slightest opportunity.
What is specified is what existing knowledge you, the agent, have about the predictor’s accuracy, and what you observe in the given situation
Nitpick: you seem to be trying to say something like “let us sidestep questions of predictor mechanics and assume that your only observations were the game-history”. But that doesn’t work. If I wake up in a room with a bomb about to go off in the left exit and a $100 fee to use the right exit, I pay the $100 fee and use the right exit. I have to see a host of other specific observations to start hypothesizing the existence of a predictor who would stop placing bombs if-counterfactually I took them, before I believe I’m in this problem.
I’m happy to leave unspecified the prior knowledge that causes us to believe there’s a predictor, but I object to the claim that we’ve fully specified my epistemic state by letting me see the problem history.
Now, we might ask: but is the predictor perfect? How perfect is she?
I’m not going to ask whether the predictor is perfect, no. I’m going to assume she isn’t, and then do some calculations, and then assume she is, and do some other calculations. Because it’s often useful to solve decision problems using case-by-case analysis.
And in the second case, your problem statement is revealed to be a lie.
Now, it’s perfectly possible for you to rephrase the problem such that my analysis does not, in some cases, reveal it to be a lie. You could say “the predictor is on the fritz, and you see the bomb” (in which case I’ll go right, thanks). Or you could say “the predictor is 1% on the fritz and 99% accurate, and in the 1% case you see a bomb and in the 99% case you may or may not see a bomb depending what you do”. That’s also fine. But insofar as you flatly postulate a (almost certain) consequence of my action, be prepared for me to assert that you’re (almost certainly) wrong.
You can certainly say “this scenario has very low probability”. That is reasonable. What you can’t say is “this scenario is logically impossible”, or any such thing. There’s no impossibility or incoherence here.
There’s impossibliity here precisely insofar as the predictor is accurate.
One intuition pump is: You say “but if-counterfactually you see the bomb on the left, and you take it, you die”. I protest “no, if-counterfactually I see the bomb on the left, then I die insofar as the predictor was on the fritz, and I get a puppy insofar as the predictor was accurate, by the principle of explosion. So I 1-in-a-hundred-trillion-trillion% die, and almost certanily get a puppy. Win.” Like, from my perspective, trying to analyze outcomes under assumptions that depend on my own actions can lead to nonsense.
(My model of a critic says “but surely you can see the intuition that, when you’re actually faced with the bomb, you actually shouldn’t take it?”. And ofc I have that intuition, as a human, I just think it lost in a neutral and fair arbitration process, but that’s a story for a different day. Speaking from the intuition that won, I’d say that it sure would be sad to take the bomb in real life, and that in real life you should spare no expense searching for the evidence that the predictor is on the fritz, which almost certainly would exist and let you avoid the bomb iff the predictor is on the fritz in real life, and if you run your real life right you should never once actually eat a bomb, but also yes, the correct intuition is that you take the bomb knowing it kills you, and that you roll your eyes at the person saying it was a bad move for roughly the same reason you roll your eyes at the person saying that ‘middle’ is never your best option in the high/low/extremeball game—such is the fate of people who are bound to submit a single action to be played against a mixture of games.)
Separately, I note that if you think an agent should behave very differently when a possibility is logically impossible, vs when they assess a probability on the order of 1 in a trillion trillion, then I suspect you’re doing something wrong. (Not least that you’re likely wrong about how much confidence a human can get in any particular logical fact.)
To check, do we have a disagreement about what to do insofar as the predictor is accurate? Like, in the case of the transparent Newcomb’s problem with a literally perfect predictor, would you one-box even if the box were empty?
It’s not complete enough to determine what I do when I don’t see a bomb.
I don’t understand this objection. The given scenario is that you do see a bomb. The question is: what do you do in the given scenario?
You are welcome to imagine any other scenarios you like, or talk about counterfactuals or what have you. But the scenario, as given, tells you that you know certain things and observe certain things. The scenario does not appear to be in any way impossible. “What do I do when I don’t see a bomb” seems irrelevant to the question, which posits that you do see a bomb.
… flatly asserting consequences of my actions as if they’re facts …
Er, what? If you take the bomb, you burn to death. Given the scenario, that’s a fact. How can it not be a fact? (Except if the bomb happens to malfunction, or some such thing, which I assume is not what you mean…?)
(If this still isn’t clear, try working out what the predictor does to the agent that takes the bomb [Left] if it’s present, but pays the $100 [Right] if it isn’t.)
Well, let’s see. The problem says:
If the predictor predicted that you would choose Right, then she put a bomb in Left. If the predictor predicted that you would choose Left, then she did not put a bomb in Left, and the box is empty.
So, if the predictor predicts that I will choose Right, she will put a bomb in Left, in which case I will choose Left. If she predicts that I will choose Left, then she puts no bomb in Left, in which case I will choose Right. This appears to be paradoxical, but that seems to me to be the predictor’s fault (for making an unconditional prediction of the behavior of an agent that will certainly condition its behavior on the prediction), and thus the predictor’s problem.
I… don’t see what bearing this has on the disagreement, though.
Nitpick: you seem to be trying to say something like “let us sidestep questions of predictor mechanics and assume that your only observations were the game-history”. But that doesn’t work. If I wake up in a room with a bomb about to go off in the left exit and a $100 fee to use the right exit, I pay the $100 fee and use the right exit. I have to see a host of other specific observations to start hypothesizing the existence of a predictor who would stop blacing bombs if-counterfactually I took them, before I believe I’m in this problem.
I’m happy to leave unspecified the prior knowledge that causes us to believe there’s a predictor, but I object to the claim that we’ve fully specified my epistemic state by letting me see the problem history.
What I am saying is that we don’t have access to “questions of predictor mechanics”, only to the agent’s knowledge of “predictor mechanics”. In other words, we’ve fully specified your epistemic state by specifying your epistemic state—that’s all. I don’t know what you mean by calling it “the problem history”. There’s nothing odd about knowing (to some degree of certainty) that certain things have happened. You know there’s a (supposed) predictor, you know that she has (apparently) made such-and-such predictions, this many times, with these-and-such outcomes, etc. What are her “mechanics”? Well, you’re welcome to draw any conclusions about that from what you know about what’s gone before. Again, there is nothing unusual going on here.
I’m not going to ask whether the predictor is perfect, no. I’m going to assume she isn’t, and then do some calculations, and then assume she is, and do some other calculations. Because it’s often useful to solve decision problems using case-by-case analysis.
And in the second case, your problem statement is revealed to be a lie.
I’m sorry, but this really makes very little sense to me. We know the predictor is not quite perfect. That’s one of the things we’re told as a known fact. She’s pretty close to perfect, but not quite. There’s just the one “case” here, and that’s it…
Indeed, the entire “your problem statement is revealed to be a lie” approach seems to me to be nonsensical. This is the given scenario. If you assume that the predictor is perfect and, from that, conclude that the scenario couldn’t’ve come to pass, doesn’t that entail that, by construction, the predictor isn’t perfect (proof by contradiction, as it were)? But then… we already know she isn’t perfect, so what was the point of assuming otherwise?
There’s impossibliity here precisely insofar as the predictor is accurate.
Well, we know she’s at least slightly inaccurate… obviously, because she got it wrong this time! We know she can get it wrong, because, by construction, she did get it wrong. (Also, of course, we know she’s gotten it wrong before, but that merely overdetermines the conclusion that she’s not quite perfect.)
One intuition pump is: You say “but if-counterfactually you see the bomb on the left, and you take it, you die”.
Well, no. In the given scenario, you do see the bomb on the left. There’s no counterfactuals involved.
… you roll your eyes at the person saying it was a bad move for roughly the same reason you roll your eyes at the person saying that ‘middle’ is never your best option in the high/low/extremeball game—such is the fate of people who are bound to submit a single action to be played against a mixture of games.
But you’re not bound to submit a single action to be played against a mixture of games. In the given scenario, you can choose your action knowing which game you’re playing!
… in real life you should spare no expense searching for the evidence that the predictor is on the fritz, which almost certainly would exist and let you avoid the bomb iff the predictor is on the fritz in real life …
… or, you could just… choose Right. That seems to me to be a clear win.
Separately, I note that if you think an agent should behave differently when a possibility is logically impossible, vs when they assess a probability on the order of 1 in a googleplex, then I suspect you’re doing something wrong. (Not least that you’re likely wrong about how much confidence a human can get in any particular logical fact.)
If an agent finds themselves in a scenario that they think is logically impossible, then they’re obviously wrong about that scenario being logically impossible, as demonstrated by the fact that actually, it did come to pass. So finding oneself in such a scenario should cause one to immediately re-examine one’s beliefs on what is, and is not, logically impossible, and/or one’s beliefs about what scenario one finds oneself in…
To check, do we have a disagreement about what to do insofar as the predictor is accurate? Like, in the case of the transparent Newcomb’s problem with a literally perfect predictor, would you one-box even if the box were empty?
How would I come to the belief that the predictor is literally perfect (or even mostly perfect)? Wouldn’t that just look like a string of observations of cases where either (a) there’s money in the one box and the agent takes the one box full of money, or (b) there’s no money in the one box and the agent takes the two boxes (and gets only the lesser quantity)? How exactly would I distinguish between the “perfect predictor” hypothesis and the “people take the one box if it has a million dollars, two boxes otherwise” hypothesis, as an explanation for those observations? (Or, to put it another way, what do I think happens to people who take an empty one box? Well, I can’t have any information about that, right? [Beyond the common sense of “obviously, they get zilch”…] Because if that had ever happened before, well, there goes the “perfect predictor” hypothesis, yes?)
The scenario does not appear to be in any way impossible.
The scenario says “the predictor is likely to be accurate” and then makes an assertion that is (for me, at least) false insofar as the predictor is accurate. You can’t have it both ways. The problem statement (at least partially) contradicts itself. You and I have a disagreement about how to evaluate counterfactuals in cases where the problem statement is partly-self-contradictory.
This appears to be paradoxical, but that seems to me to be the predictor’s fault
Sure, it’s the predictor’s problem, and the behavior that I expect of the predictor in the case that I force them to notice they have a problem has a direct effect on what I do if I don’t see a bomb. In particular, if they reward me for showing them their problem, then I’d go right when I see no bomb, whereas if they’d smite me, then I wouldn’t. But you’re right that this is inconsequential when I do see the bomb.
In the given scenario, you do see the bomb on the left. There’s no counterfactuals involved.
Well for one thing, I just looked and there’s no bomb on the left, so the whole discussion is counter-to-fact. And for another, if I pretend I’m in the scenario, then I choose my action by visualizing the (counterfactual) consequences of taking the bomb, and visualizing the (counterfactual) consequences of refraining. So there are plenty of counterfactuals involved.
I assert that the (counterfactual) consequences of taking the bomb include (almost certainly) rendering the whole scenario impossible, and rendering some other hypothetical (that I don’t need to pay $100 to leave) possible instead. And so according to me, the correct response to someone saying “assume you see the bomb” is to say “no, I shall assume that I see no bomb instead”, because that’s the consequence I visualize of (counterfactually) taking the bomb.
You’re welcome to test it empirically (well, maybe after adding at least $1k to all outcomes to incentivise me to play), if you have an all-but-one-in-a-trillion-trillion accurate predictor-of-me lying around. (I expect your empericism will prove me right, in that the counterfactuals where it shows me a bomb are all in fact rendered impossible, and what happens in reality instead is that I get to leave with $900).
...Also, just to be clear, you’re aware that these are two different internally-consistent but contradictory theories of how to evaluate counterfactuals? Like, we can be pretty confident that there’s no argument a CDT agent can hear that causes them to wish to adopt FDT counterfactuals, and vice versa. Humans come equipped with both intuitions (I can give you other problems that pump the other intuitions, if you’d like), and we have to find some other way to arbitrate the conflict.
(Currently I’m in a mode of attempting to convey the second set of intuitions to you in this one problem, in light of your self-professed difficulty grasping them. Feel free to demonstrate understanding of the second set of intuitions and request that we instead switch to discussing why I think the latter wins in arbitration.)
How would I come to the belief that the predictor is literally perfect (or even mostly perfect)?
I’m asking you to do a case-analysis. For concreteness, suppose you get to inspect the predictor’s computing substrate and source code, and you do a bunch of pondering and thinking, and you’re like “hmm, yes, after lots of careful consideration, I now think that I am 90% likely to be facing a transparent Newcomb’s problem, and 10% likely to be in some other situation, eg, where the code doesn’t work how I think it does, or cosmic rays hit the computer, or what-have-you”. Now we agree that one possible thing that could have happened, is that the predictor accurately predicted you in the past, and we can freely discuss what you should do insofar as that’s the case.
Like, presumably when I present you with the high/low/extremeball game, you have no problem granting statements like “insofar as the die came up ‘highball’, I should choose ‘high’”. Saying “but how could I possibly have come to the epistemic state where I was 100% certain that the die came up ‘highball’???” is a nonsequitur. Your analysis in the face of uncertainty can be decomposed into an aggregation of your analyses in each scenario that might obtain. And so I ask again, insofar as the predictor accurately predicted you—which is surely one of many possible cases to analyze, after examining the mind of the entity that sure looks like it predicted you—what action would you prescribe?
The scenario says “the predictor is likely to be accurate”
Well… no, the scenario says “the predictor has predicted correctly 1 trillion trillion minus one times, and incorrectly one time”. Does that make it “likely to be accurate”? You tell me, I guess, but that seems like an unnecessarily vague characterization of a precise description.
and then makes an assertion that is (for me, at least) false insofar as the predictor is accurate. You can’t have it both ways. The problem statement (at least partially) contradicts itself.
What do you mean by this? What’s contradictory about the predictor making a mistake? Clearly, it’s not perfect. We know this because it made at least one mistake in the past, and then another mistake just now.
Is the predictor “accurate”? Well, it’s approximately as accurate as it takes to guess 1 trillion trillion times and only be wrong once…
I assert that the (counterfactual) consequences of taking the bomb include (almost certainly) rendering the whole scenario impossible, and rendering some other hypothetical (that I don’t need to pay $100 to leave) possible instead. And so according to me, the correct response to someone saying “assume you see the bomb” is to say “no, I shall assume that I see no bomb instead”, because that’s the consequence I visualize of (counterfactually) taking the bomb.
I confess that this reads like moon logic to me. It’s possible that there’s something fundamental I don’t understand about what you’re saying.
...Also, just to be clear, you’re aware that these are two different internally-consistent but contradictory theories of how to evaluate counterfactuals? Like, we can be pretty confident that there’s no argument a CDT agent can hear that causes them to wish to adopt FDT counterfactuals, and vice versa. Humans come equipped with both intuitions (I can give you other problems that pump the other intuitions, if you’d like), and we have to find some other way to arbitrate the conflict.
I am not familiar with this, no. If you have explanatory material / intuition pumps / etc. to illustrate this, I’d certainly appreciate it!
Saying “but how could I possibly have come to the epistemic state where I was 100% certain that the die came up ‘highball’???” is a nonsequitur.
I am not asking how I could come to believe the “literally perfect predictor” thing with 100% certainty; I am asking how I could come to believe it at all (with, let’s say, > 50% certainty).
Now we agree that one possible thing that could have happened, is that the predictor accurately predicted you in the past, and we can freely discuss what you should do insofar as that’s the case.
Hold on, hold on. Are we talking about repeated plays of the same game? Where I face the same situation repeatedly? Or are we talking about observing (or learning about) the predictor playing the game with other people before me?
The “Bomb” scenario described in the OP says nothing about repeated play. If that’s an assumption you’re introducing, I think it needs to be made explicit…
that seems like an unnecessarily vague characterization of a precise description
I deny that we have a precise description. If you listed out a specific trillion trillion observations that I allegedly made, then we could talk about whether those particular observations justify thinking that we’re in the game with the bomb. (If those trillion trillion observations were all from me waking up in a strange room and interacting with it, with no other context, then as noted above, I would have no reason to believe I’m in this game as opposed to any variety of other games consistent with those observations.) The scenario vaguely alleges that we think we’re facing an accurate predictor, and then alleges that their observed failure rate (on an unspecified history against unspecified players) is 1 per trillion-trillion. It does not say how or why we got into the epistemic state of thinking that there’s an accurate predictor there; we assume this by fiat.
(To be clear, I’m fine with assuming this by fiat. I’m simply arguing that your reluctance to analyze the problem by cases seems strange and likely erroneous to me.)
I am not familiar with this, no. If you have explanatory material / intuition pumps / etc. to illustrate this, I’d certainly appreciate it!
The canonical explanatory text is the FDT paper (PDF warning) (that the OP is responding to a critique of, iirc), and there’s a bunch of literature on LW (maybe start at the wiki page on UDT? Hopefully we have one of those) exploring various intuitions. If you’re not familiar with this style of logic, I recommend starting there (ah look we do have a UDT wiki page). I might write up some fresh intuition pumps later, to try to improve the exposition. (We’ve sure got a lot of exposition if you dig through the archives, but I think there are still a bunch of gaps.)
I am not asking how I could come to believe the “literally perfect predictor” thing with 100% certainty; I am asking how I could come to believe it at all (with, let’s say, > 50% certainty).
Uh, I mean, we could play a version of this game where the sums were positive (such that you’d want to play) and smaller (such that I could fund the prizes), and then I could say “yo Said Achmiz, I decided to act as a predictor of you in this game, I’m about to show you either a box with $10 and a box with $1, or an empty box and a box with $1, depending on how I predicted you’ll behave”. And then I think that at least the other readers on this post would predict that I have >50% probability of predicting your behavior correctly. You haven’t exactly been subtle about how you make decisions.
Predicting someone accurately with >50% likelihood isn’t all that hard. A coin predicts you “perfectly accurately” 50% of the time, you’ve only gotta do slightly better than that.
I suspect you’re confused here in putting “perfect prediction” on some big pedestal. the predictor is trying to reason about you, and they either reason basically-correctly, or their reasoning includes some mistake. And insofar as they reason validly to a conclusion about the right person, they’re going to get the answer right.
And again, note that you’re not exactly a tricky case yourself. You’re kinda broadcasting your decision-making procedure all over the internet. You don’t have to be a supermind studying a brainscan to figure out that Said Achmiz takes the extra $1.
My girlfriend often reasons accurately about whether I’m getting dinner tonight. Sometimes she reasons wrongly about it (eg, she neglects that I had a late lunch) but gets the right answer by chance anyway. Sometimes she reasons wrongly about it and gets the wrong answer by chance. But often she reasons correctly about it, and gets the right answer for the right reasons. And if I’m standing in the supermarket wondering whether she already ate or whether I should buy enough cheese to make her some food too, I have no trouble thinking “well, insofar as she got the right answer for the right reasons tonight, she knew I’d be hungry when I got home, and so she hasn’t eaten yet”. None of this case-analysis requires me to believe she’s a supermind who studied a scan of my brain for twelve thousand years. Believing that she was right about me for the right reasons is just not a very difficult epistemic state to enter. My reasons for doing most of what I do just aren’t all that complicated. It’s often not all that tricky to draw the right conclusions about what people do for the right reasons.
Furthermore, I note that in the high/low/extremeball game, I have no trouble answering questions about what I should do insofar as the game is highball, even though I have <50% (and in fact ~33%) probability on that being the case. In fact, I could analyze payouts insofar as the game is highball, even if the game was only 0.1% likely to be highball! In fact, the probability of me actually believing I face a given decision problem is almost irrelevant to my ability to analyze the payouts of different actions. As evidenced in part by the fact that I am prescribing actions in this incredibly implausible scenario with bombs and predictors.
And so in this wild hypothetical situation where we’ve somehow (by fiat) become convinced that there’s an ancient predictor predicting our actions (which is way weirder than my girlfriend predicting my action, tbh), presumably one of our hypotheses for what’s going on is that the predictor correctly deduced what we would do, for the correct reasons. Like, that’s one of many hypotheses that could explain our past observations. And so, assuming that hypothesis, for the purpose of analysis—and recalling that I can analyze which action is best assuming the game is highball, without making any claim about how easy or hard it is to figure out whether the game is highball—assuming that hypothesis, what action would you prescribe?
Well… no, the scenario says “the predictor has predicted correctly 1 trillion trillion minus one times, and incorrectly one time”. Does that make it “likely to be accurate”? You tell me, I guess, but that seems like an unnecessarily vague characterization of a precise description.
Let’s be more precise, then, and speak in terms of “correctness” rather than “accuracy”. There are then two possibilities in the “bomb” scenario as stipulated:
The predictor thought I would take the right box, and was correct.
The predictor thought I would take the right box, and was incorrect.
Now, note the following interesting property of the above two possibilities: I get to choose which of them is realized. I cannot change what the predictor thought, nor can I change its actions conditional on its prediction (which in the stipulated case involves placing a bomb in the left box), but I can choose to make its prediction correct or incorrect, depending on whether I take the box it predicted I would take.
Observe now the following interesting corollary of the above argument: it implies the existence of a certain strategy, which we might call “ObeyBot”, which always chooses the action that confirms the predictor’s prediction, and does so regardless of payoffs: by hypothesis, ObeyBot does not care about burning to death, nor does it care about losing $100; it cares only for making the predictor right as often as it can.
Now, suppose I were to tell you that the predictor’s stipulated track record in this game (10^24 − 1 successes, and 1 failure) were achieved againsta population consisting (almost) entirely of ObeyBots. What would you make of this claim?
It should be quite obvious that there is only one conclusion you can draw: the predictor’s track record establishes absolutely nothing about its predictive capabilities in general. When faced with a population of agents that always do whatever the predictor says they will do, any “predictor”—from the superintelligent Omega to a random number generator—will achieve an accuracy of 100%.
This is an extremely important observation! What it shows is that the predictor’s claimed track record can mean very different things depending on what kind of agents it was playing against… which in turn means that the problem as stated is underdetermined: knowledge of the predictor’s track record does not by itself suffice to pin down the predictor’s actual behavior, unless we are also given information about what kind of agents the predictor was playing against.
Now, obviously whoever came up with the “bomb” scenario presumably did not want us to assume that their predictor is only accurate against ObeyBots and no one else. But to capture this notion requires more than merely stating the predictor’s track record; it requires the further supposition that the predictor’s accuracy is a persistent feature of the predictor, rather than of the agents it was playing against. It requires, in short, the very thing you criticized as an “unnecessarily vague characterization”: the statement that the predictor is, in fact, “likely to be accurate”.
With this as a prerequisite, we are now equipped to address your next question:
What do you mean by this? What’s contradictory about the predictor making a mistake? Clearly, it’s not perfect. We know this because it made at least one mistake in the past, and then another mistake just now.
(First, a brief recap: we are asked to assume the existence of a predictor with an extremely high level of predictive accuracy. Moreover, although the problem statement did not actually specify, it is likely we are meant to assume that this predictor’s accuracy persists across a large reference class of possible agent designs; specifically—and crucially—this reference class must be large enough to include our own decision algorithm, whatever it may be.)
Now, with this in mind, your (Said’s) question is: what could possibly be contradictory about this scenario? What could be contradictory about a scenario that asks us to assume the existence of a predictor that can accurately predict the behavior of a large reference class of agents, including ourselves?
At this point, it should be quite obvious where the potential contradiction lies: it arises directly from the fact that an agent who finds itself in the described scenario can choose to make the predictor right or wrong. Earlier, I exploited this property to construct a strategy (ObeyBot) which makes the predictor right no matter what; but now I will do the opposite, and construct a strategy that always chooses whichever action falsifies the predictor’s prediction. For thematic appropriateness, let us call this strategy “SpiteBot”.
I assert that it is impossible for any predictor to achieve any accuracy against SpiteBot higher than 0%. Therefore, if we assume that I myself am a SpiteBot, the contradiction becomes immediately clear: it is impossible for the predictor to achieve the stipulated predictive accuracy across any reference class of agents broad enough to include SpiteBot, and yet in order for the predictor’s reference class to include me, it must include SpiteBot, since I am a SpiteBot. Contradiction!
Of course, this only establishes that the problem statement is contradictory if the reader happens to be a SpiteBot. And while that might be somewhat amusing to imagine, it is unlikely that any of us (the intended audience of the problem) are SpiteBots, or use any decision theory akin to SpiteBot. So let us ask: are there any other decision theories that manage to achieve a similar effect—any other decision theories that, under at least some subset of circumstances, turn into SpiteBot, thereby rendering the problem formulation contradictory?
Asked that way, the question answers itself: any decision theory that behaves like SpiteBot at least some of the time, will generate contradictions with the problem statement at least some of the time. And (here is the key point) if those contradictions are to avoided, the hypothesized predictor must avoid whatever set of circumstances triggers the SpiteBot-like behavior, lest it falsify its own alleged predictive accuracy… which implies the following corollaries:
The predictor cannot show its face (or its boxes) before SpiteBot. Any native SpiteBots in this universe will therefore never find themselves in this particular “bomb” scenario, nor any relatives.
Any strategies that behave like SpiteBot some of the time, will not encounter any “bomb”-like scenarios during that time. For example, the strategy “Choose your actions normally except on Wednesdays; behave like SpiteBot on Wednesdays” will not encounter any bomb-like scenarios on Wednesday.
If you don’t like the idea of ending up in a “bomb”-like scenario, you should precommit to behaving like SpiteBot in the face of such a scenario. If you actually do this properly, you will never see a real predictor give you a bomb-like scenario, which means that the only time you will be faced with any kind of bomb-like scenario will be in your imagination, perhaps at the prompting of some inquisitive philosopher. At this point you are free to ask the philosopher how his stipulated predictor reacts to SpiteBot and SpiteBot-like strategies, and—if he fails to give you a convincing answer—you are free to dismiss his premise as contradictory, and his scenario as something you don’t have to worry about.
(The general case of “behaving like SpiteBot when confronted with situations you’d really rather not have encountered to begin with” is called “fuck-you decision theory”, or FDT for short.)
Yes, it was this exact objection that I addressed in my previous replies that relied upon a misreading of the problem. I missed that the boxes were open and thought that the only clue to the prediction was the note that was left.
The only solution was to assume that the predictor does not always leave a note, and this solution also works for the stated scenario. You see that the boxes are open, and the left one contains a bomb, but did everyone else? Did anyone else? The problem setup doesn’t say.
This sort of vagueness leaves holes big enough to drive a truck through. The stated FDT support for picking Left depends absolutely critically on the subjunctive dependency odds being at least many millions to one, and the stated evidence is nowhere near strong enough to support that. Failing that, FDT recommends picking Right.
So the whole scenario is pointless. It doesn’t explore what it was intended to explore.
You can modify the problem to say that the predictor really is that reliable for every agent, but doesn’t always leave the boxes open for you or write a note. This doesn’t mean that the predictor is perfectly reliable, so a SpiteBot can still face this scenario but is just extremely unlikely to.
There’s also no question of what would happen if you “always spite the predictor’s prediction”
There IS a question of what would happen if you “always spite the predictor’s prediction”, since doing so seems to make the 1 in a trillion trillion error rate impossible.
In the Newcomb case, there’s a disagreement about whether one-boxing can actually somehow cause there to be a million dollars in the box; CDT denies this possibility (because it takes no account of sufficiently accurate predictors), while timeless/logical/functional/whatever decision theories accept it.
To be clear, FDT does not accept causation that happens backwards in time. It’s not claiming that the action of one-boxing itself causes there to be a million dollars in the box. It’s the agent’s algorithm, and, further down the causal diagram, Omega’s simulation of this algorithm that causes the million dollars. The causation happens before the prediction and is nothing special in that sense.
Yes, sure. Indeed we don’t need to accept causation of any kind, in any temporal direction. We can simply observe that one-boxers get a million dollars, and two-boxers do not. (In fact, even if we accept shminux’s model, this changes nothing about what the correct choice is.)
The main point of FDT is that it gives the optimal expected utility on average for agents using it. It does not guarantee optimal expected utility for every instance of an agent using it.
Suppose you have a population of two billion agents, each going through this scenario every day. Upon seeing a note predicting right, one billion would pick left and one billion would pick right. We can assume that they all pick left if they see a note predicting left or no note at all.
Every year, the Right agents essentially always see a note predicting right, and pay more than $30000 each. The Left agents essentially always see a note predicting left (or no note) and pay $0 each.
The average rate of deaths is comparable: one death per few trillion years in each group, which is to say, essentially never. They all know that it could happen, of course.
Which group is better off?
Edit: I misread Predictor’s accuracy. It does not say that it is in all scenarios 1 − 10^-24, just that in some unknown sample of scenarios, it was 1 − 10^-24. This changes the odds so much that FDT does not recommend taking the left box.
Edit: I misread Predictor’s accuracy. It does not say that it is in all scenarios 1 − 10^-24, just that in some unknown sample of scenarios, it was 1 − 10^-24. This changes the odds so much that FDT does not recommend taking the left box.
Two questions, if I may:
Why do you read it this way? The problem simply states the failure rate is 1 in a trillion trillion.
If we go with your interpretation, why exactly does that change things? It seems to me that the sample size would have to be extemely huge in order to determine a failure rate that low.
It depends upon what the meaning of the word “is” is:
The failure rate has been tested over an immense number of prediction, and evaluated as 10^-24 (to one significant figure). That is the currently accepted estimate for the predictor’s error rate for scenarios randomly selected from the sample.
The failure rate is theoretically 10^-24, over some assumed distribution of agent types. Your decision model may or may not appear anywhere in this distribution.
The failure rate is bounded above by 10^-24 for every possible scenario.
A self-harming agent in this scenario cannot be consistently predicted by Predictor at all (success rate 0%), so we know that (3) is definitely false.
(1) and (2) aren’t strong enough, because it gives little information about Predictor’s error rate concerning your scenario and your decision model.
We have essentially zero information about Predictor’s true error bounds regarding agents that sometimes carry out self-harming actions. In order to recommend taking the left box, an FDT agent is one that sometimes carries out self-harming actions, though this requires that the upper bound on Predictor’s failure of subjunctive dependency is less than the ratio of the utilities of: paying $100, and burning to death all intelligent life in the universe.
We do not have anywhere near enough information to justify that tight a bound. So FDT can’t recommend such an action. Maybe someone else can write a scenario that is in similar spirit, but isn’t so flawed.
Another way of phrasing it: you don’t get the $100 marginal payoff if you’re not prepared to knowingly go to your death in the incredibly unlikely event of a particular type of misprediction.
That’s the sense in which I meant “you got screwed”. You entered the scenario knowing that it was incredibly unlikely that you would die regardless of what you decide, but were prepared to accept that incredibly microscopic chance of death in exchange for keeping your $100. The odds just went against you.
Edit: If Predictor’s actual bound on error rate was 10^-24, this would be valid. However, Predictor’s bound on error rate cannot be 10^-24 in all scenarios, so this is all irrelevant. What a waste of time.
Re: the Bomb scenario:
It seems to me that the given defense of FDT is, to put it mildly, unsatisfactory. Whatever “fancy” reasoning is proffered, nevertheless the options on offer are “burn to death” or “pay $100”—and the choice is obvious.
FDT recommends knowingly choosing to burn to death? So much the worse for FDT!
FDT has very persuasive reasoning for why I should choose to burn to death? Uh-huh (asks the non-FDT agent), and if you’re so rational, why are you dead?
Counterfactuals, you say? Well, that’s great, but you still chose to burn to death, instead of choosing not to burn to death.
In “Newcomb’s Problem and Regret of Rationality”, Eliezer wrote:
Similarly, you don’t have to take the Right box because your decision theory says you should. You can just… take the Right box.
And, you know… not burn to death.
(Maybe the real FDT is “use FDT in all the cases except where doing so will result in you burning to death, in which case use not-FDT”? That way you get the good outcome in all 1 trillion trillion cases, eh?)
P.S. Vaniver’s comment seems completely inapplicable to me, since in the “Bomb” scenario it’s not a question of uncertainty at all.
I’m gonna try this one more time from a different angle: what’s your answer on Parfit’s Hitchhiker? To pay or not to pay?
Pay.
So even though you are already in the city, you choose to pay and lose utility in that specific scenario? That seems inconsistent with right-boxing on Bomb.
For the record, my answer is also to pay, I but then again I also left-box on Bomb.
Parfit’s Hitchhiker is not an analogous situation, since it doesn’t take place in a context like “you’re the last person in the universe and will never interact with another agent ever”, nor does paying cause me to burn to death (in which case I wouldn’t pay; note that this would defeat the point of being rescued in the first place!).
But more importantly, in the Parfit’s Hitchhiker situation, you have in fact been provided with value (namely, your life!). Then you’re asked to pay a (vastly smaller!) price for that value.
In the Bomb scenario, on the other hand, you’re asked to give up your life (very painfully), and in exchange you get (and have gotten) absolutely nothing whatsoever.
So I really don’t see the relevance of the question…
Actually, I have thought about this a bit more and concluded Bomb and Parfit’s hitchhiker are indeed analogous in a very important sense: both problems give you the option to “pay” (be it in dollars or with torture and death), even though not paying doesn’t causally affect whether or not you die.
Like Partfit’s hitchhiker, where you are asked to pay $1000 even though you are already rescued.
That was never relevant to begin with.
Well, both problems have a predictor and focus on a specific situation after the predictor has already made the prediction. Both problems have subjunctive dependence. So they are analogous, but they have differences as well. However, it seems like you don’t pay because of subjunctive dependence reasons, so never mind, I guess.
This is where, at least in part, your misunderstanding lies (IMO). FDT doesn’t recommend choosing to burn to death. It recommends Left-boxing, which avoids burning to death AND avoids paying $100.
In doing so, FDT beats both CDT and EDT, which both pay $100. It really is as simple as that. The Bomb is an argument for FDT, and quite an excellent one.
… huh? How does this work? The scenario, as described in the OP, is that the Left box has a bomb in it. By taking it, you burn to death. But FDT, as you say, recommends Left-boxing. Therefore, FDT recommends knowingly choosing to burn to death.
I don’t understand how you can deny this when your own post clearly describes all of this.
This works because Left-boxing means you’re in a world where the predictors model of you also Left-boxed when the predictor made its prediction, causing it to not put a Bomb in Left.
Put differently, the situation described by MacAskill becomes virtually impossible if you Left-box, since the probability of Left-boxing and burning to death is ~0.
OR, alternatively, we say: no, we see the Bomb. We can’t retroactively change this! If we keep that part of the world fixed, then, GIVEN the subjunctive dependence between us and the predictor (assuming it’s there), that simply means we Right-box (with probability ~1), since that’s what the predictor’s model did.
Of course, then it’s not much of a decision theoretic problem anymore, since the decision is already fixed in the problem statement. If we assume we can still make a decision, then that decision is made in 2 places: first by the predictor’s model, then by us. Left-boxing means the model Left-boxes and we get to live for free. Right-boxing means the model Right-boxes and we get to live at a cost of $100. The right decision must be Left-boxing.
Irrelevant, since the described scenario explicitly stipulates that you find yourself in precisely that situation.
Yes, that’s what I’ve been saying: choosing Right in that scenario is the correct decision.
I have no idea what you mean by this.
No, Left-boxing means we burn to death.
“Irrelevant, since the described scenario explicitly stipulates that you find yourself in precisely that situation.”
Actually, this whole problem is irrelevant to me, a Left-boxer: Left-boxers never (or extremely rarely) find themselves in the situation with a bomb in Left. That’s the point.
Firstly, there’s a difference between “never” and “extremely rarely”. And in the latter case, the question remains “and what do you do then?”. To which, it seems, you answer “choose the Right box”…? Well, I agree with that! But that’s just the view that I’ve already described as “Left-box unless there’s a bomb in Left, in which case Right-box”.
It remains unclear to me what it is you think we disagree on.
That difference is so small as to be neglected.
It seems to me that strategy leaves you manipulatable by the predictor, who can then just always predict you will Right-box, put a bomb in Left, and let you Right-box, causing you to lose $1,000.
By construction it is not, because the scenario is precisely that we find ourselves in one such exceptional case; the posterior probability (having observed that we do so find ourselves) is thus ~1.
… but you have said, in a previous post, that if you find yourself in this scenario, you Right-box. How to reconcile your apparently contradictory statements…?
Except that we don’t find ourselves there if we Left-box. But we seem to be going around in a circle.
Right-boxing is the necessary consequence if we assume the predictor’s Right-box prediction is fixed now. So GIVEN the Right-box prediction, I apparently Right-box.
My entire point is that the prediction is NOT a given. I Left-box, and thus change the prediction to Left-box.
I have made no contradictory statements. I am and always have been saying that Left-boxing is the correct decision to resolve this dilemma.
There’s no “if” about it. The scenario is that we do find ourselves there. (If you’re fighting the hypothetical, you have to be very explicit about that, because then we’re just talking about two totally different, and pretty much unrelated, things. But I have so far understood you to not be doing that.)
I don’t know what you mean by “apparently”. You have two boxes—that’s the scenario. Which do you choose—that’s the question. You can pick either one; where does “apparently” come in?
What does this mean? The boxes are already in front of you.
You just said in this very comment that you Right-box in the given scenario! (And also in several other comments… are you really going to make me cite each of them…?)
I’m not going to make you cite anything. I know what you mean. I said Right-boxing is a consequence, given a certain resolution of the problem; I always maintained Left-boxing is the correct decision. Apparently I didn’t explain myself well, that’s on me. But I’m kinda done, I can’t seem to get my point across (not saying it’s your fault btw).
Do you understand why one should Left-box for a perfect predictor if there’s a bomb in the left box?
Of course one should not; if there’s a bomb in Left, doing so leads to you dying.
It doesn’t. Instead, it will make it so that there will have never been a bomb in the first place.
To understand this, imagine yourself as a deterministic algorithm. Either you Left-box under all circumstances (even if there is a bomb in the left box), or you Right-box under all circumstances, or you Right-box iff there is a bomb in the left box.
Implementing the first algorithm out of these three is the best choice (the expected utility is 0).
Implementing the third algorithm (that’s what you do) is the worst choice (the expected utility is -$100).
By the way, I want to point out that you apparently disagree with Heighn on this. He says, as I understand him, that if you pick Left, you do indeed burn to death, but this is fine, because in [1 trillion trillion minus one] possible worlds, you live and pay nothing. But you instead say that if you pick Left… something happens… and the bomb in the Left box, which you were just staring directly at, disappears somehow. Or wasn’t ever there (somehow), even though, again, you were just looking right at it.
How do you reconcile this disagreement? One of you has to be wrong about the consequences of picking the Left box.
I think we agree. My stance: if you Left-box, that just means the predictor predicted that with probability close to 1. From there on, there are a trillion trillion − 1 possible worlds where you live for free, and 1 where you die.
I’m not saying “You die, but that’s fine, because there are possible worlds where you live”. I’m saying that “you die” is a possible world, and there are way more possible worlds where you live.
How?
But apparently the consequences of this aren’t deterministic after all, since the predictor is fallible. So this doesn’t help.
If you reread my comments, I simplified it by assuming an infallible predictor.
For this, it’s helpful to define another kind of causality (logical causality) as distinct from physical causality. You can’t physically cause something to have never been that way, because physical causality can’t go to the past. But you can use logical causality for that, since the output of your decision determines not only your output, but the output of all equivalent computations across the entire timeline. By Left-boxing even in case of a bomb, you will have made it so that the predictor’s simulation of you has Left-boxed as well, resulting in the bomb never having been there.
… so, in other words, you’re not actually talking about the scenario described in the OP. But that’s what my comments have been about, so… everything you said has been a non sequitur…?
This really doesn’t answer the question.
Again, the scenario is: you’re looking at the Left box, and there’s a bomb in it. It’s right there in front of you. What do you do?
So, for example, when you say:
So if you take the Left box, what actually, physically happens?
See my top-level comment, this is precisely the problem with the scenario descibed in the OP I pointed out. Your reading is standard, but not the intended meaning.
But it’s also puzzling that you can’t ITT this point, to see both meanings, even if you disagree that it’s reasonable to allow/expect the intended one. Perhaps divesting from having an opinion on the object level question might help? Like, what is the point the others are trying to make, specifically, how does it work, regardless of if it’s a wrong point, described in a way that makes no reference to its wrongness/absurdity?
If a point seems to me to be absurd, then how can I understand or explain how it works (given that I don’t think it works at all)?
As far as your top-level comment, well, my follow-up questions about it remain unanswered…
Like with bug reports, it’s not helpful to say that something “doesn’t work at all”, it’s useful to be more specific. There’s some failure of rationality at play here, you are way too intelligent to be incapable of seeing what the point is, so there is some systematic avoidance of allowing yourself to see what is going on. Heighn’s antagonistic dogmatism doesn’t help, but shouldn’t be this debilitating.
I dropped out of that conversation because it seemed to be going in circles, and I think I’ve explained everything already. Apparently the conversation continued, green_leaf seems to be making good points, and Heighn continues needlessly upping the heat.
I don’t think object level conversation is helpful at this point, there is some methodological issue in how you think about this that I don’t see an efficient approach to. I’m already way outside the sort of conversational norms I’m trying to follow for the last few years, which is probably making this comment as hopelessly unhelpful as ever, though in 2010 that’d more likely be the default mode of response for me.
Note that it’s my argumentation that’s being called crazy, which is a large factor in the “antagonism” you seem to observe—a word choice I don’t agree with, btw.
About the “needlessly upping the heat”, I’ve tried this discussion from multiple different angles, seeing if we can come to a resolution. So far, no, alas, but not for lack of trying. I will admit some of my reactions were short and a bit provocative, but I don’t appreciate nor agree with your accusations. I have been honest in my reactions.
I’ve been you ten years ago. This doesn’t help, courtesy or honesty (purposes that tend to be at odds with each other) aren’t always sufficient, it’s also necessary to entertain strange points of view that are obviously wrong, in order to talk in another’s language, to de-escalate where escalation won’t help (it might help with feeding norms, but knowing what norms you are feeding is important). And often enough that is still useless and the best thing is to give up. Or at least more decisively overturn the chess board, as I’m doing with some of the last few comments to this post, to avoid remaining in an interminable failure mode.
Just… no. Don’t act like you know me, because you don’t. I appreciate you trying to help, but this isn’t the way.
These norms are interesting in how well they fade into the background, oppose being examined. If you happen to be a programmer or have enough impression of what that might be like, just imagine a programmer team where talking about bugs can be taboo in some circumstances, especially if they are hypothetical bugs imagined out of whole cloth to check if they happen to be there, or brought to attention to see if it’s cheap to put measures in place to prevent their going unnoticed, even if it eventually turns out that they were never there to begin with in actuality. With rationality, that’s hypotheses about how people think, including hypotheses about norms that oppose examination of such hypotheses and norms.
Sorry, I’m having trouble understanding your point here. I understand your analogy (I was a developer), but am not sure what you’re drawing the analogy to.
I see your point, although I have entertained Said’s view as well. But yes, I could have done better. I tend to get like this when my argumentation is being called crazy, and I should have done better.
You could have just told me this instead of complaining about me to Said though.
“So if you take the Left box, what actually, physically happens?”
You live. For free. Because the bomb was never there to begin with.
Yes, the situation does say the bomb is there. But it also says the bomb isn’t there if you Left-box.
At the very least, this is a contradiction, which makes the scenario incoherent nonsense.
(I don’t think it’s actually true that “it also says the bomb isn’t there if you Left-box”—but if it did say that, then the scenario would be inconsistent, and thus impossible to interpret.)
That’s what I’ve been saying to you: a contradiction.
And there are two ways to resolve it.
This is misleading. What happens is that the situation you found yourself in doesn’t take place with significant measure. You live mostly in different situations, not this one.
I don’t see how it is misleading. Achmiz asked what actually happens; it is, in virtually all possible worlds, that you live for free.
It is misleading because Said’s perspective is to focus on the current situation, without regarding the other situations as decision relevant. From UDT perspective you are advocating, the other situations remain decision relevant, and that explains much of what you are talking about in other replies. But from that same perspective, it doesn’t matter that you live in the situation Said is asking about, so it’s misleading that you keep attention on this situation in your reply without remarking on how that disagrees with the perspective you are advocating in other replies.
In the parent comment, you say “it is, in virtually all possible worlds, that you live for free”. This is confusing: are you talking about the possible worlds within the situation Said was asking about, or also about possible worlds outside that situation? The distinction matters for the argument in these comments, but you are saying this ambiguously.
No, non sequitur means something else. (If I say “A, therefore B”, but B doesn’t follow from A, that’s a non sequitur.)
I simplified the problem to make it easier for you to understand.
It does. Your question was “How?”. The answer is “through logical causality.”
You take the left box with the bomb, and it has always been empty.
This doesn’t even resemble a coherent answer. Do you really not see how absurd this is?
It doesn’t seem coherent if you don’t understand logical causality.
There is nothing incoherent about both of these being true:
You Left-box under all circumstances (even if there is a bomb in the box)
The expected utility of executing this algorithm is 0 (the best possible)
These two statements can both be true at the same time, and (1) implies (2).
None of that is responsive to the question I actually asked.
It is. The response to your question “So if you take the Left box, what actually, physically happens?” is “Physically, nothing.” That’s why I defined logical causality—it helps understand why (1) is the algorithm with the best expected utility, and why yours is worse.
What do you mean by “Physically, nothing.”? There’s a bomb in there—does it somehow fail to explode? How?
It fails to have ever been there.
Do you see how that makes absolutely no sense as an answer to the question I asked? Like, do you see what makes what you said incomprehensible, what makes it appear to be nonsense? I’m not asking you to admit that it’s nonsense, but can you see why it reads as bizarre moon logic?
I can, although I indeed don’t think it is nonsense.
What do you think our (or specifically my) viewpoint is?
I’m no longer sure; you and green_leaf appear to have different, contradictory views, and at this point that divergence has confused me enough that I could no longer say confidently what either of you seem to be saying without going back and carefully re-reading all the comments. And that, I’m afraid, isn’t something that I have time for at the moment… so perhaps it’s best to write this discussion off, after all.
Of course! Thanks for your time.
You’re still neglecting the other kind of causality, so “nothing” makes no sense to you (since something clearly happens).
I’m tapping out, since I don’t see you putting any effort into understanding this topic.
Agreed, but I think it’s important to stress that it’s not like you see a bomb, Left-box, and then see it disappear or something. It’s just that Left-boxing means the predictor already predicted that, and the bomb was never there to begin with.
Put differently, you can only Left-box in a world where the predictor predicted you would.
What stops you from Left-boxing in a world where the predictor didn’t predict that you would?
To make the question clearer, let’s set aside all this business about the fallibility of the predictor. Sure, yes, the predictor’s perfect, it can predict your actions with 100% accuracy somehow, something about algorithms, simulations, models, whatever… fine. We take all that as given.
So: you see the two boxes, and after thinking about it very carefully, you reach for the Right box (as the predictor always knew that you would).
But suddenly, a stray cosmic ray strikes your brain! No way this was predictable—it was random, the result of some chain of stochastic events in the universe. And though you were totally going to pick Right, you suddenly grab the Left box instead.
Surely, there’s nothing either physically or logically impossible about this, right?
So if the predictor predicted you’d pick Right, and there’s a bomb in Left, and you have every intention of picking Right, but due to the aforesaid cosmic ray you actually take the Left box… what happens?
But the scenario stipulates that the bomb is there. Given this, taking the Left box results in… what? Like, in that scenario, if you take the Left box, what actually happens?
The scenario also stipulates the bomb isn’t there if you Left-box.
What actually happens? Not much. You live. For free.
Yes, that’s correct.
By executing the first algorithm, the bomb has never been there.
Here it’s useful to distinguish between agentic ‘can’ and physical ‘can.’
Since I assume a deterministic universe for simplification, there is only one physical ‘can.’ But there are two agentic ’can″s—no matter the prediction, I can agentically choose either way. The predictor’s prediction is logically posterior to my choice, and his prediction (and the bomb’s presence) are the way they are because of my choice. So I can Left-box even if there is a bomb in the left box, even though it’s physically impossible.
(It’s better to use agentic can over physical can for decision-making, since that use of can allows us to act as if we determined the output of all computations identical to us, which brings about better results. The agent that uses the physical can as their definition will see the bomb more often.)
Unless I’m missing something.
No, that’s just plain wrong. If you Left-box given a perfect predictor, the predictor didn’t put a bomb in Left. That’s a given. If the predictor did put a bomb in Left and you Left-box, then the predictor isn’t perfect.
“Irrelevant, since the described scenario explicitly stipulates that you find yourself in precisely that situation.”
It also stipulates the predictor predicts almost perfectly. So it’s very relevant.
“Yes, that’s what I’ve been saying: choosing Right in that scenario is the correct decision.”
No, it’s the wrong decision. Right-boxing is just the necessary consequence of the predictor predicting I Right-box. But insofar this is a decision problem, Left-boxing is correct, and then the predictor predicted I would Left-box.
“No, Left-boxing means we burn to death.”
No, it means the model Left-boxed and thus the predictor didn’t put a bomb in Left.
Do you understand how subjunctive dependence works?
Yes, almost perfectly (well, it has to be “almost”, because it’s also stipulated that the predictor got it wrong this time).
None of this matters, because the scenario stipulates that there’s a bomb in the Left box.
But it’s stipulated that the predictor did put a bomb in Left. That’s part of the scenario.
Why does it matter? We know that there’s a bomb in Left, because the scenario tells us so.
Well, not with your answer, because you Right-box. But anyway.
It matters a lot, because in a way the problem description is contradicting itself (which happens more often in Newcomblike problems).
It says there’s a bomb in Left.
It also says that if I Left-box, then the predictor predicted this, and will not have put a Bomb in Left. (Unless you assume the predictor predicts so well by looking at, I don’t know, the color of your shoes or something. But it strongly seems like the predictor has some model of your decision procedure.)
You keep repeating (1), ignoring (2), even though (2) is stipulated just as much as (1).
So, yes, my question whether you understand subjunctive dependence is justified, because you keep ignoring that crucial part of the problem.
Well, first of all, if there is actually a contradiction in the scenario, then we’ve been wasting our time. What’s to talk about? In such a case the answer to “what happens in this scenario” is “nothing, it’s logically impossible in the first place”, and we’re done.
But of course there isn’t actually a contradiction. (Which you know, otherwise you wouldn’t have needed to hedge by saying “in a way”.)
It’s simply that the problem says that if you Left-box, then the predictor predicted this, and will not have put a bomb in Left… usually. Almost always! But not quite always. It very rarely makes mistakes! And this time, it would seem, is one of those times.
So there’s no contradiction, there’s just a (barely) fallible predictor.
So the scenario tells us that there’s a bomb in Left, we go “welp, guess the predictor screwed up”, and then… well, apparently FDT tells us to choose Left anyway? For some reason…? (Or does it? You tell me…) But regardless, obviously the correct choice is Right, because Left’s got a bomb in it.
I really don’t know what else there is to say about this.
There is, as I explained. There’s 2 ways of resolving it, but yours isn’t one of them. You can’t have it both ways.
Just… no. “The predictor predicted this”, yes, so there are a trillion trillion − 1 follow-up worlds where I don’t burn to death! And yes, 1 - just 1 - world where I do. Why choose to focus on that 1 out of a trillion trillion worlds?
Because the problem talks about a bomb in Left?
No. The problem says more than that. It clearly predicts a trillion trillion − 1 worlds where I don’t burn to death. That 1 world where I do sucks, but paying $100 to avoid it seems odd. Unless, of course, you value your life infinitely (which you do I believe?). That’s fine, it does all depend on the specific valuations.
The problem stipulates that you actually, in fact, find yourself in a world where there’s a bomb in Left. These “other worlds” are—in the scenario we’re given—entirely hypothetical (or “counterfactual”, if you like). Do they even exist? If so, in what sense? Not clear. But in the world you find yourself in (we are told), there’s a bomb in the Left box. You can either take that box, and burn to death, or… not do that.
So, “why choose to focus on” that world? Because that’s the world we find ourselves in, where we have to make the choice.
Paying $100 to avoid burning to death isn’t something that “seems odd”, it’s totally normal and the obviously correct choice.
My point is that those “other worlds” are just as much stipulated by the problem statement as that one world you focus on. So, you pay $100 and don’t burn to death. I don’t pay $100, burn to death in 1 world, and live for free in a trillion trillion − 1 worlds. Even if I value my life at $10,000,000,000,000, my choice gives more utility.
Sorry, but no, they’re not. You may choose to infer their “existence” from what’s stated in the problem—but that’s an inference that depends on various additional assumptions (e.g. about the nature of counterfactuals, and all sorts of other things). All that’s actually stipulated is the one world you find yourself in.
You infer the existence of me burning to death from what’s stated in the problem as well. There’s no difference.
I do have the assumption of subjunctive dependence. But without that one—if, say, the predictor predicts by looking at the color of my shoes—then I don’t Left-box anyway.
Of course there’s a difference: inferring burning to death just depends on the perfectly ordinary assumption of cause and effect, plus what is very explicitly stated in the problem. Inferring the existence of other worlds depends on much more esoteric assumptions that that. There’s really no comparison at all.
Not only is that not the only assumption required, it’s not even clear what it means to “assume” subjunctive dependence. Sure, it’s stipulated that the predictor is usually (but not quite always!) right about what you’ll do. What else is there to this “assumption” than that?
But how that leads to “other worlds exist” and “it’s meaningful to aggregate utility across them” and so on… I have no idea.
Inferring that I don’t burn to death depends on
Omega modelling my decision procedure
Cause and effect from there.
That’s it. No esoteric assumptions. I’m not talking about a multiverse with worlds existing next to each other or whatever, just possible worlds.
If they’re just possible worlds, then why do they matter? They’re not actual worlds, after all (by the time the described scenario is happening, it’s too late for any of them to be actual!). So… what’s the relevance?
The world you’re describing is just as much a possible world as the ones I describe. That’s my point.
Huh? It’s the world that’s stipulated to be the actual world, in the scenario.
No, it isn’t. In the world that’s stipulated, you still have to make your decision.
That decision is made in my head and in the predictor’s head. That’s the key.
But if you choose Left, you will burn to death. I’ve already quoted that. Says so right in the OP.
That’s one possible world. There are many more where I don’t burn to death.
But… there aren’t, though. They’ve already failed to be possible, at that point.
The UDT convention is that other possible worlds remain relevant, even when you find yourself in a possible world that isn’t compatible with their actuality. It’s confusing to discuss this general point as if it’s specific to this contentious thought experiment.
Well, we’re discussing it in the context of this thought experiment. If the point applies more generally, then so be it.
Can you explain (or link to an explanation of) what is meant by “convention” and “remain relevant” here?
The setting has a sample space, as in expected utility theory, with situations that take place in some event (let’s call it a situation event) and offer a choice between smaller events resulting from taking alternative actions. The misleading UDT convention is to call the situation event “actual”. It’s misleading because the goal is to optimize expected utility over the whole sample space, not just over the situation event, so the places on the sample space outside the situation event are effectively still in play, still remain relevant, not ruled out by the particular situation event being “actual”.
Alright. But by the time the situation described in the OP happens, it no longer matters whether you optimize expected utility over the whole sample space; that goal is now moot. One event out of the sample space has occurred, and the others have failed to occur. Why would you continue to attempt to achieve that goal, toward which you are no longer capable of taking any action?
That goal may be moot for some ways of doing decisions. For UDT it’s not moot, it’s the only thing that we care about instead. And calling some situation or another “actual” has no effect at all on the goal, and on the process of decision making in any situation, actual or otherwise, that’s what makes the goal and the decision process reflectively stable.
“But by the time the situation described in the OP happens, it no longer matters whether you optimize expected utility over the whole sample space; that goal is now moot.”
This is what we agree on. If you’re in the situation with a bomb, all that matters is the bomb.
My stance is that Left-boxers virtually never get into the situation to begin with, because of the prediction Omega makes. So with probability close to 1, they never see a bomb.
Your stance (if I understand correctly) is that the problem statement says there is a bomb, so, that’s what’s true with probability 1 (or almost 1).
And so I believe that’s where our disagreement lies. I think Newcomblike problems are often “trick questions” that can be resolved in two ways, one leaning more towards your interpretation.
In spirit of Vladimir’s points, if I annoyed you, I do apologize. I can get quite intense in such discussions.
But that’s false for a UDT agent, it still matters to that agent-instance-in-the-situation what happens in other situations, those without a bomb, it’s not the case that all that matters is the bomb (or even a bomb).
Hmm, interesting. I don’t know much about UDT. From and FDT perspective, I’d say that if you’re in the situation with the bomb, your decision procedure already Right-boxed and therefore you’re Right-boxing again, as logical necessity. (Making the problem very interesting.)
To explain my view more, the question I try to answer in these problems is more or less: if I were to choose a decision theory now to strictly adhere to, knowing I might run into the Bomb problem, which decision theory would I choose?
Not at the point in time where Omega models my decision procedure.
One thing we do agree on:
If I ever find myself in the Bomb scenario, I Right-box. Because in that scenario, the predictor’s model of me already Right-boxed, and therefore I do, too—not as a decision, per se, but as a logical consequence.
The correct decision is another question—that’s Left-boxing, because the decision is being made in two places. If I find myself in the Bomb scenario, that just means the decision to Right-box was already made.
The Bomb problem asks what the correct decision is, and makes clear (at least under my assumption) that the decision is made at 2 points in time. At that first point (in the predictor’s head), Left-boxing leads to the most utility: it avoids burning to death for free. Note that at that point, there is not yet a bomb in Left!
If we agree on that, then I don’t understand what it is that you think we disagree on! (Although the “not as a decision, per se” bit seems… contentless.)
No, it asks what decision you should make. And we apparently agree that the answer is “Right”.
Hmmm, I thought that comment might clear things up, but apparently it doesn’t. And I’m left wondering if you even read it.
Anyway, Left-boxing is the correct decision. But since you didn’t really engage with my points, I’ll be leaving now.
What does it mean to say that Left-boxing is “the correct decision” if you then say that the decision you’d actually make would be to Right-box? This seems to be straightforwardly contradictory, in a way that renders the claim nonsensical.
I read all your comments in this thread. But you seem to be saying things that, in a very straightforward way, simply don’t make any sense…
Alright. The correct decision is Left-boxing, because that means the predictor’s model Left-boxed (and so do I), letting me live for free. Because, at the point where the predictor models me, the Bomb isn’t placed yet (and never will be).
However, IF I’m in the Bomb scenario, then the predictor’s model already Right-boxed. Then, because of subjunctive dependence, it’s apparently not possible for me to Left-box, just as it is impossible for two calculators to give a different result to 2 + 2.
Well, the Bomb scenario is what we’re given. So the first paragraph you just wrote there is… irrelevant? Inapplicable? What’s the point of it? It’s answering a question that’s not being asked.
As for the last sentence of your comment, I don’t understand what you mean by it. Certainly it’s possible for you to Left-box; you just go ahead and Left-box. This would be a bad idea, of course! Because you’d burn to death. But you could do it! You just shouldn’t—a point on which we, apparently, agree.
The bottom line is: to the actual single question the scenario asks—which box do you choose, finding yourself in the given situation?—we give the same answer. Yes?
The bottom line is that Bomb is a decision problem. If I am still free to make a decision (which I suppose I am, otherwise it isn’t much of a problem), then the decision I make is made at 2 points in time. And then, Left-boxing is the better decision.
Yes, the Bomb is what we’re given. But with the very reasonable assumption of subjunctive dependence, it specifies what I am saying...
We agree that if I would be there, I would Right-box, but also everybody would then Right-box, as a logical necessity (well, 1 in a trillion trillion error rate, sure). It has nothing to do with correct or incorrect decisions, viewed like that: the decision is already hard coded into the problem statement, because of the subjunctive dependence.
“But you can just Left-box” doesn’t work: that’s like expecting one calculator to answer to 2 + 2 differently than another calculator.
Unless I’m missing something, it’s possible you’re in the predictor’s simulation, in which case it’s possible you will Left-box.
Excellent point!
I think it’s better to explain to such people the problem where the predictor is perfect, and then generalize to an imperfect predictor. They don’t understand the general principle of your present choices pseudo-overwriting the entire timeline and can’t think in the seemingly-noncausal way that optimal decision-making requires. By jumping right to an imperfect predictor, the principle becomes, I think, too complicated to explain.
(Btw, you can call your answer “obvious” and my side “crazy” all you want, but it won’t change a thing until you actually demonstrate why and how FDT is wrong, which you haven’t done.)
I’ve done that: FDT is wrong because it (according to you) recommends that you choose to burn to death, when you could easily choose not to burn to death. Pretty simple.
It seems to me that your argument proves too much.
Let’s set aside this specific example and consider something more everyday: making promises. It is valuable to be able to make promises that others will believe, even when they are promises to do something that (once the relevant situation arises) you will strongly prefer not to do.
Suppose I want a $1000 loan, with $1100 to be repaid one year from now. My counterparty Bob has no trust in the legal system, police, etc., and expects that next year I will be somewhere where he can’t easily find me and force me to pay up. But I really need the money. Fortunately, Bob knows some mad scientists and we agree to the following: I will have implanted in my body a device that will kill me if 366 days from now I haven’t paid up. I get the money. I pay up. Nobody dies. Yay.
I hope we are agreed that (granted the rather absurd premises involved) I should be glad to have this option, even though in the case where I don’t pay up it kills me.
Revised scenario: Bob knows some mad psychologists who by some combination of questioning, brain scanning, etc., are able to determine very reliably what future choices I will make in any given situation. He also knows that in a year’s time I might (but with extremely low probability) be in a situation where I can only save my life at the cost of the $1100 that I owe him. He has no risk tolerance to speak of and will not lend me the money if in that situation I would choose to save my life and not give him the money.
Granted these (again absurd) premises, do you agree with me that it is to my advantage to have the sort of personality that can promise to pay Bob back even if it literally kills me?
It seems to me that: 1. Your argument in this thread would tell me, a year down the line and in the surprising situation that I do in fact need to choose between Bob’s money and my life, “save your life, obviously”. 2. If my personality were such that I would do as you advise in that situation, then Bob will not lend me the money. (Which may in fact mean that in that unlikely future situation I die anyway.) 3. Your reasons for saying “FDT recommends knowingly choosing to burn to death! So much the worse for FDT!”, are equally reasons to say “Being someone who can make and keep this sort of promise means knowingly choosing to pay up and die! So much the worse for being that sort of person!”. 4. Being that sort of person is not in fact worse even though there are situations in which it leads to a worse outcome. 5. There is no version of “being that sort of person” that lets you just decide to live, in that unlikely situation, because paying up at the cost of your own life is what “being that sort of person” means. 6. To whatever extent I get to choose whether to be that sort of person, I have to make the decision before I know whether I’m going to be in that unlikely situation. And, to whatever extent I get to choose, it is reasonable to choose to be that sort of person, because the net benefit is greater. 7. Once again, “be that sort of person and then change your mind” is not one of the available options; if I will change my mind about it, then I was never that sort of person after all.
What (if anything) do you disagree with in that paragraph? What (if anything) do you find relevantly disanalogous between the situation I describe here and the one with the bomb?
I do not.
Your scenario omits the crucial element of the scenario in the OP, where you (the subject) find yourself in a situation where the predictor turns out to have erred in its prediction.
Hmm. I am genuinely quite baffled by this; there seems to be some very fundamental difference in how we are looking at the world. Let me just check that this is a real disagreement and not a misunderstanding (even if it is there would also be a real disagreement, but a different one): I am asking not “do you agree with me that at the point where I have to choose between dying and failing to repay Bob it is to my advantage …” but “do you agree with me that at an earlier point, say when I am negotiating with Bob it is to my advantage …”.
If I am understanding you right and you are understanding me right, then I think the following is true. Suppose that when Bob has explained his position (he is willing to lend me the money if, and only if, his mad scientists determine that I will definitely repay him even if the alternative is death), some supernatural being magically informs me that while it cannot lend me the money it can make me the sort of person who can make the kind of commitment Bob wants and actually follow through. I think you would recommend that I either not accept this offer, or at any rate not make that commitment having been empowered to do so.
Do you feel the same way about the first scenario, where instead of choosing to be a person who will pay up even at the price of death I choose to be a person who will be compelled by brute force to pay up or die? If not, why?
Why does that matter? (Maybe it doesn’t; your opinion about my scenario is AIUI the same as your opinion about the one in the OP.)
Yes, I understood you correctly. My answer stands. (But I appreciate the verification.)
Right.
No, because there’s a difference between “pay up or die” and “pay up and die”.
The scenario in the OP seems to hinge on it. As described, the situation is that the agent has picked FDT as their decision theory, is absolutely the sort of agent who will choose the Left box and die if so predicted, who is thereby supposed to not actually encounter situations where the Left box has a bomb… but oops! The predictor messed up and there is a bomb there anyhow. And now the agent is left with a choice on which nothing depends except whether he pointlessly dies.
I see no analogous feature of your scenarios…
I agree (of course!) that there is a difference between “pay up and die” and “pay up or die”. But I don’t understand how this difference can be responsible for the difference in your opinions about the two scenarios.
Scenario 1: I choose for things to be so arranged that in unlikely situation S (where if I pay Bob back I die), if I don’t pay Bob back then I also die. You agree with me (I think—you haven’t actually said so explicitly) that it can be to my benefit for things to be this way, if this is the precondition for getting the loan from Bob.
Scenario 2: I choose for things to be so arranged that in unlikely scenario S (where, again, if I pay Bob back I die), I will definitely pay. You think this state of affairs can’t be to my advantage.
How is scenario 2 actually worse for me than scenario 1? Outside situation S, they are no different (I will not be faced with such strong incentive not to pay Bob back, and I will in fact pay him back, and I will not die). In situation S, scenario 1 means I die either way, so I might as well pay my debts; scenario 2 means I will pay up and die. I’m equally dead in each case. I choose to pay up in each case.
In scenario 1, I do have the option of saying a mental “fuck you” to Bob, not repaying my debt, and dying at the hand of his infernal machinery rather than whatever other thing I could save myself from with the money. But I’m equally dead either way, and I can’t see why I’d prefer this, and in any case it’s beyond my understanding why having this not-very-appealing extra option would be enough for scenario 1 to be good and scenario 2 to be bad.
What am I missing?
I think we are at cross purposes somehow about the “predictor turns out to have erred” thing. I do understand that this feature is present in the OP’s thought experiment and absent in mine. My thought experiment isn’t meant to be equivalent to the one in the OP, though it is meant to be similar in some ways (and I think we are agreed that it is similar in the ways I intended it to be similar). It’s meant to give me another view of something in your thinking that I don’t understand, in the hope that I might understand it better (hopefully with the eventual effect of improving either my thinking or yours, if it turns out that one of us is making a mistake rather than just starting from axioms that seem alien to one another).
Anyway, it probably doesn’t matter, because so far as I can tell you do in fact have “the same” opinion about the OP’s thought experiment and mine; I was asking about disanalogies between the two in case it turned out that you agreed with all the numbered points in the paragraph before that question. I think you don’t agree with them all, but I’m not sure exactly where the disagreements are; I might understand better if you could tell me which of those numbered points you disagree with.
Yeah you keep repeating that. Stating it. Saying it’s simple, obvious, whatever. Saying I’m being crazy. But it’s just wrong. So there’s that.
Which part of what I said you deny…?
That I’m being crazy
That Left-boxing means burning to death
That your answer is obviously correct
Take your pick.
The scenario stipulates this:
This is instead part of the misleading framing. Putting bomb in Left is actually one of the situations being considered, not all that actually happens, even if it says that it’s what actually happens. It’s one of the possible worlds, and there is a misleading convention of saying that when you find yourself in a possible world, what you see is what actually happens. It’s because that’s how it subjectively looks like, even if other worlds are supposed to still matter by UDT convention.
The question is not which action to take. The question is which decision theory gives the most utility. Any candidate for “best decision theory” should take the left box. This results in a virtually guaranteed save of $100 - and yes, a death burn in an extremely unlikely scenario. In that unlikely scenario, yes, taking the right box gives the most utility—but that’s answering the wrong question.
This sort of reasoning makes sense if you must decide on which box to take prior to learning the details of your situation (a.k.a. in a “veil of ignorance”), and cannot change your choice even after you discover that, e.g., taking the Left box will kill you. In such a case, sure, you can say “look, it’s a gamble, and I did lose big this time, but it was a very favorable gamble, with a clearly positive expected outcome”. (Although see Robyn Dawes’ commentary on such “skewed” gambles. However, we can let this pass here.)
But that’s not the case here. Here, you’ve learned that taking the Left box kills you, but you still have a choice! You can still choose to take Right! And live!
Yes, FDT insists that actually, you must choose in advance (by “choosing your algorithm” or what have you), and must stick to the choice no matter what. But that is a feature of FDT, it is not a feature of the scenario! The scenario does not require that you stick to your choice. You’re free to take Right and live, no matter what your decision theory says.
So when selecting a decision theory, you may of course feel free to pick the one that says that you must pick Left, and knowingly burn to death, while I will pick the one that says that I can pick whatever I want. One of us will be dead, and the other will be “smiling from atop a heap of utility”.
(“But what about all those other possible worlds?”, you may ask. Well, by construction, I don’t find myself in any of those, so they’re irrelevant to my decision now, in the actual world.)
Well, I’d say FDT recognizes that you do choose in advance, because you are predictable. Apparently you have an algorithm running that makes these choices, and the predictor simulates that algorithm. It’s not that you “must” stick to your choice. It’s about constructing a theory that consistently recommends the actions that maximize expected utility.
I know I keep repeating that—but it seems that’s where our disagreement lies. You look at which action is best in a specific scenario, I look at what decision theory produces the most utility. An artificial superintelligence running a decision theory can’t choose freely no matter what the decision theory says: running the decision theory means doing what it says.
That seems like an argument against “running a decision theory”, then!
Now, that statement may seem like it doesn’t make sense. I agree! But that’s because, as I see it, your view doesn’t make sense; what I just wrote is consistent with what you write…
Clearly, I, a human agent placed in the described scenario, could choose either Left or Right. Well, then we should design our AGI in such a way that it also has this same capability.
Obviously, the AGI will in fact (definitionally) be running some algorithm. But whatever algorithm that is, ought to be one that results in it being able to choose (and in fact choosing) Right in the “Bomb” scenario.
What decision theory does that correspond to? You tell me…
CDT
CDT indeed Right-boxes, thereby losing utility.
Exactly, it doesn’t make sense. It is in fact nonsense, unless you are saying it’s impossible to specify a coherent, utility-maximizing decision theory at all?
Btw, please explain how it’s consistent with what I wrote, because it seems obvious to me it’s not.
And if I select FDT, I would be the one “smiling from atop a heap of utility” in (10^24 − 1) out of 10^24 worlds.
Yes, but the point is to construct a decision theory that recommends actions in a way that maximizes expected utility. Recommending left-boxing does that, because it saves you $100 in virtually every world. That’s it, really. You keep focusing on that 1 out of 10^24 possibility were you burn to death, but that doesn’t take anything away from FDT. Like I said: it’s not about which action to take, let alone which action in such an improbable scenario. It’s about what decision theory we need.
So you say. But in the scenario (and in any situation we actually find ourselves in), only the one, actual, world is available for inspection. In that actual world, I’m the one with the heap of utility, and you’re dead.
Who knows what I would do in any of those worlds, and what would happen as a result? Who knows what you would do?
In the given scenario, FDT loses, period, and loses really badly and, what is worse, loses in a completely avoidable manner.
As I said, this reasoning makes sense if, at the time of your decision, you don’t know what possibility you will end up with (and are thus making a gamble). It makes no sense at all if you are deciding while in full possession of all relevant facts.
Totally, and the decision theory we need is one that doesn’t make such terrible missteps!
Of course, it is possible to make an argument like: “yes, FDT fails badly in this improbable scenario, but all other available decision theories fail worse / more often, so the best thing to do is to go with FDT”. But that’s not the argument being made here—indeed, you’ve explicitly disclaimed it…
No. We can inspect more worlds. We know what happens given the agent’s choice and the predictor’s prediction. There are multiple paths, each with its own probability. The problem description focuses on that one world, yes. But the point remains—we need a decision theory, we need it to recommend an action (left-boxing or right-boxing), and left-boxing gives the most utility if we consider the bigger picture.
Do you agree that recommending left-boxing before the predictor makes its prediction is rational?
Well, no. We can reason about more worlds. But we can’t actually inspect them.
Here’s the question I have, though, which I have yet to see a good answer to. You say:
But why can’t our decision theory recommend “choose Left if and only if it contains no bomb; otherwise choose Right”? (Remember, the boxes are open; we can see what’s in there…)
I think that recommending no-bomb-boxing is rational. Or, like: “Take the left box, unless of course the predictor made a mistake and put a bomb in there, in which case, of course, take the right box.”
As to inspection, maybe I’m not familiar enough with the terminology there.
Re your last point: I was just thinking about that too. And strangely enough I missed that the boxes are open. But wouldn’t the note be useless in that case?
I will think about this more, but it seems to me your decision theory can’t recommend “Left-box, unless you see a bomb in left.”, and FDT doesn’t do this. The problem is, in that case the prediction influences what you end up doing. What if the predictor is malevolent, and predicts you choose right, placing the bomb in left? It could make you lose $100 easily. Maybe if you believed the predictor to be benevolent?
Well, uh… that is rather an important aspect of the scenario…
Why not?
Yes, it certainly does. And that’s a problem for the predictor, perhaps, but why should it be a problem for me? People condition their actions on knowledge of past events (including predictions of their actions!) all the time.
Indeed, the predictor doesn’t have to predict anything to make me lose $100; it can just place the bomb in the left box, period. This then boils down to a simple threat: “pay $100 or die!”. Hardly a tricky decision theory problem…
Sure. But given the note, I had the knowledge needed already, it seems. But whatever.
Didn’t say it was a tricky decision problem. My point was that your strategy is easily exploitable and may therefore not be a good strategy.
If your strategy is “always choose Left”, then a malevolent “predictor” can put a bomb in Left and be guaranteed to kill you. That seems much worse than being mugged for $100.
The problem description explicitly states the predictor doesn’t do that, so no.
I don’t see how that’s relevant. In the original problem, you’ve been placed in this weird situation against your will, where something bad will happen to you (either the loss of $100 or … death). If we’re supposing that the predictor is malevolent, she could certainly do all sorts of things… are we assuming that the predictor is constrained in some way? Clearly, she can make mistakes, so that opens up her options to any kind of thing you like. In any case, your choice (by construction) is as stated: pay $100, or die.
You don’t see how the problem description preventing it is relevant?
The description doesn’t prevent malevolence, but it does prevent putting a bomb in left if the agent left-boxes.
FDT doesn’t insist on this at all. FDT recognizes that IF your decision procedure is modelled prior to your current decision, than you did in fact choose in advance. If an FDT’er playing Bomb doesn’t believe her decision procedure was being modelled this way, she wouldn’t take Left!
If and only if it is a feature of the scenario, then FDT recognizes it. FDT isn’t insisting the world to be a certain way. I wouldn’t be a proponent of it if it did.
If a model of you predicts that you will choose A, but in fact you can choose B, and want to choose B, and do choose B, then clearly the model was wrong. Thinking “the model says I will choose A, therefore I have to (???) choose A” is total nonsense.
(Is there some other way to interpret what you’re saying? I don’t see it.)
“Thinking “the model says I will choose A, therefore I have to (???) choose A” is total nonsense.”
I choose whatever I want, knowing that it means the predictor predicted that choice.
In Bomb, if I choose Left, the predictor will have predicted that (given subjunctive dependence). Yes, the predictor said it predicted Right in the problem description; but if I choose Left, that simply means the problem ran differently from the start. It means, starting from the beginning, the predictor predicts I will choose Left, doesn’t put a bomb in Left, doesn’t leave the “I predicted you will pick Right”-note (but maybe leaves a “I predicted you will pick Left”-note) , and then I indeed choose Left, letting me live for free.
If the model is in fact (near) perfect, then choosing B means the model chose B too. That may seem like changing the past, but it really isn’t, that’s just the confusing way these problems are set up.
Claiming you can choose something a (near) perfect model of you didn’t predict is like claiming two identical calculators can give a different answer to 2 + 2.
It is the case, in way. Otherwise the predictor could not have predicted your action. I’m not saying you actively decide what to do beforehand, but apparently you are running a predictable decision procedure.
I think the more fundamental issue is that you can construct these sorts of dilemmas for all decision theories. For example, you can easily come up with scenarios where Omega punishes you for following a certain decision theory and rewards you otherwise.
The right question to ask is not whether a decision theory recommends something that makes you burn to death in some scenario, but whether it recommends you do so across a broad class of fair dilemmas. I’m not convinced that FDT does that, and the bomb dilemma did not move me much.
You can of course construct scenarios where Omega punishes you for all sorts of things, but in the given case, FDT recommends a manifestly self-destructive action, in a circumstance where you’re entirely free to instead not take that action. Other decision theories do not do this (whatever their other faults may be).
But of course it is the right question. The given dilemma is perfectly fair. FDT recommends that you knowingly choose to burn to death, when you could instead not choose to burn to death, and incur no bad consequences thereby. This is a clear failure.
What makes the bomb dilemma seem unfair to me is the fact that it’s conditioning on an extremely unlikely event. The only way we blow up is if the predictor predicted incorrectly. But by assumption, the predictor is near-perfect. So it seems implausible that this outcome would ever happen.
Why is this unfair?
Look, I keep saying this, but it doesn’t seem to me like anyone’s really engaged with it, so I’ll try again:
If the scenario were “pick Left or Right; after you pick, then the boxes are opened and the contents revealed; due to [insert relevant causal mechanisms involving a predictor or whatever else here], the Left box should be empty; unfortunately, one time in a trillion trillion, there’ll be some chance mistake, and Left will turn out (after you’ve chosen it) to have a bomb, and you’ll blow up”…
… then FDT telling you to take Left would be perfectly reasonable. I mean, it’s a gamble, right? A gamble with an unambiguously positive expected outcome; a gamble you’ll end up winning in the utterly overwhelming majority of cases. Once in a trillion trillion times, you suffer a painful death—but hey, that’s better odds than each of us take every day when we cross the street on our way to the corner store. In that case, it would surely be unfair to say “hey, but in this extremely unlikely outcome, you end up burning to death!”.
But that’s not the scenario!
In the given scenario, we already know what the boxes have in them. They’re open; the contents are visible. We already know that Left has a bomb. We know, to a certainty, that choosing Left means we burn to death. It’s not a gamble with an overwhelming, astronomical likelihood of a good outcome, and only a microscopically tiny chance of painful death—instead, it’s knowingly choosing a certain death!
Yes, the predictor is near-perfect. But so what? In the given scenario, that’s no longer relevant! The predictor has already predicted, and its prediction has already been evaluated, and has already been observed to have erred! There’s no longer any reason at all to choose Left, and every reason not to choose Left.
And yet FDT still tells us to choose Left. This is a catastrophic failure; and what’s more, it’s an obvious failure, and a totally preventable one.
Now, again: it would be reasonable to say: “Fine, yes, FDT fails horribly in this very, very rare circumstance; this is clearly a terrible mistake. Yet other decision theories fail, at least this badly, or in far more common situations, or both, so FDT still comes out ahead, on net.”
But that’s not the claim in the OP; the claim is that, somehow, knowingly choosing a guaranteed painful death (when it would be trivial to avoid it) is the correct choice, in this scenario.
And that’s just crazy.
My updated defense of FDT, should you be interested.
Like I’ve said before, it’s not about which action to take, it’s about which strategy to have. It’s obvious right-boxing gives the most utility in this specific scenario only, but that’s not what it’s about.
Why? Why is it not about which action to take?
I reject this. If Right-boxing gives the most utility in this specific scenario, then you should Right-box in this specific scenario. Because that’s the scenario that—by construction—is actually happening to you.
In other scenarios, perhaps you should do other things. But in this scenario, Right is the right answer.
And this is the key point. It seems to me impossible to have a decision theory that right-boxes in Bomb but still does as well as FDT does in all other scenarios.
It’s about which strategy you should adhere to. The strategy of right-boxing loses you $100 virtually all the time.
If it’s about utility, then specify it in terms of utility, not death or dollars.
Utility is often measured in dollars. If I had created the Bomb scenario, I would have specified life/death in terms of dollars as well. Like, “Life is worth $1,000,000 to you.” That way, you can easily compare the loss of your life to the $100 cost of Right-boxing.
Yes, you keep saying this, and I still think you’re wrong. Our candidate decision theory has to recommend something for this scenario—and that recommendation gets picked up by the predictor beforehand. You have to take that into account. You seem to be extremely focused on this extremely unlikely scenario, which is odd to me.
How exactly is it preventable? I’m honestly asking. If you have a strategy that, if the agent commits to it before the predictor makes her prediction, does better than FDT, I’m all ears.
It’s preventable by taking the Right box. If you take Left, you burn to death. If you take Right, you don’t burn to death.
Totally, here it is:
FDT, except that if the predictor makes a mistake and there’s a bomb in the Left, take Right instead.
You seem to have misunderstood the problem statement [1]. If you commit to doing “FDT, except that if the predictor makes a mistake and there’s a bomb in the Left, take Right instead”, then you will almost surely have to pay $100 (since the predictor predicts that you will take Right), whereas if you commit to using pure FDT, then you will almost surely have to pay nothing (with a small chance of death). There really is no “strategy that, if the agent commits to it before the predictor makes her prediction, does better than FDT”.
[1] Which is fair enough, as it wasn’t actually specified correctly: the predictor is actually trying to predict whether you will take Left or Right if it leaves its helpful note, not in the general case. But this assumption has to be added, since otherwise FDT says to take Right.
It sounds like you’re saying that I correctly understood the problem statement as it was written (but it was written incorrectly); but that the post erroneously claims that in the scenario as (incorrectly) written, FDT says to take Left, when in fact FDT in that scenario-as-written says to take right. Do I understand you?
Yes.
Why? FDT isn’t influenced in its decision by the note, so there is no loss of subjunctive dependence when this assumption isn’t added. (Or so it seems to me: I am operating at the limits of my FDT-knowledge here.)
How would this work? Your strategy seems to be “Left-box unless the note says there’s a bomb in Left”. This ensures the predictor is right whether she puts a bomb in Left or not, and doesn’t optimize expected utility.
It doesn’t kill you in a case when you can choose not to be killed, though, and that’s the important thing.
It costs you p * $100 for 0 ⇐ p ⇐ 1 where p depends on how “mean” you believe the predictor is. Left-boxing costs 10^-24 * $1,000,000 = $10^-18 if you value life at a million dollars. Then if p > 10^-20, Left-boxing beats your strategy.
Why would I value my life finitely in this case? (Well, ever, really, but especially in this scenario…)
Also, were you operating under the life-has-infinite-value assumption all along? If so, then
You were incorrect about FDT’s decision in this specific problem
You should probably have mentioned you had this unusual assumption, so we could have resolved this discussion way earlier
Note that FDT Right-boxes when you give life infinite value.
What’s special in this scenario with regards to valuing life finitely?
If you always value life infinitely, it seems to me all actions you can ever take get infinite values, as there is always a chance you die, which makes decision making on basis of utility pointless.
Unfortunately, that doesn’t work. The predictor, if malevolent, could then easily make you choose right and pay a $100.
Left-boxing is the best strategy possible as far as I can tell. As in, yes, that extremely unlikely scenario where you burn to death sucks big time, but there is no better strategy possible (unless there is a superior strategy I—and it appears everybody—haven’t/hasn’t thought of).
If you commit to taking Left, then the predictor, if malevolent, can “mistakenly” “predict” that you’ll take Right, making you burn to death. Just like in the given scenario: “Whoops, a mistaken prediction! How unfortunate and improbable! Guess you have no choice but to kill yourself now, how sad…”
There absolutely is a better strategy: don’t knowingly choose to burn to death.
We know the error rate of the predictor, so this point is moot.
I still have to see a strategy incorporating this that doesn’t overall lose by losing utility in other scenarios.
How do we know it? If the predictor is malevolent, then it can “err” as much as it wants.
For the record, I read Nate’s comments again, and I now think of it like this:
To the extent that the predictor was accurate in her line of reasoning, then you left-boxing does NOT result in you slowly burning to death. It results in, well, the problem statement being wrong, because the following can’t all be true:
The predictor is accurate
The predictor predicts you right-box, and places the bomb in left
You left-box
And yes, apparently the predictor can be wrong, but I’d say, who even cares? The probability of the predictor being wrong is supposed to be virtually zero anyway (although as Nate notes, the problem description isn’t complete in that regard).
We know it because it is given in the problem description, which you violate if the predictor ‘can “err” as much as it wants’.
Although I strongly disagree with Achmiz on the Bomb scenario in general, here we agree: Bomb is perfectly fair. You just have to take the probabilities into account, after which—if we value life at, say, $1,000,000 - Left-boxing is the only correct strategy.
Well, it’s only unlikely if the agent left-boxes. If she right-boxes, the scenario is very likely.
I don’t think the problem itself is unfair—what’s unfair is saying FDT is wrong for left-boxing.
For the record: I completely agree with Said on this specific point. Bomb is a fair problem. Each decision theory entering this problem gets dealt the exact same hand.
No. Ironically, Bomb is an argument for FDT, not against it: for if I adhere to FDT, I will never* burn to death AND save myself $100 if I do face this predictor.
*never here means only 1 in 1 trillion trillion if you meet the predictor
If there is some nontrivial chance that the predictor is adversarial but constrained to be accurate and truthful (within the bounds given), then on the balance of probability people taking the right box upon seeing a note predicting right are worse off. Yes, it sucks thatyouin particular got screwed, but the chances of that were astronomically low.This shows up more obviously if you look at repeated iterations, compare performance of decision theories in large populations, or weight outcomes across possible worlds.Edit: The odds were not astronomically low. I misinterpreted the statement about Predictor’s accuracy to be stronger than it actually was. FDT recommends taking the right box, and paying $100.
No, because the scenario stipulates that you find yourself facing a Left box with a bomb. Anyone who finds themselves in this scenario is worse off taking Left than Right, because taking Left kills you painfully, and taking Right does no such thing. There is no question of any “balance of probability”.
But you didn’t “get screwed”! You have a choice! You can take Left, or Right.
Again: the scenario stipulates that taking Left kills you, and FDT agrees that taking Left kills you; and likewise it is stipulated (and FDT does not dispute) that you can indeed take whichever box you like.
All of that is completely irrelevant, because in the actual world that you (the agent in the scenario) find yourself in, you can either burn to death, or not. It’s completely up to you. You don’t have to do what FDT says to do, regardless of what happens in any other possible worlds or counterfactuals or what have you.
It really seems to me like anyone who takes Left in the “Bomb” scenario is making almost exactly the same mistake as people who two-box in the classic Newcomb’s problem. Most of the point of “Newcomb’s Problem and Regret of Rationality” is that you don’t have to, and shouldn’t, do things like this.
But actually, it’s a much worse mistake! In the Newcomb case, there’s a disagreement about whether one-boxing can actually somehow cause there to be a million dollars in the box; CDT denies this possibility (because it takes no account of sufficiently accurate predictors), while timeless/logical/functional/whatever decision theories accept it. But here, there is no disagreement at all; FDT admits that choosing Left causes you to die painfully, but says you should do it anyway! That is obviously much worse.
The other point of “Newcomb’s Problem and Regret of Rationality” is that it is a huge mistake to redefine losing (such as, say, burning to death) as winning. That, also, seems like a mistake that’s being made here.
I don’t see that there’s any way of rescuing this result.
According to me, the correct rejoinder to Will is: I have confidently asserted that X is false for X whose probabliity I assign much greater probability than 1 in a trillion trillion, and so I hereby confidently assert that no, I do not see the bomb on the left. You see the bomb on the left, and lose $100. I see no bombs, and lose $0.
I can already hear the peanut gallery objecting that we can increase the fallibility of the predictor to reasonable numbers and I’d still take the bomb, so before we go further, let’s all agree that sometimes you’re faced with uncertainty, and the move that is best given your uncertainty is not the same as the move that is best given perfect knowledge. For example, suppose there are three games (“lowball”, “highball”, and “extremeball”) that work as follows. In each game, I have three actions—low, middle, and high. In the lowball game, my payouts are $5, $4, and $0 respectively. In the highball game, my payouts are $0, $4, and $5 respectively. In the extremeball game, my payouts are $5, $4, and $5 respectively. Now suppose that the real game I’m facing is that one of these games is chosen at uniform random by unobserved die roll. What action should I choose? Clearly ‘middle’, with an expected utility of $4 (compared to $3.33 for either ‘low’ or ‘high’). And when I do choose middle, I hope we can all agree that it’s foul play to say “you fool, you should have chosen low because the game is lowball”, or “you fool, there is no possible world in which that’s the best action”, or “you idiot, that’s literally the worst available action because the game was exrtemeball”. If I knew which game I was playing, I’d play the best move for that game. But insofar as I must enter a single action played against the whole mixture of games, I might have to choose something that’s not the best action in your favorite subgame.
With that in mind, we can now decompose Will’s problem with the bomb into two subgames that I’m bound to play simultaneously.
In one subgame (that happens with probabliity 2 in a trillion trillion, although feel free to assume it’s more likely than that), the predictor is stumped and guesses randomly. We all agree that in that subgame, the best action is to avoid the bomb.
In the other subgame, the predictor is accurate. And insofar as the predictor is accurate, and supposing that we’ve seen the bomb, we can consider our two available actions (taking the bomb, or spending $100 to avoid it). But here something curious happens: insofar as the predictor is accurate, and we see the bomb, and we take it, we refute the problem statement.
That’s not my problem, as an agent. That’s your problem, as the person who stated the problem. If you say “assume you’re facing a perfect predictor, and see the bomb”, then I can validly say “no”. Similar to how if you say “assume you’re going to take the $5 bill, and you can either take the $5 bill or the $10 bill, but if you violate the laws of logic then you get a $100 fine, what do you do?” I can validly say “no”. It’s not my fault that you named a decision problem whose premises I can flatly refute.
Hopefully we all agree that insofar as the predictor is perfect (which, remember, is a case in the case analysis when the predictor is falible), the problem statement here is deeply flawed, because I can by an action of mine refute it outright. The standard rejoinder is a bit of sleight-of-hand, where the person posing the problem says “ah, but the predictor is fallible”. But as we’ve already seen, I can just decompose it right back into two subproblems that we then aggregate across (much like the higball/lowball/extremeball case), at which point one of our case-analyses reveals that insofar as the predictor is accurate, the whole problem-statement is still flawed.
And this isn’t me saying “I wish to be evaluated from an epistemic vantage point that takes into account the other imaginary branches of reality”. This is me saying, your problem statement was wrong. It’s me pointing out that you’re a liar, or at least that I can by a clever choice of actions render you a liar. When you say “the predictor was accurate and you saw the bomb, what do you do?”, and I say “take the bomb”, I don’t get blown up, I reveal your mistake. Your problem statement is indeterminate. You shouldn’ta given me a problem I could refute. I’m not saying “there’s other hypothetical branches of reality that benefit from me taking this bomb”, I’m saying “WRONG, tell me what really happened”. Your story was false, my dude.
There’s some question of what to do when an obviously ill-formed game is mixed in with a properly-formed game, by, eg, adding some uncertainty about whether the predictor is fallible. Like, how are we supposed to analyze games comprising subgames where the problem statement can be refuted in one subgame but not others? And according to me, the obvious answer is that if you say “you are 1% playing problem A and 99% playing problem B”, and if I can by some act refute that I’m playing problem B, then I am perfectly licensed in saying “WRONG (99%)”. Mixing in a little uncertainty (or even a lot of uncertainty!) doesn’t stop you from being wrong (at my will) in the cases where you’re asserting falsehoods about my actions.
So, no, I don’t see the bomb. That’s all but 1 in a trillion trillion ‘WRONG’. What really happened, according to this problem, is—well, it’s still indeterminate, because Will’s problem statement isn’t complete. He doesn’t tell us what happens insofar as the predictor is accurate and I go left if and only if I see the bomb, ie if I always spite the predictor’s prediction. If the predictor would give me a 1 trillion dollar coin for my ingenuity in spiting them, then what really happens is I see a $1T coin (all but 1 in a trillion trillion times). If instead (as is the oral tradition when someone leaves their decision problem indeterminate) counterfactually-spiting the predictor causes me to find myself in a room full of hornets rather than exits, then what really happened is that I saw no bomb (and no hornets), almost certainly.
If you want me to stop denying your problem-statements outright, you’ve gotta stop giving me problem statements that I can (probabilistically) refute by my actions.
Thanks, this comment thread was pretty helpful.
After reading your comments, here’s my current explanation of what’s up with the bomb argument:
Then I’m a bit confused about how to estimate that probability, but I suspect the reasoning goes like this:
Sanity check
As a sanity-check, I note this implies that if the utilities-times-probabilities are different, I would not mind taking the $100 hit. Let’s see what the math says here, and then check whether my intuitions agree.
Suppose I value my life at $1 million. Then I think that I should become more indifferent here when the probability of a mistaken simulation approaches 1 in 100,000, or where the money on the line is closer to $10−17.
[You can skip this, but here’s me stating the two multiplications I compared:
World 1: I fake-kill myself to save $X, with probability 110
World 2: I actually kill myself (cost: $1MM), with probability 1Y
To find the indifference point I want the two multiplications of utility-to-probability to come out to be equal. If X = $100, then Y equals 100,000. If Y is a trillion trillion (1024), then X = 10−17. (Unless I did the math wrong.)]
I think this doesn’t obviously clash with my intuitions, and somewhat matches them.
If the simulator was getting things wrong 1 in 100,000 times, I think I’d be more careful with my life in the “real world case” (insofar as that is a sensible concept). Going further, if you told me they were wrong 1 in 10 times, this would change my action, so there’s got to be a tipping point somewhere, and this seems reasonable for many people (though I actually value my life at more than $1MM).
And if the money was that tiny ($10−17), I’d be fairly open to “not taking even the one-in-a-trillion-trillion chance”. (Though really my intuition is that I don’t care about money way before $10^-17, and would probably not risk anything serious starting at like 0.1 cents, because that sort of money seems kind of irritating to have to deal with. So my intuition doesn’t match perfectly here. Though I think that if I were expecting to play trillions of such games, then I would start to actively care about such tiny amounts of money.)
Whether the predictor is accurate isn’t specified in the problem statement, and indeed can’t be specified in the problem statement (lest the scenario be incoherent, or posit impossible epistemic states of the agent being tested). What is specified is what existing knowledge you, the agent, have about the predictor’s accuracy, and what you observe in the given situation (from which you can perhaps infer additional things about the predictor, but that’s up to you).
In other words, the scenario is: as per the information you have, so far, the predictor has predicted 1 trillion trillion times, and been wrong once (or, some multiple of those numbers—predicted 2 trillion trillion times and been wrong twice, etc.).
You now observe the given situation (note predicting Right, bomb in Left, etc.). What do you do?
Now, we might ask: but is the predictor perfect? How perfect is she? Well… you know that she’s erred once in a trillion trillion times so far—ah, no, make that twice in a trillion trillion times, as of this iteration you now find yourself in. That’s the information you have at your disposal. What can you conclude from that? That’s up to you.
Likewise, you say:
The problem statement absolutely is complete. It asks what you would/should do in the given scenario. There is no need to specify what “would” happen in other (counterfactual) scenarios, because you (the agent) do not observe those scenarios. There’s also no question of what would happen if you “always spite the predictor’s prediction”, because there is no “always”; there’s just the given situation, where we know what happens if you choose Left: you burn to death.
You can certainly say “this scenario has very low probability”. That is reasonable. What you can’t say is “this scenario is logically impossible”, or any such thing. There’s no impossibility or incoherence here.
It’s not complete enough to determine what I do when I don’t see a bomb. And so when the problem statement is corrected to stop flatly asserting consequences of my actions as if they’re facts, you’ll find that my behavior in the corrected problem is underdefined. (If this still isn’t clear, try working out what the predictor does to the agent that takes the bomb if it’s present, but pays the $100 if it isn’t.)
And if we’re really technical, it’s not actually quite complete enough to determine what I do when I see the bomb. That depends on what the predictor does when there are two consistent possible outcomes. Like, if I would go left when there was no bomb and right when there was a bomb, what would the predictor do? If they only place the bomb if I insist on going right when there’s no bomb, then I have no incentive to go left upon seeing a bomb insofar as they’re accurate. To force me to go left, the predictor has to be trigger-happy, dropping the bomb given the slightest opportunity.
Nitpick: you seem to be trying to say something like “let us sidestep questions of predictor mechanics and assume that your only observations were the game-history”. But that doesn’t work. If I wake up in a room with a bomb about to go off in the left exit and a $100 fee to use the right exit, I pay the $100 fee and use the right exit. I have to see a host of other specific observations to start hypothesizing the existence of a predictor who would stop placing bombs if-counterfactually I took them, before I believe I’m in this problem.
I’m happy to leave unspecified the prior knowledge that causes us to believe there’s a predictor, but I object to the claim that we’ve fully specified my epistemic state by letting me see the problem history.
I’m not going to ask whether the predictor is perfect, no. I’m going to assume she isn’t, and then do some calculations, and then assume she is, and do some other calculations. Because it’s often useful to solve decision problems using case-by-case analysis.
And in the second case, your problem statement is revealed to be a lie.
Now, it’s perfectly possible for you to rephrase the problem such that my analysis does not, in some cases, reveal it to be a lie. You could say “the predictor is on the fritz, and you see the bomb” (in which case I’ll go right, thanks). Or you could say “the predictor is 1% on the fritz and 99% accurate, and in the 1% case you see a bomb and in the 99% case you may or may not see a bomb depending what you do”. That’s also fine. But insofar as you flatly postulate a (almost certain) consequence of my action, be prepared for me to assert that you’re (almost certainly) wrong.
There’s impossibliity here precisely insofar as the predictor is accurate.
One intuition pump is: You say “but if-counterfactually you see the bomb on the left, and you take it, you die”. I protest “no, if-counterfactually I see the bomb on the left, then I die insofar as the predictor was on the fritz, and I get a puppy insofar as the predictor was accurate, by the principle of explosion. So I 1-in-a-hundred-trillion-trillion% die, and almost certanily get a puppy. Win.” Like, from my perspective, trying to analyze outcomes under assumptions that depend on my own actions can lead to nonsense.
(My model of a critic says “but surely you can see the intuition that, when you’re actually faced with the bomb, you actually shouldn’t take it?”. And ofc I have that intuition, as a human, I just think it lost in a neutral and fair arbitration process, but that’s a story for a different day. Speaking from the intuition that won, I’d say that it sure would be sad to take the bomb in real life, and that in real life you should spare no expense searching for the evidence that the predictor is on the fritz, which almost certainly would exist and let you avoid the bomb iff the predictor is on the fritz in real life, and if you run your real life right you should never once actually eat a bomb, but also yes, the correct intuition is that you take the bomb knowing it kills you, and that you roll your eyes at the person saying it was a bad move for roughly the same reason you roll your eyes at the person saying that ‘middle’ is never your best option in the high/low/extremeball game—such is the fate of people who are bound to submit a single action to be played against a mixture of games.)
Separately, I note that if you think an agent should behave very differently when a possibility is logically impossible, vs when they assess a probability on the order of 1 in a trillion trillion, then I suspect you’re doing something wrong. (Not least that you’re likely wrong about how much confidence a human can get in any particular logical fact.)
To check, do we have a disagreement about what to do insofar as the predictor is accurate? Like, in the case of the transparent Newcomb’s problem with a literally perfect predictor, would you one-box even if the box were empty?
I don’t understand this objection. The given scenario is that you do see a bomb. The question is: what do you do in the given scenario?
You are welcome to imagine any other scenarios you like, or talk about counterfactuals or what have you. But the scenario, as given, tells you that you know certain things and observe certain things. The scenario does not appear to be in any way impossible. “What do I do when I don’t see a bomb” seems irrelevant to the question, which posits that you do see a bomb.
Er, what? If you take the bomb, you burn to death. Given the scenario, that’s a fact. How can it not be a fact? (Except if the bomb happens to malfunction, or some such thing, which I assume is not what you mean…?)
Well, let’s see. The problem says:
So, if the predictor predicts that I will choose Right, she will put a bomb in Left, in which case I will choose Left. If she predicts that I will choose Left, then she puts no bomb in Left, in which case I will choose Right. This appears to be paradoxical, but that seems to me to be the predictor’s fault (for making an unconditional prediction of the behavior of an agent that will certainly condition its behavior on the prediction), and thus the predictor’s problem.
I… don’t see what bearing this has on the disagreement, though.
What I am saying is that we don’t have access to “questions of predictor mechanics”, only to the agent’s knowledge of “predictor mechanics”. In other words, we’ve fully specified your epistemic state by specifying your epistemic state—that’s all. I don’t know what you mean by calling it “the problem history”. There’s nothing odd about knowing (to some degree of certainty) that certain things have happened. You know there’s a (supposed) predictor, you know that she has (apparently) made such-and-such predictions, this many times, with these-and-such outcomes, etc. What are her “mechanics”? Well, you’re welcome to draw any conclusions about that from what you know about what’s gone before. Again, there is nothing unusual going on here.
I’m sorry, but this really makes very little sense to me. We know the predictor is not quite perfect. That’s one of the things we’re told as a known fact. She’s pretty close to perfect, but not quite. There’s just the one “case” here, and that’s it…
Indeed, the entire “your problem statement is revealed to be a lie” approach seems to me to be nonsensical. This is the given scenario. If you assume that the predictor is perfect and, from that, conclude that the scenario couldn’t’ve come to pass, doesn’t that entail that, by construction, the predictor isn’t perfect (proof by contradiction, as it were)? But then… we already know she isn’t perfect, so what was the point of assuming otherwise?
Well, we know she’s at least slightly inaccurate… obviously, because she got it wrong this time! We know she can get it wrong, because, by construction, she did get it wrong. (Also, of course, we know she’s gotten it wrong before, but that merely overdetermines the conclusion that she’s not quite perfect.)
Well, no. In the given scenario, you do see the bomb on the left. There’s no counterfactuals involved.
But you’re not bound to submit a single action to be played against a mixture of games. In the given scenario, you can choose your action knowing which game you’re playing!
… or, you could just… choose Right. That seems to me to be a clear win.
If an agent finds themselves in a scenario that they think is logically impossible, then they’re obviously wrong about that scenario being logically impossible, as demonstrated by the fact that actually, it did come to pass. So finding oneself in such a scenario should cause one to immediately re-examine one’s beliefs on what is, and is not, logically impossible, and/or one’s beliefs about what scenario one finds oneself in…
How would I come to the belief that the predictor is literally perfect (or even mostly perfect)? Wouldn’t that just look like a string of observations of cases where either (a) there’s money in the one box and the agent takes the one box full of money, or (b) there’s no money in the one box and the agent takes the two boxes (and gets only the lesser quantity)? How exactly would I distinguish between the “perfect predictor” hypothesis and the “people take the one box if it has a million dollars, two boxes otherwise” hypothesis, as an explanation for those observations? (Or, to put it another way, what do I think happens to people who take an empty one box? Well, I can’t have any information about that, right? [Beyond the common sense of “obviously, they get zilch”…] Because if that had ever happened before, well, there goes the “perfect predictor” hypothesis, yes?)
The scenario says “the predictor is likely to be accurate” and then makes an assertion that is (for me, at least) false insofar as the predictor is accurate. You can’t have it both ways. The problem statement (at least partially) contradicts itself. You and I have a disagreement about how to evaluate counterfactuals in cases where the problem statement is partly-self-contradictory.
Sure, it’s the predictor’s problem, and the behavior that I expect of the predictor in the case that I force them to notice they have a problem has a direct effect on what I do if I don’t see a bomb. In particular, if they reward me for showing them their problem, then I’d go right when I see no bomb, whereas if they’d smite me, then I wouldn’t. But you’re right that this is inconsequential when I do see the bomb.
Well for one thing, I just looked and there’s no bomb on the left, so the whole discussion is counter-to-fact. And for another, if I pretend I’m in the scenario, then I choose my action by visualizing the (counterfactual) consequences of taking the bomb, and visualizing the (counterfactual) consequences of refraining. So there are plenty of counterfactuals involved.
I assert that the (counterfactual) consequences of taking the bomb include (almost certainly) rendering the whole scenario impossible, and rendering some other hypothetical (that I don’t need to pay $100 to leave) possible instead. And so according to me, the correct response to someone saying “assume you see the bomb” is to say “no, I shall assume that I see no bomb instead”, because that’s the consequence I visualize of (counterfactually) taking the bomb.
You’re welcome to test it empirically (well, maybe after adding at least $1k to all outcomes to incentivise me to play), if you have an all-but-one-in-a-trillion-trillion accurate predictor-of-me lying around. (I expect your empericism will prove me right, in that the counterfactuals where it shows me a bomb are all in fact rendered impossible, and what happens in reality instead is that I get to leave with $900).
...Also, just to be clear, you’re aware that these are two different internally-consistent but contradictory theories of how to evaluate counterfactuals? Like, we can be pretty confident that there’s no argument a CDT agent can hear that causes them to wish to adopt FDT counterfactuals, and vice versa. Humans come equipped with both intuitions (I can give you other problems that pump the other intuitions, if you’d like), and we have to find some other way to arbitrate the conflict.
(Currently I’m in a mode of attempting to convey the second set of intuitions to you in this one problem, in light of your self-professed difficulty grasping them. Feel free to demonstrate understanding of the second set of intuitions and request that we instead switch to discussing why I think the latter wins in arbitration.)
I’m asking you to do a case-analysis. For concreteness, suppose you get to inspect the predictor’s computing substrate and source code, and you do a bunch of pondering and thinking, and you’re like “hmm, yes, after lots of careful consideration, I now think that I am 90% likely to be facing a transparent Newcomb’s problem, and 10% likely to be in some other situation, eg, where the code doesn’t work how I think it does, or cosmic rays hit the computer, or what-have-you”. Now we agree that one possible thing that could have happened, is that the predictor accurately predicted you in the past, and we can freely discuss what you should do insofar as that’s the case.
Like, presumably when I present you with the high/low/extremeball game, you have no problem granting statements like “insofar as the die came up ‘highball’, I should choose ‘high’”. Saying “but how could I possibly have come to the epistemic state where I was 100% certain that the die came up ‘highball’???” is a nonsequitur. Your analysis in the face of uncertainty can be decomposed into an aggregation of your analyses in each scenario that might obtain. And so I ask again, insofar as the predictor accurately predicted you—which is surely one of many possible cases to analyze, after examining the mind of the entity that sure looks like it predicted you—what action would you prescribe?
Well… no, the scenario says “the predictor has predicted correctly 1 trillion trillion minus one times, and incorrectly one time”. Does that make it “likely to be accurate”? You tell me, I guess, but that seems like an unnecessarily vague characterization of a precise description.
What do you mean by this? What’s contradictory about the predictor making a mistake? Clearly, it’s not perfect. We know this because it made at least one mistake in the past, and then another mistake just now.
Is the predictor “accurate”? Well, it’s approximately as accurate as it takes to guess 1 trillion trillion times and only be wrong once…
I confess that this reads like moon logic to me. It’s possible that there’s something fundamental I don’t understand about what you’re saying.
I am not familiar with this, no. If you have explanatory material / intuition pumps / etc. to illustrate this, I’d certainly appreciate it!
I am not asking how I could come to believe the “literally perfect predictor” thing with 100% certainty; I am asking how I could come to believe it at all (with, let’s say, > 50% certainty).
Hold on, hold on. Are we talking about repeated plays of the same game? Where I face the same situation repeatedly? Or are we talking about observing (or learning about) the predictor playing the game with other people before me?
The “Bomb” scenario described in the OP says nothing about repeated play. If that’s an assumption you’re introducing, I think it needs to be made explicit…
I deny that we have a precise description. If you listed out a specific trillion trillion observations that I allegedly made, then we could talk about whether those particular observations justify thinking that we’re in the game with the bomb. (If those trillion trillion observations were all from me waking up in a strange room and interacting with it, with no other context, then as noted above, I would have no reason to believe I’m in this game as opposed to any variety of other games consistent with those observations.) The scenario vaguely alleges that we think we’re facing an accurate predictor, and then alleges that their observed failure rate (on an unspecified history against unspecified players) is 1 per trillion-trillion. It does not say how or why we got into the epistemic state of thinking that there’s an accurate predictor there; we assume this by fiat.
(To be clear, I’m fine with assuming this by fiat. I’m simply arguing that your reluctance to analyze the problem by cases seems strange and likely erroneous to me.)
The canonical explanatory text is the FDT paper (PDF warning) (that the OP is responding to a critique of, iirc), and there’s a bunch of literature on LW (maybe start at the wiki page on UDT? Hopefully we have one of those) exploring various intuitions. If you’re not familiar with this style of logic, I recommend starting there (ah look we do have a UDT wiki page). I might write up some fresh intuition pumps later, to try to improve the exposition. (We’ve sure got a lot of exposition if you dig through the archives, but I think there are still a bunch of gaps.)
Uh, I mean, we could play a version of this game where the sums were positive (such that you’d want to play) and smaller (such that I could fund the prizes), and then I could say “yo Said Achmiz, I decided to act as a predictor of you in this game, I’m about to show you either a box with $10 and a box with $1, or an empty box and a box with $1, depending on how I predicted you’ll behave”. And then I think that at least the other readers on this post would predict that I have >50% probability of predicting your behavior correctly. You haven’t exactly been subtle about how you make decisions.
Predicting someone accurately with >50% likelihood isn’t all that hard. A coin predicts you “perfectly accurately” 50% of the time, you’ve only gotta do slightly better than that.
I suspect you’re confused here in putting “perfect prediction” on some big pedestal. the predictor is trying to reason about you, and they either reason basically-correctly, or their reasoning includes some mistake. And insofar as they reason validly to a conclusion about the right person, they’re going to get the answer right.
And again, note that you’re not exactly a tricky case yourself. You’re kinda broadcasting your decision-making procedure all over the internet. You don’t have to be a supermind studying a brainscan to figure out that Said Achmiz takes the extra $1.
My girlfriend often reasons accurately about whether I’m getting dinner tonight. Sometimes she reasons wrongly about it (eg, she neglects that I had a late lunch) but gets the right answer by chance anyway. Sometimes she reasons wrongly about it and gets the wrong answer by chance. But often she reasons correctly about it, and gets the right answer for the right reasons. And if I’m standing in the supermarket wondering whether she already ate or whether I should buy enough cheese to make her some food too, I have no trouble thinking “well, insofar as she got the right answer for the right reasons tonight, she knew I’d be hungry when I got home, and so she hasn’t eaten yet”. None of this case-analysis requires me to believe she’s a supermind who studied a scan of my brain for twelve thousand years. Believing that she was right about me for the right reasons is just not a very difficult epistemic state to enter. My reasons for doing most of what I do just aren’t all that complicated. It’s often not all that tricky to draw the right conclusions about what people do for the right reasons.
Furthermore, I note that in the high/low/extremeball game, I have no trouble answering questions about what I should do insofar as the game is highball, even though I have <50% (and in fact ~33%) probability on that being the case. In fact, I could analyze payouts insofar as the game is highball, even if the game was only 0.1% likely to be highball! In fact, the probability of me actually believing I face a given decision problem is almost irrelevant to my ability to analyze the payouts of different actions. As evidenced in part by the fact that I am prescribing actions in this incredibly implausible scenario with bombs and predictors.
And so in this wild hypothetical situation where we’ve somehow (by fiat) become convinced that there’s an ancient predictor predicting our actions (which is way weirder than my girlfriend predicting my action, tbh), presumably one of our hypotheses for what’s going on is that the predictor correctly deduced what we would do, for the correct reasons. Like, that’s one of many hypotheses that could explain our past observations. And so, assuming that hypothesis, for the purpose of analysis—and recalling that I can analyze which action is best assuming the game is highball, without making any claim about how easy or hard it is to figure out whether the game is highball—assuming that hypothesis, what action would you prescribe?
Let’s be more precise, then, and speak in terms of “correctness” rather than “accuracy”. There are then two possibilities in the “bomb” scenario as stipulated:
The predictor thought I would take the right box, and was correct.
The predictor thought I would take the right box, and was incorrect.
Now, note the following interesting property of the above two possibilities: I get to choose which of them is realized. I cannot change what the predictor thought, nor can I change its actions conditional on its prediction (which in the stipulated case involves placing a bomb in the left box), but I can choose to make its prediction correct or incorrect, depending on whether I take the box it predicted I would take.
Observe now the following interesting corollary of the above argument: it implies the existence of a certain strategy, which we might call “ObeyBot”, which always chooses the action that confirms the predictor’s prediction, and does so regardless of payoffs: by hypothesis, ObeyBot does not care about burning to death, nor does it care about losing $100; it cares only for making the predictor right as often as it can.
Now, suppose I were to tell you that the predictor’s stipulated track record in this game (10^24 − 1 successes, and 1 failure) were achieved against a population consisting (almost) entirely of ObeyBots. What would you make of this claim?
It should be quite obvious that there is only one conclusion you can draw: the predictor’s track record establishes absolutely nothing about its predictive capabilities in general. When faced with a population of agents that always do whatever the predictor says they will do, any “predictor”—from the superintelligent Omega to a random number generator—will achieve an accuracy of 100%.
This is an extremely important observation! What it shows is that the predictor’s claimed track record can mean very different things depending on what kind of agents it was playing against… which in turn means that the problem as stated is underdetermined: knowledge of the predictor’s track record does not by itself suffice to pin down the predictor’s actual behavior, unless we are also given information about what kind of agents the predictor was playing against.
Now, obviously whoever came up with the “bomb” scenario presumably did not want us to assume that their predictor is only accurate against ObeyBots and no one else. But to capture this notion requires more than merely stating the predictor’s track record; it requires the further supposition that the predictor’s accuracy is a persistent feature of the predictor, rather than of the agents it was playing against. It requires, in short, the very thing you criticized as an “unnecessarily vague characterization”: the statement that the predictor is, in fact, “likely to be accurate”.
With this as a prerequisite, we are now equipped to address your next question:
(First, a brief recap: we are asked to assume the existence of a predictor with an extremely high level of predictive accuracy. Moreover, although the problem statement did not actually specify, it is likely we are meant to assume that this predictor’s accuracy persists across a large reference class of possible agent designs; specifically—and crucially—this reference class must be large enough to include our own decision algorithm, whatever it may be.)
Now, with this in mind, your (Said’s) question is: what could possibly be contradictory about this scenario? What could be contradictory about a scenario that asks us to assume the existence of a predictor that can accurately predict the behavior of a large reference class of agents, including ourselves?
At this point, it should be quite obvious where the potential contradiction lies: it arises directly from the fact that an agent who finds itself in the described scenario can choose to make the predictor right or wrong. Earlier, I exploited this property to construct a strategy (ObeyBot) which makes the predictor right no matter what; but now I will do the opposite, and construct a strategy that always chooses whichever action falsifies the predictor’s prediction. For thematic appropriateness, let us call this strategy “SpiteBot”.
I assert that it is impossible for any predictor to achieve any accuracy against SpiteBot higher than 0%. Therefore, if we assume that I myself am a SpiteBot, the contradiction becomes immediately clear: it is impossible for the predictor to achieve the stipulated predictive accuracy across any reference class of agents broad enough to include SpiteBot, and yet in order for the predictor’s reference class to include me, it must include SpiteBot, since I am a SpiteBot. Contradiction!
Of course, this only establishes that the problem statement is contradictory if the reader happens to be a SpiteBot. And while that might be somewhat amusing to imagine, it is unlikely that any of us (the intended audience of the problem) are SpiteBots, or use any decision theory akin to SpiteBot. So let us ask: are there any other decision theories that manage to achieve a similar effect—any other decision theories that, under at least some subset of circumstances, turn into SpiteBot, thereby rendering the problem formulation contradictory?
Asked that way, the question answers itself: any decision theory that behaves like SpiteBot at least some of the time, will generate contradictions with the problem statement at least some of the time. And (here is the key point) if those contradictions are to avoided, the hypothesized predictor must avoid whatever set of circumstances triggers the SpiteBot-like behavior, lest it falsify its own alleged predictive accuracy… which implies the following corollaries:
The predictor cannot show its face (or its boxes) before SpiteBot. Any native SpiteBots in this universe will therefore never find themselves in this particular “bomb” scenario, nor any relatives.
Any strategies that behave like SpiteBot some of the time, will not encounter any “bomb”-like scenarios during that time. For example, the strategy “Choose your actions normally except on Wednesdays; behave like SpiteBot on Wednesdays” will not encounter any bomb-like scenarios on Wednesday.
If you don’t like the idea of ending up in a “bomb”-like scenario, you should precommit to behaving like SpiteBot in the face of such a scenario. If you actually do this properly, you will never see a real predictor give you a bomb-like scenario, which means that the only time you will be faced with any kind of bomb-like scenario will be in your imagination, perhaps at the prompting of some inquisitive philosopher. At this point you are free to ask the philosopher how his stipulated predictor reacts to SpiteBot and SpiteBot-like strategies, and—if he fails to give you a convincing answer—you are free to dismiss his premise as contradictory, and his scenario as something you don’t have to worry about.
(The general case of “behaving like SpiteBot when confronted with situations you’d really rather not have encountered to begin with” is called “fuck-you decision theory”, or FDT for short.)
Yes, it was this exact objection that I addressed in my previous replies that relied upon a misreading of the problem. I missed that the boxes were open and thought that the only clue to the prediction was the note that was left.
The only solution was to assume that the predictor does not always leave a note, and this solution also works for the stated scenario. You see that the boxes are open, and the left one contains a bomb, but did everyone else? Did anyone else? The problem setup doesn’t say.
This sort of vagueness leaves holes big enough to drive a truck through. The stated FDT support for picking Left depends absolutely critically on the subjunctive dependency odds being at least many millions to one, and the stated evidence is nowhere near strong enough to support that. Failing that, FDT recommends picking Right.
So the whole scenario is pointless. It doesn’t explore what it was intended to explore.
You can modify the problem to say that the predictor really is that reliable for every agent, but doesn’t always leave the boxes open for you or write a note. This doesn’t mean that the predictor is perfectly reliable, so a SpiteBot can still face this scenario but is just extremely unlikely to.
There IS a question of what would happen if you “always spite the predictor’s prediction”, since doing so seems to make the 1 in a trillion trillion error rate impossible.
To be clear, FDT does not accept causation that happens backwards in time. It’s not claiming that the action of one-boxing itself causes there to be a million dollars in the box. It’s the agent’s algorithm, and, further down the causal diagram, Omega’s simulation of this algorithm that causes the million dollars. The causation happens before the prediction and is nothing special in that sense.
Yes, sure. Indeed we don’t need to accept causation of any kind, in any temporal direction. We can simply observe that one-boxers get a million dollars, and two-boxers do not. (In fact, even if we accept shminux’s model, this changes nothing about what the correct choice is.)
Eh? This kind of reasoning leads to failing to smoke on Smoking Lesion.
The main point of FDT is that it gives the optimal expected utilityon averagefor agents using it. It does not guarantee optimal expected utility forevery instanceof an agent using it.Suppose you have a population of two billion agents, each going through this scenario every day. Upon seeing a note predicting right, one billion would pick left and one billion would pick right. We can assume that they all pick left if they see a note predicting left or no note at all.Every year, the Right agents essentially always see a note predicting right, and pay more than $30000 each. The Left agents essentially always see a note predicting left (or no note) and pay $0 each.The average rate of deaths is comparable: one death per few trillion years in each group, which is to say, essentially never. They all know that itcouldhappen, of course.Which group is better off?Edit: I misread Predictor’s accuracy. It does not say that it is in all scenarios 1 − 10^-24, just that in some unknown sample of scenarios, it was 1 − 10^-24. This changes the odds so much that FDT does not recommend taking the left box.
Obviously, the group that’s better off is the third group: the one that picks Left if there’s no bomb in there, Right otherwise.
… I mean, seriously, what the heck? The scenario specifies that the boxes are open! You can see what’s in there! How is this even a question?
(Bonus question: what will the predictor say about the behavior of this third group? What choice will she predict a member of this group will make?)
Two questions, if I may:
Why do you read it this way? The problem simply states the failure rate is 1 in a trillion trillion.
If we go with your interpretation, why exactly does that change things? It seems to me that the sample size would have to be extemely huge in order to determine a failure rate that low.
It depends upon what the meaning of the word “is” is:
The failure rate has been tested over an immense number of prediction, and evaluated as 10^-24 (to one significant figure). That is the currently accepted estimate for the predictor’s error rate for scenarios randomly selected from the sample.
The failure rate is theoretically 10^-24, over some assumed distribution of agent types. Your decision model may or may not appear anywhere in this distribution.
The failure rate is bounded above by 10^-24 for every possible scenario.
A self-harming agent in this scenario cannot be consistently predicted by Predictor at all (success rate 0%), so we know that (3) is definitely false.
(1) and (2) aren’t strong enough, because it gives little information about Predictor’s error rate concerning your scenario and your decision model.
We have essentially zero information about Predictor’s true error bounds regarding agents that sometimes carry out self-harming actions. In order to recommend taking the left box, an FDT agent is one that sometimes carries out self-harming actions, though this requires that the upper bound on Predictor’s failure of subjunctive dependency is less than the ratio of the utilities of: paying $100, and burning to death all intelligent life in the universe.
We do not have anywhere near enough information to justify that tight a bound. So FDT can’t recommend such an action. Maybe someone else can write a scenario that is in similar spirit, but isn’t so flawed.
Thanks, I appreciate this. Your answer clarifies a lot, and I will think about it more.
Another way of phrasing it: you don’t get the $100 marginal payoff if you’renot preparedto knowingly go to your death in the incredibly unlikely event of a particular type of misprediction.That’s the sense in which I meant “you got screwed”. You entered the scenario knowing that it was incredibly unlikely that you would die regardless of what you decide, but wereprepared to acceptthat incredibly microscopic chance of death in exchange for keeping your $100. The odds just went against you.Edit: If Predictor’s actual bound on error rate was 10^-24, this would be valid. However, Predictor’s bound on error rate cannot be 10^-24 in all scenarios, so this is all irrelevant. What a waste of time.