If there is some nontrivial chance that the predictor is adversarial but constrained to be accurate and truthful (within the bounds given), then on the balance of probability people taking the right box upon seeing a note predicting right are worse off. Yes, it sucks that you in particular got screwed, but the chances of that were astronomically low.
This shows up more obviously if you look at repeated iterations, compare performance of decision theories in large populations, or weight outcomes across possible worlds.
Edit: The odds were not astronomically low. I misinterpreted the statement about Predictor’s accuracy to be stronger than it actually was. FDT recommends taking the right box, and paying $100.
on the balance of probability people taking the right box upon seeing a note predicting right are worse off
No, because the scenario stipulates that you find yourself facing a Left box with a bomb. Anyone who finds themselves in this scenario is worse off taking Left than Right, because taking Left kills you painfully, and taking Right does no such thing. There is no question of any “balance of probability”.
Yes, it sucks that you in particular got screwed, but the chances of that were astronomically low.
But you didn’t “get screwed”! You have a choice! You can take Left, or Right.
Again: the scenario stipulates that taking Left kills you, and FDT agrees that taking Left kills you; and likewise it is stipulated (and FDT does not dispute) that you can indeed take whichever box you like.
This shows up more obviously if you look at repeated iterations, compare performance of decision theories in large populations, or weight outcomes across possible worlds.
All of that is completely irrelevant, because in the actual world that you (the agent in the scenario) find yourself in, you can either burn to death, or not. It’s completely up to you. You don’t have to do what FDT says to do, regardless of what happens in any other possible worlds or counterfactuals or what have you.
It really seems to me like anyone who takes Left in the “Bomb” scenario is making almost exactly the same mistake as people who two-box in the classic Newcomb’s problem. Most of the point of “Newcomb’s Problem and Regret of Rationality” is that you don’t have to, and shouldn’t, do things like this.
But actually, it’s a much worse mistake! In the Newcomb case, there’s a disagreement about whether one-boxing can actually somehow cause there to be a million dollars in the box; CDT denies this possibility (because it takes no account of sufficiently accurate predictors), while timeless/logical/functional/whatever decision theories accept it. But here, there is no disagreement at all; FDT admits that choosing Left causes you to die painfully, but says you should do it anyway! That is obviously much worse.
The other point of “Newcomb’s Problem and Regret of Rationality” is that it is a huge mistake to redefine losing (such as, say, burning to death) as winning. That, also, seems like a mistake that’s being made here.
I don’t see that there’s any way of rescuing this result.
According to me, the correct rejoinder to Will is: I have confidently asserted that X is false for X whose probabliity I assign much greater probability than 1 in a trillion trillion, and so I hereby confidently assert that no, I do not see the bomb on the left. You see the bomb on the left, and lose $100. I see no bombs, and lose $0.
I can already hear the peanut gallery objecting that we can increase the fallibility of the predictor to reasonable numbers and I’d still take the bomb, so before we go further, let’s all agree that sometimes you’re faced with uncertainty, and the move that is best given your uncertainty is not the same as the move that is best given perfect knowledge. For example, suppose there are three games (“lowball”, “highball”, and “extremeball”) that work as follows. In each game, I have three actions—low, middle, and high. In the lowball game, my payouts are $5, $4, and $0 respectively. In the highball game, my payouts are $0, $4, and $5 respectively. In the extremeball game, my payouts are $5, $4, and $5 respectively. Now suppose that the real game I’m facing is that one of these games is chosen at uniform random by unobserved die roll. What action should I choose? Clearly ‘middle’, with an expected utility of $4 (compared to $3.33 for either ‘low’ or ‘high’). And when I do choose middle, I hope we can all agree that it’s foul play to say “you fool, you should have chosen low because the game is lowball”, or “you fool, there is no possible world in which that’s the best action”, or “you idiot, that’s literally the worst available action because the game was exrtemeball”. If I knew which game I was playing, I’d play the best move for that game. But insofar as I must enter a single action played against the whole mixture of games, I might have to choose something that’s not the best action in your favorite subgame.
With that in mind, we can now decompose Will’s problem with the bomb into two subgames that I’m bound to play simultaneously.
In one subgame (that happens with probabliity 2 in a trillion trillion, although feel free to assume it’s more likely than that), the predictor is stumped and guesses randomly. We all agree that in that subgame, the best action is to avoid the bomb.
In the other subgame, the predictor is accurate. And insofar as the predictor is accurate, and supposing that we’ve seen the bomb, we can consider our two available actions (taking the bomb, or spending $100 to avoid it). But here something curious happens: insofar as the predictor is accurate, and we see the bomb, and we take it, we refute the problem statement.
That’s not my problem, as an agent. That’s your problem, as the person who stated the problem. If you say “assume you’re facing a perfect predictor, and see the bomb”, then I can validly say “no”. Similar to how if you say “assume you’re going to take the $5 bill, and you can either take the $5 bill or the $10 bill, but if you violate the laws of logic then you get a $100 fine, what do you do?” I can validly say “no”. It’s not my fault that you named a decision problem whose premises I can flatly refute.
Hopefully we all agree that insofar as the predictor is perfect (which, remember, is a case in the case analysis when the predictor is falible), the problem statement here is deeply flawed, because I can by an action of mine refute it outright. The standard rejoinder is a bit of sleight-of-hand, where the person posing the problem says “ah, but the predictor is fallible”. But as we’ve already seen, I can just decompose it right back into two subproblems that we then aggregate across (much like the higball/lowball/extremeball case), at which point one of our case-analyses reveals that insofar as the predictor is accurate, the whole problem-statement is still flawed.
And this isn’t me saying “I wish to be evaluated from an epistemic vantage point that takes into account the other imaginary branches of reality”. This is me saying, your problem statement was wrong. It’s me pointing out that you’re a liar, or at least that I can by a clever choice of actions render you a liar. When you say “the predictor was accurate and you saw the bomb, what do you do?”, and I say “take the bomb”, I don’t get blown up, I reveal your mistake. Your problem statement is indeterminate. You shouldn’ta given me a problem I could refute. I’m not saying “there’s other hypothetical branches of reality that benefit from me taking this bomb”, I’m saying “WRONG, tell me what really happened”. Your story was false, my dude.
There’s some question of what to do when an obviously ill-formed game is mixed in with a properly-formed game, by, eg, adding some uncertainty about whether the predictor is fallible. Like, how are we supposed to analyze games comprising subgames where the problem statement can be refuted in one subgame but not others? And according to me, the obvious answer is that if you say “you are 1% playing problem A and 99% playing problem B”, and if I can by some act refute that I’m playing problem B, then I am perfectly licensed in saying “WRONG (99%)”. Mixing in a little uncertainty (or even a lot of uncertainty!) doesn’t stop you from being wrong (at my will) in the cases where you’re asserting falsehoods about my actions.
So, no, I don’t see the bomb. That’s all but 1 in a trillion trillion ‘WRONG’. What really happened, according to this problem, is—well, it’s still indeterminate, because Will’s problem statement isn’t complete. He doesn’t tell us what happens insofar as the predictor is accurate and I go left if and only if I see the bomb, ie if I always spite the predictor’s prediction. If the predictor would give me a 1 trillion dollar coin for my ingenuity in spiting them, then what really happens is I see a $1T coin (all but 1 in a trillion trillion times). If instead (as is the oral tradition when someone leaves their decision problem indeterminate) counterfactually-spiting the predictor causes me to find myself in a room full of hornets rather than exits, then what really happened is that I saw no bomb (and no hornets), almost certainly.
If you want me to stop denying your problem-statements outright, you’ve gotta stop giving me problem statements that I can (probabilistically) refute by my actions.
After reading your comments, here’s my current explanation of what’s up with the bomb argument:
The problem with claiming that taking Left is wrong in the bomb-problem, is in the inference that “because I see the bomb is in the left box” this necessarily implies that “I am in the one-in-a-trillion-trillion situation where the predictor was inaccurate”.
However, this is forgetting the other option, where this setup is one of the vastly larger other worlds where I am being predicted by the predictor. In those worlds, it’s important that I make the worlds logically inconsistent by taking the left box, and so the predictor cannot accurately predict me taking the right box. Otherwise it may notice that this is a fixed point (i.e. leaving a note saying he predicted I’d take the right box does in fact lead to me taking the right box) and then call it a day and execute an action (that I don’t want) with an accurate prediction.
So there’s a 1-in-a-trillion-trillion chance that I am in the real game, and a notably higher chance that I’m in the mind of some agent predicting what I’ll do here. (And in that simulation-world, I don’t want to do the thing I don’t want them to predict that I’ll do.)
So make your estimates of the probably you’re being perfectly simulated and the probability you’re in the real game, and then compare them, multiply by utilities, and you’re done.
Then I’m a bit confused about how to estimate that probability, but I suspect the reasoning goes like this:
It’s near-certain that I will be simulated in some scenarios by the predictor, and it’s one-in-a-trillion-trillions that this is the real scenario. This scenario seems “plausible” as a scenario the predictor would simulate me in, especially given that the alternative ground-truth is that this is the scenario that they actually went with! I’m going to put it somewhere between “10%” and “100%”. So I think the odds ratio is around the order of magnitude of “one-in-ten” to “one-in-a-trillion-trillion”. And when I multiply them by the expected utility, the ratio is still well in favor of taking the bomb and making it very likely that in reality I will not lose 100 dollars.
Sanity check
As a sanity-check, I note this implies that if the utilities-times-probabilities are different, I would not mind taking the $100 hit. Let’s see what the math says here, and then check whether my intuitions agree.
Suppose I value my life at $1 million. Then I think that I should become more indifferent here when the probability of a mistaken simulation approaches 1 in 100,000, or where the money on the line is closer to $10−17.
[You can skip this, but here’s me stating the two multiplications I compared:
World 1: I fake-kill myself to save $X, with probability 110
World 2: I actually kill myself (cost: $1MM), with probability 1Y
To find the indifference point I want the two multiplications of utility-to-probability to come out to be equal. If X = $100, then Y equals 100,000. If Y is a trillion trillion (1024), then X = 10−17. (Unless I did the math wrong.)]
I think this doesn’t obviously clash with my intuitions, and somewhat matches them.
If the simulator was getting things wrong 1 in 100,000 times, I think I’d be more careful with my life in the “real world case” (insofar as that is a sensible concept). Going further, if you told me they were wrong 1 in 10 times, this would change my action, so there’s got to be a tipping point somewhere, and this seems reasonable for many people (though I actually value my life at more than $1MM).
And if the money was that tiny ($10−17), I’d be fairly open to “not taking even the one-in-a-trillion-trillion chance”. (Though really my intuition is that I don’t care about money way before $10^-17, and would probably not risk anything serious starting at like 0.1 cents, because that sort of money seems kind of irritating to have to deal with. So my intuition doesn’t match perfectly here. Though I think that if I were expecting to play trillions of such games, then I would start to actively care about such tiny amounts of money.)
In the other subgame, the predictor is accurate. And insofar as the predictor is accurate, and supposing that we’ve seen the bomb, we can consider our two available actions (taking the bomb, or spending $100 to avoid it). But here something curious happens: insofar as the predictor is accurate, and we see the bomb, and we take it, we refute the problem statement.
That’s not my problem, as an agent. That’s your problem, as the person who stated the problem. If you say “assume you’re facing a perfect predictor, and see the bomb”, then I can validly say “no”.
Whether the predictor is accurate isn’t specified in the problem statement, and indeed can’t be specified in the problem statement (lest the scenario be incoherent, or posit impossible epistemic states of the agent being tested). What is specified is what existing knowledge you, the agent, have about the predictor’s accuracy, and what you observe in the given situation (from which you can perhaps infer additional things about the predictor, but that’s up to you).
In other words, the scenario is: as per the information you have, so far, the predictor has predicted 1 trillion trillion times, and been wrong once (or, some multiple of those numbers—predicted 2 trillion trillion times and been wrong twice, etc.).
You now observe the given situation (note predicting Right, bomb in Left, etc.). What do you do?
Now, we might ask: but is the predictor perfect? How perfect is she? Well… you know that she’s erred once in a trillion trillion times so far—ah, no, make that twice in a trillion trillion times, as of this iteration you now find yourself in. That’s the information you have at your disposal. What can you conclude from that? That’s up to you.
Likewise, you say:
So, no, I don’t see the bomb. That’s all but 1 in a trillion trillion ‘WRONG’. What really happened, according to this problem, is—well, it’s still indeterminate, because Will’s problem statement isn’t complete. He doesn’t tell us what happens insofar as the predictor is accurate and I go left if and only if I see the bomb, ie if I always spite the predictor’s prediction. If the predictor would give me a 1 trillion dollar coin for my ingenuity in spiting them, then what really happens is I see a $1T coin (all but 1 in a trillion trillion times).
The problem statement absolutely is complete. It asks what you would/should do in the given scenario. There is no need to specify what “would” happen in other (counterfactual) scenarios, because you (the agent) do not observe those scenarios. There’s also no question of what would happen if you “always spite the predictor’s prediction”, because there is no “always”; there’s just the given situation, where we know what happens if you choose Left: you burn to death.
You can certainly say “this scenario has very low probability”. That is reasonable. What you can’t say is “this scenario is logically impossible”, or any such thing. There’s no impossibility or incoherence here.
It’s not complete enough to determine what I do when I don’t see a bomb. And so when the problem statement is corrected to stop flatly asserting consequences of my actions as if they’re facts, you’ll find that my behavior in the corrected problem is underdefined. (If this still isn’t clear, try working out what the predictor does to the agent that takes the bomb if it’s present, but pays the $100 if it isn’t.)
And if we’re really technical, it’s not actually quite complete enough to determine what I do when I see the bomb. That depends on what the predictor does when there are two consistent possible outcomes. Like, if I would go left when there was no bomb and right when there was a bomb, what would the predictor do? If they only place the bomb if I insist on going right when there’s no bomb, then I have no incentive to go left upon seeing a bomb insofar as they’re accurate. To force me to go left, the predictor has to be trigger-happy, dropping the bomb given the slightest opportunity.
What is specified is what existing knowledge you, the agent, have about the predictor’s accuracy, and what you observe in the given situation
Nitpick: you seem to be trying to say something like “let us sidestep questions of predictor mechanics and assume that your only observations were the game-history”. But that doesn’t work. If I wake up in a room with a bomb about to go off in the left exit and a $100 fee to use the right exit, I pay the $100 fee and use the right exit. I have to see a host of other specific observations to start hypothesizing the existence of a predictor who would stop placing bombs if-counterfactually I took them, before I believe I’m in this problem.
I’m happy to leave unspecified the prior knowledge that causes us to believe there’s a predictor, but I object to the claim that we’ve fully specified my epistemic state by letting me see the problem history.
Now, we might ask: but is the predictor perfect? How perfect is she?
I’m not going to ask whether the predictor is perfect, no. I’m going to assume she isn’t, and then do some calculations, and then assume she is, and do some other calculations. Because it’s often useful to solve decision problems using case-by-case analysis.
And in the second case, your problem statement is revealed to be a lie.
Now, it’s perfectly possible for you to rephrase the problem such that my analysis does not, in some cases, reveal it to be a lie. You could say “the predictor is on the fritz, and you see the bomb” (in which case I’ll go right, thanks). Or you could say “the predictor is 1% on the fritz and 99% accurate, and in the 1% case you see a bomb and in the 99% case you may or may not see a bomb depending what you do”. That’s also fine. But insofar as you flatly postulate a (almost certain) consequence of my action, be prepared for me to assert that you’re (almost certainly) wrong.
You can certainly say “this scenario has very low probability”. That is reasonable. What you can’t say is “this scenario is logically impossible”, or any such thing. There’s no impossibility or incoherence here.
There’s impossibliity here precisely insofar as the predictor is accurate.
One intuition pump is: You say “but if-counterfactually you see the bomb on the left, and you take it, you die”. I protest “no, if-counterfactually I see the bomb on the left, then I die insofar as the predictor was on the fritz, and I get a puppy insofar as the predictor was accurate, by the principle of explosion. So I 1-in-a-hundred-trillion-trillion% die, and almost certanily get a puppy. Win.” Like, from my perspective, trying to analyze outcomes under assumptions that depend on my own actions can lead to nonsense.
(My model of a critic says “but surely you can see the intuition that, when you’re actually faced with the bomb, you actually shouldn’t take it?”. And ofc I have that intuition, as a human, I just think it lost in a neutral and fair arbitration process, but that’s a story for a different day. Speaking from the intuition that won, I’d say that it sure would be sad to take the bomb in real life, and that in real life you should spare no expense searching for the evidence that the predictor is on the fritz, which almost certainly would exist and let you avoid the bomb iff the predictor is on the fritz in real life, and if you run your real life right you should never once actually eat a bomb, but also yes, the correct intuition is that you take the bomb knowing it kills you, and that you roll your eyes at the person saying it was a bad move for roughly the same reason you roll your eyes at the person saying that ‘middle’ is never your best option in the high/low/extremeball game—such is the fate of people who are bound to submit a single action to be played against a mixture of games.)
Separately, I note that if you think an agent should behave very differently when a possibility is logically impossible, vs when they assess a probability on the order of 1 in a trillion trillion, then I suspect you’re doing something wrong. (Not least that you’re likely wrong about how much confidence a human can get in any particular logical fact.)
To check, do we have a disagreement about what to do insofar as the predictor is accurate? Like, in the case of the transparent Newcomb’s problem with a literally perfect predictor, would you one-box even if the box were empty?
It’s not complete enough to determine what I do when I don’t see a bomb.
I don’t understand this objection. The given scenario is that you do see a bomb. The question is: what do you do in the given scenario?
You are welcome to imagine any other scenarios you like, or talk about counterfactuals or what have you. But the scenario, as given, tells you that you know certain things and observe certain things. The scenario does not appear to be in any way impossible. “What do I do when I don’t see a bomb” seems irrelevant to the question, which posits that you do see a bomb.
… flatly asserting consequences of my actions as if they’re facts …
Er, what? If you take the bomb, you burn to death. Given the scenario, that’s a fact. How can it not be a fact? (Except if the bomb happens to malfunction, or some such thing, which I assume is not what you mean…?)
(If this still isn’t clear, try working out what the predictor does to the agent that takes the bomb [Left] if it’s present, but pays the $100 [Right] if it isn’t.)
Well, let’s see. The problem says:
If the predictor predicted that you would choose Right, then she put a bomb in Left. If the predictor predicted that you would choose Left, then she did not put a bomb in Left, and the box is empty.
So, if the predictor predicts that I will choose Right, she will put a bomb in Left, in which case I will choose Left. If she predicts that I will choose Left, then she puts no bomb in Left, in which case I will choose Right. This appears to be paradoxical, but that seems to me to be the predictor’s fault (for making an unconditional prediction of the behavior of an agent that will certainly condition its behavior on the prediction), and thus the predictor’s problem.
I… don’t see what bearing this has on the disagreement, though.
Nitpick: you seem to be trying to say something like “let us sidestep questions of predictor mechanics and assume that your only observations were the game-history”. But that doesn’t work. If I wake up in a room with a bomb about to go off in the left exit and a $100 fee to use the right exit, I pay the $100 fee and use the right exit. I have to see a host of other specific observations to start hypothesizing the existence of a predictor who would stop blacing bombs if-counterfactually I took them, before I believe I’m in this problem.
I’m happy to leave unspecified the prior knowledge that causes us to believe there’s a predictor, but I object to the claim that we’ve fully specified my epistemic state by letting me see the problem history.
What I am saying is that we don’t have access to “questions of predictor mechanics”, only to the agent’s knowledge of “predictor mechanics”. In other words, we’ve fully specified your epistemic state by specifying your epistemic state—that’s all. I don’t know what you mean by calling it “the problem history”. There’s nothing odd about knowing (to some degree of certainty) that certain things have happened. You know there’s a (supposed) predictor, you know that she has (apparently) made such-and-such predictions, this many times, with these-and-such outcomes, etc. What are her “mechanics”? Well, you’re welcome to draw any conclusions about that from what you know about what’s gone before. Again, there is nothing unusual going on here.
I’m not going to ask whether the predictor is perfect, no. I’m going to assume she isn’t, and then do some calculations, and then assume she is, and do some other calculations. Because it’s often useful to solve decision problems using case-by-case analysis.
And in the second case, your problem statement is revealed to be a lie.
I’m sorry, but this really makes very little sense to me. We know the predictor is not quite perfect. That’s one of the things we’re told as a known fact. She’s pretty close to perfect, but not quite. There’s just the one “case” here, and that’s it…
Indeed, the entire “your problem statement is revealed to be a lie” approach seems to me to be nonsensical. This is the given scenario. If you assume that the predictor is perfect and, from that, conclude that the scenario couldn’t’ve come to pass, doesn’t that entail that, by construction, the predictor isn’t perfect (proof by contradiction, as it were)? But then… we already know she isn’t perfect, so what was the point of assuming otherwise?
There’s impossibliity here precisely insofar as the predictor is accurate.
Well, we know she’s at least slightly inaccurate… obviously, because she got it wrong this time! We know she can get it wrong, because, by construction, she did get it wrong. (Also, of course, we know she’s gotten it wrong before, but that merely overdetermines the conclusion that she’s not quite perfect.)
One intuition pump is: You say “but if-counterfactually you see the bomb on the left, and you take it, you die”.
Well, no. In the given scenario, you do see the bomb on the left. There’s no counterfactuals involved.
… you roll your eyes at the person saying it was a bad move for roughly the same reason you roll your eyes at the person saying that ‘middle’ is never your best option in the high/low/extremeball game—such is the fate of people who are bound to submit a single action to be played against a mixture of games.
But you’re not bound to submit a single action to be played against a mixture of games. In the given scenario, you can choose your action knowing which game you’re playing!
… in real life you should spare no expense searching for the evidence that the predictor is on the fritz, which almost certainly would exist and let you avoid the bomb iff the predictor is on the fritz in real life …
… or, you could just… choose Right. That seems to me to be a clear win.
Separately, I note that if you think an agent should behave differently when a possibility is logically impossible, vs when they assess a probability on the order of 1 in a googleplex, then I suspect you’re doing something wrong. (Not least that you’re likely wrong about how much confidence a human can get in any particular logical fact.)
If an agent finds themselves in a scenario that they think is logically impossible, then they’re obviously wrong about that scenario being logically impossible, as demonstrated by the fact that actually, it did come to pass. So finding oneself in such a scenario should cause one to immediately re-examine one’s beliefs on what is, and is not, logically impossible, and/or one’s beliefs about what scenario one finds oneself in…
To check, do we have a disagreement about what to do insofar as the predictor is accurate? Like, in the case of the transparent Newcomb’s problem with a literally perfect predictor, would you one-box even if the box were empty?
How would I come to the belief that the predictor is literally perfect (or even mostly perfect)? Wouldn’t that just look like a string of observations of cases where either (a) there’s money in the one box and the agent takes the one box full of money, or (b) there’s no money in the one box and the agent takes the two boxes (and gets only the lesser quantity)? How exactly would I distinguish between the “perfect predictor” hypothesis and the “people take the one box if it has a million dollars, two boxes otherwise” hypothesis, as an explanation for those observations? (Or, to put it another way, what do I think happens to people who take an empty one box? Well, I can’t have any information about that, right? [Beyond the common sense of “obviously, they get zilch”…] Because if that had ever happened before, well, there goes the “perfect predictor” hypothesis, yes?)
The scenario does not appear to be in any way impossible.
The scenario says “the predictor is likely to be accurate” and then makes an assertion that is (for me, at least) false insofar as the predictor is accurate. You can’t have it both ways. The problem statement (at least partially) contradicts itself. You and I have a disagreement about how to evaluate counterfactuals in cases where the problem statement is partly-self-contradictory.
This appears to be paradoxical, but that seems to me to be the predictor’s fault
Sure, it’s the predictor’s problem, and the behavior that I expect of the predictor in the case that I force them to notice they have a problem has a direct effect on what I do if I don’t see a bomb. In particular, if they reward me for showing them their problem, then I’d go right when I see no bomb, whereas if they’d smite me, then I wouldn’t. But you’re right that this is inconsequential when I do see the bomb.
In the given scenario, you do see the bomb on the left. There’s no counterfactuals involved.
Well for one thing, I just looked and there’s no bomb on the left, so the whole discussion is counter-to-fact. And for another, if I pretend I’m in the scenario, then I choose my action by visualizing the (counterfactual) consequences of taking the bomb, and visualizing the (counterfactual) consequences of refraining. So there are plenty of counterfactuals involved.
I assert that the (counterfactual) consequences of taking the bomb include (almost certainly) rendering the whole scenario impossible, and rendering some other hypothetical (that I don’t need to pay $100 to leave) possible instead. And so according to me, the correct response to someone saying “assume you see the bomb” is to say “no, I shall assume that I see no bomb instead”, because that’s the consequence I visualize of (counterfactually) taking the bomb.
You’re welcome to test it empirically (well, maybe after adding at least $1k to all outcomes to incentivise me to play), if you have an all-but-one-in-a-trillion-trillion accurate predictor-of-me lying around. (I expect your empericism will prove me right, in that the counterfactuals where it shows me a bomb are all in fact rendered impossible, and what happens in reality instead is that I get to leave with $900).
...Also, just to be clear, you’re aware that these are two different internally-consistent but contradictory theories of how to evaluate counterfactuals? Like, we can be pretty confident that there’s no argument a CDT agent can hear that causes them to wish to adopt FDT counterfactuals, and vice versa. Humans come equipped with both intuitions (I can give you other problems that pump the other intuitions, if you’d like), and we have to find some other way to arbitrate the conflict.
(Currently I’m in a mode of attempting to convey the second set of intuitions to you in this one problem, in light of your self-professed difficulty grasping them. Feel free to demonstrate understanding of the second set of intuitions and request that we instead switch to discussing why I think the latter wins in arbitration.)
How would I come to the belief that the predictor is literally perfect (or even mostly perfect)?
I’m asking you to do a case-analysis. For concreteness, suppose you get to inspect the predictor’s computing substrate and source code, and you do a bunch of pondering and thinking, and you’re like “hmm, yes, after lots of careful consideration, I now think that I am 90% likely to be facing a transparent Newcomb’s problem, and 10% likely to be in some other situation, eg, where the code doesn’t work how I think it does, or cosmic rays hit the computer, or what-have-you”. Now we agree that one possible thing that could have happened, is that the predictor accurately predicted you in the past, and we can freely discuss what you should do insofar as that’s the case.
Like, presumably when I present you with the high/low/extremeball game, you have no problem granting statements like “insofar as the die came up ‘highball’, I should choose ‘high’”. Saying “but how could I possibly have come to the epistemic state where I was 100% certain that the die came up ‘highball’???” is a nonsequitur. Your analysis in the face of uncertainty can be decomposed into an aggregation of your analyses in each scenario that might obtain. And so I ask again, insofar as the predictor accurately predicted you—which is surely one of many possible cases to analyze, after examining the mind of the entity that sure looks like it predicted you—what action would you prescribe?
The scenario says “the predictor is likely to be accurate”
Well… no, the scenario says “the predictor has predicted correctly 1 trillion trillion minus one times, and incorrectly one time”. Does that make it “likely to be accurate”? You tell me, I guess, but that seems like an unnecessarily vague characterization of a precise description.
and then makes an assertion that is (for me, at least) false insofar as the predictor is accurate. You can’t have it both ways. The problem statement (at least partially) contradicts itself.
What do you mean by this? What’s contradictory about the predictor making a mistake? Clearly, it’s not perfect. We know this because it made at least one mistake in the past, and then another mistake just now.
Is the predictor “accurate”? Well, it’s approximately as accurate as it takes to guess 1 trillion trillion times and only be wrong once…
I assert that the (counterfactual) consequences of taking the bomb include (almost certainly) rendering the whole scenario impossible, and rendering some other hypothetical (that I don’t need to pay $100 to leave) possible instead. And so according to me, the correct response to someone saying “assume you see the bomb” is to say “no, I shall assume that I see no bomb instead”, because that’s the consequence I visualize of (counterfactually) taking the bomb.
I confess that this reads like moon logic to me. It’s possible that there’s something fundamental I don’t understand about what you’re saying.
...Also, just to be clear, you’re aware that these are two different internally-consistent but contradictory theories of how to evaluate counterfactuals? Like, we can be pretty confident that there’s no argument a CDT agent can hear that causes them to wish to adopt FDT counterfactuals, and vice versa. Humans come equipped with both intuitions (I can give you other problems that pump the other intuitions, if you’d like), and we have to find some other way to arbitrate the conflict.
I am not familiar with this, no. If you have explanatory material / intuition pumps / etc. to illustrate this, I’d certainly appreciate it!
Saying “but how could I possibly have come to the epistemic state where I was 100% certain that the die came up ‘highball’???” is a nonsequitur.
I am not asking how I could come to believe the “literally perfect predictor” thing with 100% certainty; I am asking how I could come to believe it at all (with, let’s say, > 50% certainty).
Now we agree that one possible thing that could have happened, is that the predictor accurately predicted you in the past, and we can freely discuss what you should do insofar as that’s the case.
Hold on, hold on. Are we talking about repeated plays of the same game? Where I face the same situation repeatedly? Or are we talking about observing (or learning about) the predictor playing the game with other people before me?
The “Bomb” scenario described in the OP says nothing about repeated play. If that’s an assumption you’re introducing, I think it needs to be made explicit…
that seems like an unnecessarily vague characterization of a precise description
I deny that we have a precise description. If you listed out a specific trillion trillion observations that I allegedly made, then we could talk about whether those particular observations justify thinking that we’re in the game with the bomb. (If those trillion trillion observations were all from me waking up in a strange room and interacting with it, with no other context, then as noted above, I would have no reason to believe I’m in this game as opposed to any variety of other games consistent with those observations.) The scenario vaguely alleges that we think we’re facing an accurate predictor, and then alleges that their observed failure rate (on an unspecified history against unspecified players) is 1 per trillion-trillion. It does not say how or why we got into the epistemic state of thinking that there’s an accurate predictor there; we assume this by fiat.
(To be clear, I’m fine with assuming this by fiat. I’m simply arguing that your reluctance to analyze the problem by cases seems strange and likely erroneous to me.)
I am not familiar with this, no. If you have explanatory material / intuition pumps / etc. to illustrate this, I’d certainly appreciate it!
The canonical explanatory text is the FDT paper (PDF warning) (that the OP is responding to a critique of, iirc), and there’s a bunch of literature on LW (maybe start at the wiki page on UDT? Hopefully we have one of those) exploring various intuitions. If you’re not familiar with this style of logic, I recommend starting there (ah look we do have a UDT wiki page). I might write up some fresh intuition pumps later, to try to improve the exposition. (We’ve sure got a lot of exposition if you dig through the archives, but I think there are still a bunch of gaps.)
I am not asking how I could come to believe the “literally perfect predictor” thing with 100% certainty; I am asking how I could come to believe it at all (with, let’s say, > 50% certainty).
Uh, I mean, we could play a version of this game where the sums were positive (such that you’d want to play) and smaller (such that I could fund the prizes), and then I could say “yo Said Achmiz, I decided to act as a predictor of you in this game, I’m about to show you either a box with $10 and a box with $1, or an empty box and a box with $1, depending on how I predicted you’ll behave”. And then I think that at least the other readers on this post would predict that I have >50% probability of predicting your behavior correctly. You haven’t exactly been subtle about how you make decisions.
Predicting someone accurately with >50% likelihood isn’t all that hard. A coin predicts you “perfectly accurately” 50% of the time, you’ve only gotta do slightly better than that.
I suspect you’re confused here in putting “perfect prediction” on some big pedestal. the predictor is trying to reason about you, and they either reason basically-correctly, or their reasoning includes some mistake. And insofar as they reason validly to a conclusion about the right person, they’re going to get the answer right.
And again, note that you’re not exactly a tricky case yourself. You’re kinda broadcasting your decision-making procedure all over the internet. You don’t have to be a supermind studying a brainscan to figure out that Said Achmiz takes the extra $1.
My girlfriend often reasons accurately about whether I’m getting dinner tonight. Sometimes she reasons wrongly about it (eg, she neglects that I had a late lunch) but gets the right answer by chance anyway. Sometimes she reasons wrongly about it and gets the wrong answer by chance. But often she reasons correctly about it, and gets the right answer for the right reasons. And if I’m standing in the supermarket wondering whether she already ate or whether I should buy enough cheese to make her some food too, I have no trouble thinking “well, insofar as she got the right answer for the right reasons tonight, she knew I’d be hungry when I got home, and so she hasn’t eaten yet”. None of this case-analysis requires me to believe she’s a supermind who studied a scan of my brain for twelve thousand years. Believing that she was right about me for the right reasons is just not a very difficult epistemic state to enter. My reasons for doing most of what I do just aren’t all that complicated. It’s often not all that tricky to draw the right conclusions about what people do for the right reasons.
Furthermore, I note that in the high/low/extremeball game, I have no trouble answering questions about what I should do insofar as the game is highball, even though I have <50% (and in fact ~33%) probability on that being the case. In fact, I could analyze payouts insofar as the game is highball, even if the game was only 0.1% likely to be highball! In fact, the probability of me actually believing I face a given decision problem is almost irrelevant to my ability to analyze the payouts of different actions. As evidenced in part by the fact that I am prescribing actions in this incredibly implausible scenario with bombs and predictors.
And so in this wild hypothetical situation where we’ve somehow (by fiat) become convinced that there’s an ancient predictor predicting our actions (which is way weirder than my girlfriend predicting my action, tbh), presumably one of our hypotheses for what’s going on is that the predictor correctly deduced what we would do, for the correct reasons. Like, that’s one of many hypotheses that could explain our past observations. And so, assuming that hypothesis, for the purpose of analysis—and recalling that I can analyze which action is best assuming the game is highball, without making any claim about how easy or hard it is to figure out whether the game is highball—assuming that hypothesis, what action would you prescribe?
Well… no, the scenario says “the predictor has predicted correctly 1 trillion trillion minus one times, and incorrectly one time”. Does that make it “likely to be accurate”? You tell me, I guess, but that seems like an unnecessarily vague characterization of a precise description.
Let’s be more precise, then, and speak in terms of “correctness” rather than “accuracy”. There are then two possibilities in the “bomb” scenario as stipulated:
The predictor thought I would take the right box, and was correct.
The predictor thought I would take the right box, and was incorrect.
Now, note the following interesting property of the above two possibilities: I get to choose which of them is realized. I cannot change what the predictor thought, nor can I change its actions conditional on its prediction (which in the stipulated case involves placing a bomb in the left box), but I can choose to make its prediction correct or incorrect, depending on whether I take the box it predicted I would take.
Observe now the following interesting corollary of the above argument: it implies the existence of a certain strategy, which we might call “ObeyBot”, which always chooses the action that confirms the predictor’s prediction, and does so regardless of payoffs: by hypothesis, ObeyBot does not care about burning to death, nor does it care about losing $100; it cares only for making the predictor right as often as it can.
Now, suppose I were to tell you that the predictor’s stipulated track record in this game (10^24 − 1 successes, and 1 failure) were achieved againsta population consisting (almost) entirely of ObeyBots. What would you make of this claim?
It should be quite obvious that there is only one conclusion you can draw: the predictor’s track record establishes absolutely nothing about its predictive capabilities in general. When faced with a population of agents that always do whatever the predictor says they will do, any “predictor”—from the superintelligent Omega to a random number generator—will achieve an accuracy of 100%.
This is an extremely important observation! What it shows is that the predictor’s claimed track record can mean very different things depending on what kind of agents it was playing against… which in turn means that the problem as stated is underdetermined: knowledge of the predictor’s track record does not by itself suffice to pin down the predictor’s actual behavior, unless we are also given information about what kind of agents the predictor was playing against.
Now, obviously whoever came up with the “bomb” scenario presumably did not want us to assume that their predictor is only accurate against ObeyBots and no one else. But to capture this notion requires more than merely stating the predictor’s track record; it requires the further supposition that the predictor’s accuracy is a persistent feature of the predictor, rather than of the agents it was playing against. It requires, in short, the very thing you criticized as an “unnecessarily vague characterization”: the statement that the predictor is, in fact, “likely to be accurate”.
With this as a prerequisite, we are now equipped to address your next question:
What do you mean by this? What’s contradictory about the predictor making a mistake? Clearly, it’s not perfect. We know this because it made at least one mistake in the past, and then another mistake just now.
(First, a brief recap: we are asked to assume the existence of a predictor with an extremely high level of predictive accuracy. Moreover, although the problem statement did not actually specify, it is likely we are meant to assume that this predictor’s accuracy persists across a large reference class of possible agent designs; specifically—and crucially—this reference class must be large enough to include our own decision algorithm, whatever it may be.)
Now, with this in mind, your (Said’s) question is: what could possibly be contradictory about this scenario? What could be contradictory about a scenario that asks us to assume the existence of a predictor that can accurately predict the behavior of a large reference class of agents, including ourselves?
At this point, it should be quite obvious where the potential contradiction lies: it arises directly from the fact that an agent who finds itself in the described scenario can choose to make the predictor right or wrong. Earlier, I exploited this property to construct a strategy (ObeyBot) which makes the predictor right no matter what; but now I will do the opposite, and construct a strategy that always chooses whichever action falsifies the predictor’s prediction. For thematic appropriateness, let us call this strategy “SpiteBot”.
I assert that it is impossible for any predictor to achieve any accuracy against SpiteBot higher than 0%. Therefore, if we assume that I myself am a SpiteBot, the contradiction becomes immediately clear: it is impossible for the predictor to achieve the stipulated predictive accuracy across any reference class of agents broad enough to include SpiteBot, and yet in order for the predictor’s reference class to include me, it must include SpiteBot, since I am a SpiteBot. Contradiction!
Of course, this only establishes that the problem statement is contradictory if the reader happens to be a SpiteBot. And while that might be somewhat amusing to imagine, it is unlikely that any of us (the intended audience of the problem) are SpiteBots, or use any decision theory akin to SpiteBot. So let us ask: are there any other decision theories that manage to achieve a similar effect—any other decision theories that, under at least some subset of circumstances, turn into SpiteBot, thereby rendering the problem formulation contradictory?
Asked that way, the question answers itself: any decision theory that behaves like SpiteBot at least some of the time, will generate contradictions with the problem statement at least some of the time. And (here is the key point) if those contradictions are to avoided, the hypothesized predictor must avoid whatever set of circumstances triggers the SpiteBot-like behavior, lest it falsify its own alleged predictive accuracy… which implies the following corollaries:
The predictor cannot show its face (or its boxes) before SpiteBot. Any native SpiteBots in this universe will therefore never find themselves in this particular “bomb” scenario, nor any relatives.
Any strategies that behave like SpiteBot some of the time, will not encounter any “bomb”-like scenarios during that time. For example, the strategy “Choose your actions normally except on Wednesdays; behave like SpiteBot on Wednesdays” will not encounter any bomb-like scenarios on Wednesday.
If you don’t like the idea of ending up in a “bomb”-like scenario, you should precommit to behaving like SpiteBot in the face of such a scenario. If you actually do this properly, you will never see a real predictor give you a bomb-like scenario, which means that the only time you will be faced with any kind of bomb-like scenario will be in your imagination, perhaps at the prompting of some inquisitive philosopher. At this point you are free to ask the philosopher how his stipulated predictor reacts to SpiteBot and SpiteBot-like strategies, and—if he fails to give you a convincing answer—you are free to dismiss his premise as contradictory, and his scenario as something you don’t have to worry about.
(The general case of “behaving like SpiteBot when confronted with situations you’d really rather not have encountered to begin with” is called “fuck-you decision theory”, or FDT for short.)
Yes, it was this exact objection that I addressed in my previous replies that relied upon a misreading of the problem. I missed that the boxes were open and thought that the only clue to the prediction was the note that was left.
The only solution was to assume that the predictor does not always leave a note, and this solution also works for the stated scenario. You see that the boxes are open, and the left one contains a bomb, but did everyone else? Did anyone else? The problem setup doesn’t say.
This sort of vagueness leaves holes big enough to drive a truck through. The stated FDT support for picking Left depends absolutely critically on the subjunctive dependency odds being at least many millions to one, and the stated evidence is nowhere near strong enough to support that. Failing that, FDT recommends picking Right.
So the whole scenario is pointless. It doesn’t explore what it was intended to explore.
You can modify the problem to say that the predictor really is that reliable for every agent, but doesn’t always leave the boxes open for you or write a note. This doesn’t mean that the predictor is perfectly reliable, so a SpiteBot can still face this scenario but is just extremely unlikely to.
There’s also no question of what would happen if you “always spite the predictor’s prediction”
There IS a question of what would happen if you “always spite the predictor’s prediction”, since doing so seems to make the 1 in a trillion trillion error rate impossible.
In the Newcomb case, there’s a disagreement about whether one-boxing can actually somehow cause there to be a million dollars in the box; CDT denies this possibility (because it takes no account of sufficiently accurate predictors), while timeless/logical/functional/whatever decision theories accept it.
To be clear, FDT does not accept causation that happens backwards in time. It’s not claiming that the action of one-boxing itself causes there to be a million dollars in the box. It’s the agent’s algorithm, and, further down the causal diagram, Omega’s simulation of this algorithm that causes the million dollars. The causation happens before the prediction and is nothing special in that sense.
Yes, sure. Indeed we don’t need to accept causation of any kind, in any temporal direction. We can simply observe that one-boxers get a million dollars, and two-boxers do not. (In fact, even if we accept shminux’s model, this changes nothing about what the correct choice is.)
The main point of FDT is that it gives the optimal expected utility on average for agents using it. It does not guarantee optimal expected utility for every instance of an agent using it.
Suppose you have a population of two billion agents, each going through this scenario every day. Upon seeing a note predicting right, one billion would pick left and one billion would pick right. We can assume that they all pick left if they see a note predicting left or no note at all.
Every year, the Right agents essentially always see a note predicting right, and pay more than $30000 each. The Left agents essentially always see a note predicting left (or no note) and pay $0 each.
The average rate of deaths is comparable: one death per few trillion years in each group, which is to say, essentially never. They all know that it could happen, of course.
Which group is better off?
Edit: I misread Predictor’s accuracy. It does not say that it is in all scenarios 1 − 10^-24, just that in some unknown sample of scenarios, it was 1 − 10^-24. This changes the odds so much that FDT does not recommend taking the left box.
Edit: I misread Predictor’s accuracy. It does not say that it is in all scenarios 1 − 10^-24, just that in some unknown sample of scenarios, it was 1 − 10^-24. This changes the odds so much that FDT does not recommend taking the left box.
Two questions, if I may:
Why do you read it this way? The problem simply states the failure rate is 1 in a trillion trillion.
If we go with your interpretation, why exactly does that change things? It seems to me that the sample size would have to be extemely huge in order to determine a failure rate that low.
It depends upon what the meaning of the word “is” is:
The failure rate has been tested over an immense number of prediction, and evaluated as 10^-24 (to one significant figure). That is the currently accepted estimate for the predictor’s error rate for scenarios randomly selected from the sample.
The failure rate is theoretically 10^-24, over some assumed distribution of agent types. Your decision model may or may not appear anywhere in this distribution.
The failure rate is bounded above by 10^-24 for every possible scenario.
A self-harming agent in this scenario cannot be consistently predicted by Predictor at all (success rate 0%), so we know that (3) is definitely false.
(1) and (2) aren’t strong enough, because it gives little information about Predictor’s error rate concerning your scenario and your decision model.
We have essentially zero information about Predictor’s true error bounds regarding agents that sometimes carry out self-harming actions. In order to recommend taking the left box, an FDT agent is one that sometimes carries out self-harming actions, though this requires that the upper bound on Predictor’s failure of subjunctive dependency is less than the ratio of the utilities of: paying $100, and burning to death all intelligent life in the universe.
We do not have anywhere near enough information to justify that tight a bound. So FDT can’t recommend such an action. Maybe someone else can write a scenario that is in similar spirit, but isn’t so flawed.
Another way of phrasing it: you don’t get the $100 marginal payoff if you’re not prepared to knowingly go to your death in the incredibly unlikely event of a particular type of misprediction.
That’s the sense in which I meant “you got screwed”. You entered the scenario knowing that it was incredibly unlikely that you would die regardless of what you decide, but were prepared to accept that incredibly microscopic chance of death in exchange for keeping your $100. The odds just went against you.
Edit: If Predictor’s actual bound on error rate was 10^-24, this would be valid. However, Predictor’s bound on error rate cannot be 10^-24 in all scenarios, so this is all irrelevant. What a waste of time.
If there is some nontrivial chance that the predictor is adversarial but constrained to be accurate and truthful (within the bounds given), then on the balance of probability people taking the right box upon seeing a note predicting right are worse off. Yes, it sucks thatyouin particular got screwed, but the chances of that were astronomically low.This shows up more obviously if you look at repeated iterations, compare performance of decision theories in large populations, or weight outcomes across possible worlds.Edit: The odds were not astronomically low. I misinterpreted the statement about Predictor’s accuracy to be stronger than it actually was. FDT recommends taking the right box, and paying $100.
No, because the scenario stipulates that you find yourself facing a Left box with a bomb. Anyone who finds themselves in this scenario is worse off taking Left than Right, because taking Left kills you painfully, and taking Right does no such thing. There is no question of any “balance of probability”.
But you didn’t “get screwed”! You have a choice! You can take Left, or Right.
Again: the scenario stipulates that taking Left kills you, and FDT agrees that taking Left kills you; and likewise it is stipulated (and FDT does not dispute) that you can indeed take whichever box you like.
All of that is completely irrelevant, because in the actual world that you (the agent in the scenario) find yourself in, you can either burn to death, or not. It’s completely up to you. You don’t have to do what FDT says to do, regardless of what happens in any other possible worlds or counterfactuals or what have you.
It really seems to me like anyone who takes Left in the “Bomb” scenario is making almost exactly the same mistake as people who two-box in the classic Newcomb’s problem. Most of the point of “Newcomb’s Problem and Regret of Rationality” is that you don’t have to, and shouldn’t, do things like this.
But actually, it’s a much worse mistake! In the Newcomb case, there’s a disagreement about whether one-boxing can actually somehow cause there to be a million dollars in the box; CDT denies this possibility (because it takes no account of sufficiently accurate predictors), while timeless/logical/functional/whatever decision theories accept it. But here, there is no disagreement at all; FDT admits that choosing Left causes you to die painfully, but says you should do it anyway! That is obviously much worse.
The other point of “Newcomb’s Problem and Regret of Rationality” is that it is a huge mistake to redefine losing (such as, say, burning to death) as winning. That, also, seems like a mistake that’s being made here.
I don’t see that there’s any way of rescuing this result.
According to me, the correct rejoinder to Will is: I have confidently asserted that X is false for X whose probabliity I assign much greater probability than 1 in a trillion trillion, and so I hereby confidently assert that no, I do not see the bomb on the left. You see the bomb on the left, and lose $100. I see no bombs, and lose $0.
I can already hear the peanut gallery objecting that we can increase the fallibility of the predictor to reasonable numbers and I’d still take the bomb, so before we go further, let’s all agree that sometimes you’re faced with uncertainty, and the move that is best given your uncertainty is not the same as the move that is best given perfect knowledge. For example, suppose there are three games (“lowball”, “highball”, and “extremeball”) that work as follows. In each game, I have three actions—low, middle, and high. In the lowball game, my payouts are $5, $4, and $0 respectively. In the highball game, my payouts are $0, $4, and $5 respectively. In the extremeball game, my payouts are $5, $4, and $5 respectively. Now suppose that the real game I’m facing is that one of these games is chosen at uniform random by unobserved die roll. What action should I choose? Clearly ‘middle’, with an expected utility of $4 (compared to $3.33 for either ‘low’ or ‘high’). And when I do choose middle, I hope we can all agree that it’s foul play to say “you fool, you should have chosen low because the game is lowball”, or “you fool, there is no possible world in which that’s the best action”, or “you idiot, that’s literally the worst available action because the game was exrtemeball”. If I knew which game I was playing, I’d play the best move for that game. But insofar as I must enter a single action played against the whole mixture of games, I might have to choose something that’s not the best action in your favorite subgame.
With that in mind, we can now decompose Will’s problem with the bomb into two subgames that I’m bound to play simultaneously.
In one subgame (that happens with probabliity 2 in a trillion trillion, although feel free to assume it’s more likely than that), the predictor is stumped and guesses randomly. We all agree that in that subgame, the best action is to avoid the bomb.
In the other subgame, the predictor is accurate. And insofar as the predictor is accurate, and supposing that we’ve seen the bomb, we can consider our two available actions (taking the bomb, or spending $100 to avoid it). But here something curious happens: insofar as the predictor is accurate, and we see the bomb, and we take it, we refute the problem statement.
That’s not my problem, as an agent. That’s your problem, as the person who stated the problem. If you say “assume you’re facing a perfect predictor, and see the bomb”, then I can validly say “no”. Similar to how if you say “assume you’re going to take the $5 bill, and you can either take the $5 bill or the $10 bill, but if you violate the laws of logic then you get a $100 fine, what do you do?” I can validly say “no”. It’s not my fault that you named a decision problem whose premises I can flatly refute.
Hopefully we all agree that insofar as the predictor is perfect (which, remember, is a case in the case analysis when the predictor is falible), the problem statement here is deeply flawed, because I can by an action of mine refute it outright. The standard rejoinder is a bit of sleight-of-hand, where the person posing the problem says “ah, but the predictor is fallible”. But as we’ve already seen, I can just decompose it right back into two subproblems that we then aggregate across (much like the higball/lowball/extremeball case), at which point one of our case-analyses reveals that insofar as the predictor is accurate, the whole problem-statement is still flawed.
And this isn’t me saying “I wish to be evaluated from an epistemic vantage point that takes into account the other imaginary branches of reality”. This is me saying, your problem statement was wrong. It’s me pointing out that you’re a liar, or at least that I can by a clever choice of actions render you a liar. When you say “the predictor was accurate and you saw the bomb, what do you do?”, and I say “take the bomb”, I don’t get blown up, I reveal your mistake. Your problem statement is indeterminate. You shouldn’ta given me a problem I could refute. I’m not saying “there’s other hypothetical branches of reality that benefit from me taking this bomb”, I’m saying “WRONG, tell me what really happened”. Your story was false, my dude.
There’s some question of what to do when an obviously ill-formed game is mixed in with a properly-formed game, by, eg, adding some uncertainty about whether the predictor is fallible. Like, how are we supposed to analyze games comprising subgames where the problem statement can be refuted in one subgame but not others? And according to me, the obvious answer is that if you say “you are 1% playing problem A and 99% playing problem B”, and if I can by some act refute that I’m playing problem B, then I am perfectly licensed in saying “WRONG (99%)”. Mixing in a little uncertainty (or even a lot of uncertainty!) doesn’t stop you from being wrong (at my will) in the cases where you’re asserting falsehoods about my actions.
So, no, I don’t see the bomb. That’s all but 1 in a trillion trillion ‘WRONG’. What really happened, according to this problem, is—well, it’s still indeterminate, because Will’s problem statement isn’t complete. He doesn’t tell us what happens insofar as the predictor is accurate and I go left if and only if I see the bomb, ie if I always spite the predictor’s prediction. If the predictor would give me a 1 trillion dollar coin for my ingenuity in spiting them, then what really happens is I see a $1T coin (all but 1 in a trillion trillion times). If instead (as is the oral tradition when someone leaves their decision problem indeterminate) counterfactually-spiting the predictor causes me to find myself in a room full of hornets rather than exits, then what really happened is that I saw no bomb (and no hornets), almost certainly.
If you want me to stop denying your problem-statements outright, you’ve gotta stop giving me problem statements that I can (probabilistically) refute by my actions.
Thanks, this comment thread was pretty helpful.
After reading your comments, here’s my current explanation of what’s up with the bomb argument:
Then I’m a bit confused about how to estimate that probability, but I suspect the reasoning goes like this:
Sanity check
As a sanity-check, I note this implies that if the utilities-times-probabilities are different, I would not mind taking the $100 hit. Let’s see what the math says here, and then check whether my intuitions agree.
Suppose I value my life at $1 million. Then I think that I should become more indifferent here when the probability of a mistaken simulation approaches 1 in 100,000, or where the money on the line is closer to $10−17.
[You can skip this, but here’s me stating the two multiplications I compared:
World 1: I fake-kill myself to save $X, with probability 110
World 2: I actually kill myself (cost: $1MM), with probability 1Y
To find the indifference point I want the two multiplications of utility-to-probability to come out to be equal. If X = $100, then Y equals 100,000. If Y is a trillion trillion (1024), then X = 10−17. (Unless I did the math wrong.)]
I think this doesn’t obviously clash with my intuitions, and somewhat matches them.
If the simulator was getting things wrong 1 in 100,000 times, I think I’d be more careful with my life in the “real world case” (insofar as that is a sensible concept). Going further, if you told me they were wrong 1 in 10 times, this would change my action, so there’s got to be a tipping point somewhere, and this seems reasonable for many people (though I actually value my life at more than $1MM).
And if the money was that tiny ($10−17), I’d be fairly open to “not taking even the one-in-a-trillion-trillion chance”. (Though really my intuition is that I don’t care about money way before $10^-17, and would probably not risk anything serious starting at like 0.1 cents, because that sort of money seems kind of irritating to have to deal with. So my intuition doesn’t match perfectly here. Though I think that if I were expecting to play trillions of such games, then I would start to actively care about such tiny amounts of money.)
Whether the predictor is accurate isn’t specified in the problem statement, and indeed can’t be specified in the problem statement (lest the scenario be incoherent, or posit impossible epistemic states of the agent being tested). What is specified is what existing knowledge you, the agent, have about the predictor’s accuracy, and what you observe in the given situation (from which you can perhaps infer additional things about the predictor, but that’s up to you).
In other words, the scenario is: as per the information you have, so far, the predictor has predicted 1 trillion trillion times, and been wrong once (or, some multiple of those numbers—predicted 2 trillion trillion times and been wrong twice, etc.).
You now observe the given situation (note predicting Right, bomb in Left, etc.). What do you do?
Now, we might ask: but is the predictor perfect? How perfect is she? Well… you know that she’s erred once in a trillion trillion times so far—ah, no, make that twice in a trillion trillion times, as of this iteration you now find yourself in. That’s the information you have at your disposal. What can you conclude from that? That’s up to you.
Likewise, you say:
The problem statement absolutely is complete. It asks what you would/should do in the given scenario. There is no need to specify what “would” happen in other (counterfactual) scenarios, because you (the agent) do not observe those scenarios. There’s also no question of what would happen if you “always spite the predictor’s prediction”, because there is no “always”; there’s just the given situation, where we know what happens if you choose Left: you burn to death.
You can certainly say “this scenario has very low probability”. That is reasonable. What you can’t say is “this scenario is logically impossible”, or any such thing. There’s no impossibility or incoherence here.
It’s not complete enough to determine what I do when I don’t see a bomb. And so when the problem statement is corrected to stop flatly asserting consequences of my actions as if they’re facts, you’ll find that my behavior in the corrected problem is underdefined. (If this still isn’t clear, try working out what the predictor does to the agent that takes the bomb if it’s present, but pays the $100 if it isn’t.)
And if we’re really technical, it’s not actually quite complete enough to determine what I do when I see the bomb. That depends on what the predictor does when there are two consistent possible outcomes. Like, if I would go left when there was no bomb and right when there was a bomb, what would the predictor do? If they only place the bomb if I insist on going right when there’s no bomb, then I have no incentive to go left upon seeing a bomb insofar as they’re accurate. To force me to go left, the predictor has to be trigger-happy, dropping the bomb given the slightest opportunity.
Nitpick: you seem to be trying to say something like “let us sidestep questions of predictor mechanics and assume that your only observations were the game-history”. But that doesn’t work. If I wake up in a room with a bomb about to go off in the left exit and a $100 fee to use the right exit, I pay the $100 fee and use the right exit. I have to see a host of other specific observations to start hypothesizing the existence of a predictor who would stop placing bombs if-counterfactually I took them, before I believe I’m in this problem.
I’m happy to leave unspecified the prior knowledge that causes us to believe there’s a predictor, but I object to the claim that we’ve fully specified my epistemic state by letting me see the problem history.
I’m not going to ask whether the predictor is perfect, no. I’m going to assume she isn’t, and then do some calculations, and then assume she is, and do some other calculations. Because it’s often useful to solve decision problems using case-by-case analysis.
And in the second case, your problem statement is revealed to be a lie.
Now, it’s perfectly possible for you to rephrase the problem such that my analysis does not, in some cases, reveal it to be a lie. You could say “the predictor is on the fritz, and you see the bomb” (in which case I’ll go right, thanks). Or you could say “the predictor is 1% on the fritz and 99% accurate, and in the 1% case you see a bomb and in the 99% case you may or may not see a bomb depending what you do”. That’s also fine. But insofar as you flatly postulate a (almost certain) consequence of my action, be prepared for me to assert that you’re (almost certainly) wrong.
There’s impossibliity here precisely insofar as the predictor is accurate.
One intuition pump is: You say “but if-counterfactually you see the bomb on the left, and you take it, you die”. I protest “no, if-counterfactually I see the bomb on the left, then I die insofar as the predictor was on the fritz, and I get a puppy insofar as the predictor was accurate, by the principle of explosion. So I 1-in-a-hundred-trillion-trillion% die, and almost certanily get a puppy. Win.” Like, from my perspective, trying to analyze outcomes under assumptions that depend on my own actions can lead to nonsense.
(My model of a critic says “but surely you can see the intuition that, when you’re actually faced with the bomb, you actually shouldn’t take it?”. And ofc I have that intuition, as a human, I just think it lost in a neutral and fair arbitration process, but that’s a story for a different day. Speaking from the intuition that won, I’d say that it sure would be sad to take the bomb in real life, and that in real life you should spare no expense searching for the evidence that the predictor is on the fritz, which almost certainly would exist and let you avoid the bomb iff the predictor is on the fritz in real life, and if you run your real life right you should never once actually eat a bomb, but also yes, the correct intuition is that you take the bomb knowing it kills you, and that you roll your eyes at the person saying it was a bad move for roughly the same reason you roll your eyes at the person saying that ‘middle’ is never your best option in the high/low/extremeball game—such is the fate of people who are bound to submit a single action to be played against a mixture of games.)
Separately, I note that if you think an agent should behave very differently when a possibility is logically impossible, vs when they assess a probability on the order of 1 in a trillion trillion, then I suspect you’re doing something wrong. (Not least that you’re likely wrong about how much confidence a human can get in any particular logical fact.)
To check, do we have a disagreement about what to do insofar as the predictor is accurate? Like, in the case of the transparent Newcomb’s problem with a literally perfect predictor, would you one-box even if the box were empty?
I don’t understand this objection. The given scenario is that you do see a bomb. The question is: what do you do in the given scenario?
You are welcome to imagine any other scenarios you like, or talk about counterfactuals or what have you. But the scenario, as given, tells you that you know certain things and observe certain things. The scenario does not appear to be in any way impossible. “What do I do when I don’t see a bomb” seems irrelevant to the question, which posits that you do see a bomb.
Er, what? If you take the bomb, you burn to death. Given the scenario, that’s a fact. How can it not be a fact? (Except if the bomb happens to malfunction, or some such thing, which I assume is not what you mean…?)
Well, let’s see. The problem says:
So, if the predictor predicts that I will choose Right, she will put a bomb in Left, in which case I will choose Left. If she predicts that I will choose Left, then she puts no bomb in Left, in which case I will choose Right. This appears to be paradoxical, but that seems to me to be the predictor’s fault (for making an unconditional prediction of the behavior of an agent that will certainly condition its behavior on the prediction), and thus the predictor’s problem.
I… don’t see what bearing this has on the disagreement, though.
What I am saying is that we don’t have access to “questions of predictor mechanics”, only to the agent’s knowledge of “predictor mechanics”. In other words, we’ve fully specified your epistemic state by specifying your epistemic state—that’s all. I don’t know what you mean by calling it “the problem history”. There’s nothing odd about knowing (to some degree of certainty) that certain things have happened. You know there’s a (supposed) predictor, you know that she has (apparently) made such-and-such predictions, this many times, with these-and-such outcomes, etc. What are her “mechanics”? Well, you’re welcome to draw any conclusions about that from what you know about what’s gone before. Again, there is nothing unusual going on here.
I’m sorry, but this really makes very little sense to me. We know the predictor is not quite perfect. That’s one of the things we’re told as a known fact. She’s pretty close to perfect, but not quite. There’s just the one “case” here, and that’s it…
Indeed, the entire “your problem statement is revealed to be a lie” approach seems to me to be nonsensical. This is the given scenario. If you assume that the predictor is perfect and, from that, conclude that the scenario couldn’t’ve come to pass, doesn’t that entail that, by construction, the predictor isn’t perfect (proof by contradiction, as it were)? But then… we already know she isn’t perfect, so what was the point of assuming otherwise?
Well, we know she’s at least slightly inaccurate… obviously, because she got it wrong this time! We know she can get it wrong, because, by construction, she did get it wrong. (Also, of course, we know she’s gotten it wrong before, but that merely overdetermines the conclusion that she’s not quite perfect.)
Well, no. In the given scenario, you do see the bomb on the left. There’s no counterfactuals involved.
But you’re not bound to submit a single action to be played against a mixture of games. In the given scenario, you can choose your action knowing which game you’re playing!
… or, you could just… choose Right. That seems to me to be a clear win.
If an agent finds themselves in a scenario that they think is logically impossible, then they’re obviously wrong about that scenario being logically impossible, as demonstrated by the fact that actually, it did come to pass. So finding oneself in such a scenario should cause one to immediately re-examine one’s beliefs on what is, and is not, logically impossible, and/or one’s beliefs about what scenario one finds oneself in…
How would I come to the belief that the predictor is literally perfect (or even mostly perfect)? Wouldn’t that just look like a string of observations of cases where either (a) there’s money in the one box and the agent takes the one box full of money, or (b) there’s no money in the one box and the agent takes the two boxes (and gets only the lesser quantity)? How exactly would I distinguish between the “perfect predictor” hypothesis and the “people take the one box if it has a million dollars, two boxes otherwise” hypothesis, as an explanation for those observations? (Or, to put it another way, what do I think happens to people who take an empty one box? Well, I can’t have any information about that, right? [Beyond the common sense of “obviously, they get zilch”…] Because if that had ever happened before, well, there goes the “perfect predictor” hypothesis, yes?)
The scenario says “the predictor is likely to be accurate” and then makes an assertion that is (for me, at least) false insofar as the predictor is accurate. You can’t have it both ways. The problem statement (at least partially) contradicts itself. You and I have a disagreement about how to evaluate counterfactuals in cases where the problem statement is partly-self-contradictory.
Sure, it’s the predictor’s problem, and the behavior that I expect of the predictor in the case that I force them to notice they have a problem has a direct effect on what I do if I don’t see a bomb. In particular, if they reward me for showing them their problem, then I’d go right when I see no bomb, whereas if they’d smite me, then I wouldn’t. But you’re right that this is inconsequential when I do see the bomb.
Well for one thing, I just looked and there’s no bomb on the left, so the whole discussion is counter-to-fact. And for another, if I pretend I’m in the scenario, then I choose my action by visualizing the (counterfactual) consequences of taking the bomb, and visualizing the (counterfactual) consequences of refraining. So there are plenty of counterfactuals involved.
I assert that the (counterfactual) consequences of taking the bomb include (almost certainly) rendering the whole scenario impossible, and rendering some other hypothetical (that I don’t need to pay $100 to leave) possible instead. And so according to me, the correct response to someone saying “assume you see the bomb” is to say “no, I shall assume that I see no bomb instead”, because that’s the consequence I visualize of (counterfactually) taking the bomb.
You’re welcome to test it empirically (well, maybe after adding at least $1k to all outcomes to incentivise me to play), if you have an all-but-one-in-a-trillion-trillion accurate predictor-of-me lying around. (I expect your empericism will prove me right, in that the counterfactuals where it shows me a bomb are all in fact rendered impossible, and what happens in reality instead is that I get to leave with $900).
...Also, just to be clear, you’re aware that these are two different internally-consistent but contradictory theories of how to evaluate counterfactuals? Like, we can be pretty confident that there’s no argument a CDT agent can hear that causes them to wish to adopt FDT counterfactuals, and vice versa. Humans come equipped with both intuitions (I can give you other problems that pump the other intuitions, if you’d like), and we have to find some other way to arbitrate the conflict.
(Currently I’m in a mode of attempting to convey the second set of intuitions to you in this one problem, in light of your self-professed difficulty grasping them. Feel free to demonstrate understanding of the second set of intuitions and request that we instead switch to discussing why I think the latter wins in arbitration.)
I’m asking you to do a case-analysis. For concreteness, suppose you get to inspect the predictor’s computing substrate and source code, and you do a bunch of pondering and thinking, and you’re like “hmm, yes, after lots of careful consideration, I now think that I am 90% likely to be facing a transparent Newcomb’s problem, and 10% likely to be in some other situation, eg, where the code doesn’t work how I think it does, or cosmic rays hit the computer, or what-have-you”. Now we agree that one possible thing that could have happened, is that the predictor accurately predicted you in the past, and we can freely discuss what you should do insofar as that’s the case.
Like, presumably when I present you with the high/low/extremeball game, you have no problem granting statements like “insofar as the die came up ‘highball’, I should choose ‘high’”. Saying “but how could I possibly have come to the epistemic state where I was 100% certain that the die came up ‘highball’???” is a nonsequitur. Your analysis in the face of uncertainty can be decomposed into an aggregation of your analyses in each scenario that might obtain. And so I ask again, insofar as the predictor accurately predicted you—which is surely one of many possible cases to analyze, after examining the mind of the entity that sure looks like it predicted you—what action would you prescribe?
Well… no, the scenario says “the predictor has predicted correctly 1 trillion trillion minus one times, and incorrectly one time”. Does that make it “likely to be accurate”? You tell me, I guess, but that seems like an unnecessarily vague characterization of a precise description.
What do you mean by this? What’s contradictory about the predictor making a mistake? Clearly, it’s not perfect. We know this because it made at least one mistake in the past, and then another mistake just now.
Is the predictor “accurate”? Well, it’s approximately as accurate as it takes to guess 1 trillion trillion times and only be wrong once…
I confess that this reads like moon logic to me. It’s possible that there’s something fundamental I don’t understand about what you’re saying.
I am not familiar with this, no. If you have explanatory material / intuition pumps / etc. to illustrate this, I’d certainly appreciate it!
I am not asking how I could come to believe the “literally perfect predictor” thing with 100% certainty; I am asking how I could come to believe it at all (with, let’s say, > 50% certainty).
Hold on, hold on. Are we talking about repeated plays of the same game? Where I face the same situation repeatedly? Or are we talking about observing (or learning about) the predictor playing the game with other people before me?
The “Bomb” scenario described in the OP says nothing about repeated play. If that’s an assumption you’re introducing, I think it needs to be made explicit…
I deny that we have a precise description. If you listed out a specific trillion trillion observations that I allegedly made, then we could talk about whether those particular observations justify thinking that we’re in the game with the bomb. (If those trillion trillion observations were all from me waking up in a strange room and interacting with it, with no other context, then as noted above, I would have no reason to believe I’m in this game as opposed to any variety of other games consistent with those observations.) The scenario vaguely alleges that we think we’re facing an accurate predictor, and then alleges that their observed failure rate (on an unspecified history against unspecified players) is 1 per trillion-trillion. It does not say how or why we got into the epistemic state of thinking that there’s an accurate predictor there; we assume this by fiat.
(To be clear, I’m fine with assuming this by fiat. I’m simply arguing that your reluctance to analyze the problem by cases seems strange and likely erroneous to me.)
The canonical explanatory text is the FDT paper (PDF warning) (that the OP is responding to a critique of, iirc), and there’s a bunch of literature on LW (maybe start at the wiki page on UDT? Hopefully we have one of those) exploring various intuitions. If you’re not familiar with this style of logic, I recommend starting there (ah look we do have a UDT wiki page). I might write up some fresh intuition pumps later, to try to improve the exposition. (We’ve sure got a lot of exposition if you dig through the archives, but I think there are still a bunch of gaps.)
Uh, I mean, we could play a version of this game where the sums were positive (such that you’d want to play) and smaller (such that I could fund the prizes), and then I could say “yo Said Achmiz, I decided to act as a predictor of you in this game, I’m about to show you either a box with $10 and a box with $1, or an empty box and a box with $1, depending on how I predicted you’ll behave”. And then I think that at least the other readers on this post would predict that I have >50% probability of predicting your behavior correctly. You haven’t exactly been subtle about how you make decisions.
Predicting someone accurately with >50% likelihood isn’t all that hard. A coin predicts you “perfectly accurately” 50% of the time, you’ve only gotta do slightly better than that.
I suspect you’re confused here in putting “perfect prediction” on some big pedestal. the predictor is trying to reason about you, and they either reason basically-correctly, or their reasoning includes some mistake. And insofar as they reason validly to a conclusion about the right person, they’re going to get the answer right.
And again, note that you’re not exactly a tricky case yourself. You’re kinda broadcasting your decision-making procedure all over the internet. You don’t have to be a supermind studying a brainscan to figure out that Said Achmiz takes the extra $1.
My girlfriend often reasons accurately about whether I’m getting dinner tonight. Sometimes she reasons wrongly about it (eg, she neglects that I had a late lunch) but gets the right answer by chance anyway. Sometimes she reasons wrongly about it and gets the wrong answer by chance. But often she reasons correctly about it, and gets the right answer for the right reasons. And if I’m standing in the supermarket wondering whether she already ate or whether I should buy enough cheese to make her some food too, I have no trouble thinking “well, insofar as she got the right answer for the right reasons tonight, she knew I’d be hungry when I got home, and so she hasn’t eaten yet”. None of this case-analysis requires me to believe she’s a supermind who studied a scan of my brain for twelve thousand years. Believing that she was right about me for the right reasons is just not a very difficult epistemic state to enter. My reasons for doing most of what I do just aren’t all that complicated. It’s often not all that tricky to draw the right conclusions about what people do for the right reasons.
Furthermore, I note that in the high/low/extremeball game, I have no trouble answering questions about what I should do insofar as the game is highball, even though I have <50% (and in fact ~33%) probability on that being the case. In fact, I could analyze payouts insofar as the game is highball, even if the game was only 0.1% likely to be highball! In fact, the probability of me actually believing I face a given decision problem is almost irrelevant to my ability to analyze the payouts of different actions. As evidenced in part by the fact that I am prescribing actions in this incredibly implausible scenario with bombs and predictors.
And so in this wild hypothetical situation where we’ve somehow (by fiat) become convinced that there’s an ancient predictor predicting our actions (which is way weirder than my girlfriend predicting my action, tbh), presumably one of our hypotheses for what’s going on is that the predictor correctly deduced what we would do, for the correct reasons. Like, that’s one of many hypotheses that could explain our past observations. And so, assuming that hypothesis, for the purpose of analysis—and recalling that I can analyze which action is best assuming the game is highball, without making any claim about how easy or hard it is to figure out whether the game is highball—assuming that hypothesis, what action would you prescribe?
Let’s be more precise, then, and speak in terms of “correctness” rather than “accuracy”. There are then two possibilities in the “bomb” scenario as stipulated:
The predictor thought I would take the right box, and was correct.
The predictor thought I would take the right box, and was incorrect.
Now, note the following interesting property of the above two possibilities: I get to choose which of them is realized. I cannot change what the predictor thought, nor can I change its actions conditional on its prediction (which in the stipulated case involves placing a bomb in the left box), but I can choose to make its prediction correct or incorrect, depending on whether I take the box it predicted I would take.
Observe now the following interesting corollary of the above argument: it implies the existence of a certain strategy, which we might call “ObeyBot”, which always chooses the action that confirms the predictor’s prediction, and does so regardless of payoffs: by hypothesis, ObeyBot does not care about burning to death, nor does it care about losing $100; it cares only for making the predictor right as often as it can.
Now, suppose I were to tell you that the predictor’s stipulated track record in this game (10^24 − 1 successes, and 1 failure) were achieved against a population consisting (almost) entirely of ObeyBots. What would you make of this claim?
It should be quite obvious that there is only one conclusion you can draw: the predictor’s track record establishes absolutely nothing about its predictive capabilities in general. When faced with a population of agents that always do whatever the predictor says they will do, any “predictor”—from the superintelligent Omega to a random number generator—will achieve an accuracy of 100%.
This is an extremely important observation! What it shows is that the predictor’s claimed track record can mean very different things depending on what kind of agents it was playing against… which in turn means that the problem as stated is underdetermined: knowledge of the predictor’s track record does not by itself suffice to pin down the predictor’s actual behavior, unless we are also given information about what kind of agents the predictor was playing against.
Now, obviously whoever came up with the “bomb” scenario presumably did not want us to assume that their predictor is only accurate against ObeyBots and no one else. But to capture this notion requires more than merely stating the predictor’s track record; it requires the further supposition that the predictor’s accuracy is a persistent feature of the predictor, rather than of the agents it was playing against. It requires, in short, the very thing you criticized as an “unnecessarily vague characterization”: the statement that the predictor is, in fact, “likely to be accurate”.
With this as a prerequisite, we are now equipped to address your next question:
(First, a brief recap: we are asked to assume the existence of a predictor with an extremely high level of predictive accuracy. Moreover, although the problem statement did not actually specify, it is likely we are meant to assume that this predictor’s accuracy persists across a large reference class of possible agent designs; specifically—and crucially—this reference class must be large enough to include our own decision algorithm, whatever it may be.)
Now, with this in mind, your (Said’s) question is: what could possibly be contradictory about this scenario? What could be contradictory about a scenario that asks us to assume the existence of a predictor that can accurately predict the behavior of a large reference class of agents, including ourselves?
At this point, it should be quite obvious where the potential contradiction lies: it arises directly from the fact that an agent who finds itself in the described scenario can choose to make the predictor right or wrong. Earlier, I exploited this property to construct a strategy (ObeyBot) which makes the predictor right no matter what; but now I will do the opposite, and construct a strategy that always chooses whichever action falsifies the predictor’s prediction. For thematic appropriateness, let us call this strategy “SpiteBot”.
I assert that it is impossible for any predictor to achieve any accuracy against SpiteBot higher than 0%. Therefore, if we assume that I myself am a SpiteBot, the contradiction becomes immediately clear: it is impossible for the predictor to achieve the stipulated predictive accuracy across any reference class of agents broad enough to include SpiteBot, and yet in order for the predictor’s reference class to include me, it must include SpiteBot, since I am a SpiteBot. Contradiction!
Of course, this only establishes that the problem statement is contradictory if the reader happens to be a SpiteBot. And while that might be somewhat amusing to imagine, it is unlikely that any of us (the intended audience of the problem) are SpiteBots, or use any decision theory akin to SpiteBot. So let us ask: are there any other decision theories that manage to achieve a similar effect—any other decision theories that, under at least some subset of circumstances, turn into SpiteBot, thereby rendering the problem formulation contradictory?
Asked that way, the question answers itself: any decision theory that behaves like SpiteBot at least some of the time, will generate contradictions with the problem statement at least some of the time. And (here is the key point) if those contradictions are to avoided, the hypothesized predictor must avoid whatever set of circumstances triggers the SpiteBot-like behavior, lest it falsify its own alleged predictive accuracy… which implies the following corollaries:
The predictor cannot show its face (or its boxes) before SpiteBot. Any native SpiteBots in this universe will therefore never find themselves in this particular “bomb” scenario, nor any relatives.
Any strategies that behave like SpiteBot some of the time, will not encounter any “bomb”-like scenarios during that time. For example, the strategy “Choose your actions normally except on Wednesdays; behave like SpiteBot on Wednesdays” will not encounter any bomb-like scenarios on Wednesday.
If you don’t like the idea of ending up in a “bomb”-like scenario, you should precommit to behaving like SpiteBot in the face of such a scenario. If you actually do this properly, you will never see a real predictor give you a bomb-like scenario, which means that the only time you will be faced with any kind of bomb-like scenario will be in your imagination, perhaps at the prompting of some inquisitive philosopher. At this point you are free to ask the philosopher how his stipulated predictor reacts to SpiteBot and SpiteBot-like strategies, and—if he fails to give you a convincing answer—you are free to dismiss his premise as contradictory, and his scenario as something you don’t have to worry about.
(The general case of “behaving like SpiteBot when confronted with situations you’d really rather not have encountered to begin with” is called “fuck-you decision theory”, or FDT for short.)
Yes, it was this exact objection that I addressed in my previous replies that relied upon a misreading of the problem. I missed that the boxes were open and thought that the only clue to the prediction was the note that was left.
The only solution was to assume that the predictor does not always leave a note, and this solution also works for the stated scenario. You see that the boxes are open, and the left one contains a bomb, but did everyone else? Did anyone else? The problem setup doesn’t say.
This sort of vagueness leaves holes big enough to drive a truck through. The stated FDT support for picking Left depends absolutely critically on the subjunctive dependency odds being at least many millions to one, and the stated evidence is nowhere near strong enough to support that. Failing that, FDT recommends picking Right.
So the whole scenario is pointless. It doesn’t explore what it was intended to explore.
You can modify the problem to say that the predictor really is that reliable for every agent, but doesn’t always leave the boxes open for you or write a note. This doesn’t mean that the predictor is perfectly reliable, so a SpiteBot can still face this scenario but is just extremely unlikely to.
There IS a question of what would happen if you “always spite the predictor’s prediction”, since doing so seems to make the 1 in a trillion trillion error rate impossible.
To be clear, FDT does not accept causation that happens backwards in time. It’s not claiming that the action of one-boxing itself causes there to be a million dollars in the box. It’s the agent’s algorithm, and, further down the causal diagram, Omega’s simulation of this algorithm that causes the million dollars. The causation happens before the prediction and is nothing special in that sense.
Yes, sure. Indeed we don’t need to accept causation of any kind, in any temporal direction. We can simply observe that one-boxers get a million dollars, and two-boxers do not. (In fact, even if we accept shminux’s model, this changes nothing about what the correct choice is.)
Eh? This kind of reasoning leads to failing to smoke on Smoking Lesion.
The main point of FDT is that it gives the optimal expected utilityon averagefor agents using it. It does not guarantee optimal expected utility forevery instanceof an agent using it.Suppose you have a population of two billion agents, each going through this scenario every day. Upon seeing a note predicting right, one billion would pick left and one billion would pick right. We can assume that they all pick left if they see a note predicting left or no note at all.Every year, the Right agents essentially always see a note predicting right, and pay more than $30000 each. The Left agents essentially always see a note predicting left (or no note) and pay $0 each.The average rate of deaths is comparable: one death per few trillion years in each group, which is to say, essentially never. They all know that itcouldhappen, of course.Which group is better off?Edit: I misread Predictor’s accuracy. It does not say that it is in all scenarios 1 − 10^-24, just that in some unknown sample of scenarios, it was 1 − 10^-24. This changes the odds so much that FDT does not recommend taking the left box.
Obviously, the group that’s better off is the third group: the one that picks Left if there’s no bomb in there, Right otherwise.
… I mean, seriously, what the heck? The scenario specifies that the boxes are open! You can see what’s in there! How is this even a question?
(Bonus question: what will the predictor say about the behavior of this third group? What choice will she predict a member of this group will make?)
Two questions, if I may:
Why do you read it this way? The problem simply states the failure rate is 1 in a trillion trillion.
If we go with your interpretation, why exactly does that change things? It seems to me that the sample size would have to be extemely huge in order to determine a failure rate that low.
It depends upon what the meaning of the word “is” is:
The failure rate has been tested over an immense number of prediction, and evaluated as 10^-24 (to one significant figure). That is the currently accepted estimate for the predictor’s error rate for scenarios randomly selected from the sample.
The failure rate is theoretically 10^-24, over some assumed distribution of agent types. Your decision model may or may not appear anywhere in this distribution.
The failure rate is bounded above by 10^-24 for every possible scenario.
A self-harming agent in this scenario cannot be consistently predicted by Predictor at all (success rate 0%), so we know that (3) is definitely false.
(1) and (2) aren’t strong enough, because it gives little information about Predictor’s error rate concerning your scenario and your decision model.
We have essentially zero information about Predictor’s true error bounds regarding agents that sometimes carry out self-harming actions. In order to recommend taking the left box, an FDT agent is one that sometimes carries out self-harming actions, though this requires that the upper bound on Predictor’s failure of subjunctive dependency is less than the ratio of the utilities of: paying $100, and burning to death all intelligent life in the universe.
We do not have anywhere near enough information to justify that tight a bound. So FDT can’t recommend such an action. Maybe someone else can write a scenario that is in similar spirit, but isn’t so flawed.
Thanks, I appreciate this. Your answer clarifies a lot, and I will think about it more.
Another way of phrasing it: you don’t get the $100 marginal payoff if you’renot preparedto knowingly go to your death in the incredibly unlikely event of a particular type of misprediction.That’s the sense in which I meant “you got screwed”. You entered the scenario knowing that it was incredibly unlikely that you would die regardless of what you decide, but wereprepared to acceptthat incredibly microscopic chance of death in exchange for keeping your $100. The odds just went against you.Edit: If Predictor’s actual bound on error rate was 10^-24, this would be valid. However, Predictor’s bound on error rate cannot be 10^-24 in all scenarios, so this is all irrelevant. What a waste of time.