It’s not complete enough to determine what I do when I don’t see a bomb. And so when the problem statement is corrected to stop flatly asserting consequences of my actions as if they’re facts, you’ll find that my behavior in the corrected problem is underdefined. (If this still isn’t clear, try working out what the predictor does to the agent that takes the bomb if it’s present, but pays the $100 if it isn’t.)
And if we’re really technical, it’s not actually quite complete enough to determine what I do when I see the bomb. That depends on what the predictor does when there are two consistent possible outcomes. Like, if I would go left when there was no bomb and right when there was a bomb, what would the predictor do? If they only place the bomb if I insist on going right when there’s no bomb, then I have no incentive to go left upon seeing a bomb insofar as they’re accurate. To force me to go left, the predictor has to be trigger-happy, dropping the bomb given the slightest opportunity.
What is specified is what existing knowledge you, the agent, have about the predictor’s accuracy, and what you observe in the given situation
Nitpick: you seem to be trying to say something like “let us sidestep questions of predictor mechanics and assume that your only observations were the game-history”. But that doesn’t work. If I wake up in a room with a bomb about to go off in the left exit and a $100 fee to use the right exit, I pay the $100 fee and use the right exit. I have to see a host of other specific observations to start hypothesizing the existence of a predictor who would stop placing bombs if-counterfactually I took them, before I believe I’m in this problem.
I’m happy to leave unspecified the prior knowledge that causes us to believe there’s a predictor, but I object to the claim that we’ve fully specified my epistemic state by letting me see the problem history.
Now, we might ask: but is the predictor perfect? How perfect is she?
I’m not going to ask whether the predictor is perfect, no. I’m going to assume she isn’t, and then do some calculations, and then assume she is, and do some other calculations. Because it’s often useful to solve decision problems using case-by-case analysis.
And in the second case, your problem statement is revealed to be a lie.
Now, it’s perfectly possible for you to rephrase the problem such that my analysis does not, in some cases, reveal it to be a lie. You could say “the predictor is on the fritz, and you see the bomb” (in which case I’ll go right, thanks). Or you could say “the predictor is 1% on the fritz and 99% accurate, and in the 1% case you see a bomb and in the 99% case you may or may not see a bomb depending what you do”. That’s also fine. But insofar as you flatly postulate a (almost certain) consequence of my action, be prepared for me to assert that you’re (almost certainly) wrong.
You can certainly say “this scenario has very low probability”. That is reasonable. What you can’t say is “this scenario is logically impossible”, or any such thing. There’s no impossibility or incoherence here.
There’s impossibliity here precisely insofar as the predictor is accurate.
One intuition pump is: You say “but if-counterfactually you see the bomb on the left, and you take it, you die”. I protest “no, if-counterfactually I see the bomb on the left, then I die insofar as the predictor was on the fritz, and I get a puppy insofar as the predictor was accurate, by the principle of explosion. So I 1-in-a-hundred-trillion-trillion% die, and almost certanily get a puppy. Win.” Like, from my perspective, trying to analyze outcomes under assumptions that depend on my own actions can lead to nonsense.
(My model of a critic says “but surely you can see the intuition that, when you’re actually faced with the bomb, you actually shouldn’t take it?”. And ofc I have that intuition, as a human, I just think it lost in a neutral and fair arbitration process, but that’s a story for a different day. Speaking from the intuition that won, I’d say that it sure would be sad to take the bomb in real life, and that in real life you should spare no expense searching for the evidence that the predictor is on the fritz, which almost certainly would exist and let you avoid the bomb iff the predictor is on the fritz in real life, and if you run your real life right you should never once actually eat a bomb, but also yes, the correct intuition is that you take the bomb knowing it kills you, and that you roll your eyes at the person saying it was a bad move for roughly the same reason you roll your eyes at the person saying that ‘middle’ is never your best option in the high/low/extremeball game—such is the fate of people who are bound to submit a single action to be played against a mixture of games.)
Separately, I note that if you think an agent should behave very differently when a possibility is logically impossible, vs when they assess a probability on the order of 1 in a trillion trillion, then I suspect you’re doing something wrong. (Not least that you’re likely wrong about how much confidence a human can get in any particular logical fact.)
To check, do we have a disagreement about what to do insofar as the predictor is accurate? Like, in the case of the transparent Newcomb’s problem with a literally perfect predictor, would you one-box even if the box were empty?
It’s not complete enough to determine what I do when I don’t see a bomb.
I don’t understand this objection. The given scenario is that you do see a bomb. The question is: what do you do in the given scenario?
You are welcome to imagine any other scenarios you like, or talk about counterfactuals or what have you. But the scenario, as given, tells you that you know certain things and observe certain things. The scenario does not appear to be in any way impossible. “What do I do when I don’t see a bomb” seems irrelevant to the question, which posits that you do see a bomb.
… flatly asserting consequences of my actions as if they’re facts …
Er, what? If you take the bomb, you burn to death. Given the scenario, that’s a fact. How can it not be a fact? (Except if the bomb happens to malfunction, or some such thing, which I assume is not what you mean…?)
(If this still isn’t clear, try working out what the predictor does to the agent that takes the bomb [Left] if it’s present, but pays the $100 [Right] if it isn’t.)
Well, let’s see. The problem says:
If the predictor predicted that you would choose Right, then she put a bomb in Left. If the predictor predicted that you would choose Left, then she did not put a bomb in Left, and the box is empty.
So, if the predictor predicts that I will choose Right, she will put a bomb in Left, in which case I will choose Left. If she predicts that I will choose Left, then she puts no bomb in Left, in which case I will choose Right. This appears to be paradoxical, but that seems to me to be the predictor’s fault (for making an unconditional prediction of the behavior of an agent that will certainly condition its behavior on the prediction), and thus the predictor’s problem.
I… don’t see what bearing this has on the disagreement, though.
Nitpick: you seem to be trying to say something like “let us sidestep questions of predictor mechanics and assume that your only observations were the game-history”. But that doesn’t work. If I wake up in a room with a bomb about to go off in the left exit and a $100 fee to use the right exit, I pay the $100 fee and use the right exit. I have to see a host of other specific observations to start hypothesizing the existence of a predictor who would stop blacing bombs if-counterfactually I took them, before I believe I’m in this problem.
I’m happy to leave unspecified the prior knowledge that causes us to believe there’s a predictor, but I object to the claim that we’ve fully specified my epistemic state by letting me see the problem history.
What I am saying is that we don’t have access to “questions of predictor mechanics”, only to the agent’s knowledge of “predictor mechanics”. In other words, we’ve fully specified your epistemic state by specifying your epistemic state—that’s all. I don’t know what you mean by calling it “the problem history”. There’s nothing odd about knowing (to some degree of certainty) that certain things have happened. You know there’s a (supposed) predictor, you know that she has (apparently) made such-and-such predictions, this many times, with these-and-such outcomes, etc. What are her “mechanics”? Well, you’re welcome to draw any conclusions about that from what you know about what’s gone before. Again, there is nothing unusual going on here.
I’m not going to ask whether the predictor is perfect, no. I’m going to assume she isn’t, and then do some calculations, and then assume she is, and do some other calculations. Because it’s often useful to solve decision problems using case-by-case analysis.
And in the second case, your problem statement is revealed to be a lie.
I’m sorry, but this really makes very little sense to me. We know the predictor is not quite perfect. That’s one of the things we’re told as a known fact. She’s pretty close to perfect, but not quite. There’s just the one “case” here, and that’s it…
Indeed, the entire “your problem statement is revealed to be a lie” approach seems to me to be nonsensical. This is the given scenario. If you assume that the predictor is perfect and, from that, conclude that the scenario couldn’t’ve come to pass, doesn’t that entail that, by construction, the predictor isn’t perfect (proof by contradiction, as it were)? But then… we already know she isn’t perfect, so what was the point of assuming otherwise?
There’s impossibliity here precisely insofar as the predictor is accurate.
Well, we know she’s at least slightly inaccurate… obviously, because she got it wrong this time! We know she can get it wrong, because, by construction, she did get it wrong. (Also, of course, we know she’s gotten it wrong before, but that merely overdetermines the conclusion that she’s not quite perfect.)
One intuition pump is: You say “but if-counterfactually you see the bomb on the left, and you take it, you die”.
Well, no. In the given scenario, you do see the bomb on the left. There’s no counterfactuals involved.
… you roll your eyes at the person saying it was a bad move for roughly the same reason you roll your eyes at the person saying that ‘middle’ is never your best option in the high/low/extremeball game—such is the fate of people who are bound to submit a single action to be played against a mixture of games.
But you’re not bound to submit a single action to be played against a mixture of games. In the given scenario, you can choose your action knowing which game you’re playing!
… in real life you should spare no expense searching for the evidence that the predictor is on the fritz, which almost certainly would exist and let you avoid the bomb iff the predictor is on the fritz in real life …
… or, you could just… choose Right. That seems to me to be a clear win.
Separately, I note that if you think an agent should behave differently when a possibility is logically impossible, vs when they assess a probability on the order of 1 in a googleplex, then I suspect you’re doing something wrong. (Not least that you’re likely wrong about how much confidence a human can get in any particular logical fact.)
If an agent finds themselves in a scenario that they think is logically impossible, then they’re obviously wrong about that scenario being logically impossible, as demonstrated by the fact that actually, it did come to pass. So finding oneself in such a scenario should cause one to immediately re-examine one’s beliefs on what is, and is not, logically impossible, and/or one’s beliefs about what scenario one finds oneself in…
To check, do we have a disagreement about what to do insofar as the predictor is accurate? Like, in the case of the transparent Newcomb’s problem with a literally perfect predictor, would you one-box even if the box were empty?
How would I come to the belief that the predictor is literally perfect (or even mostly perfect)? Wouldn’t that just look like a string of observations of cases where either (a) there’s money in the one box and the agent takes the one box full of money, or (b) there’s no money in the one box and the agent takes the two boxes (and gets only the lesser quantity)? How exactly would I distinguish between the “perfect predictor” hypothesis and the “people take the one box if it has a million dollars, two boxes otherwise” hypothesis, as an explanation for those observations? (Or, to put it another way, what do I think happens to people who take an empty one box? Well, I can’t have any information about that, right? [Beyond the common sense of “obviously, they get zilch”…] Because if that had ever happened before, well, there goes the “perfect predictor” hypothesis, yes?)
The scenario does not appear to be in any way impossible.
The scenario says “the predictor is likely to be accurate” and then makes an assertion that is (for me, at least) false insofar as the predictor is accurate. You can’t have it both ways. The problem statement (at least partially) contradicts itself. You and I have a disagreement about how to evaluate counterfactuals in cases where the problem statement is partly-self-contradictory.
This appears to be paradoxical, but that seems to me to be the predictor’s fault
Sure, it’s the predictor’s problem, and the behavior that I expect of the predictor in the case that I force them to notice they have a problem has a direct effect on what I do if I don’t see a bomb. In particular, if they reward me for showing them their problem, then I’d go right when I see no bomb, whereas if they’d smite me, then I wouldn’t. But you’re right that this is inconsequential when I do see the bomb.
In the given scenario, you do see the bomb on the left. There’s no counterfactuals involved.
Well for one thing, I just looked and there’s no bomb on the left, so the whole discussion is counter-to-fact. And for another, if I pretend I’m in the scenario, then I choose my action by visualizing the (counterfactual) consequences of taking the bomb, and visualizing the (counterfactual) consequences of refraining. So there are plenty of counterfactuals involved.
I assert that the (counterfactual) consequences of taking the bomb include (almost certainly) rendering the whole scenario impossible, and rendering some other hypothetical (that I don’t need to pay $100 to leave) possible instead. And so according to me, the correct response to someone saying “assume you see the bomb” is to say “no, I shall assume that I see no bomb instead”, because that’s the consequence I visualize of (counterfactually) taking the bomb.
You’re welcome to test it empirically (well, maybe after adding at least $1k to all outcomes to incentivise me to play), if you have an all-but-one-in-a-trillion-trillion accurate predictor-of-me lying around. (I expect your empericism will prove me right, in that the counterfactuals where it shows me a bomb are all in fact rendered impossible, and what happens in reality instead is that I get to leave with $900).
...Also, just to be clear, you’re aware that these are two different internally-consistent but contradictory theories of how to evaluate counterfactuals? Like, we can be pretty confident that there’s no argument a CDT agent can hear that causes them to wish to adopt FDT counterfactuals, and vice versa. Humans come equipped with both intuitions (I can give you other problems that pump the other intuitions, if you’d like), and we have to find some other way to arbitrate the conflict.
(Currently I’m in a mode of attempting to convey the second set of intuitions to you in this one problem, in light of your self-professed difficulty grasping them. Feel free to demonstrate understanding of the second set of intuitions and request that we instead switch to discussing why I think the latter wins in arbitration.)
How would I come to the belief that the predictor is literally perfect (or even mostly perfect)?
I’m asking you to do a case-analysis. For concreteness, suppose you get to inspect the predictor’s computing substrate and source code, and you do a bunch of pondering and thinking, and you’re like “hmm, yes, after lots of careful consideration, I now think that I am 90% likely to be facing a transparent Newcomb’s problem, and 10% likely to be in some other situation, eg, where the code doesn’t work how I think it does, or cosmic rays hit the computer, or what-have-you”. Now we agree that one possible thing that could have happened, is that the predictor accurately predicted you in the past, and we can freely discuss what you should do insofar as that’s the case.
Like, presumably when I present you with the high/low/extremeball game, you have no problem granting statements like “insofar as the die came up ‘highball’, I should choose ‘high’”. Saying “but how could I possibly have come to the epistemic state where I was 100% certain that the die came up ‘highball’???” is a nonsequitur. Your analysis in the face of uncertainty can be decomposed into an aggregation of your analyses in each scenario that might obtain. And so I ask again, insofar as the predictor accurately predicted you—which is surely one of many possible cases to analyze, after examining the mind of the entity that sure looks like it predicted you—what action would you prescribe?
The scenario says “the predictor is likely to be accurate”
Well… no, the scenario says “the predictor has predicted correctly 1 trillion trillion minus one times, and incorrectly one time”. Does that make it “likely to be accurate”? You tell me, I guess, but that seems like an unnecessarily vague characterization of a precise description.
and then makes an assertion that is (for me, at least) false insofar as the predictor is accurate. You can’t have it both ways. The problem statement (at least partially) contradicts itself.
What do you mean by this? What’s contradictory about the predictor making a mistake? Clearly, it’s not perfect. We know this because it made at least one mistake in the past, and then another mistake just now.
Is the predictor “accurate”? Well, it’s approximately as accurate as it takes to guess 1 trillion trillion times and only be wrong once…
I assert that the (counterfactual) consequences of taking the bomb include (almost certainly) rendering the whole scenario impossible, and rendering some other hypothetical (that I don’t need to pay $100 to leave) possible instead. And so according to me, the correct response to someone saying “assume you see the bomb” is to say “no, I shall assume that I see no bomb instead”, because that’s the consequence I visualize of (counterfactually) taking the bomb.
I confess that this reads like moon logic to me. It’s possible that there’s something fundamental I don’t understand about what you’re saying.
...Also, just to be clear, you’re aware that these are two different internally-consistent but contradictory theories of how to evaluate counterfactuals? Like, we can be pretty confident that there’s no argument a CDT agent can hear that causes them to wish to adopt FDT counterfactuals, and vice versa. Humans come equipped with both intuitions (I can give you other problems that pump the other intuitions, if you’d like), and we have to find some other way to arbitrate the conflict.
I am not familiar with this, no. If you have explanatory material / intuition pumps / etc. to illustrate this, I’d certainly appreciate it!
Saying “but how could I possibly have come to the epistemic state where I was 100% certain that the die came up ‘highball’???” is a nonsequitur.
I am not asking how I could come to believe the “literally perfect predictor” thing with 100% certainty; I am asking how I could come to believe it at all (with, let’s say, > 50% certainty).
Now we agree that one possible thing that could have happened, is that the predictor accurately predicted you in the past, and we can freely discuss what you should do insofar as that’s the case.
Hold on, hold on. Are we talking about repeated plays of the same game? Where I face the same situation repeatedly? Or are we talking about observing (or learning about) the predictor playing the game with other people before me?
The “Bomb” scenario described in the OP says nothing about repeated play. If that’s an assumption you’re introducing, I think it needs to be made explicit…
that seems like an unnecessarily vague characterization of a precise description
I deny that we have a precise description. If you listed out a specific trillion trillion observations that I allegedly made, then we could talk about whether those particular observations justify thinking that we’re in the game with the bomb. (If those trillion trillion observations were all from me waking up in a strange room and interacting with it, with no other context, then as noted above, I would have no reason to believe I’m in this game as opposed to any variety of other games consistent with those observations.) The scenario vaguely alleges that we think we’re facing an accurate predictor, and then alleges that their observed failure rate (on an unspecified history against unspecified players) is 1 per trillion-trillion. It does not say how or why we got into the epistemic state of thinking that there’s an accurate predictor there; we assume this by fiat.
(To be clear, I’m fine with assuming this by fiat. I’m simply arguing that your reluctance to analyze the problem by cases seems strange and likely erroneous to me.)
I am not familiar with this, no. If you have explanatory material / intuition pumps / etc. to illustrate this, I’d certainly appreciate it!
The canonical explanatory text is the FDT paper (PDF warning) (that the OP is responding to a critique of, iirc), and there’s a bunch of literature on LW (maybe start at the wiki page on UDT? Hopefully we have one of those) exploring various intuitions. If you’re not familiar with this style of logic, I recommend starting there (ah look we do have a UDT wiki page). I might write up some fresh intuition pumps later, to try to improve the exposition. (We’ve sure got a lot of exposition if you dig through the archives, but I think there are still a bunch of gaps.)
I am not asking how I could come to believe the “literally perfect predictor” thing with 100% certainty; I am asking how I could come to believe it at all (with, let’s say, > 50% certainty).
Uh, I mean, we could play a version of this game where the sums were positive (such that you’d want to play) and smaller (such that I could fund the prizes), and then I could say “yo Said Achmiz, I decided to act as a predictor of you in this game, I’m about to show you either a box with $10 and a box with $1, or an empty box and a box with $1, depending on how I predicted you’ll behave”. And then I think that at least the other readers on this post would predict that I have >50% probability of predicting your behavior correctly. You haven’t exactly been subtle about how you make decisions.
Predicting someone accurately with >50% likelihood isn’t all that hard. A coin predicts you “perfectly accurately” 50% of the time, you’ve only gotta do slightly better than that.
I suspect you’re confused here in putting “perfect prediction” on some big pedestal. the predictor is trying to reason about you, and they either reason basically-correctly, or their reasoning includes some mistake. And insofar as they reason validly to a conclusion about the right person, they’re going to get the answer right.
And again, note that you’re not exactly a tricky case yourself. You’re kinda broadcasting your decision-making procedure all over the internet. You don’t have to be a supermind studying a brainscan to figure out that Said Achmiz takes the extra $1.
My girlfriend often reasons accurately about whether I’m getting dinner tonight. Sometimes she reasons wrongly about it (eg, she neglects that I had a late lunch) but gets the right answer by chance anyway. Sometimes she reasons wrongly about it and gets the wrong answer by chance. But often she reasons correctly about it, and gets the right answer for the right reasons. And if I’m standing in the supermarket wondering whether she already ate or whether I should buy enough cheese to make her some food too, I have no trouble thinking “well, insofar as she got the right answer for the right reasons tonight, she knew I’d be hungry when I got home, and so she hasn’t eaten yet”. None of this case-analysis requires me to believe she’s a supermind who studied a scan of my brain for twelve thousand years. Believing that she was right about me for the right reasons is just not a very difficult epistemic state to enter. My reasons for doing most of what I do just aren’t all that complicated. It’s often not all that tricky to draw the right conclusions about what people do for the right reasons.
Furthermore, I note that in the high/low/extremeball game, I have no trouble answering questions about what I should do insofar as the game is highball, even though I have <50% (and in fact ~33%) probability on that being the case. In fact, I could analyze payouts insofar as the game is highball, even if the game was only 0.1% likely to be highball! In fact, the probability of me actually believing I face a given decision problem is almost irrelevant to my ability to analyze the payouts of different actions. As evidenced in part by the fact that I am prescribing actions in this incredibly implausible scenario with bombs and predictors.
And so in this wild hypothetical situation where we’ve somehow (by fiat) become convinced that there’s an ancient predictor predicting our actions (which is way weirder than my girlfriend predicting my action, tbh), presumably one of our hypotheses for what’s going on is that the predictor correctly deduced what we would do, for the correct reasons. Like, that’s one of many hypotheses that could explain our past observations. And so, assuming that hypothesis, for the purpose of analysis—and recalling that I can analyze which action is best assuming the game is highball, without making any claim about how easy or hard it is to figure out whether the game is highball—assuming that hypothesis, what action would you prescribe?
Well… no, the scenario says “the predictor has predicted correctly 1 trillion trillion minus one times, and incorrectly one time”. Does that make it “likely to be accurate”? You tell me, I guess, but that seems like an unnecessarily vague characterization of a precise description.
Let’s be more precise, then, and speak in terms of “correctness” rather than “accuracy”. There are then two possibilities in the “bomb” scenario as stipulated:
The predictor thought I would take the right box, and was correct.
The predictor thought I would take the right box, and was incorrect.
Now, note the following interesting property of the above two possibilities: I get to choose which of them is realized. I cannot change what the predictor thought, nor can I change its actions conditional on its prediction (which in the stipulated case involves placing a bomb in the left box), but I can choose to make its prediction correct or incorrect, depending on whether I take the box it predicted I would take.
Observe now the following interesting corollary of the above argument: it implies the existence of a certain strategy, which we might call “ObeyBot”, which always chooses the action that confirms the predictor’s prediction, and does so regardless of payoffs: by hypothesis, ObeyBot does not care about burning to death, nor does it care about losing $100; it cares only for making the predictor right as often as it can.
Now, suppose I were to tell you that the predictor’s stipulated track record in this game (10^24 − 1 successes, and 1 failure) were achieved againsta population consisting (almost) entirely of ObeyBots. What would you make of this claim?
It should be quite obvious that there is only one conclusion you can draw: the predictor’s track record establishes absolutely nothing about its predictive capabilities in general. When faced with a population of agents that always do whatever the predictor says they will do, any “predictor”—from the superintelligent Omega to a random number generator—will achieve an accuracy of 100%.
This is an extremely important observation! What it shows is that the predictor’s claimed track record can mean very different things depending on what kind of agents it was playing against… which in turn means that the problem as stated is underdetermined: knowledge of the predictor’s track record does not by itself suffice to pin down the predictor’s actual behavior, unless we are also given information about what kind of agents the predictor was playing against.
Now, obviously whoever came up with the “bomb” scenario presumably did not want us to assume that their predictor is only accurate against ObeyBots and no one else. But to capture this notion requires more than merely stating the predictor’s track record; it requires the further supposition that the predictor’s accuracy is a persistent feature of the predictor, rather than of the agents it was playing against. It requires, in short, the very thing you criticized as an “unnecessarily vague characterization”: the statement that the predictor is, in fact, “likely to be accurate”.
With this as a prerequisite, we are now equipped to address your next question:
What do you mean by this? What’s contradictory about the predictor making a mistake? Clearly, it’s not perfect. We know this because it made at least one mistake in the past, and then another mistake just now.
(First, a brief recap: we are asked to assume the existence of a predictor with an extremely high level of predictive accuracy. Moreover, although the problem statement did not actually specify, it is likely we are meant to assume that this predictor’s accuracy persists across a large reference class of possible agent designs; specifically—and crucially—this reference class must be large enough to include our own decision algorithm, whatever it may be.)
Now, with this in mind, your (Said’s) question is: what could possibly be contradictory about this scenario? What could be contradictory about a scenario that asks us to assume the existence of a predictor that can accurately predict the behavior of a large reference class of agents, including ourselves?
At this point, it should be quite obvious where the potential contradiction lies: it arises directly from the fact that an agent who finds itself in the described scenario can choose to make the predictor right or wrong. Earlier, I exploited this property to construct a strategy (ObeyBot) which makes the predictor right no matter what; but now I will do the opposite, and construct a strategy that always chooses whichever action falsifies the predictor’s prediction. For thematic appropriateness, let us call this strategy “SpiteBot”.
I assert that it is impossible for any predictor to achieve any accuracy against SpiteBot higher than 0%. Therefore, if we assume that I myself am a SpiteBot, the contradiction becomes immediately clear: it is impossible for the predictor to achieve the stipulated predictive accuracy across any reference class of agents broad enough to include SpiteBot, and yet in order for the predictor’s reference class to include me, it must include SpiteBot, since I am a SpiteBot. Contradiction!
Of course, this only establishes that the problem statement is contradictory if the reader happens to be a SpiteBot. And while that might be somewhat amusing to imagine, it is unlikely that any of us (the intended audience of the problem) are SpiteBots, or use any decision theory akin to SpiteBot. So let us ask: are there any other decision theories that manage to achieve a similar effect—any other decision theories that, under at least some subset of circumstances, turn into SpiteBot, thereby rendering the problem formulation contradictory?
Asked that way, the question answers itself: any decision theory that behaves like SpiteBot at least some of the time, will generate contradictions with the problem statement at least some of the time. And (here is the key point) if those contradictions are to avoided, the hypothesized predictor must avoid whatever set of circumstances triggers the SpiteBot-like behavior, lest it falsify its own alleged predictive accuracy… which implies the following corollaries:
The predictor cannot show its face (or its boxes) before SpiteBot. Any native SpiteBots in this universe will therefore never find themselves in this particular “bomb” scenario, nor any relatives.
Any strategies that behave like SpiteBot some of the time, will not encounter any “bomb”-like scenarios during that time. For example, the strategy “Choose your actions normally except on Wednesdays; behave like SpiteBot on Wednesdays” will not encounter any bomb-like scenarios on Wednesday.
If you don’t like the idea of ending up in a “bomb”-like scenario, you should precommit to behaving like SpiteBot in the face of such a scenario. If you actually do this properly, you will never see a real predictor give you a bomb-like scenario, which means that the only time you will be faced with any kind of bomb-like scenario will be in your imagination, perhaps at the prompting of some inquisitive philosopher. At this point you are free to ask the philosopher how his stipulated predictor reacts to SpiteBot and SpiteBot-like strategies, and—if he fails to give you a convincing answer—you are free to dismiss his premise as contradictory, and his scenario as something you don’t have to worry about.
(The general case of “behaving like SpiteBot when confronted with situations you’d really rather not have encountered to begin with” is called “fuck-you decision theory”, or FDT for short.)
Yes, it was this exact objection that I addressed in my previous replies that relied upon a misreading of the problem. I missed that the boxes were open and thought that the only clue to the prediction was the note that was left.
The only solution was to assume that the predictor does not always leave a note, and this solution also works for the stated scenario. You see that the boxes are open, and the left one contains a bomb, but did everyone else? Did anyone else? The problem setup doesn’t say.
This sort of vagueness leaves holes big enough to drive a truck through. The stated FDT support for picking Left depends absolutely critically on the subjunctive dependency odds being at least many millions to one, and the stated evidence is nowhere near strong enough to support that. Failing that, FDT recommends picking Right.
So the whole scenario is pointless. It doesn’t explore what it was intended to explore.
You can modify the problem to say that the predictor really is that reliable for every agent, but doesn’t always leave the boxes open for you or write a note. This doesn’t mean that the predictor is perfectly reliable, so a SpiteBot can still face this scenario but is just extremely unlikely to.
It’s not complete enough to determine what I do when I don’t see a bomb. And so when the problem statement is corrected to stop flatly asserting consequences of my actions as if they’re facts, you’ll find that my behavior in the corrected problem is underdefined. (If this still isn’t clear, try working out what the predictor does to the agent that takes the bomb if it’s present, but pays the $100 if it isn’t.)
And if we’re really technical, it’s not actually quite complete enough to determine what I do when I see the bomb. That depends on what the predictor does when there are two consistent possible outcomes. Like, if I would go left when there was no bomb and right when there was a bomb, what would the predictor do? If they only place the bomb if I insist on going right when there’s no bomb, then I have no incentive to go left upon seeing a bomb insofar as they’re accurate. To force me to go left, the predictor has to be trigger-happy, dropping the bomb given the slightest opportunity.
Nitpick: you seem to be trying to say something like “let us sidestep questions of predictor mechanics and assume that your only observations were the game-history”. But that doesn’t work. If I wake up in a room with a bomb about to go off in the left exit and a $100 fee to use the right exit, I pay the $100 fee and use the right exit. I have to see a host of other specific observations to start hypothesizing the existence of a predictor who would stop placing bombs if-counterfactually I took them, before I believe I’m in this problem.
I’m happy to leave unspecified the prior knowledge that causes us to believe there’s a predictor, but I object to the claim that we’ve fully specified my epistemic state by letting me see the problem history.
I’m not going to ask whether the predictor is perfect, no. I’m going to assume she isn’t, and then do some calculations, and then assume she is, and do some other calculations. Because it’s often useful to solve decision problems using case-by-case analysis.
And in the second case, your problem statement is revealed to be a lie.
Now, it’s perfectly possible for you to rephrase the problem such that my analysis does not, in some cases, reveal it to be a lie. You could say “the predictor is on the fritz, and you see the bomb” (in which case I’ll go right, thanks). Or you could say “the predictor is 1% on the fritz and 99% accurate, and in the 1% case you see a bomb and in the 99% case you may or may not see a bomb depending what you do”. That’s also fine. But insofar as you flatly postulate a (almost certain) consequence of my action, be prepared for me to assert that you’re (almost certainly) wrong.
There’s impossibliity here precisely insofar as the predictor is accurate.
One intuition pump is: You say “but if-counterfactually you see the bomb on the left, and you take it, you die”. I protest “no, if-counterfactually I see the bomb on the left, then I die insofar as the predictor was on the fritz, and I get a puppy insofar as the predictor was accurate, by the principle of explosion. So I 1-in-a-hundred-trillion-trillion% die, and almost certanily get a puppy. Win.” Like, from my perspective, trying to analyze outcomes under assumptions that depend on my own actions can lead to nonsense.
(My model of a critic says “but surely you can see the intuition that, when you’re actually faced with the bomb, you actually shouldn’t take it?”. And ofc I have that intuition, as a human, I just think it lost in a neutral and fair arbitration process, but that’s a story for a different day. Speaking from the intuition that won, I’d say that it sure would be sad to take the bomb in real life, and that in real life you should spare no expense searching for the evidence that the predictor is on the fritz, which almost certainly would exist and let you avoid the bomb iff the predictor is on the fritz in real life, and if you run your real life right you should never once actually eat a bomb, but also yes, the correct intuition is that you take the bomb knowing it kills you, and that you roll your eyes at the person saying it was a bad move for roughly the same reason you roll your eyes at the person saying that ‘middle’ is never your best option in the high/low/extremeball game—such is the fate of people who are bound to submit a single action to be played against a mixture of games.)
Separately, I note that if you think an agent should behave very differently when a possibility is logically impossible, vs when they assess a probability on the order of 1 in a trillion trillion, then I suspect you’re doing something wrong. (Not least that you’re likely wrong about how much confidence a human can get in any particular logical fact.)
To check, do we have a disagreement about what to do insofar as the predictor is accurate? Like, in the case of the transparent Newcomb’s problem with a literally perfect predictor, would you one-box even if the box were empty?
I don’t understand this objection. The given scenario is that you do see a bomb. The question is: what do you do in the given scenario?
You are welcome to imagine any other scenarios you like, or talk about counterfactuals or what have you. But the scenario, as given, tells you that you know certain things and observe certain things. The scenario does not appear to be in any way impossible. “What do I do when I don’t see a bomb” seems irrelevant to the question, which posits that you do see a bomb.
Er, what? If you take the bomb, you burn to death. Given the scenario, that’s a fact. How can it not be a fact? (Except if the bomb happens to malfunction, or some such thing, which I assume is not what you mean…?)
Well, let’s see. The problem says:
So, if the predictor predicts that I will choose Right, she will put a bomb in Left, in which case I will choose Left. If she predicts that I will choose Left, then she puts no bomb in Left, in which case I will choose Right. This appears to be paradoxical, but that seems to me to be the predictor’s fault (for making an unconditional prediction of the behavior of an agent that will certainly condition its behavior on the prediction), and thus the predictor’s problem.
I… don’t see what bearing this has on the disagreement, though.
What I am saying is that we don’t have access to “questions of predictor mechanics”, only to the agent’s knowledge of “predictor mechanics”. In other words, we’ve fully specified your epistemic state by specifying your epistemic state—that’s all. I don’t know what you mean by calling it “the problem history”. There’s nothing odd about knowing (to some degree of certainty) that certain things have happened. You know there’s a (supposed) predictor, you know that she has (apparently) made such-and-such predictions, this many times, with these-and-such outcomes, etc. What are her “mechanics”? Well, you’re welcome to draw any conclusions about that from what you know about what’s gone before. Again, there is nothing unusual going on here.
I’m sorry, but this really makes very little sense to me. We know the predictor is not quite perfect. That’s one of the things we’re told as a known fact. She’s pretty close to perfect, but not quite. There’s just the one “case” here, and that’s it…
Indeed, the entire “your problem statement is revealed to be a lie” approach seems to me to be nonsensical. This is the given scenario. If you assume that the predictor is perfect and, from that, conclude that the scenario couldn’t’ve come to pass, doesn’t that entail that, by construction, the predictor isn’t perfect (proof by contradiction, as it were)? But then… we already know she isn’t perfect, so what was the point of assuming otherwise?
Well, we know she’s at least slightly inaccurate… obviously, because she got it wrong this time! We know she can get it wrong, because, by construction, she did get it wrong. (Also, of course, we know she’s gotten it wrong before, but that merely overdetermines the conclusion that she’s not quite perfect.)
Well, no. In the given scenario, you do see the bomb on the left. There’s no counterfactuals involved.
But you’re not bound to submit a single action to be played against a mixture of games. In the given scenario, you can choose your action knowing which game you’re playing!
… or, you could just… choose Right. That seems to me to be a clear win.
If an agent finds themselves in a scenario that they think is logically impossible, then they’re obviously wrong about that scenario being logically impossible, as demonstrated by the fact that actually, it did come to pass. So finding oneself in such a scenario should cause one to immediately re-examine one’s beliefs on what is, and is not, logically impossible, and/or one’s beliefs about what scenario one finds oneself in…
How would I come to the belief that the predictor is literally perfect (or even mostly perfect)? Wouldn’t that just look like a string of observations of cases where either (a) there’s money in the one box and the agent takes the one box full of money, or (b) there’s no money in the one box and the agent takes the two boxes (and gets only the lesser quantity)? How exactly would I distinguish between the “perfect predictor” hypothesis and the “people take the one box if it has a million dollars, two boxes otherwise” hypothesis, as an explanation for those observations? (Or, to put it another way, what do I think happens to people who take an empty one box? Well, I can’t have any information about that, right? [Beyond the common sense of “obviously, they get zilch”…] Because if that had ever happened before, well, there goes the “perfect predictor” hypothesis, yes?)
The scenario says “the predictor is likely to be accurate” and then makes an assertion that is (for me, at least) false insofar as the predictor is accurate. You can’t have it both ways. The problem statement (at least partially) contradicts itself. You and I have a disagreement about how to evaluate counterfactuals in cases where the problem statement is partly-self-contradictory.
Sure, it’s the predictor’s problem, and the behavior that I expect of the predictor in the case that I force them to notice they have a problem has a direct effect on what I do if I don’t see a bomb. In particular, if they reward me for showing them their problem, then I’d go right when I see no bomb, whereas if they’d smite me, then I wouldn’t. But you’re right that this is inconsequential when I do see the bomb.
Well for one thing, I just looked and there’s no bomb on the left, so the whole discussion is counter-to-fact. And for another, if I pretend I’m in the scenario, then I choose my action by visualizing the (counterfactual) consequences of taking the bomb, and visualizing the (counterfactual) consequences of refraining. So there are plenty of counterfactuals involved.
I assert that the (counterfactual) consequences of taking the bomb include (almost certainly) rendering the whole scenario impossible, and rendering some other hypothetical (that I don’t need to pay $100 to leave) possible instead. And so according to me, the correct response to someone saying “assume you see the bomb” is to say “no, I shall assume that I see no bomb instead”, because that’s the consequence I visualize of (counterfactually) taking the bomb.
You’re welcome to test it empirically (well, maybe after adding at least $1k to all outcomes to incentivise me to play), if you have an all-but-one-in-a-trillion-trillion accurate predictor-of-me lying around. (I expect your empericism will prove me right, in that the counterfactuals where it shows me a bomb are all in fact rendered impossible, and what happens in reality instead is that I get to leave with $900).
...Also, just to be clear, you’re aware that these are two different internally-consistent but contradictory theories of how to evaluate counterfactuals? Like, we can be pretty confident that there’s no argument a CDT agent can hear that causes them to wish to adopt FDT counterfactuals, and vice versa. Humans come equipped with both intuitions (I can give you other problems that pump the other intuitions, if you’d like), and we have to find some other way to arbitrate the conflict.
(Currently I’m in a mode of attempting to convey the second set of intuitions to you in this one problem, in light of your self-professed difficulty grasping them. Feel free to demonstrate understanding of the second set of intuitions and request that we instead switch to discussing why I think the latter wins in arbitration.)
I’m asking you to do a case-analysis. For concreteness, suppose you get to inspect the predictor’s computing substrate and source code, and you do a bunch of pondering and thinking, and you’re like “hmm, yes, after lots of careful consideration, I now think that I am 90% likely to be facing a transparent Newcomb’s problem, and 10% likely to be in some other situation, eg, where the code doesn’t work how I think it does, or cosmic rays hit the computer, or what-have-you”. Now we agree that one possible thing that could have happened, is that the predictor accurately predicted you in the past, and we can freely discuss what you should do insofar as that’s the case.
Like, presumably when I present you with the high/low/extremeball game, you have no problem granting statements like “insofar as the die came up ‘highball’, I should choose ‘high’”. Saying “but how could I possibly have come to the epistemic state where I was 100% certain that the die came up ‘highball’???” is a nonsequitur. Your analysis in the face of uncertainty can be decomposed into an aggregation of your analyses in each scenario that might obtain. And so I ask again, insofar as the predictor accurately predicted you—which is surely one of many possible cases to analyze, after examining the mind of the entity that sure looks like it predicted you—what action would you prescribe?
Well… no, the scenario says “the predictor has predicted correctly 1 trillion trillion minus one times, and incorrectly one time”. Does that make it “likely to be accurate”? You tell me, I guess, but that seems like an unnecessarily vague characterization of a precise description.
What do you mean by this? What’s contradictory about the predictor making a mistake? Clearly, it’s not perfect. We know this because it made at least one mistake in the past, and then another mistake just now.
Is the predictor “accurate”? Well, it’s approximately as accurate as it takes to guess 1 trillion trillion times and only be wrong once…
I confess that this reads like moon logic to me. It’s possible that there’s something fundamental I don’t understand about what you’re saying.
I am not familiar with this, no. If you have explanatory material / intuition pumps / etc. to illustrate this, I’d certainly appreciate it!
I am not asking how I could come to believe the “literally perfect predictor” thing with 100% certainty; I am asking how I could come to believe it at all (with, let’s say, > 50% certainty).
Hold on, hold on. Are we talking about repeated plays of the same game? Where I face the same situation repeatedly? Or are we talking about observing (or learning about) the predictor playing the game with other people before me?
The “Bomb” scenario described in the OP says nothing about repeated play. If that’s an assumption you’re introducing, I think it needs to be made explicit…
I deny that we have a precise description. If you listed out a specific trillion trillion observations that I allegedly made, then we could talk about whether those particular observations justify thinking that we’re in the game with the bomb. (If those trillion trillion observations were all from me waking up in a strange room and interacting with it, with no other context, then as noted above, I would have no reason to believe I’m in this game as opposed to any variety of other games consistent with those observations.) The scenario vaguely alleges that we think we’re facing an accurate predictor, and then alleges that their observed failure rate (on an unspecified history against unspecified players) is 1 per trillion-trillion. It does not say how or why we got into the epistemic state of thinking that there’s an accurate predictor there; we assume this by fiat.
(To be clear, I’m fine with assuming this by fiat. I’m simply arguing that your reluctance to analyze the problem by cases seems strange and likely erroneous to me.)
The canonical explanatory text is the FDT paper (PDF warning) (that the OP is responding to a critique of, iirc), and there’s a bunch of literature on LW (maybe start at the wiki page on UDT? Hopefully we have one of those) exploring various intuitions. If you’re not familiar with this style of logic, I recommend starting there (ah look we do have a UDT wiki page). I might write up some fresh intuition pumps later, to try to improve the exposition. (We’ve sure got a lot of exposition if you dig through the archives, but I think there are still a bunch of gaps.)
Uh, I mean, we could play a version of this game where the sums were positive (such that you’d want to play) and smaller (such that I could fund the prizes), and then I could say “yo Said Achmiz, I decided to act as a predictor of you in this game, I’m about to show you either a box with $10 and a box with $1, or an empty box and a box with $1, depending on how I predicted you’ll behave”. And then I think that at least the other readers on this post would predict that I have >50% probability of predicting your behavior correctly. You haven’t exactly been subtle about how you make decisions.
Predicting someone accurately with >50% likelihood isn’t all that hard. A coin predicts you “perfectly accurately” 50% of the time, you’ve only gotta do slightly better than that.
I suspect you’re confused here in putting “perfect prediction” on some big pedestal. the predictor is trying to reason about you, and they either reason basically-correctly, or their reasoning includes some mistake. And insofar as they reason validly to a conclusion about the right person, they’re going to get the answer right.
And again, note that you’re not exactly a tricky case yourself. You’re kinda broadcasting your decision-making procedure all over the internet. You don’t have to be a supermind studying a brainscan to figure out that Said Achmiz takes the extra $1.
My girlfriend often reasons accurately about whether I’m getting dinner tonight. Sometimes she reasons wrongly about it (eg, she neglects that I had a late lunch) but gets the right answer by chance anyway. Sometimes she reasons wrongly about it and gets the wrong answer by chance. But often she reasons correctly about it, and gets the right answer for the right reasons. And if I’m standing in the supermarket wondering whether she already ate or whether I should buy enough cheese to make her some food too, I have no trouble thinking “well, insofar as she got the right answer for the right reasons tonight, she knew I’d be hungry when I got home, and so she hasn’t eaten yet”. None of this case-analysis requires me to believe she’s a supermind who studied a scan of my brain for twelve thousand years. Believing that she was right about me for the right reasons is just not a very difficult epistemic state to enter. My reasons for doing most of what I do just aren’t all that complicated. It’s often not all that tricky to draw the right conclusions about what people do for the right reasons.
Furthermore, I note that in the high/low/extremeball game, I have no trouble answering questions about what I should do insofar as the game is highball, even though I have <50% (and in fact ~33%) probability on that being the case. In fact, I could analyze payouts insofar as the game is highball, even if the game was only 0.1% likely to be highball! In fact, the probability of me actually believing I face a given decision problem is almost irrelevant to my ability to analyze the payouts of different actions. As evidenced in part by the fact that I am prescribing actions in this incredibly implausible scenario with bombs and predictors.
And so in this wild hypothetical situation where we’ve somehow (by fiat) become convinced that there’s an ancient predictor predicting our actions (which is way weirder than my girlfriend predicting my action, tbh), presumably one of our hypotheses for what’s going on is that the predictor correctly deduced what we would do, for the correct reasons. Like, that’s one of many hypotheses that could explain our past observations. And so, assuming that hypothesis, for the purpose of analysis—and recalling that I can analyze which action is best assuming the game is highball, without making any claim about how easy or hard it is to figure out whether the game is highball—assuming that hypothesis, what action would you prescribe?
Let’s be more precise, then, and speak in terms of “correctness” rather than “accuracy”. There are then two possibilities in the “bomb” scenario as stipulated:
The predictor thought I would take the right box, and was correct.
The predictor thought I would take the right box, and was incorrect.
Now, note the following interesting property of the above two possibilities: I get to choose which of them is realized. I cannot change what the predictor thought, nor can I change its actions conditional on its prediction (which in the stipulated case involves placing a bomb in the left box), but I can choose to make its prediction correct or incorrect, depending on whether I take the box it predicted I would take.
Observe now the following interesting corollary of the above argument: it implies the existence of a certain strategy, which we might call “ObeyBot”, which always chooses the action that confirms the predictor’s prediction, and does so regardless of payoffs: by hypothesis, ObeyBot does not care about burning to death, nor does it care about losing $100; it cares only for making the predictor right as often as it can.
Now, suppose I were to tell you that the predictor’s stipulated track record in this game (10^24 − 1 successes, and 1 failure) were achieved against a population consisting (almost) entirely of ObeyBots. What would you make of this claim?
It should be quite obvious that there is only one conclusion you can draw: the predictor’s track record establishes absolutely nothing about its predictive capabilities in general. When faced with a population of agents that always do whatever the predictor says they will do, any “predictor”—from the superintelligent Omega to a random number generator—will achieve an accuracy of 100%.
This is an extremely important observation! What it shows is that the predictor’s claimed track record can mean very different things depending on what kind of agents it was playing against… which in turn means that the problem as stated is underdetermined: knowledge of the predictor’s track record does not by itself suffice to pin down the predictor’s actual behavior, unless we are also given information about what kind of agents the predictor was playing against.
Now, obviously whoever came up with the “bomb” scenario presumably did not want us to assume that their predictor is only accurate against ObeyBots and no one else. But to capture this notion requires more than merely stating the predictor’s track record; it requires the further supposition that the predictor’s accuracy is a persistent feature of the predictor, rather than of the agents it was playing against. It requires, in short, the very thing you criticized as an “unnecessarily vague characterization”: the statement that the predictor is, in fact, “likely to be accurate”.
With this as a prerequisite, we are now equipped to address your next question:
(First, a brief recap: we are asked to assume the existence of a predictor with an extremely high level of predictive accuracy. Moreover, although the problem statement did not actually specify, it is likely we are meant to assume that this predictor’s accuracy persists across a large reference class of possible agent designs; specifically—and crucially—this reference class must be large enough to include our own decision algorithm, whatever it may be.)
Now, with this in mind, your (Said’s) question is: what could possibly be contradictory about this scenario? What could be contradictory about a scenario that asks us to assume the existence of a predictor that can accurately predict the behavior of a large reference class of agents, including ourselves?
At this point, it should be quite obvious where the potential contradiction lies: it arises directly from the fact that an agent who finds itself in the described scenario can choose to make the predictor right or wrong. Earlier, I exploited this property to construct a strategy (ObeyBot) which makes the predictor right no matter what; but now I will do the opposite, and construct a strategy that always chooses whichever action falsifies the predictor’s prediction. For thematic appropriateness, let us call this strategy “SpiteBot”.
I assert that it is impossible for any predictor to achieve any accuracy against SpiteBot higher than 0%. Therefore, if we assume that I myself am a SpiteBot, the contradiction becomes immediately clear: it is impossible for the predictor to achieve the stipulated predictive accuracy across any reference class of agents broad enough to include SpiteBot, and yet in order for the predictor’s reference class to include me, it must include SpiteBot, since I am a SpiteBot. Contradiction!
Of course, this only establishes that the problem statement is contradictory if the reader happens to be a SpiteBot. And while that might be somewhat amusing to imagine, it is unlikely that any of us (the intended audience of the problem) are SpiteBots, or use any decision theory akin to SpiteBot. So let us ask: are there any other decision theories that manage to achieve a similar effect—any other decision theories that, under at least some subset of circumstances, turn into SpiteBot, thereby rendering the problem formulation contradictory?
Asked that way, the question answers itself: any decision theory that behaves like SpiteBot at least some of the time, will generate contradictions with the problem statement at least some of the time. And (here is the key point) if those contradictions are to avoided, the hypothesized predictor must avoid whatever set of circumstances triggers the SpiteBot-like behavior, lest it falsify its own alleged predictive accuracy… which implies the following corollaries:
The predictor cannot show its face (or its boxes) before SpiteBot. Any native SpiteBots in this universe will therefore never find themselves in this particular “bomb” scenario, nor any relatives.
Any strategies that behave like SpiteBot some of the time, will not encounter any “bomb”-like scenarios during that time. For example, the strategy “Choose your actions normally except on Wednesdays; behave like SpiteBot on Wednesdays” will not encounter any bomb-like scenarios on Wednesday.
If you don’t like the idea of ending up in a “bomb”-like scenario, you should precommit to behaving like SpiteBot in the face of such a scenario. If you actually do this properly, you will never see a real predictor give you a bomb-like scenario, which means that the only time you will be faced with any kind of bomb-like scenario will be in your imagination, perhaps at the prompting of some inquisitive philosopher. At this point you are free to ask the philosopher how his stipulated predictor reacts to SpiteBot and SpiteBot-like strategies, and—if he fails to give you a convincing answer—you are free to dismiss his premise as contradictory, and his scenario as something you don’t have to worry about.
(The general case of “behaving like SpiteBot when confronted with situations you’d really rather not have encountered to begin with” is called “fuck-you decision theory”, or FDT for short.)
Yes, it was this exact objection that I addressed in my previous replies that relied upon a misreading of the problem. I missed that the boxes were open and thought that the only clue to the prediction was the note that was left.
The only solution was to assume that the predictor does not always leave a note, and this solution also works for the stated scenario. You see that the boxes are open, and the left one contains a bomb, but did everyone else? Did anyone else? The problem setup doesn’t say.
This sort of vagueness leaves holes big enough to drive a truck through. The stated FDT support for picking Left depends absolutely critically on the subjunctive dependency odds being at least many millions to one, and the stated evidence is nowhere near strong enough to support that. Failing that, FDT recommends picking Right.
So the whole scenario is pointless. It doesn’t explore what it was intended to explore.
You can modify the problem to say that the predictor really is that reliable for every agent, but doesn’t always leave the boxes open for you or write a note. This doesn’t mean that the predictor is perfectly reliable, so a SpiteBot can still face this scenario but is just extremely unlikely to.