The scenario says “the predictor is likely to be accurate”
Well… no, the scenario says “the predictor has predicted correctly 1 trillion trillion minus one times, and incorrectly one time”. Does that make it “likely to be accurate”? You tell me, I guess, but that seems like an unnecessarily vague characterization of a precise description.
and then makes an assertion that is (for me, at least) false insofar as the predictor is accurate. You can’t have it both ways. The problem statement (at least partially) contradicts itself.
What do you mean by this? What’s contradictory about the predictor making a mistake? Clearly, it’s not perfect. We know this because it made at least one mistake in the past, and then another mistake just now.
Is the predictor “accurate”? Well, it’s approximately as accurate as it takes to guess 1 trillion trillion times and only be wrong once…
I assert that the (counterfactual) consequences of taking the bomb include (almost certainly) rendering the whole scenario impossible, and rendering some other hypothetical (that I don’t need to pay $100 to leave) possible instead. And so according to me, the correct response to someone saying “assume you see the bomb” is to say “no, I shall assume that I see no bomb instead”, because that’s the consequence I visualize of (counterfactually) taking the bomb.
I confess that this reads like moon logic to me. It’s possible that there’s something fundamental I don’t understand about what you’re saying.
...Also, just to be clear, you’re aware that these are two different internally-consistent but contradictory theories of how to evaluate counterfactuals? Like, we can be pretty confident that there’s no argument a CDT agent can hear that causes them to wish to adopt FDT counterfactuals, and vice versa. Humans come equipped with both intuitions (I can give you other problems that pump the other intuitions, if you’d like), and we have to find some other way to arbitrate the conflict.
I am not familiar with this, no. If you have explanatory material / intuition pumps / etc. to illustrate this, I’d certainly appreciate it!
Saying “but how could I possibly have come to the epistemic state where I was 100% certain that the die came up ‘highball’???” is a nonsequitur.
I am not asking how I could come to believe the “literally perfect predictor” thing with 100% certainty; I am asking how I could come to believe it at all (with, let’s say, > 50% certainty).
Now we agree that one possible thing that could have happened, is that the predictor accurately predicted you in the past, and we can freely discuss what you should do insofar as that’s the case.
Hold on, hold on. Are we talking about repeated plays of the same game? Where I face the same situation repeatedly? Or are we talking about observing (or learning about) the predictor playing the game with other people before me?
The “Bomb” scenario described in the OP says nothing about repeated play. If that’s an assumption you’re introducing, I think it needs to be made explicit…
that seems like an unnecessarily vague characterization of a precise description
I deny that we have a precise description. If you listed out a specific trillion trillion observations that I allegedly made, then we could talk about whether those particular observations justify thinking that we’re in the game with the bomb. (If those trillion trillion observations were all from me waking up in a strange room and interacting with it, with no other context, then as noted above, I would have no reason to believe I’m in this game as opposed to any variety of other games consistent with those observations.) The scenario vaguely alleges that we think we’re facing an accurate predictor, and then alleges that their observed failure rate (on an unspecified history against unspecified players) is 1 per trillion-trillion. It does not say how or why we got into the epistemic state of thinking that there’s an accurate predictor there; we assume this by fiat.
(To be clear, I’m fine with assuming this by fiat. I’m simply arguing that your reluctance to analyze the problem by cases seems strange and likely erroneous to me.)
I am not familiar with this, no. If you have explanatory material / intuition pumps / etc. to illustrate this, I’d certainly appreciate it!
The canonical explanatory text is the FDT paper (PDF warning) (that the OP is responding to a critique of, iirc), and there’s a bunch of literature on LW (maybe start at the wiki page on UDT? Hopefully we have one of those) exploring various intuitions. If you’re not familiar with this style of logic, I recommend starting there (ah look we do have a UDT wiki page). I might write up some fresh intuition pumps later, to try to improve the exposition. (We’ve sure got a lot of exposition if you dig through the archives, but I think there are still a bunch of gaps.)
I am not asking how I could come to believe the “literally perfect predictor” thing with 100% certainty; I am asking how I could come to believe it at all (with, let’s say, > 50% certainty).
Uh, I mean, we could play a version of this game where the sums were positive (such that you’d want to play) and smaller (such that I could fund the prizes), and then I could say “yo Said Achmiz, I decided to act as a predictor of you in this game, I’m about to show you either a box with $10 and a box with $1, or an empty box and a box with $1, depending on how I predicted you’ll behave”. And then I think that at least the other readers on this post would predict that I have >50% probability of predicting your behavior correctly. You haven’t exactly been subtle about how you make decisions.
Predicting someone accurately with >50% likelihood isn’t all that hard. A coin predicts you “perfectly accurately” 50% of the time, you’ve only gotta do slightly better than that.
I suspect you’re confused here in putting “perfect prediction” on some big pedestal. the predictor is trying to reason about you, and they either reason basically-correctly, or their reasoning includes some mistake. And insofar as they reason validly to a conclusion about the right person, they’re going to get the answer right.
And again, note that you’re not exactly a tricky case yourself. You’re kinda broadcasting your decision-making procedure all over the internet. You don’t have to be a supermind studying a brainscan to figure out that Said Achmiz takes the extra $1.
My girlfriend often reasons accurately about whether I’m getting dinner tonight. Sometimes she reasons wrongly about it (eg, she neglects that I had a late lunch) but gets the right answer by chance anyway. Sometimes she reasons wrongly about it and gets the wrong answer by chance. But often she reasons correctly about it, and gets the right answer for the right reasons. And if I’m standing in the supermarket wondering whether she already ate or whether I should buy enough cheese to make her some food too, I have no trouble thinking “well, insofar as she got the right answer for the right reasons tonight, she knew I’d be hungry when I got home, and so she hasn’t eaten yet”. None of this case-analysis requires me to believe she’s a supermind who studied a scan of my brain for twelve thousand years. Believing that she was right about me for the right reasons is just not a very difficult epistemic state to enter. My reasons for doing most of what I do just aren’t all that complicated. It’s often not all that tricky to draw the right conclusions about what people do for the right reasons.
Furthermore, I note that in the high/low/extremeball game, I have no trouble answering questions about what I should do insofar as the game is highball, even though I have <50% (and in fact ~33%) probability on that being the case. In fact, I could analyze payouts insofar as the game is highball, even if the game was only 0.1% likely to be highball! In fact, the probability of me actually believing I face a given decision problem is almost irrelevant to my ability to analyze the payouts of different actions. As evidenced in part by the fact that I am prescribing actions in this incredibly implausible scenario with bombs and predictors.
And so in this wild hypothetical situation where we’ve somehow (by fiat) become convinced that there’s an ancient predictor predicting our actions (which is way weirder than my girlfriend predicting my action, tbh), presumably one of our hypotheses for what’s going on is that the predictor correctly deduced what we would do, for the correct reasons. Like, that’s one of many hypotheses that could explain our past observations. And so, assuming that hypothesis, for the purpose of analysis—and recalling that I can analyze which action is best assuming the game is highball, without making any claim about how easy or hard it is to figure out whether the game is highball—assuming that hypothesis, what action would you prescribe?
Well… no, the scenario says “the predictor has predicted correctly 1 trillion trillion minus one times, and incorrectly one time”. Does that make it “likely to be accurate”? You tell me, I guess, but that seems like an unnecessarily vague characterization of a precise description.
Let’s be more precise, then, and speak in terms of “correctness” rather than “accuracy”. There are then two possibilities in the “bomb” scenario as stipulated:
The predictor thought I would take the right box, and was correct.
The predictor thought I would take the right box, and was incorrect.
Now, note the following interesting property of the above two possibilities: I get to choose which of them is realized. I cannot change what the predictor thought, nor can I change its actions conditional on its prediction (which in the stipulated case involves placing a bomb in the left box), but I can choose to make its prediction correct or incorrect, depending on whether I take the box it predicted I would take.
Observe now the following interesting corollary of the above argument: it implies the existence of a certain strategy, which we might call “ObeyBot”, which always chooses the action that confirms the predictor’s prediction, and does so regardless of payoffs: by hypothesis, ObeyBot does not care about burning to death, nor does it care about losing $100; it cares only for making the predictor right as often as it can.
Now, suppose I were to tell you that the predictor’s stipulated track record in this game (10^24 − 1 successes, and 1 failure) were achieved againsta population consisting (almost) entirely of ObeyBots. What would you make of this claim?
It should be quite obvious that there is only one conclusion you can draw: the predictor’s track record establishes absolutely nothing about its predictive capabilities in general. When faced with a population of agents that always do whatever the predictor says they will do, any “predictor”—from the superintelligent Omega to a random number generator—will achieve an accuracy of 100%.
This is an extremely important observation! What it shows is that the predictor’s claimed track record can mean very different things depending on what kind of agents it was playing against… which in turn means that the problem as stated is underdetermined: knowledge of the predictor’s track record does not by itself suffice to pin down the predictor’s actual behavior, unless we are also given information about what kind of agents the predictor was playing against.
Now, obviously whoever came up with the “bomb” scenario presumably did not want us to assume that their predictor is only accurate against ObeyBots and no one else. But to capture this notion requires more than merely stating the predictor’s track record; it requires the further supposition that the predictor’s accuracy is a persistent feature of the predictor, rather than of the agents it was playing against. It requires, in short, the very thing you criticized as an “unnecessarily vague characterization”: the statement that the predictor is, in fact, “likely to be accurate”.
With this as a prerequisite, we are now equipped to address your next question:
What do you mean by this? What’s contradictory about the predictor making a mistake? Clearly, it’s not perfect. We know this because it made at least one mistake in the past, and then another mistake just now.
(First, a brief recap: we are asked to assume the existence of a predictor with an extremely high level of predictive accuracy. Moreover, although the problem statement did not actually specify, it is likely we are meant to assume that this predictor’s accuracy persists across a large reference class of possible agent designs; specifically—and crucially—this reference class must be large enough to include our own decision algorithm, whatever it may be.)
Now, with this in mind, your (Said’s) question is: what could possibly be contradictory about this scenario? What could be contradictory about a scenario that asks us to assume the existence of a predictor that can accurately predict the behavior of a large reference class of agents, including ourselves?
At this point, it should be quite obvious where the potential contradiction lies: it arises directly from the fact that an agent who finds itself in the described scenario can choose to make the predictor right or wrong. Earlier, I exploited this property to construct a strategy (ObeyBot) which makes the predictor right no matter what; but now I will do the opposite, and construct a strategy that always chooses whichever action falsifies the predictor’s prediction. For thematic appropriateness, let us call this strategy “SpiteBot”.
I assert that it is impossible for any predictor to achieve any accuracy against SpiteBot higher than 0%. Therefore, if we assume that I myself am a SpiteBot, the contradiction becomes immediately clear: it is impossible for the predictor to achieve the stipulated predictive accuracy across any reference class of agents broad enough to include SpiteBot, and yet in order for the predictor’s reference class to include me, it must include SpiteBot, since I am a SpiteBot. Contradiction!
Of course, this only establishes that the problem statement is contradictory if the reader happens to be a SpiteBot. And while that might be somewhat amusing to imagine, it is unlikely that any of us (the intended audience of the problem) are SpiteBots, or use any decision theory akin to SpiteBot. So let us ask: are there any other decision theories that manage to achieve a similar effect—any other decision theories that, under at least some subset of circumstances, turn into SpiteBot, thereby rendering the problem formulation contradictory?
Asked that way, the question answers itself: any decision theory that behaves like SpiteBot at least some of the time, will generate contradictions with the problem statement at least some of the time. And (here is the key point) if those contradictions are to avoided, the hypothesized predictor must avoid whatever set of circumstances triggers the SpiteBot-like behavior, lest it falsify its own alleged predictive accuracy… which implies the following corollaries:
The predictor cannot show its face (or its boxes) before SpiteBot. Any native SpiteBots in this universe will therefore never find themselves in this particular “bomb” scenario, nor any relatives.
Any strategies that behave like SpiteBot some of the time, will not encounter any “bomb”-like scenarios during that time. For example, the strategy “Choose your actions normally except on Wednesdays; behave like SpiteBot on Wednesdays” will not encounter any bomb-like scenarios on Wednesday.
If you don’t like the idea of ending up in a “bomb”-like scenario, you should precommit to behaving like SpiteBot in the face of such a scenario. If you actually do this properly, you will never see a real predictor give you a bomb-like scenario, which means that the only time you will be faced with any kind of bomb-like scenario will be in your imagination, perhaps at the prompting of some inquisitive philosopher. At this point you are free to ask the philosopher how his stipulated predictor reacts to SpiteBot and SpiteBot-like strategies, and—if he fails to give you a convincing answer—you are free to dismiss his premise as contradictory, and his scenario as something you don’t have to worry about.
(The general case of “behaving like SpiteBot when confronted with situations you’d really rather not have encountered to begin with” is called “fuck-you decision theory”, or FDT for short.)
Yes, it was this exact objection that I addressed in my previous replies that relied upon a misreading of the problem. I missed that the boxes were open and thought that the only clue to the prediction was the note that was left.
The only solution was to assume that the predictor does not always leave a note, and this solution also works for the stated scenario. You see that the boxes are open, and the left one contains a bomb, but did everyone else? Did anyone else? The problem setup doesn’t say.
This sort of vagueness leaves holes big enough to drive a truck through. The stated FDT support for picking Left depends absolutely critically on the subjunctive dependency odds being at least many millions to one, and the stated evidence is nowhere near strong enough to support that. Failing that, FDT recommends picking Right.
So the whole scenario is pointless. It doesn’t explore what it was intended to explore.
You can modify the problem to say that the predictor really is that reliable for every agent, but doesn’t always leave the boxes open for you or write a note. This doesn’t mean that the predictor is perfectly reliable, so a SpiteBot can still face this scenario but is just extremely unlikely to.
Well… no, the scenario says “the predictor has predicted correctly 1 trillion trillion minus one times, and incorrectly one time”. Does that make it “likely to be accurate”? You tell me, I guess, but that seems like an unnecessarily vague characterization of a precise description.
What do you mean by this? What’s contradictory about the predictor making a mistake? Clearly, it’s not perfect. We know this because it made at least one mistake in the past, and then another mistake just now.
Is the predictor “accurate”? Well, it’s approximately as accurate as it takes to guess 1 trillion trillion times and only be wrong once…
I confess that this reads like moon logic to me. It’s possible that there’s something fundamental I don’t understand about what you’re saying.
I am not familiar with this, no. If you have explanatory material / intuition pumps / etc. to illustrate this, I’d certainly appreciate it!
I am not asking how I could come to believe the “literally perfect predictor” thing with 100% certainty; I am asking how I could come to believe it at all (with, let’s say, > 50% certainty).
Hold on, hold on. Are we talking about repeated plays of the same game? Where I face the same situation repeatedly? Or are we talking about observing (or learning about) the predictor playing the game with other people before me?
The “Bomb” scenario described in the OP says nothing about repeated play. If that’s an assumption you’re introducing, I think it needs to be made explicit…
I deny that we have a precise description. If you listed out a specific trillion trillion observations that I allegedly made, then we could talk about whether those particular observations justify thinking that we’re in the game with the bomb. (If those trillion trillion observations were all from me waking up in a strange room and interacting with it, with no other context, then as noted above, I would have no reason to believe I’m in this game as opposed to any variety of other games consistent with those observations.) The scenario vaguely alleges that we think we’re facing an accurate predictor, and then alleges that their observed failure rate (on an unspecified history against unspecified players) is 1 per trillion-trillion. It does not say how or why we got into the epistemic state of thinking that there’s an accurate predictor there; we assume this by fiat.
(To be clear, I’m fine with assuming this by fiat. I’m simply arguing that your reluctance to analyze the problem by cases seems strange and likely erroneous to me.)
The canonical explanatory text is the FDT paper (PDF warning) (that the OP is responding to a critique of, iirc), and there’s a bunch of literature on LW (maybe start at the wiki page on UDT? Hopefully we have one of those) exploring various intuitions. If you’re not familiar with this style of logic, I recommend starting there (ah look we do have a UDT wiki page). I might write up some fresh intuition pumps later, to try to improve the exposition. (We’ve sure got a lot of exposition if you dig through the archives, but I think there are still a bunch of gaps.)
Uh, I mean, we could play a version of this game where the sums were positive (such that you’d want to play) and smaller (such that I could fund the prizes), and then I could say “yo Said Achmiz, I decided to act as a predictor of you in this game, I’m about to show you either a box with $10 and a box with $1, or an empty box and a box with $1, depending on how I predicted you’ll behave”. And then I think that at least the other readers on this post would predict that I have >50% probability of predicting your behavior correctly. You haven’t exactly been subtle about how you make decisions.
Predicting someone accurately with >50% likelihood isn’t all that hard. A coin predicts you “perfectly accurately” 50% of the time, you’ve only gotta do slightly better than that.
I suspect you’re confused here in putting “perfect prediction” on some big pedestal. the predictor is trying to reason about you, and they either reason basically-correctly, or their reasoning includes some mistake. And insofar as they reason validly to a conclusion about the right person, they’re going to get the answer right.
And again, note that you’re not exactly a tricky case yourself. You’re kinda broadcasting your decision-making procedure all over the internet. You don’t have to be a supermind studying a brainscan to figure out that Said Achmiz takes the extra $1.
My girlfriend often reasons accurately about whether I’m getting dinner tonight. Sometimes she reasons wrongly about it (eg, she neglects that I had a late lunch) but gets the right answer by chance anyway. Sometimes she reasons wrongly about it and gets the wrong answer by chance. But often she reasons correctly about it, and gets the right answer for the right reasons. And if I’m standing in the supermarket wondering whether she already ate or whether I should buy enough cheese to make her some food too, I have no trouble thinking “well, insofar as she got the right answer for the right reasons tonight, she knew I’d be hungry when I got home, and so she hasn’t eaten yet”. None of this case-analysis requires me to believe she’s a supermind who studied a scan of my brain for twelve thousand years. Believing that she was right about me for the right reasons is just not a very difficult epistemic state to enter. My reasons for doing most of what I do just aren’t all that complicated. It’s often not all that tricky to draw the right conclusions about what people do for the right reasons.
Furthermore, I note that in the high/low/extremeball game, I have no trouble answering questions about what I should do insofar as the game is highball, even though I have <50% (and in fact ~33%) probability on that being the case. In fact, I could analyze payouts insofar as the game is highball, even if the game was only 0.1% likely to be highball! In fact, the probability of me actually believing I face a given decision problem is almost irrelevant to my ability to analyze the payouts of different actions. As evidenced in part by the fact that I am prescribing actions in this incredibly implausible scenario with bombs and predictors.
And so in this wild hypothetical situation where we’ve somehow (by fiat) become convinced that there’s an ancient predictor predicting our actions (which is way weirder than my girlfriend predicting my action, tbh), presumably one of our hypotheses for what’s going on is that the predictor correctly deduced what we would do, for the correct reasons. Like, that’s one of many hypotheses that could explain our past observations. And so, assuming that hypothesis, for the purpose of analysis—and recalling that I can analyze which action is best assuming the game is highball, without making any claim about how easy or hard it is to figure out whether the game is highball—assuming that hypothesis, what action would you prescribe?
Let’s be more precise, then, and speak in terms of “correctness” rather than “accuracy”. There are then two possibilities in the “bomb” scenario as stipulated:
The predictor thought I would take the right box, and was correct.
The predictor thought I would take the right box, and was incorrect.
Now, note the following interesting property of the above two possibilities: I get to choose which of them is realized. I cannot change what the predictor thought, nor can I change its actions conditional on its prediction (which in the stipulated case involves placing a bomb in the left box), but I can choose to make its prediction correct or incorrect, depending on whether I take the box it predicted I would take.
Observe now the following interesting corollary of the above argument: it implies the existence of a certain strategy, which we might call “ObeyBot”, which always chooses the action that confirms the predictor’s prediction, and does so regardless of payoffs: by hypothesis, ObeyBot does not care about burning to death, nor does it care about losing $100; it cares only for making the predictor right as often as it can.
Now, suppose I were to tell you that the predictor’s stipulated track record in this game (10^24 − 1 successes, and 1 failure) were achieved against a population consisting (almost) entirely of ObeyBots. What would you make of this claim?
It should be quite obvious that there is only one conclusion you can draw: the predictor’s track record establishes absolutely nothing about its predictive capabilities in general. When faced with a population of agents that always do whatever the predictor says they will do, any “predictor”—from the superintelligent Omega to a random number generator—will achieve an accuracy of 100%.
This is an extremely important observation! What it shows is that the predictor’s claimed track record can mean very different things depending on what kind of agents it was playing against… which in turn means that the problem as stated is underdetermined: knowledge of the predictor’s track record does not by itself suffice to pin down the predictor’s actual behavior, unless we are also given information about what kind of agents the predictor was playing against.
Now, obviously whoever came up with the “bomb” scenario presumably did not want us to assume that their predictor is only accurate against ObeyBots and no one else. But to capture this notion requires more than merely stating the predictor’s track record; it requires the further supposition that the predictor’s accuracy is a persistent feature of the predictor, rather than of the agents it was playing against. It requires, in short, the very thing you criticized as an “unnecessarily vague characterization”: the statement that the predictor is, in fact, “likely to be accurate”.
With this as a prerequisite, we are now equipped to address your next question:
(First, a brief recap: we are asked to assume the existence of a predictor with an extremely high level of predictive accuracy. Moreover, although the problem statement did not actually specify, it is likely we are meant to assume that this predictor’s accuracy persists across a large reference class of possible agent designs; specifically—and crucially—this reference class must be large enough to include our own decision algorithm, whatever it may be.)
Now, with this in mind, your (Said’s) question is: what could possibly be contradictory about this scenario? What could be contradictory about a scenario that asks us to assume the existence of a predictor that can accurately predict the behavior of a large reference class of agents, including ourselves?
At this point, it should be quite obvious where the potential contradiction lies: it arises directly from the fact that an agent who finds itself in the described scenario can choose to make the predictor right or wrong. Earlier, I exploited this property to construct a strategy (ObeyBot) which makes the predictor right no matter what; but now I will do the opposite, and construct a strategy that always chooses whichever action falsifies the predictor’s prediction. For thematic appropriateness, let us call this strategy “SpiteBot”.
I assert that it is impossible for any predictor to achieve any accuracy against SpiteBot higher than 0%. Therefore, if we assume that I myself am a SpiteBot, the contradiction becomes immediately clear: it is impossible for the predictor to achieve the stipulated predictive accuracy across any reference class of agents broad enough to include SpiteBot, and yet in order for the predictor’s reference class to include me, it must include SpiteBot, since I am a SpiteBot. Contradiction!
Of course, this only establishes that the problem statement is contradictory if the reader happens to be a SpiteBot. And while that might be somewhat amusing to imagine, it is unlikely that any of us (the intended audience of the problem) are SpiteBots, or use any decision theory akin to SpiteBot. So let us ask: are there any other decision theories that manage to achieve a similar effect—any other decision theories that, under at least some subset of circumstances, turn into SpiteBot, thereby rendering the problem formulation contradictory?
Asked that way, the question answers itself: any decision theory that behaves like SpiteBot at least some of the time, will generate contradictions with the problem statement at least some of the time. And (here is the key point) if those contradictions are to avoided, the hypothesized predictor must avoid whatever set of circumstances triggers the SpiteBot-like behavior, lest it falsify its own alleged predictive accuracy… which implies the following corollaries:
The predictor cannot show its face (or its boxes) before SpiteBot. Any native SpiteBots in this universe will therefore never find themselves in this particular “bomb” scenario, nor any relatives.
Any strategies that behave like SpiteBot some of the time, will not encounter any “bomb”-like scenarios during that time. For example, the strategy “Choose your actions normally except on Wednesdays; behave like SpiteBot on Wednesdays” will not encounter any bomb-like scenarios on Wednesday.
If you don’t like the idea of ending up in a “bomb”-like scenario, you should precommit to behaving like SpiteBot in the face of such a scenario. If you actually do this properly, you will never see a real predictor give you a bomb-like scenario, which means that the only time you will be faced with any kind of bomb-like scenario will be in your imagination, perhaps at the prompting of some inquisitive philosopher. At this point you are free to ask the philosopher how his stipulated predictor reacts to SpiteBot and SpiteBot-like strategies, and—if he fails to give you a convincing answer—you are free to dismiss his premise as contradictory, and his scenario as something you don’t have to worry about.
(The general case of “behaving like SpiteBot when confronted with situations you’d really rather not have encountered to begin with” is called “fuck-you decision theory”, or FDT for short.)
Yes, it was this exact objection that I addressed in my previous replies that relied upon a misreading of the problem. I missed that the boxes were open and thought that the only clue to the prediction was the note that was left.
The only solution was to assume that the predictor does not always leave a note, and this solution also works for the stated scenario. You see that the boxes are open, and the left one contains a bomb, but did everyone else? Did anyone else? The problem setup doesn’t say.
This sort of vagueness leaves holes big enough to drive a truck through. The stated FDT support for picking Left depends absolutely critically on the subjunctive dependency odds being at least many millions to one, and the stated evidence is nowhere near strong enough to support that. Failing that, FDT recommends picking Right.
So the whole scenario is pointless. It doesn’t explore what it was intended to explore.
You can modify the problem to say that the predictor really is that reliable for every agent, but doesn’t always leave the boxes open for you or write a note. This doesn’t mean that the predictor is perfectly reliable, so a SpiteBot can still face this scenario but is just extremely unlikely to.