I think you mean “a spurious counterfactual where the conditional utilities view the agent one-boxing as evidence that the predictor’s axioms must be inconsistent”? That is, the agent correctly believes that predictor’s axioms are likely to be consistent but also thinks that they would be inconsistent if it one-boxed, so it two-boxes?
[Edit: this isn’t actually a spurious counterfactual.] The agent might reason “if I two-box, then either it’s because I do something stupid (we can’t rule this out for Lobian reasons, but we should be able to assign it arbitrarily low probability), or, much more likely, the predictor’s reasoning is inconsistent. An inconsistent predictor would put $1M in box B no matter what my action is, so I can get $1,001,000 by two-boxing in this scenario. I am sufficiently confident in this model that my expected payoff conditional on me two-boxing is greater than $1M, whereas I can’t possibly get more than $1M if I one-box. Therefore I should two-box.” (this only happens if the predictor is implemented in such a way that it puts $1M in box B if it is inconsistent, of course). If the agent reasons this way, it would be wrong to trust itself with high probability, but we’d want the agent to be able to trust itself with high probability without being wrong.
I think you mean “a spurious counterfactual where the conditional utilities view the agent one-boxing as evidence that the predictor’s axioms must be inconsistent”? That is, the agent correctly believes that predictor’s axioms are likely to be consistent but also thinks that they would be inconsistent if it one-boxed, so it two-boxes?
[Edit: this isn’t actually a spurious counterfactual.] The agent might reason “if I two-box, then either it’s because I do something stupid (we can’t rule this out for Lobian reasons, but we should be able to assign it arbitrarily low probability), or, much more likely, the predictor’s reasoning is inconsistent. An inconsistent predictor would put $1M in box B no matter what my action is, so I can get $1,001,000 by two-boxing in this scenario. I am sufficiently confident in this model that my expected payoff conditional on me two-boxing is greater than $1M, whereas I can’t possibly get more than $1M if I one-box. Therefore I should two-box.” (this only happens if the predictor is implemented in such a way that it puts $1M in box B if it is inconsistent, of course). If the agent reasons this way, it would be wrong to trust itself with high probability, but we’d want the agent to be able to trust itself with high probability without being wrong.