To entertain that possibility, suppose you’re X% confident that your best “fool the predictor into thinking I’ll one-box, and then two-box” plan will work, and Y% confident that “actually do one-box, in a way the predictor can predict” plan will work. If X=Y or X>Y you’ve got no incentive to actually one-box, only to try to pretend you will, but above some threshold of belief the predictor might beat your deception it makes sense to actually be honest.
To entertain that possibility, suppose you’re X% confident that your best “fool the predictor into thinking I’ll one-box, and then two-box” plan will work, and Y% confident that “actually do one-box, in a way the predictor can predict” plan will work. If X=Y or X>Y you’ve got no incentive to actually one-box, only to try to pretend you will, but above some threshold of belief the predictor might beat your deception it makes sense to actually be honest.