The distinction between simulation and reasoning/analysis is relevant here. You really can attempt to figure out if a version of you is being simulated by having a working is_counterfactual_evaluated, which should be the model of how the simulator decides to simulate various things, that is you can figure out if you are being simulated by reasoning about your alleged simulator, seeing if it’s going to simulate you in particular. Intervening into the details of your is_counterfactual_evaluated makes the resulting computation not-a-faithful-simulation, but it could still be some sort of reasoning. Making a useful counterfactual incentivizes the simulator to intervene into is_counterfactual_evaluated to see how you would behave in the real world, where you might predict that you are not being simulated (though more carefully, these options are not mutually exclusive).
So there is potentially an adversarial game between the counterfactual, which wants to figure out if it’s real and should behave as if it’s real, or if it’s a hypothetical and so should behave in a way that optimally misleads those who would attempt to run a counterfactual; and the predictor who wants to prevent the counterfactual from finding out that it’s counterfactual, without affecting it in any other way. But this doesn’t necessarily settle into an adversarial equilibrium, as both parties might want to form good models of each other to coordinate and bargain more effectively.
The distinction between simulation and reasoning/analysis is relevant here. You really can attempt to figure out if a version of you is being simulated by having a working
is_counterfactual_evaluated
, which should be the model of how the simulator decides to simulate various things, that is you can figure out if you are being simulated by reasoning about your alleged simulator, seeing if it’s going to simulate you in particular. Intervening into the details of youris_counterfactual_evaluated
makes the resulting computation not-a-faithful-simulation, but it could still be some sort of reasoning. Making a useful counterfactual incentivizes the simulator to intervene intois_counterfactual_evaluated
to see how you would behave in the real world, where you might predict that you are not being simulated (though more carefully, these options are not mutually exclusive).So there is potentially an adversarial game between the counterfactual, which wants to figure out if it’s real and should behave as if it’s real, or if it’s a hypothetical and so should behave in a way that optimally misleads those who would attempt to run a counterfactual; and the predictor who wants to prevent the counterfactual from finding out that it’s counterfactual, without affecting it in any other way. But this doesn’t necessarily settle into an adversarial equilibrium, as both parties might want to form good models of each other to coordinate and bargain more effectively.