And I guess that’s where decision-theoretic questions arise—if basement inductors are willing to wait for enough frames, then we can’t do anything, so we won’t. Because we wouldn’t have enough simplicity to fake observations indefinitely, right? Otherwise we are intended model.
If the basement people ever need to make some potentially-catastrophic decisions (for which a misprediction is catastrophic), then the manipulators can wait until those decisions to break the predictor. Waiting is probably cheaper for the manipulators than gathering more data is for us (or at best they are proportionally expensive, so doesn’t change the calculus at all).
If you are in something like the low-stakes setting, then there’s no opportunity for a manipulator to do too much damage—every time they do some harm they lose a bit of probability, and so there’s a reasonable bound on the total harm they can do.
And I guess that’s where decision-theoretic questions arise—if basement inductors are willing to wait for enough frames, then we can’t do anything, so we won’t. Because we wouldn’t have enough simplicity to fake observations indefinitely, right? Otherwise we are intended model.
If the basement people ever need to make some potentially-catastrophic decisions (for which a misprediction is catastrophic), then the manipulators can wait until those decisions to break the predictor. Waiting is probably cheaper for the manipulators than gathering more data is for us (or at best they are proportionally expensive, so doesn’t change the calculus at all).
If you are in something like the low-stakes setting, then there’s no opportunity for a manipulator to do too much damage—every time they do some harm they lose a bit of probability, and so there’s a reasonable bound on the total harm they can do.