Maybe you’re right and logical induction can quickly become confident that the coin is hard. In that case, encoding “updatelessness” seems easy. The agent should spend a small fixed amount of time choosing a successor program with high expected utility according to LI, and then run that program. That seems to solve both betting on a logical coin (where it’s best for the successor to compute the coin) and counterfactual mugging (where it’s best for the successor to pay up). Though we don’t know what shape the successor will take in general, and you could say figuring that out is the task of decision theory...
Another problem with this version of “updatelessness” is that if you only run the logical inductor a short amount of time before selecting the policy, the chosen policy could be terrible, since the early belief state of the inductor could be quite bad. It seems as if there’s at least something to be said about what it means to make a “good” trade-off between running too long before choosing the policy (so not being updateless enough) and running too short (so not knowing enough to choose policies wisely).
Maybe you’re right and logical induction can quickly become confident that the coin is hard. In that case, encoding “updatelessness” seems easy. The agent should spend a small fixed amount of time choosing a successor program with high expected utility according to LI, and then run that program. That seems to solve both betting on a logical coin (where it’s best for the successor to compute the coin) and counterfactual mugging (where it’s best for the successor to pay up). Though we don’t know what shape the successor will take in general, and you could say figuring that out is the task of decision theory...
Another problem with this version of “updatelessness” is that if you only run the logical inductor a short amount of time before selecting the policy, the chosen policy could be terrible, since the early belief state of the inductor could be quite bad. It seems as if there’s at least something to be said about what it means to make a “good” trade-off between running too long before choosing the policy (so not being updateless enough) and running too short (so not knowing enough to choose policies wisely).