Another problem with this version of “updatelessness” is that if you only run the logical inductor a short amount of time before selecting the policy, the chosen policy could be terrible, since the early belief state of the inductor could be quite bad. It seems as if there’s at least something to be said about what it means to make a “good” trade-off between running too long before choosing the policy (so not being updateless enough) and running too short (so not knowing enough to choose policies wisely).
Another problem with this version of “updatelessness” is that if you only run the logical inductor a short amount of time before selecting the policy, the chosen policy could be terrible, since the early belief state of the inductor could be quite bad. It seems as if there’s at least something to be said about what it means to make a “good” trade-off between running too long before choosing the policy (so not being updateless enough) and running too short (so not knowing enough to choose policies wisely).