This makes me think of Thompson sampling. There, on each round/episode you sample one hypothesis out of your current belief state and then follow the optimal action/policy for this hypothesis. In fact, Thompson sampling seems like one of the most natural computationally efficient algorithms for approximating Bayes-optimal decision making, so perhaps it is not surprising if it’s useful for real life decision making too.
This makes me think of Thompson sampling. There, on each round/episode you sample one hypothesis out of your current belief state and then follow the optimal action/policy for this hypothesis. In fact, Thompson sampling seems like one of the most natural computationally efficient algorithms for approximating Bayes-optimal decision making, so perhaps it is not surprising if it’s useful for real life decision making too.