One of the problems with imprecise bayesianism is that they haven’t come up with a good update rule—turns out it’s much trickier than it looks. You can’t just update all the distributions in the set, because [reasons i am forgetting]. Part of the reason infrabayes generalizes imprecise bayes is to fix this problem.
The reason you can’t just update all the distributions in the set is, it wouldn’t be dynamically consistent. That is, planning ahead what to do in every contingency versus updating and acting accordingly would produce different policies.
The correct update rule actually does appear in the literature (Gilboa and Schmeidler 1993). They don’t introduce any of our dual formalisms of a-measures and nonlinear functionals, instead just viewing beliefs as orders on actions, but the result is equivalent. So, our main novelty is really combining imprecise probability with reinforcement learning theory (plus consequences such as FDT-like behavior and extensions such as physicalism) rather than the update rule (even though our formulation of the update rule has some advantages).
This allows us to get good guarantees against non-computable worlds, if they have some computable regularities. Generalizing imprecise probabilities to the point where there’s a nice update rule was necessary to make this work.
I’m not sure the part about “update rule was necessary” is true. Having a nice update rule is nice, but in practice it seems more important to have nice learning algorithms. Learning algorithms is something I only began to work on[1]. As to what kind of infradistributions do we actually need (on the range between crisp and fully general), it’s not clear. Physicalism seems to work better with cohomogeneous compared to crisp, but the inroads in learning suggest affine infradistributions which is even narrower than crisp. In infra-Bayesian logic, both have different advantages (cohomogeneous admits continuous conjunction, affine might admit efficient algorithms). Maybe some synthesis is possible, but at present I don’t know.
See this for some initial observations. Since then I arrived at regret bounds for stochastic linear affine bandits (both ~O(√n) for the general case and ~O(logn) for the gap case, given an appropriate definition of “gap”) with a UCB-type algorithm. In addition, there is Tian et al 2020 which is stated as studying zero-sum games but can be viewed as a regret bound for infra-MDPs.
The reason you can’t just update all the distributions in the set is, it wouldn’t be dynamically consistent. That is, planning ahead what to do in every contingency versus updating and acting accordingly would produce different policies.
The correct update rule actually does appear in the literature (Gilboa and Schmeidler 1993). They don’t introduce any of our dual formalisms of a-measures and nonlinear functionals, instead just viewing beliefs as orders on actions, but the result is equivalent. So, our main novelty is really combining imprecise probability with reinforcement learning theory (plus consequences such as FDT-like behavior and extensions such as physicalism) rather than the update rule (even though our formulation of the update rule has some advantages).
I’m not sure the part about “update rule was necessary” is true. Having a nice update rule is nice, but in practice it seems more important to have nice learning algorithms. Learning algorithms is something I only began to work on[1]. As to what kind of infradistributions do we actually need (on the range between crisp and fully general), it’s not clear. Physicalism seems to work better with cohomogeneous compared to crisp, but the inroads in learning suggest affine infradistributions which is even narrower than crisp. In infra-Bayesian logic, both have different advantages (cohomogeneous admits continuous conjunction, affine might admit efficient algorithms). Maybe some synthesis is possible, but at present I don’t know.
See this for some initial observations. Since then I arrived at regret bounds for stochastic linear affine bandits (both ~O(√n) for the general case and ~O(logn) for the gap case, given an appropriate definition of “gap”) with a UCB-type algorithm. In addition, there is Tian et al 2020 which is stated as studying zero-sum games but can be viewed as a regret bound for infra-MDPs.