Linear infra-Bayesian Bandits

Link post

Linked is my MSc thesis, where I do regret analysis for an infra-Bayesian[1] generalization of stochastic linear bandits.

The main significance that I see in this work is:

  • Expanding our understanding of infra-Bayesian regret bounds, and solidifying our confidence that infra-Bayesianism is a viable approach. Previously, the most interesting IB regret analysis we had was Tian et al which deals (essentially) with episodic infra-MDPs. My work here doesn’t supersede Tian et al because it only talks about bandits (i.e. stateless infra-Bayesian laws), but it complements it because it deals with a parameteric hypothesis space (i.e. fits into the general theme in learning-theory that generalization bounds should scale with the dimension of the hypothesis class).

  • Discovering some surprising features of infra-Bayesian learning that have no analogues in classical theory. In particular, it turns out that affine credal sets (i.e. such that are closed w.r.t. arbitrary affine combinations of distributions and not just convex combinations) have better learning-theoretic properties, and the regret bound depends on additional parameters that don’t appear in classical theory (the “generalized sine” and the “generalized condition number” ). Credal sets defined using conditional probabilities (related to Armstrong’s “model splinters”) turn out to be well-behaved in terms of these parameters.

In addition to the open questions in the “summary” section, there is also a natural open question of extending these results to non-crisp infradistributions. (I didn’t mention it in the thesis because it requires too much additional context to motivate.)

  1. ^

    I use the word “imprecise” rather than “infra-Bayesian” in the title, because the proposed algorithms achieves a regret bound which is worst-case over the hypothesis class, so it’s not “Bayesian” in any non-trivial sense.

  2. ^

    In particular, I suspect that there’s a flavor of homogeneous ultradistributions for which the parameter becomes unnecessary. Specifically, an affine ultradistribution can be thought of as the result of “take an affine subspace of the affine space of signed distributions, intersect it with the space of actual (positive) distributions, then take downwards closure into contributions to make it into a homogeneous ultradistribution”. But we can also consider the alternative “take an affine subspace of the affine space of signed distributions, take downwards closure into signed contributions and then intersect it with the space of actual (positive) contributions”. The order matters!