Ooh! Shiny! I forgot that the InfraBayes sequence existed, but when I went back I saw that I “read” the first four of them before “bouncing off” as you say. Just now I tried to dip back in to The Many Faces of Infra-Beliefs (hoping to get a summary) and it is so big! And it is not a summary <3
That post has a section titled “Deconfusing the Cosmic Ray Problem” which could be an entire post… or maybe even an entire sequence of its own if the target audience was like “bright high school students with some calc and trig and stats” and you have to explain and motivate the Cosmic Ray Problem and explain all the things that don’t work first, before you explain inframeasures in a small and practical enough way to actually apply it slowly, and then turn the crank, and then see how inframeasure math definitely “gives the intuitive right answer” in a way that some definite alternative math does not.
Reading and googling and thinking… The sequence there now seems like it is aimed at securing intellectual priority? Like, I tried to go find the the people who invented it, and wrote a text book on it, and presented about it at conferences by searching for [inframeasure theory] on Google Videos and there was… literally zero videos?
This caused me to loop back to the first post of the sequence and realize that this was all original research, explained from scratch for something-like-the-first-time-ever in that sequence, not a book report on something invented by Kolmogorov or Schmidhuber or whoever, and known to be maybe finally sort of mature, and suspected to be useful.
So in terms of education, my priors are… assuming that this is as important as causal graphs, this will take decades to come into general awareness, sorta like Pearl’s stuff existed back in the 1980s but widespread understanding of even this simple thing lagged for a really really long time.
Honestly, trying to figure it out, I still don’t know what an inframeasure actually is in words.
If I had to guess, I’d wonder if it was maybe just a plain old bayesian model, but instead of a bayesian model with an event space that’s small and cute and easy to update on, maybe it is an event space over potential infinities of bayesian agents (with incompatible priors?) updating on different ways of having conceptual simplifications of potentially infinitely complicated underlying generic event spaces? Maybe?
If this is even what it is, then I’d be tempted to say that the point was “to put some bayes in your bayes so you can update on your updates”. Then maybe link it, conceptually, to Meta MCMC stuff? But my hunch is that that’s not exactly what’s going on here, and it might be very very very different from Meta MCMC stuff.
Infradistributions are a generalization of sets of probability distributions. Sets of probability distributions are used in “imprecise bayesianism” to represent the idea that we haven’t quite pinned down the probability distribution. The most common idea about what to do when you haven’t quite pinned down the probability distribution is to reason in a worst-case way about what that probability distribution is. Infrabayesianism agrees with this idea.
One of the problems with imprecise bayesianism is that they haven’t come up with a good update rule—turns out it’s much trickier than it looks. You can’t just update all the distributions in the set, because [reasons i am forgetting]. Part of the reason infrabayes generalizes imprecise bayes is to fix this problem.
So you can think of an infradistribution mostly as a generalization of “sets of probability distributions” which has a good update rule, unlike “sets of probability distributions”.
Why is this great?
Mainly because “sets of probability distributions” are actually a pretty great idea for decision theory. Regular Bayes has the “realizability” problem: in order to prove good loss bounds, you need to assume the prior is “realizable”, which means that one of the hypotheses in the prior is true. For example, with Solomonoff, this amounts to assuming the universe is computable.
Using sets instead, you don’t need to have the correct hypothesis in your prior; you only need to have an imprecise hypothesis which includes the correct hypothesis, and “few enough” other hypotheses that you get a reasonably tight bound on loss.
Unpacking that a little more: if the learnability condition is met, then if the true environment is within one of the imprecise hypotheses in the prior, then we can eventually do as well as an agent who just assumed that particular imprecise hypothesis from the beginning (because we eventually learn that the true world is within that imprecise hypothesis).
This allows us to get good guarantees against non-computable worlds, if they have some computable regularities. Generalizing imprecise probabilities to the point where there’s a nice update rule was necessary to make this work.
There is currently no corresponding result for logical induction. (I think something might be possible, but there are some onerous obstacles in the way.)
One of the problems with imprecise bayesianism is that they haven’t come up with a good update rule—turns out it’s much trickier than it looks. You can’t just update all the distributions in the set, because [reasons i am forgetting]. Part of the reason infrabayes generalizes imprecise bayes is to fix this problem.
The reason you can’t just update all the distributions in the set is, it wouldn’t be dynamically consistent. That is, planning ahead what to do in every contingency versus updating and acting accordingly would produce different policies.
The correct update rule actually does appear in the literature (Gilboa and Schmeidler 1993). They don’t introduce any of our dual formalisms of a-measures and nonlinear functionals, instead just viewing beliefs as orders on actions, but the result is equivalent. So, our main novelty is really combining imprecise probability with reinforcement learning theory (plus consequences such as FDT-like behavior and extensions such as physicalism) rather than the update rule (even though our formulation of the update rule has some advantages).
This allows us to get good guarantees against non-computable worlds, if they have some computable regularities. Generalizing imprecise probabilities to the point where there’s a nice update rule was necessary to make this work.
I’m not sure the part about “update rule was necessary” is true. Having a nice update rule is nice, but in practice it seems more important to have nice learning algorithms. Learning algorithms is something I only began to work on[1]. As to what kind of infradistributions do we actually need (on the range between crisp and fully general), it’s not clear. Physicalism seems to work better with cohomogeneous compared to crisp, but the inroads in learning suggest affine infradistributions which is even narrower than crisp. In infra-Bayesian logic, both have different advantages (cohomogeneous admits continuous conjunction, affine might admit efficient algorithms). Maybe some synthesis is possible, but at present I don’t know.
See this for some initial observations. Since then I arrived at regret bounds for stochastic linear affine bandits (both ~O(√n) for the general case and ~O(logn) for the gap case, given an appropriate definition of “gap”) with a UCB-type algorithm. In addition, there is Tian et al 2020 which is stated as studying zero-sum games but can be viewed as a regret bound for infra-MDPs.
Are beta and gamma distributions infradistributions in the sense that they are different sets of probability distributions whose behavior is parameterized? Or multivariate beta distributions?
Ooh! Shiny! I forgot that the InfraBayes sequence existed, but when I went back I saw that I “read” the first four of them before “bouncing off” as you say. Just now I tried to dip back in to The Many Faces of Infra-Beliefs (hoping to get a summary) and it is so big! And it is not a summary <3
That post has a section titled “Deconfusing the Cosmic Ray Problem” which could be an entire post… or maybe even an entire sequence of its own if the target audience was like “bright high school students with some calc and trig and stats” and you have to explain and motivate the Cosmic Ray Problem and explain all the things that don’t work first, before you explain inframeasures in a small and practical enough way to actually apply it slowly, and then turn the crank, and then see how inframeasure math definitely “gives the intuitive right answer” in a way that some definite alternative math does not.
Reading and googling and thinking… The sequence there now seems like it is aimed at securing intellectual priority? Like, I tried to go find the the people who invented it, and wrote a text book on it, and presented about it at conferences by searching for [inframeasure theory] on Google Videos and there was… literally zero videos?
This caused me to loop back to the first post of the sequence and realize that this was all original research, explained from scratch for something-like-the-first-time-ever in that sequence, not a book report on something invented by Kolmogorov or Schmidhuber or whoever, and known to be maybe finally sort of mature, and suspected to be useful.
So in terms of education, my priors are… assuming that this is as important as causal graphs, this will take decades to come into general awareness, sorta like Pearl’s stuff existed back in the 1980s but widespread understanding of even this simple thing lagged for a really really long time.
Honestly, trying to figure it out, I still don’t know what an inframeasure actually is in words.
If I had to guess, I’d wonder if it was maybe just a plain old bayesian model, but instead of a bayesian model with an event space that’s small and cute and easy to update on, maybe it is an event space over potential infinities of bayesian agents (with incompatible priors?) updating on different ways of having conceptual simplifications of potentially infinitely complicated underlying generic event spaces? Maybe?
If this is even what it is, then I’d be tempted to say that the point was “to put some bayes in your bayes so you can update on your updates”. Then maybe link it, conceptually, to Meta MCMC stuff? But my hunch is that that’s not exactly what’s going on here, and it might be very very very different from Meta MCMC stuff.
Infradistributions are a generalization of sets of probability distributions. Sets of probability distributions are used in “imprecise bayesianism” to represent the idea that we haven’t quite pinned down the probability distribution. The most common idea about what to do when you haven’t quite pinned down the probability distribution is to reason in a worst-case way about what that probability distribution is. Infrabayesianism agrees with this idea.
One of the problems with imprecise bayesianism is that they haven’t come up with a good update rule—turns out it’s much trickier than it looks. You can’t just update all the distributions in the set, because [reasons i am forgetting]. Part of the reason infrabayes generalizes imprecise bayes is to fix this problem.
So you can think of an infradistribution mostly as a generalization of “sets of probability distributions” which has a good update rule, unlike “sets of probability distributions”.
Why is this great?
Mainly because “sets of probability distributions” are actually a pretty great idea for decision theory. Regular Bayes has the “realizability” problem: in order to prove good loss bounds, you need to assume the prior is “realizable”, which means that one of the hypotheses in the prior is true. For example, with Solomonoff, this amounts to assuming the universe is computable.
Using sets instead, you don’t need to have the correct hypothesis in your prior; you only need to have an imprecise hypothesis which includes the correct hypothesis, and “few enough” other hypotheses that you get a reasonably tight bound on loss.
Unpacking that a little more: if the learnability condition is met, then if the true environment is within one of the imprecise hypotheses in the prior, then we can eventually do as well as an agent who just assumed that particular imprecise hypothesis from the beginning (because we eventually learn that the true world is within that imprecise hypothesis).
This allows us to get good guarantees against non-computable worlds, if they have some computable regularities. Generalizing imprecise probabilities to the point where there’s a nice update rule was necessary to make this work.
There is currently no corresponding result for logical induction. (I think something might be possible, but there are some onerous obstacles in the way.)
The reason you can’t just update all the distributions in the set is, it wouldn’t be dynamically consistent. That is, planning ahead what to do in every contingency versus updating and acting accordingly would produce different policies.
The correct update rule actually does appear in the literature (Gilboa and Schmeidler 1993). They don’t introduce any of our dual formalisms of a-measures and nonlinear functionals, instead just viewing beliefs as orders on actions, but the result is equivalent. So, our main novelty is really combining imprecise probability with reinforcement learning theory (plus consequences such as FDT-like behavior and extensions such as physicalism) rather than the update rule (even though our formulation of the update rule has some advantages).
I’m not sure the part about “update rule was necessary” is true. Having a nice update rule is nice, but in practice it seems more important to have nice learning algorithms. Learning algorithms is something I only began to work on[1]. As to what kind of infradistributions do we actually need (on the range between crisp and fully general), it’s not clear. Physicalism seems to work better with cohomogeneous compared to crisp, but the inroads in learning suggest affine infradistributions which is even narrower than crisp. In infra-Bayesian logic, both have different advantages (cohomogeneous admits continuous conjunction, affine might admit efficient algorithms). Maybe some synthesis is possible, but at present I don’t know.
See this for some initial observations. Since then I arrived at regret bounds for stochastic linear affine bandits (both ~O(√n) for the general case and ~O(logn) for the gap case, given an appropriate definition of “gap”) with a UCB-type algorithm. In addition, there is Tian et al 2020 which is stated as studying zero-sum games but can be viewed as a regret bound for infra-MDPs.
Are beta and gamma distributions infradistributions in the sense that they are different sets of probability distributions whose behavior is parameterized? Or multivariate beta distributions?