So, first off, I should probably say that a lot of the formalism overhead involved in this post in particular feels like the sort of thing that will get a whole lot more elegant as we work more things out, but “Basic inframeasure theory” still looks pretty good at this point and worth reading, and the basic results (ability to translate from pseudocausal to causal, dynamic consistency, capturing most of UDT, definition of learning) will still hold up.
Yes, your current understanding is correct, it’s rebuilding probability theory in more generality to be suitable for RL in nonrealizable environments, and capturing a much broader range of decision-theoretic problems, as well as whatever spin-off applications may come from having the basic theory worked out, like our infradistribution logic stuff.
It copes with unrealizability because its hypotheses are not probability distributions, but sets of probability distributions (actually more general than that, but it’s a good mental starting point), corresponding to properties that reality may have, without fully specifying everything. In particular, if an agent learns a class of belief functions (read: properties the environment may fulfill) is learned, this implies that for all properties within that class that the true environment fulfills (you don’t know the true environment exactly), the infrabayes agent will match or exceed the expected utility lower bound that can be guaranteed if you know reality has that property (in the low-time-discount limit)
There’s another key consideration which Vanessa was telling me to put in which I’ll post in another comment once I fully work it out again.
Also, thank you for noticing that it took a lot of work to write all this up, the proofs took a while. n_n
So let’s say I’m estimating the position of a train on a straight section of track as a single real number and I want to do an update each time I receive a noisy measurement of the train’s position. Under the theory you’re laying out here I might have, say, three Gaussians N(0, 1), N(1, 10), N(4, 6), and rather than updating a single pdf over the position of the train, I’m updating measures associated with each of these three pdf. Is that roughly correct?
(I realize this isn’t exactly a great example of how to use this theory since train positions are perfectly realizable, but I just wanted to start somewhere familiar to me.)
Do you by chance have any worked examples where you go through the update procedure for some concrete prior and observation? If not, do you have any suggestions for what would be a good toy problem where I could work through an update at a very concrete level?
I’m not sure I understood the question, but the infra-Bayesian update is not equivalent to updating every distribution in the convex set of distributions. In fact, updating a crisp infra-distribution (i.e. one that can be described as a convex set of distributions) in general produces an infra-distribution that is not crisp (i.e. you need sa-measures to describe it or use the Legendre dual view).
So, first off, I should probably say that a lot of the formalism overhead involved in this post in particular feels like the sort of thing that will get a whole lot more elegant as we work more things out, but “Basic inframeasure theory” still looks pretty good at this point and worth reading, and the basic results (ability to translate from pseudocausal to causal, dynamic consistency, capturing most of UDT, definition of learning) will still hold up.
Yes, your current understanding is correct, it’s rebuilding probability theory in more generality to be suitable for RL in nonrealizable environments, and capturing a much broader range of decision-theoretic problems, as well as whatever spin-off applications may come from having the basic theory worked out, like our infradistribution logic stuff.
It copes with unrealizability because its hypotheses are not probability distributions, but sets of probability distributions (actually more general than that, but it’s a good mental starting point), corresponding to properties that reality may have, without fully specifying everything. In particular, if an agent learns a class of belief functions (read: properties the environment may fulfill) is learned, this implies that for all properties within that class that the true environment fulfills (you don’t know the true environment exactly), the infrabayes agent will match or exceed the expected utility lower bound that can be guaranteed if you know reality has that property (in the low-time-discount limit)
There’s another key consideration which Vanessa was telling me to put in which I’ll post in another comment once I fully work it out again.
Also, thank you for noticing that it took a lot of work to write all this up, the proofs took a while. n_n
Ah this is helpful, thank you.
So let’s say I’m estimating the position of a train on a straight section of track as a single real number and I want to do an update each time I receive a noisy measurement of the train’s position. Under the theory you’re laying out here I might have, say, three Gaussians N(0, 1), N(1, 10), N(4, 6), and rather than updating a single pdf over the position of the train, I’m updating measures associated with each of these three pdf. Is that roughly correct?
(I realize this isn’t exactly a great example of how to use this theory since train positions are perfectly realizable, but I just wanted to start somewhere familiar to me.)
Do you by chance have any worked examples where you go through the update procedure for some concrete prior and observation? If not, do you have any suggestions for what would be a good toy problem where I could work through an update at a very concrete level?
I’m not sure I understood the question, but the infra-Bayesian update is not equivalent to updating every distribution in the convex set of distributions. In fact, updating a crisp infra-distribution (i.e. one that can be described as a convex set of distributions) in general produces an infra-distribution that is not crisp (i.e. you need sa-measures to describe it or use the Legendre dual view).