Nice things about the universal distribution underlying AIXI include:
It is one (lower semi-)computable probabilistic model that dominates in the measure-theoretic sense all other (lower semi-)computable probabilistic models. This is not possible to construct for most natural computability levels, so its neat that it works.
Unites compression and prediction through the coding theorem—though this is slightly weaker in the sequential case.
It has two very natural characterizations, either as feeding random bits to a UTM or as an explicit mixture of lower semi-computable environments.
With the full AIXI model, Professor Hutter was able to formally extend the probabilistic model to interactive environments without damaging the computability level. Conditioning and planning do damage the computability level but this is fairly well understood and not too bad.
Technically the connection between the computability levels of AIT (estimability, lower/upper semi-computability, approximability) and the Turing degrees has not been worked out properly. See chapter 6 of Leike’s thesis, though there is a small error in the inequalities of section 6.1.2. It is necessary to connect the computability of real valued functions (type two theory of effectivity) to the arithmetic hierarchy—as far as I know this hasn’t been done, but maybe I’ll share some notes in a few months.
Roughly, most classes don’t have a universal distribution because they are not computably enumerable, but perhaps there are various reasons. There’s a nice table in Marcus Hutter’s original book, page 50.
It says that (negative log) universal probability is about the same as the (monotone) Kolmogorov complexity—in the discrete case up to a constant multiple. Basically, the Bayesian prediction is closely connected to the shortest explanation. See Li and Vitanyi’s “An Introduction to Kolmogorov Complexity and its Applications.”
Last question is a longer story I guess. Basically, the conditionals of the universal distribution are not lower semi-computable, and it gets even worse when you have to compare the expected values of different outcomes because of tie-breaking. But a good approximation of AIXI can still be computed in the limit.
Nice things about the universal distribution underlying AIXI include:
It is one (lower semi-)computable probabilistic model that dominates in the measure-theoretic sense all other (lower semi-)computable probabilistic models. This is not possible to construct for most natural computability levels, so its neat that it works.
Unites compression and prediction through the coding theorem—though this is slightly weaker in the sequential case.
It has two very natural characterizations, either as feeding random bits to a UTM or as an explicit mixture of lower semi-computable environments.
With the full AIXI model, Professor Hutter was able to formally extend the probabilistic model to interactive environments without damaging the computability level. Conditioning and planning do damage the computability level but this is fairly well understood and not too bad.
Thanks a lot!
A few followup questions :..
By computaibility level do you mean Turing degree ?
Why cant the universal distribution be constructed for most levels ?
What exactly is the coding theorem?
What do you mean by conditioning and planning damaging the computability level and why is not so bad ?
Technically the connection between the computability levels of AIT (estimability, lower/upper semi-computability, approximability) and the Turing degrees has not been worked out properly. See chapter 6 of Leike’s thesis, though there is a small error in the inequalities of section 6.1.2. It is necessary to connect the computability of real valued functions (type two theory of effectivity) to the arithmetic hierarchy—as far as I know this hasn’t been done, but maybe I’ll share some notes in a few months.
Roughly, most classes don’t have a universal distribution because they are not computably enumerable, but perhaps there are various reasons. There’s a nice table in Marcus Hutter’s original book, page 50.
It says that (negative log) universal probability is about the same as the (monotone) Kolmogorov complexity—in the discrete case up to a constant multiple. Basically, the Bayesian prediction is closely connected to the shortest explanation. See Li and Vitanyi’s “An Introduction to Kolmogorov Complexity and its Applications.”
Last question is a longer story I guess. Basically, the conditionals of the universal distribution are not lower semi-computable, and it gets even worse when you have to compare the expected values of different outcomes because of tie-breaking. But a good approximation of AIXI can still be computed in the limit.