The tricky part for me was proving the equivalence of two views of the Solomonoff prior: as a probability distribution over programs that print bitstrings and never make mistakes, and as a mixture of all “lower semi-computable” probability distributions over finite prefixes of bitstrings that are allowed to make mistakes. Once you derive the second view from the first, the conclusion is indeed trivial.
I started from the view of multiplicative weights over computable predictors, and then when I tried to describe it as an improvement on Solomonoff induction realized it was the same thing. If you write down what the two algorithms actually do, the equivalence to within a constant is clear.
The universal a priori probability m has many remarkable properties. A theorem of L.A. Levin states that among the lower semi-computable semi-measures it is the largest such function in the following sense: If p is a lower semi-computable semi-measure, then there is a constant c such that cm(x) >= p(x) for all x.
When you say “The tricky part for me was proving the equivalence of two views of the Solomonoff prior” do you mean you tried to re-derive this result?
For the discrete version, doesn’t this boil down to converting a program that semi-computes a semi-measure into a program that “semi-samples from” a semi-measure?
Which is pretty straightforward:
We interpret the remainder of our (infinite) input as a real number y in [0,1]. Every time the measure that we’re semi-computing adds an amount p of probability to one of its possible values v, we increment a variable x (which was initialised to 0) by p. If and when x finally exceeds y, we return the value v whose probability was just incremented.
(The continuous version, which is the one we actually need, looks harder, but I guess the same general idea should work. Perhaps we just repeat the algorithm above once for every bit of the output?)
The tricky part for me was proving the equivalence of two views of the Solomonoff prior: as a probability distribution over programs that print bitstrings and never make mistakes, and as a mixture of all “lower semi-computable” probability distributions over finite prefixes of bitstrings that are allowed to make mistakes. Once you derive the second view from the first, the conclusion is indeed trivial.
I started from the view of multiplicative weights over computable predictors, and then when I tried to describe it as an improvement on Solomonoff induction realized it was the same thing. If you write down what the two algorithms actually do, the equivalence to within a constant is clear.
I mentioned to you a few days ago (in chat) that Scholarpedia stated:
When you say “The tricky part for me was proving the equivalence of two views of the Solomonoff prior” do you mean you tried to re-derive this result?
For the discrete version, doesn’t this boil down to converting a program that semi-computes a semi-measure into a program that “semi-samples from” a semi-measure?
Which is pretty straightforward:
We interpret the remainder of our (infinite) input as a real number y in [0,1]. Every time the measure that we’re semi-computing adds an amount p of probability to one of its possible values v, we increment a variable x (which was initialised to 0) by p. If and when x finally exceeds y, we return the value v whose probability was just incremented.
(The continuous version, which is the one we actually need, looks harder, but I guess the same general idea should work. Perhaps we just repeat the algorithm above once for every bit of the output?)
Yes. I eventually succeeded, but not without a tiny bit of help from the texts.