There are straightforward proofs of this fact, I think. For example, observe that Solomonoff induction does only a constant worse than multiplicative weights over the space of computable predictors. Thinking about it this way seems more productive, unless I am missing something important.
The tricky part for me was proving the equivalence of two views of the Solomonoff prior: as a probability distribution over programs that print bitstrings and never make mistakes, and as a mixture of all “lower semi-computable” probability distributions over finite prefixes of bitstrings that are allowed to make mistakes. Once you derive the second view from the first, the conclusion is indeed trivial.
I started from the view of multiplicative weights over computable predictors, and then when I tried to describe it as an improvement on Solomonoff induction realized it was the same thing. If you write down what the two algorithms actually do, the equivalence to within a constant is clear.
The universal a priori probability m has many remarkable properties. A theorem of L.A. Levin states that among the lower semi-computable semi-measures it is the largest such function in the following sense: If p is a lower semi-computable semi-measure, then there is a constant c such that cm(x) >= p(x) for all x.
When you say “The tricky part for me was proving the equivalence of two views of the Solomonoff prior” do you mean you tried to re-derive this result?
For the discrete version, doesn’t this boil down to converting a program that semi-computes a semi-measure into a program that “semi-samples from” a semi-measure?
Which is pretty straightforward:
We interpret the remainder of our (infinite) input as a real number y in [0,1]. Every time the measure that we’re semi-computing adds an amount p of probability to one of its possible values v, we increment a variable x (which was initialised to 0) by p. If and when x finally exceeds y, we return the value v whose probability was just incremented.
(The continuous version, which is the one we actually need, looks harder, but I guess the same general idea should work. Perhaps we just repeat the algorithm above once for every bit of the output?)
There are straightforward proofs of this fact, I think. For example, observe that Solomonoff induction does only a constant worse than multiplicative weights over the space of computable predictors. Thinking about it this way seems more productive, unless I am missing something important.
The tricky part for me was proving the equivalence of two views of the Solomonoff prior: as a probability distribution over programs that print bitstrings and never make mistakes, and as a mixture of all “lower semi-computable” probability distributions over finite prefixes of bitstrings that are allowed to make mistakes. Once you derive the second view from the first, the conclusion is indeed trivial.
I started from the view of multiplicative weights over computable predictors, and then when I tried to describe it as an improvement on Solomonoff induction realized it was the same thing. If you write down what the two algorithms actually do, the equivalence to within a constant is clear.
I mentioned to you a few days ago (in chat) that Scholarpedia stated:
When you say “The tricky part for me was proving the equivalence of two views of the Solomonoff prior” do you mean you tried to re-derive this result?
For the discrete version, doesn’t this boil down to converting a program that semi-computes a semi-measure into a program that “semi-samples from” a semi-measure?
Which is pretty straightforward:
We interpret the remainder of our (infinite) input as a real number y in [0,1]. Every time the measure that we’re semi-computing adds an amount p of probability to one of its possible values v, we increment a variable x (which was initialised to 0) by p. If and when x finally exceeds y, we return the value v whose probability was just incremented.
(The continuous version, which is the one we actually need, looks harder, but I guess the same general idea should work. Perhaps we just repeat the algorithm above once for every bit of the output?)
Yes. I eventually succeeded, but not without a tiny bit of help from the texts.