cousin_it comments on Solomonoff Induction, by Shane Legg

cousin_it 21 Feb 2011 8:31 UTC
0 points
The tricky part for me was proving the equivalence of two views of the Solomonoff prior: as a probability distribution over programs that print bitstrings and never make mistakes, and as a mixture of all “lower semi-computable” probability distributions over finite prefixes of bitstrings that are allowed to make mistakes. Once you derive the second view from the first, the conclusion is indeed trivial.
- paulfchristiano 21 Feb 2011 20:05 UTC
  0 points
  Parent
  I started from the view of multiplicative weights over computable predictors, and then when I tried to describe it as an improvement on Solomonoff induction realized it was the same thing. If you write down what the two algorithms actually do, the equivalence to within a constant is clear.
- Wei Dai 21 Feb 2011 9:18 UTC
  0 points
  Parent
  I mentioned to you a few days ago (in chat) that Scholarpedia stated:
  
  The universal a priori probability m has many remarkable properties. A theorem of L.A. Levin states that among the lower semi-computable semi-measures it is the largest such function in the following sense: If p is a lower semi-computable semi-measure, then there is a constant c such that cm(x) >= p(x) for all x.
  
  When you say “The tricky part for me was proving the equivalence of two views of the Solomonoff prior” do you mean you tried to re-derive this result?
  - AlephNeil 22 Feb 2011 1:12 UTC
    0 points
    Parent
    For the discrete version, doesn’t this boil down to converting a program that semi-computes a semi-measure into a program that “semi-samples from” a semi-measure?
    
    Which is pretty straightforward:
    
    We interpret the remainder of our (infinite) input as a real number y in [0,1]. Every time the measure that we’re semi-computing adds an amount p of probability to one of its possible values v, we increment a variable x (which was initialised to 0) by p. If and when x finally exceeds y, we return the value v whose probability was just incremented.
    
    (The continuous version, which is the one we actually need, looks harder, but I guess the same general idea should work. Perhaps we just repeat the algorithm above once for every bit of the output?)
  - cousin_it 21 Feb 2011 12:36 UTC
    0 points
    Parent
    Yes. I eventually succeeded, but not without a tiny bit of help from the texts.