Manfred comments on Understanding and justifying Solomonoff induction

Manfred 15 Jan 2014 3:23 UTC
2 points

But a universal prior is doing something stronger than that: it gives (in a particular way) higher probabilities to simpler programs as well.

The only programs allowed in the Solomonoff distribution are ones that don’t have any extended versions that produce the same output observed so far. So it’s not that the longer programs are given lower probability—it’s that they are given undefined probability, and are entirely “represented” by the most truncated version.

I learned this by reading Anatoly Vorobey’s forthcoming article!
- gedymin 15 Jan 2014 12:54 UTC
  0 points
  Parent
  How do you define an extended version of a program?
  
  I agree to what you’re saying only as long an “extended version” of a program is the same program, just padded with meaningless bits at the end.
  
  I don’t agree that its true regarding longer programs in general. A “longer version” of a program is any program that produces the same output, including programs that use different algorithms, programs that are padded at the beginning, padded in the middle etc.
  
  To get the mathematical expression of the Universal prior, we count all programs, and here the extended versions comes into play. Intuitively, a program padded with a single bit is counted twice (as it may be padded with 0 or 1), a program padded with 2 bits is counted 4 times, etc. So a program with length L+k is 2^k times less likely than as program with length L. That’s where the 2^{-length(p)} probability comes from.
  What links here?
  - gedymin's comment on Understanding and justifying Solomonoff induction by gedymin (15 Jan 2014 20:57 UTC; 0 points)
- lavalamp 15 Jan 2014 6:53 UTC
  0 points
  Parent
  
  The only programs allowed in the Solomonoff distribution are ones that don’t have any extended versions that produce the same output observed so far.
  
  Did not know that! It seems like that would leave some probability mass unassigned, how do you rebalance? Even if you succeed, it seems likely that (for large enough outputs) there’ll be lots of programs that have epsilon difference—that are basically the same, for all practical purposes.
  - Manfred 15 Jan 2014 7:31 UTC
    0 points
    Parent
    
    how do you rebalance?
    
    Normalize!
    
    there’ll be lots of programs that have epsilon difference—that are basically the same, for all practical purposes.
    
    Solomonoff induction is just defined for binary data. Differences are a minimum of 1 bit,, which is enough.
- Eugine_Nier 15 Jan 2014 5:29 UTC
  −2 points
  Parent
  
  The only programs allowed in the Solomonoff distribution are ones that don’t have any extended versions that produce the same output observed so far. So it’s not that the longer programs are given lower probability—it’s that they are given undefined probability, and are entirely “represented” by the most truncated version.
  
  That depends on how one defines “Solomonoff induction”.