cousin_it comments on Solomonoff induction on a random string

cousin_it 9 Apr 2014 14:40 UTC
3 points
Maybe we’re using different definitions of SI? The version that I’m thinking of (which I believe is the original version) quantifies over all possible deterministic programs that output a sequence of bits, using an arbitrary encoding of programs. Then it sums up the weights of all programs that are consistent with the inputs seen so far, and outputs a probability distribution. That turns out to be equivalent to a weighted mixture of all possible computable distributions, though the proof isn’t obvious. If we need to adapt this version of SI to output a single guess at each step, we just take the guess with the highest probability.

Does that sound right? If yes, then this page from Li and Vitanyi’s book basically says that SI’s probability of the next bit being 1 converges to ⁵⁄₆ with probability 1, unless I’m missing something.
- Strilanc 10 Apr 2014 4:03 UTC
  1 point
  Parent
  That’s not the same definition I was using.
  
  I said the programs have a single output, instead of a probability distribution. It matches the sequence so far or it doesn’t, maybe programs are 100% eliminated or 100% not eliminated. The probabilistic nature comes entirely from the summing-over-all-programs part.
  
  If programs can output a probability distribution then clearly the program “return {0:1/6, 1:5/6}” will do very well and cause the results to converge appropriately.
  - cousin_it 10 Apr 2014 8:50 UTC
    1 point
    Parent
    
    I said the programs have a single output, instead of a probability distribution. It matches the sequence so far or it doesn’t, maybe programs are 100% eliminated or 100% not eliminated. The probabilistic nature comes entirely from the summing-over-all-programs part.
    
    Yes, I’m using the same definition. The “deterministic programs” mentioned in my comment are programs that output a sequence of bits, not programs that output a probability distribution.
    
    That definition is equivalent to a mixture of all possible computable distributions. I suppose that equivalence is an important and surprising fact about SI: how come it makes no difference whether you quantify over programs that output a sequence of bits, or programs that output a probability distribution? But yes, it is true.
- christopherj 9 Apr 2014 15:44 UTC
  −2 points
  Parent
  He has repeatedly said that he’s talking about an SI that outputs a specific prediction instead of a probability distribution of them, and you even quoted him saying so.