MrMind comments on Solomonoff induction on a random string

MrMind 10 Apr 2014 14:55 UTC
1 point
It is important here to distinguish between two models of SI: there is one which regards the universal prior as a probability distribution over programs that generate a definite output, and there is another one that considers the universal prior over a set of computable distributions (that must contain the correct one).
The first SI, given a random string (that is, incompressible), will generate hypothesis with the same length of the string, since it’s constantly pruning those hypothesis that doesn’t match exactly with the given input.
The second SI, given a random string (that is, drawn from a uniform distribution), will with probability 1 assign a very high probability to the uniform distribution.
What links here?
- MrMind's comment on Solomonoff induction on a random string by christopherj (10 Apr 2014 15:06 UTC; 0 points)
- pivo 10 Apr 2014 19:56 UTC
  3 points
  Parent
  For posterity, the convention is to call the two models Universal/Solomonoff prior M and Universal/Levin mixture ξ, respectively.
  - cousin_it 11 Apr 2014 7:08 UTC
    2 points
    Parent
    I’m not sure why we need to make that distinction. The Solomonoff and Levin constructions are equivalent. The prior built from all deterministic programs that output bit strings, and the prior built from all computable probability distributions, turn out to be the same prior. See e.g. here for proofs and references.
    What links here?
    cousin_it's comment on More and Less than Solomonoff Induction by Manfred (21 May 2014 14:19 UTC; 1 point)
    - pivo 11 Apr 2014 8:16 UTC
      1 point
      Parent
      They’re equivalent from the point of view of a consumer of the prediction, they’re not equivalent from the point of view of an implementation. And since this is a discussion about how does it work, the distinction is useful.
    - MrMind 11 Apr 2014 8:56 UTC
      0 points
      Parent
      Then I’m confused, because the two would seem to produce two very different answers on the same string.
      Since a string with very high Kolmogorov complexity can be clearly produced by a uniform distribution, the Solomonoff prior would converge to a very high complexity hypothesis, while the Levin mixture would just assign 0.5 to 0 and 0.5 to 1.
      What am I missing here?
      - cousin_it 11 Apr 2014 12:23 UTC
        4 points
        Parent
        The Solomonoff prior would have many surviving hypotheses at each step, and the total weight of those that predict a 0 for the next bit would be about equal to the total weight of those that predict a 1. If the input distribution is biased, e.g. 0 with probability ⁵⁄₆ and 1 with probability ¹⁄₆, then the Solomonoff prior will converge on that as well. That works for any computable input distribution, with probability 1 according to the input distribution.
        V_V 11 Apr 2014 13:10 UTC
        1 point
        Parent
        nitpick: the prior does not converge, the prior is what you have before you start observing data, then it is a posterior.
        MrMind 11 Apr 2014 12:39 UTC
        1 point
        Parent
        Many thanks, I get it now.
      - V_V 11 Apr 2014 10:06 UTC
        3 points
        Parent
        What matters is the probability that they assign to the next bit being equal to one.