I’m not sure why we need to make that distinction. The Solomonoff and Levin constructions are equivalent. The prior built from all deterministic programs that output bit strings, and the prior built from all computable probability distributions, turn out to be the same prior. See e.g. here for proofs and references.
They’re equivalent from the point of view of a consumer of the prediction, they’re not equivalent from the point of view of an implementation. And since this is a discussion about how does it work, the distinction is useful.
Then I’m confused, because the two would seem to produce two very different answers on the same string. Since a string with very high Kolmogorov complexity can be clearly produced by a uniform distribution, the Solomonoff prior would converge to a very high complexity hypothesis, while the Levin mixture would just assign 0.5 to 0 and 0.5 to 1. What am I missing here?
The Solomonoff prior would have many surviving hypotheses at each step, and the total weight of those that predict a 0 for the next bit would be about equal to the total weight of those that predict a 1. If the input distribution is biased, e.g. 0 with probability 5⁄6 and 1 with probability 1⁄6, then the Solomonoff prior will converge on that as well. That works for any computable input distribution, with probability 1 according to the input distribution.
For posterity, the convention is to call the two models Universal/Solomonoff prior M and Universal/Levin mixture ξ, respectively.
I’m not sure why we need to make that distinction. The Solomonoff and Levin constructions are equivalent. The prior built from all deterministic programs that output bit strings, and the prior built from all computable probability distributions, turn out to be the same prior. See e.g. here for proofs and references.
They’re equivalent from the point of view of a consumer of the prediction, they’re not equivalent from the point of view of an implementation. And since this is a discussion about how does it work, the distinction is useful.
Then I’m confused, because the two would seem to produce two very different answers on the same string.
Since a string with very high Kolmogorov complexity can be clearly produced by a uniform distribution, the Solomonoff prior would converge to a very high complexity hypothesis, while the Levin mixture would just assign 0.5 to 0 and 0.5 to 1.
What am I missing here?
The Solomonoff prior would have many surviving hypotheses at each step, and the total weight of those that predict a 0 for the next bit would be about equal to the total weight of those that predict a 1. If the input distribution is biased, e.g. 0 with probability 5⁄6 and 1 with probability 1⁄6, then the Solomonoff prior will converge on that as well. That works for any computable input distribution, with probability 1 according to the input distribution.
nitpick: the prior does not converge, the prior is what you have before you start observing data, then it is a posterior.
Many thanks, I get it now.
What matters is the probability that they assign to the next bit being equal to one.