A very simple measure on the binary strings is the uniform measure and so Solomonoff Induction will converge on it with high probability. This is easiest to think about from the Solomonoff-Levin definition of the universal prior where you take a mixture distribution of the measures according to their complexity—thus a simple thing like a uniform prior gets a very high prior probability under the universal distribution. This is different from the sequence of bits itself being complex due to the bits being random. The confusing thing is when you define it the other way by sampling programs, and it’s not at all obvious that things work out the same… indeed it’s quite surprising I’d say.
I’d suggest reading the second chapter of “Machine Super Intelligence”, I think it’s clearer there than in my old master’s thesis as I do more explaining and less proofs.
It’s not clear to me from this description whether the SI predictor is also conditioned. Anyway, if the universal prior is not conditioned, then the convergence is easy as the uniform distribution has very low complexity. If it is conditioned, then you will have no doubt observed many processes well modelled by a uniform distribution over your life—flipping a coin is a good example. So the estimated probability of encountering a uniform distribution in a new situation won’t be all that low.
Indeed, with so much data SI will have built a model of language, and how this maps to mathematics and distributions, and in particular there is a good chance it will have seen a description of quantum mechanics. So if it’s also been provided with information that these will be quantum coin flips, it should predict basically perfectly, including modelling the probability that you’re lying or have simply set up the experiment wrong.
I think what mathemajician means is that if the stream of data is random (in that the bits are independent random variables each with probability 1⁄2 of being 1) then Solomonoff induction converges on the uniform measure with high probability (probability 1, in fact).
I’m sure you knew that already, but you don’t seem to realize that it undercuts the logic behind your claim:
The universal prior implies you should say “substantially less than 1 million”.
This is a tad confused.
A very simple measure on the binary strings is the uniform measure and so Solomonoff Induction will converge on it with high probability. This is easiest to think about from the Solomonoff-Levin definition of the universal prior where you take a mixture distribution of the measures according to their complexity—thus a simple thing like a uniform prior gets a very high prior probability under the universal distribution. This is different from the sequence of bits itself being complex due to the bits being random. The confusing thing is when you define it the other way by sampling programs, and it’s not at all obvious that things work out the same… indeed it’s quite surprising I’d say.
I’d suggest reading the second chapter of “Machine Super Intelligence”, I think it’s clearer there than in my old master’s thesis as I do more explaining and less proofs.
Can you explain why? What’s the result saying the Solomonoff distribution “as a whole” often converges on uniform?
It’s not clear to me from this description whether the SI predictor is also conditioned. Anyway, if the universal prior is not conditioned, then the convergence is easy as the uniform distribution has very low complexity. If it is conditioned, then you will have no doubt observed many processes well modelled by a uniform distribution over your life—flipping a coin is a good example. So the estimated probability of encountering a uniform distribution in a new situation won’t be all that low.
Indeed, with so much data SI will have built a model of language, and how this maps to mathematics and distributions, and in particular there is a good chance it will have seen a description of quantum mechanics. So if it’s also been provided with information that these will be quantum coin flips, it should predict basically perfectly, including modelling the probability that you’re lying or have simply set up the experiment wrong.
I think what mathemajician means is that if the stream of data is random (in that the bits are independent random variables each with probability 1⁄2 of being 1) then Solomonoff induction converges on the uniform measure with high probability (probability 1, in fact).
I’m sure you knew that already, but you don’t seem to realize that it undercuts the logic behind your claim: