Not in the variant I described… which probably means it violates some precondition of the optimality proof and that I shouldn’t call it a Solomonoff Inductor. It still makes predictions by weighting and eliminating programs, but in too simplistic a way.
Let X be the sequence of bits so far an y be the next bit to predict. SI (in the specific version we are discussing) looks for short programs which output Xy and then halt.
If X is empty then the output is just the initial bias of the inductor. In your example it will presumably output a 0 because for any program length it has more programs which output 0 and halt than programs which output 1 and halt (assuming that if you removed the “2″ opcode it would get a roughly half split). If X is not empty, there are programs made of a prefix P which just computes X followed by some suffix S which computes y. If we restrict the inductor to these programs it will just keep outputting whatever its initial bias was, and not learn anything. But the Solomonoff inductor is not restricted to such programs. Any finite X of length n is the prefix of some (actually, infinitely many) computable infinite sequences. Let Z_X be a non-halting program which generates one of these infinite sequences. We can therefore generate X by the program: ”Run Z_X in an interpreter until it has emitted n bits, then halt” And we can extend this to also generate the next bit y as: ”Run Z_X in an interpreter until it has emitted n+1 bits, then halt” note that the length difference between these two program is less than one bit on average, because encoding n takes log(n)+log(log(n)) bits. In the previous case, on the other hand, the length difference between P and PS is always at least one bit. Therefore, as n increases, the Solomonoff inductor increasingly gives preference to the second type of programs, which notably don’t contain your dreaded “2” opcode.
The point of SI is that the bias inherent with the choice of the universal Turing machine is washed out as more data are observed.
Not in the variant I described… which probably means it violates some precondition of the optimality proof and that I shouldn’t call it a Solomonoff Inductor. It still makes predictions by weighting and eliminating programs, but in too simplistic a way.
I don’t think so.
Let X be the sequence of bits so far an y be the next bit to predict. SI (in the specific version we are discussing) looks for short programs which output Xy and then halt.
If X is empty then the output is just the initial bias of the inductor. In your example it will presumably output a 0 because for any program length it has more programs which output 0 and halt than programs which output 1 and halt (assuming that if you removed the “2″ opcode it would get a roughly half split).
If X is not empty, there are programs made of a prefix P which just computes X followed by some suffix S which computes y. If we restrict the inductor to these programs it will just keep outputting whatever its initial bias was, and not learn anything.
But the Solomonoff inductor is not restricted to such programs.
Any finite X of length n is the prefix of some (actually, infinitely many) computable infinite sequences. Let Z_X be a non-halting program which generates one of these infinite sequences. We can therefore generate X by the program:
”Run Z_X in an interpreter until it has emitted n bits, then halt”
And we can extend this to also generate the next bit y as:
”Run Z_X in an interpreter until it has emitted n+1 bits, then halt”
note that the length difference between these two program is less than one bit on average, because encoding n takes log(n)+log(log(n)) bits. In the previous case, on the other hand, the length difference between P and PS is always at least one bit.
Therefore, as n increases, the Solomonoff inductor increasingly gives preference to the second type of programs, which notably don’t contain your dreaded “2” opcode.