Second given that the probability is reduced by half with every bit of added algorithm length wouldn’t that imply that algorithms’ having 1 bit were the most likely and have a probability of 1⁄2. In fact I doubt if any algorithm at all is describable with 1 bit.
Keep in mind that the Solomonoff induction is defined up to the choice of a prefix-free universal Turing machine. There exist some of these UTMs that can represent a valid program with 1 bit, but this isn’t particularly relevant: Each UTM leads to a different Solomonoff prior. Solomonoff induction is universal only in an asymptotic sense: as you accumulate more and more bits of evidence, the posterior distributions converge with high probability.
Probability estimates dominated by short programs are biased by the choice of the UTM. As more evidence accumulates, the likelihood of a short program being consistent with all of it decreases (intuitively, by the pigeonhole principle), and the probability estimate will be dominated by longer and longer programs, “washing out” the initial bias.
Why do they need to be ranked? Why not assign them all probability 1/N where N = 2^(n+1) − 2, the number of algorithms having length up to and including length n.
In order to normalize properly, you have to divide the raw Solomonoff measure by the Chaitin’s omega of that particular UTM, that is, the probability that a randomly generated program halts.
Keep in mind that the Solomonoff induction is defined up to the choice of a prefix-free universal Turing machine. There exist some of these UTMs that can represent a valid program with 1 bit, but this isn’t particularly relevant:
Each UTM leads to a different Solomonoff prior. Solomonoff induction is universal only in an asymptotic sense: as you accumulate more and more bits of evidence, the posterior distributions converge with high probability.
Probability estimates dominated by short programs are biased by the choice of the UTM. As more evidence accumulates, the likelihood of a short program being consistent with all of it decreases (intuitively, by the pigeonhole principle), and the probability estimate will be dominated by longer and longer programs, “washing out” the initial bias.
In order to normalize properly, you have to divide the raw Solomonoff measure by the Chaitin’s omega of that particular UTM, that is, the probability that a randomly generated program halts.