Just a pointer, your description of the prior here is not correct when using Kolmogorov complexity (or Solomonoff induction).
For each sequence of bits A, you need to take as prior P(A) = 1 / (2 ^ K(A)) where K(A) is the length of the shortest binary program whose output is the sequence A.
On a technical note: the programs need to form a “prefix-free” set, or this prior will not give you a converging sum of probabilities. One way to do that is to encode a special “END” symbol at the end of a program, so that no program is a prefix of any other. Another approach is to declare total length of the program at the start of the program, so that the compiler (or interpreter) knows where to stop.
Solomonoff induction is not quite the same, but is very similar. In this case, we imagine giving a computer completely random program and data, with each bit of inout generated independently and with equal probability of being 0 or 1, and consider the probability that it will halt with output string “A”; then this is the prior Ps(A). These priors can be shown to be equivalent up to a small multiplicative factor. It’s fairly easy to see why the Solomonoff prior Ps(A) is always a bit bigger than the Kolmogorov complexity prior P(A) (since the computer might have been given the very shortest program for generating A as random input). It takes more effort to show that it is not very much bigger, and that the difference doesn’t matter in practice.
Following that technical note, I’ll address your question below about why use it.
Just a pointer, your description of the prior here is not correct when using Kolmogorov complexity (or Solomonoff induction).
For each sequence of bits A, you need to take as prior P(A) = 1 / (2 ^ K(A)) where K(A) is the length of the shortest binary program whose output is the sequence A.
On a technical note: the programs need to form a “prefix-free” set, or this prior will not give you a converging sum of probabilities. One way to do that is to encode a special “END” symbol at the end of a program, so that no program is a prefix of any other. Another approach is to declare total length of the program at the start of the program, so that the compiler (or interpreter) knows where to stop.
Solomonoff induction is not quite the same, but is very similar. In this case, we imagine giving a computer completely random program and data, with each bit of inout generated independently and with equal probability of being 0 or 1, and consider the probability that it will halt with output string “A”; then this is the prior Ps(A). These priors can be shown to be equivalent up to a small multiplicative factor. It’s fairly easy to see why the Solomonoff prior Ps(A) is always a bit bigger than the Kolmogorov complexity prior P(A) (since the computer might have been given the very shortest program for generating A as random input). It takes more effort to show that it is not very much bigger, and that the difference doesn’t matter in practice.
Following that technical note, I’ll address your question below about why use it.