In order for the prior probabilities you assign to programs to be well-formed, they must limit to 0 as the length of the program goes to infinity. Otherwise the probabilities won’t add up to 1.
No matter what prior you choose, you’ll always end up with a variant of Occam’s Razor where programs beyond a particular length are always assigned very low probabilities.
You don’t necessarily have to decrease proportionally to 2^(-length), instead of say 3^(-length) or 1/Ackermann(length), like the Solomonoff prior does. However, since each additional bit lets you identify a value from a space that’s twice as big (i.e. the precision grows like 2^length), it seems like a natural choice to penalize by that much.
(Personally, I prefer a prior that also lightly penalizes for running time. That removes computability issues and makes the induction approximable by dove-tailing.)
In order for the prior probabilities you assign to programs to be well-formed, they must limit to 0 as the length of the program goes to infinity. Otherwise the probabilities won’t add up to 1.
No matter what prior you choose, you’ll always end up with a variant of Occam’s Razor where programs beyond a particular length are always assigned very low probabilities.
You don’t necessarily have to decrease proportionally to 2^(-length), instead of say 3^(-length) or 1/Ackermann(length), like the Solomonoff prior does. However, since each additional bit lets you identify a value from a space that’s twice as big (i.e. the precision grows like 2^length), it seems like a natural choice to penalize by that much.
(Personally, I prefer a prior that also lightly penalizes for running time. That removes computability issues and makes the induction approximable by dove-tailing.)