Interesting question. If you have a countable infinity of mutually exclusive explanations (e.g. they are all finite strings using letters from some finite alphabet), then your only constraint is that the infinite sum of all their prior probabilities must converge to 1. Otherwise you’re free to choose. You could make the convergence really fast (say, by making the prior of a hypothesis inversely proportional to the exponent of the exponent of its length), or slower if you wish to. A very natural and popular choice is restricting the hypotheses to form a “prefix-free set” (no hypothesis can begin with another shorter hypothesis) and then assigning every hypothesis of N bits a prior of 2^-N, which makes the sum converge by Kraft’s inequality.
Apart from giving a simple formula for the prior, it comes in handy in other theoretical constructions. For example, if you have a “universal Turing machine” (a computer than can execute arbitrary programs) and feed it an infinite input stream of bits, perhaps coming from a random source because you intend to “execute a random program”… then it needs to know where the program ends. You could introduce an end-of-program marker, but a more general solution is to make valid programs form a prefix-free set, so that when the machine has finished reading a valid program, it knows that reading more bits won’t result in a longer but still valid program. (Note that adding an end-of-program marker is one of the ways to make your set of programs prefix-free!)
Overall this is a nice example of an idea that “just smells good” to a mathematician’s intuition.
Ah! I must have had a brain-stnank—this makes total sense in math / theoretical CS terms, I was substituting an incorrect interpretation of “hypothesis” when reading the comment out of context. Thanks :)
Interesting question. If you have a countable infinity of mutually exclusive explanations (e.g. they are all finite strings using letters from some finite alphabet), then your only constraint is that the infinite sum of all their prior probabilities must converge to 1. Otherwise you’re free to choose. You could make the convergence really fast (say, by making the prior of a hypothesis inversely proportional to the exponent of the exponent of its length), or slower if you wish to. A very natural and popular choice is restricting the hypotheses to form a “prefix-free set” (no hypothesis can begin with another shorter hypothesis) and then assigning every hypothesis of N bits a prior of 2^-N, which makes the sum converge by Kraft’s inequality.
What is the reasoning behind using a prefix-free set?
Apart from giving a simple formula for the prior, it comes in handy in other theoretical constructions. For example, if you have a “universal Turing machine” (a computer than can execute arbitrary programs) and feed it an infinite input stream of bits, perhaps coming from a random source because you intend to “execute a random program”… then it needs to know where the program ends. You could introduce an end-of-program marker, but a more general solution is to make valid programs form a prefix-free set, so that when the machine has finished reading a valid program, it knows that reading more bits won’t result in a longer but still valid program. (Note that adding an end-of-program marker is one of the ways to make your set of programs prefix-free!)
Overall this is a nice example of an idea that “just smells good” to a mathematician’s intuition.
Ah! I must have had a brain-stnank—this makes total sense in math / theoretical CS terms, I was substituting an incorrect interpretation of “hypothesis” when reading the comment out of context. Thanks :)