Yes, there is a connection between “the length of the shortest program that generates exactly the message coming from the environment” (the Kolmogorov Complexity of this message ) and “unknown but computable probabililty distribution”.
“unknown but computable” means there is a program, but we don’t know anything about it, including its size. There is an infinite number of possible programs, and an infinite number of lengths of such programs. But we know that the sum of all the probabilities from “the probability distribution” of all programs that could generate the environment must be exactly equal to one, as per Math’s definition of probability.
Let me quote Shane Legg about it:
“Over a denumerable (infinite but countable) space you mathematically can’t put a non-zero uniform prior because it will always sum to more than one. Thus, some elements have to have higher prior probability than others. Now if you order these elements according to some complexity measure (any total ordering will do, so the choice of complexity function makes no difference), then the boundedness of the probability sum means that the supremium of the sequence converges to zero. Thus, for very general learning systems, and for any complexity measure you like, you can only violate Occam’s razor locally. Eventually, as things get more complex, their probabilities must decrease.
Yes, there is a connection between “the length of the shortest program that generates exactly the message coming from the environment” (the Kolmogorov Complexity of this message ) and “unknown but computable probabililty distribution”.
“unknown but computable” means there is a program, but we don’t know anything about it, including its size. There is an infinite number of possible programs, and an infinite number of lengths of such programs. But we know that the sum of all the probabilities from “the probability distribution” of all programs that could generate the environment must be exactly equal to one, as per Math’s definition of probability.
Let me quote Shane Legg about it:
“Over a denumerable (infinite but countable) space you mathematically can’t put a non-zero uniform prior because it will always sum to more than one. Thus, some elements have to have higher prior probability than others. Now if you order these elements according to some complexity measure (any total ordering will do, so the choice of complexity function makes no difference), then the boundedness of the probability sum means that the supremium of the sequence converges to zero. Thus, for very general learning systems, and for any complexity measure you like, you can only violate Occam’s razor locally. Eventually, as things get more complex, their probabilities must decrease.
In summary: if you want to do pure Bayesian learning over very large spaces you must set a prior that globally respects a kind of Occam’s razor. It’s not a matter of belief, it’s a mathematical necessity.” http://www.vetta.org/2011/06/treatise-on-universal-induction/comment-page-1/#comment-21408
thanks for the elaboration and the link.