What is the rationale for considering some machines and not others?
Because we want to measure the information content of the string, not some crazy complex reference machine. That’s why a tiny reference machine is used. In terms of inductive inference, when you say that the bound is infinitely large, what you’re saying is that you don’t believe in Occam’s razor. In which case the whole Bayesian system can get weird. For example, if you have an arbitrarily strong prior belief that most of the world is full of purple chickens from Andromeda galaxy, well, Bayes’ rule is not going to help you much. What you want is an uninformative prior distribution, or equivalently over computable distributions, a very simple reference machine.
Thanks to the rapid convergence of the posterior from a universal prior, that 2^100 factor is small for any moderate amount of data. Just look at the bound equation.
These things are not glossed over. Read the mathematical literature on the subject, it’s all there.
Tim:
What is the rationale for considering some machines and not others?
Because we want to measure the information content of the string, not some crazy complex reference machine. That’s why a tiny reference machine is used. In terms of inductive inference, when you say that the bound is infinitely large, what you’re saying is that you don’t believe in Occam’s razor. In which case the whole Bayesian system can get weird. For example, if you have an arbitrarily strong prior belief that most of the world is full of purple chickens from Andromeda galaxy, well, Bayes’ rule is not going to help you much. What you want is an uninformative prior distribution, or equivalently over computable distributions, a very simple reference machine.
Thanks to the rapid convergence of the posterior from a universal prior, that 2^100 factor is small for any moderate amount of data. Just look at the bound equation.
These things are not glossed over. Read the mathematical literature on the subject, it’s all there.