How is it that Solomonoff Induction, and by extension Occam’s Razor, is justified in the first place? Why is it that hypotheses with higher Kolmogorov complexity are less likely to be true than those with lower Kolmogorov complexity? If it is justified by that fact that it has “worked” in the past, does that not require Solomonoff induction to justify that has worked, in the sense that you need to verify that your memories are true, and thus requires circular reasoning?
See: You only need faith in two things and the comment on the binomial monkey prior (a theory which says that the ‘past’ does not predict the ‘future’).
You could argue that there exists a more fundamental assumption, hidden in the supposed rules of probability, about the validity of the evidence you’re updating on. Here I can only reply that we’re trying to explain the data regardless of whether or not it “is true,” and point to the fact that you’re clearly willing to act like this endeavor has value.
There are more hypotheses with a high complexity than with a low complexity, so it is mathematically necessary to assign lower probabilities to high complexity cases than to low complexity cases (broadly speaking and in general—obviously you can make particular exceptions) if you want your probabilities to sum to 1, because you are summing an infinite series, and to get it to come to a limit, the terms in the series must be generally decreasing.
But in the infinite series of possibilities summing to 1, why should the hypotheses with the highest probability be the ones with the lowest complexity, as opposed to having each consecutive hypothesis having an arbitrary complexity level?
Almost all hypotheses have high complexity. Therefore most high-complexity hypotheses must have low probability.
(To put it differently: let p(n) be the total probability of all hypotheses with complexity n, where I assume we’ve defined complexity in some way that makes it always a positive integer. Then the sum of the p(n) converges, which implies that the p(n) tend to 0. So for large n the total probability of all hypotheses of complexity n must be small, never mind the probability of any particular one.)
Note: all this tells you only about what happens in the limit. It’s all consistent with there being some particular high-complexity hypotheses with high probability.
But why should the probability for lower-complexity hypotheses be any lower?
It shouldn’t, it should be higher.
If you just meant ”… be any higher?” then the answer is that if the probabilities of the higher-complexity hypotheses tend to zero, then for any particular low-complexity hypothesis H all but finitely many of the higher-complexity hypotheses have lower probability. (That’s just part of what “tending to zero” means.)
How is it that Solomonoff Induction, and by extension Occam’s Razor, is justified in the first place? Why is it that hypotheses with higher Kolmogorov complexity are less likely to be true than those with lower Kolmogorov complexity? If it is justified by that fact that it has “worked” in the past, does that not require Solomonoff induction to justify that has worked, in the sense that you need to verify that your memories are true, and thus requires circular reasoning?
See: You only need faith in two things and the comment on the binomial monkey prior (a theory which says that the ‘past’ does not predict the ‘future’).
You could argue that there exists a more fundamental assumption, hidden in the supposed rules of probability, about the validity of the evidence you’re updating on. Here I can only reply that we’re trying to explain the data regardless of whether or not it “is true,” and point to the fact that you’re clearly willing to act like this endeavor has value.
There are more hypotheses with a high complexity than with a low complexity, so it is mathematically necessary to assign lower probabilities to high complexity cases than to low complexity cases (broadly speaking and in general—obviously you can make particular exceptions) if you want your probabilities to sum to 1, because you are summing an infinite series, and to get it to come to a limit, the terms in the series must be generally decreasing.
But in the infinite series of possibilities summing to 1, why should the hypotheses with the highest probability be the ones with the lowest complexity, as opposed to having each consecutive hypothesis having an arbitrary complexity level?
Almost all hypotheses have high complexity. Therefore most high-complexity hypotheses must have low probability.
(To put it differently: let p(n) be the total probability of all hypotheses with complexity n, where I assume we’ve defined complexity in some way that makes it always a positive integer. Then the sum of the p(n) converges, which implies that the p(n) tend to 0. So for large n the total probability of all hypotheses of complexity n must be small, never mind the probability of any particular one.)
Note: all this tells you only about what happens in the limit. It’s all consistent with there being some particular high-complexity hypotheses with high probability.
But why should the probability for lower-complexity hypotheses be any lower?
It shouldn’t, it should be higher.
If you just meant ”… be any higher?” then the answer is that if the probabilities of the higher-complexity hypotheses tend to zero, then for any particular low-complexity hypothesis H all but finitely many of the higher-complexity hypotheses have lower probability. (That’s just part of what “tending to zero” means.)
gjm’s explanation is correct.