On average you should choose the simpler of two hypotheses compared according to simplicity, and again, on average you should choose the less flubby of two hypotheses compared according to flubbiness.
Or to put this in more real terms: the Razor will be true if it is understood to mean that you should choose the hypothesis that can be described by the shorter program, and it will also be true that you should choose the hypothesis that can be described by a shorter English description.
Yes, these can on particular occasions be opposed to one another, and you would have to ask which rule is better: choose the shorter English description, or choose the shorter program? My proof does not answer this question, but it doesn’t have to, because both rules measure some kind of complexity, and the Razor is true whether it is taken in one way or the other.
Flubbity may not have much to do with complexity. In fact it can be opposed to complexity, except in the limit for extremely complex/flubby hypotheses. For example, you may say that flubbity=1000000-complexity for complexity<1000000, and flubbity=complexity elsewhere. Your proof will go through just fine, but in our world (which probably doesn’t need such huge hypotheses) it will lead to the opposite of Occam’s Razor. You don’t always have the luxury of letting your parameter go to infinity.
The proof follows on average. Naturally you can construct artificial examples that make fun of it, but there can be no proof of the Razor which is not based on averages, since in fact, it happens on various occasions that the more complex hypothesis is more correct than the simpler hypothesis.
I don’t object to the formal correctness of the proof, but the statement it proves is way too weak. Ideally we’d want something that works for complexity but not flubbity. For any Occamian prior you care to build, I can take the first few hypotheses that comprise 99% of its weight, build a new prior that assigns them a weight of 1e-20 combined, and claim it’s just as good as yours by Occamian lights.
If we removed the words “on average” from the formulation of your theorem, we’d have a stronger and more useful statement. Kelly’s work shows an approach to proving it not just “on average”, but for all possible hypothesis lengths.
ETA: I apologize for not objecting to the formal side of things. I just read the proof once again and failed to understand what it even means by “on average”.
I started reading some of Kelly’s work, and it isn’t trying to prove that the less complex hypothesis is more likely to be true, but that by starting from it you converge on the truth more quickly. I’m sure this is right but it isn’t what I was looking for.
Yes, the statement is weak. But this is partly because I wanted a proof which would be 1) valid in all possible worlds ; 2) valid according to every logically consistent assignment of priors. It may be that even with these conditions, a stronger proof is possible. But I’m skeptical that a much stronger proof is possible, because it seems to be logically consistent for someone to say that he assigns a probability of 99% to a hypothesis that has a complexity of 1,000,000, and distributes the remaining 1% among the remaining hypotheses.
This is also why I said “on average.” I couldn’t remove the words “on average” and assert that a more complex statement is always less probable without imposing a condition on the choice of prior which does not seem to be logically necessary. The meaning of “on average” in the statement of the Razor is that in the limit, as the complexity tends to infinity, the probability necessarily tends to zero; given any probability x, say 0.000001 or whatever , there will be some complexity value z such that all statements equal or greater than that complexity value z have a probability less than x.
Why do you want the theorem to hold for every logically consistent prior? This looks backwards. Occamian reasoning should show why some prior distributions work better than others, not say they’re all equally good. For example, the Solomonoff prior is one possible formalization of Occam’s Razor.
Because for every logically consistent prior, there should be a logically possible world where that prior works well. If there isn’t, and you can prove this to me, then I would exclude priors that don’t work well in any possible world.
I want it to apply to every possible world because if we understand the Razor in such a way that it doesn’t apply in every possible world, then the fact that Razor works well is a contingent fact. If this is the case there can’t be any conclusive proof of it, nor does it seem that there can be any ultimate reason why the Razor works well except “we happen to be in one of the possible worlds where it works well.” Yes, there could be many interpretations which are more practical in our actual world, but I was more interested in an interpretation which is necessary in principle.
This is even more backwards. There are logically possible worlds where an overseer god punishes everyone who uses Bayesian updating. Does this mean we should stop doing science? Looking for “non-contingent” facts and “ultimate” reasons strikes me as a very unfruitful area of research.
My point is that if someone has a higher greater for the more complex hypothesis which turns out to be correct, you cannot object to his prior, saying “How did you know that you should use a higher prior,” since people do not justify their priors. Otherwise they wouldn’t be priors.
A major use (if not the whole point,) of occam’s razor is to have a rational basis for priors.
If people don’t have to justify their priors, then why have a process for generating them at all?
If I create an encoding with ‘God’ as a low complexity explanation, would you say I am being rational?
But the point of my question above was that you find out that the more complex hypothesis is correct when you get evidence for it. Juggling your priors is not the way to do it. (in fact it probably invites accidentally counting evidence twice.
On average you should choose the simpler of two hypotheses compared according to simplicity, and again, on average you should choose the less flubby of two hypotheses compared according to flubbiness.
Or to put this in more real terms: the Razor will be true if it is understood to mean that you should choose the hypothesis that can be described by the shorter program, and it will also be true that you should choose the hypothesis that can be described by a shorter English description.
Yes, these can on particular occasions be opposed to one another, and you would have to ask which rule is better: choose the shorter English description, or choose the shorter program? My proof does not answer this question, but it doesn’t have to, because both rules measure some kind of complexity, and the Razor is true whether it is taken in one way or the other.
Flubbity may not have much to do with complexity. In fact it can be opposed to complexity, except in the limit for extremely complex/flubby hypotheses. For example, you may say that flubbity=1000000-complexity for complexity<1000000, and flubbity=complexity elsewhere. Your proof will go through just fine, but in our world (which probably doesn’t need such huge hypotheses) it will lead to the opposite of Occam’s Razor. You don’t always have the luxury of letting your parameter go to infinity.
By Occam’s razor?
The proof follows on average. Naturally you can construct artificial examples that make fun of it, but there can be no proof of the Razor which is not based on averages, since in fact, it happens on various occasions that the more complex hypothesis is more correct than the simpler hypothesis.
I don’t object to the formal correctness of the proof, but the statement it proves is way too weak. Ideally we’d want something that works for complexity but not flubbity. For any Occamian prior you care to build, I can take the first few hypotheses that comprise 99% of its weight, build a new prior that assigns them a weight of 1e-20 combined, and claim it’s just as good as yours by Occamian lights.
If we removed the words “on average” from the formulation of your theorem, we’d have a stronger and more useful statement. Kelly’s work shows an approach to proving it not just “on average”, but for all possible hypothesis lengths.
ETA: I apologize for not objecting to the formal side of things. I just read the proof once again and failed to understand what it even means by “on average”.
I started reading some of Kelly’s work, and it isn’t trying to prove that the less complex hypothesis is more likely to be true, but that by starting from it you converge on the truth more quickly. I’m sure this is right but it isn’t what I was looking for.
Yes, the statement is weak. But this is partly because I wanted a proof which would be 1) valid in all possible worlds ; 2) valid according to every logically consistent assignment of priors. It may be that even with these conditions, a stronger proof is possible. But I’m skeptical that a much stronger proof is possible, because it seems to be logically consistent for someone to say that he assigns a probability of 99% to a hypothesis that has a complexity of 1,000,000, and distributes the remaining 1% among the remaining hypotheses.
This is also why I said “on average.” I couldn’t remove the words “on average” and assert that a more complex statement is always less probable without imposing a condition on the choice of prior which does not seem to be logically necessary. The meaning of “on average” in the statement of the Razor is that in the limit, as the complexity tends to infinity, the probability necessarily tends to zero; given any probability x, say 0.000001 or whatever , there will be some complexity value z such that all statements equal or greater than that complexity value z have a probability less than x.
I will read the article you linked to.
Why do you want the theorem to hold for every logically consistent prior? This looks backwards. Occamian reasoning should show why some prior distributions work better than others, not say they’re all equally good. For example, the Solomonoff prior is one possible formalization of Occam’s Razor.
Because for every logically consistent prior, there should be a logically possible world where that prior works well. If there isn’t, and you can prove this to me, then I would exclude priors that don’t work well in any possible world.
I want it to apply to every possible world because if we understand the Razor in such a way that it doesn’t apply in every possible world, then the fact that Razor works well is a contingent fact. If this is the case there can’t be any conclusive proof of it, nor does it seem that there can be any ultimate reason why the Razor works well except “we happen to be in one of the possible worlds where it works well.” Yes, there could be many interpretations which are more practical in our actual world, but I was more interested in an interpretation which is necessary in principle.
This is even more backwards. There are logically possible worlds where an overseer god punishes everyone who uses Bayesian updating. Does this mean we should stop doing science? Looking for “non-contingent” facts and “ultimate” reasons strikes me as a very unfruitful area of research.
Different people have different interests.
How do you know when that happens?
My point is that if someone has a higher greater for the more complex hypothesis which turns out to be correct, you cannot object to his prior, saying “How did you know that you should use a higher prior,” since people do not justify their priors. Otherwise they wouldn’t be priors.
A major use (if not the whole point,) of occam’s razor is to have a rational basis for priors.
If people don’t have to justify their priors, then why have a process for generating them at all?
If I create an encoding with ‘God’ as a low complexity explanation, would you say I am being rational?
But the point of my question above was that you find out that the more complex hypothesis is correct when you get evidence for it. Juggling your priors is not the way to do it. (in fact it probably invites accidentally counting evidence twice.