You move from simpler hypotheses to more complex hypotheses for the same reason that you count from small numbers to big numbers.
Try imagining what counting the natural numbers “in the opposite order” would look like.
Of course, you can have large wiggles. For example, you might alternate jumping up to the next power of two and counting backwards. But using different representations for hypotheses leads to the same sort of wiggles in Occam’s Razor.
The best justification I’ve heard for believing simple hypotheses is an argument from probability.
Consider some event caused by a certain block. We know the block’s color must be either red, yellow, blue, or green; its shape must be either square, round, or triangular; its material must be either wood or metal.
We come up with two theories about the event. Both theories explain the event adequately:
The event was caused by the block being made of wood.
The event was caused by the block being blue, and triangular, and and made of metal.
Before the event happens, there are twenty four different possibile configurations of the block. “Made of wood” is true of twelve configurations, “blue, triangular, and made of metal” is true of one configuration.
After the event, we dismiss all configurations except these thirteen under which we believe the event was possible. We assume all of these thirteen are equally likely. Therefore, there’s a 12⁄13 chance that the block is made of wood and a 1⁄13 chance the block is blue, triangular, and made of metal.
Therefore, Theory 1 is twelve times more likely than Theory 2.
The same principle is at work any time you have a simple theory competing with a more complex theory. Because the complicated theory has more preconditions that have to be just right, it has a lower prior probability relative to the simple theory, and since the occurence of the event adjusts the probabilities of both theories equally, it has a lower posterior probability.
I know I read this explanation first on a discussion of Kolmogorov complexity on someone’s rationality blog, but I can’t remember who’s or what the link was. If I stole your explanation, please step up and take credit.
You move from simpler hypotheses to more complex hypotheses for the same reason that you count from small numbers to big numbers.
Try imagining what counting the natural numbers “in the opposite order” would look like.
Of course, you can have large wiggles. For example, you might alternate jumping up to the next power of two and counting backwards. But using different representations for hypotheses leads to the same sort of wiggles in Occam’s Razor.
The best justification I’ve heard for believing simple hypotheses is an argument from probability.
Consider some event caused by a certain block. We know the block’s color must be either red, yellow, blue, or green; its shape must be either square, round, or triangular; its material must be either wood or metal.
We come up with two theories about the event. Both theories explain the event adequately:
The event was caused by the block being made of wood.
The event was caused by the block being blue, and triangular, and and made of metal.
Before the event happens, there are twenty four different possibile configurations of the block. “Made of wood” is true of twelve configurations, “blue, triangular, and made of metal” is true of one configuration.
After the event, we dismiss all configurations except these thirteen under which we believe the event was possible. We assume all of these thirteen are equally likely. Therefore, there’s a 12⁄13 chance that the block is made of wood and a 1⁄13 chance the block is blue, triangular, and made of metal.
Therefore, Theory 1 is twelve times more likely than Theory 2.
The same principle is at work any time you have a simple theory competing with a more complex theory. Because the complicated theory has more preconditions that have to be just right, it has a lower prior probability relative to the simple theory, and since the occurence of the event adjusts the probabilities of both theories equally, it has a lower posterior probability.
I know I read this explanation first on a discussion of Kolmogorov complexity on someone’s rationality blog, but I can’t remember who’s or what the link was. If I stole your explanation, please step up and take credit.