If I explain a phenomenon using 10 postulates (of some fixed length) and you explain it using 10,000,000,000, your theory gets demoted (even if we don’t know anything else about the two theories) because it has more ways to go wrong. If you accept that this is true in a big way in the extreme case, you should accept that it is true in a small way in more mild cases (e.g., 10 postulates vs. 20, or 10 postulates vs. 100).
I like to think of it as an extension of the conjunction fallacy; the probability of A and B being true can’t be higher than the probability of either A or B; adding new conditions can only make the probability stay the same or go down. So the probability of a theory once it has an extra postulate, must be equal to or lower than the probability of the same theory with fewer postulates. Of course, that assumes the independence of the postulates.
The probability of the postulates all being true goes down as you add postulates. The probability of the theory being correct given the postulates may go up.
This assumes the postulates are interdependent such that the theory may be true with all postulates, but false with all postulates save one. In this case, the theories are the same except for the collapse postulate, which may or may not have any real-world consequences, depending on whether you believe decoherence accounts for the appearance of collapse all by itself.
Not only it assumes independence, it also assumes that the two competing theories have exactly the same postulates except for a single extra one. That is typically not how things work in real life.
I like to think of it as an extension of the conjunction fallacy; the probability of A and B being true can’t be higher than the probability of either A or B; adding new conditions can only make the probability stay the same or go down.
Among theories that explain the evidence equally well, those with fewer postulates are more probable. This is a strict conclusion of information theory. Further, we can trade explanatory power for theoretical complexity in a well-defined way: minimum message length. Occam’s Razor is not just “a convenient heuristic.”
Heh. I think you’re trying to generalize a narrow result way too much. Especially when we are not talking about compression ratios, but things like “explanatory power” which is quite different from getting to the shortest bit string.
Let’s take a real example which was discussed on the LW recently: the heliocentrism debates in Renaissance Europe, for example between Copernicus and Kepler, pre-Galileo (see e.g. here). Show me how the MML theory is relevant to this choice between two competing theories.
Kepler’s heliocentric theory is a direct result of Newtonian mechanics and gravitation, equations which can be encoded very simply and require few parameters to achieve accurate predictions for the planetary orbits. Copernicus’ theory improved over Ptolemy’s geocentric theory by using the same basic model for all the planetary orbits (instead of a different model for each) and naturally handling the appearance of retrograde motion. However, it still required numerous epicycles in order to make accurate predictions, because Copernicus constrained the theory to use only perfect circular motion. Allowing elliptical motion would have made the basic model slightly more complex, but would have drastically reduced the amount of necessary parameters and corrections. That’s exactly the tradeoff described by MML.
The dozens of epicycles aren’t on a par with Kepler’s laws. “Planets move in circles plus epicycles” is what you have to compare with Kepler’s laws. “Such-and-such a planet moves in such-and-such a circle plus such-and-such epicycles” is parallel not to Kepler’s laws themselves but to “Such-and-such a planet moves in such-and-such an ellipse, apart from such-and-such further corrections”. If some epicycles are needed in the first case, but no corrections in the second, then Kepler wins. If you need to add corrections to the Keplerian model, either might come out ahead.
(Why would you need corrections in the Keplerian model? Inaccurate observations. Gravitational influences of one planet on another—this is how Neptune was discovered.)
I have heard that Copernican astronomy (circles centred on the sun, plus corrections) ended up needing more epicycles than Ptolemaic (circles centred on the earth, plus corrections) for reasons I don’t know. I think Kepler’s system needed much less correction, but don’t know the details.
I don’t see why “fewer postulates” makes something “more likely”. Occam’s Razor is not a natural law, it’s a convenient heuristic for human minds.
“For every complex problem there is an answer that is clear, simple, and wrong.”—H. L. Mencken
If I explain a phenomenon using 10 postulates (of some fixed length) and you explain it using 10,000,000,000, your theory gets demoted (even if we don’t know anything else about the two theories) because it has more ways to go wrong. If you accept that this is true in a big way in the extreme case, you should accept that it is true in a small way in more mild cases (e.g., 10 postulates vs. 20, or 10 postulates vs. 100).
I like to think of it as an extension of the conjunction fallacy; the probability of A and B being true can’t be higher than the probability of either A or B; adding new conditions can only make the probability stay the same or go down. So the probability of a theory once it has an extra postulate, must be equal to or lower than the probability of the same theory with fewer postulates. Of course, that assumes the independence of the postulates.
The probability of the postulates all being true goes down as you add postulates. The probability of the theory being correct given the postulates may go up.
This assumes the postulates are interdependent such that the theory may be true with all postulates, but false with all postulates save one. In this case, the theories are the same except for the collapse postulate, which may or may not have any real-world consequences, depending on whether you believe decoherence accounts for the appearance of collapse all by itself.
Not only it assumes independence, it also assumes that the two competing theories have exactly the same postulates except for a single extra one. That is typically not how things work in real life.
Er, no it doesn’t. Where are you getting this?
From here:
Among theories that explain the evidence equally well, those with fewer postulates are more probable. This is a strict conclusion of information theory. Further, we can trade explanatory power for theoretical complexity in a well-defined way: minimum message length. Occam’s Razor is not just “a convenient heuristic.”
Could you demonstrate this, please?
The linked Wikipedia page provides a succinct derivation from Shannon and Bayes’ Theorem.
Heh. I think you’re trying to generalize a narrow result way too much. Especially when we are not talking about compression ratios, but things like “explanatory power” which is quite different from getting to the shortest bit string.
Let’s take a real example which was discussed on the LW recently: the heliocentrism debates in Renaissance Europe, for example between Copernicus and Kepler, pre-Galileo (see e.g. here). Show me how the MML theory is relevant to this choice between two competing theories.
Kepler’s heliocentric theory is a direct result of Newtonian mechanics and gravitation, equations which can be encoded very simply and require few parameters to achieve accurate predictions for the planetary orbits. Copernicus’ theory improved over Ptolemy’s geocentric theory by using the same basic model for all the planetary orbits (instead of a different model for each) and naturally handling the appearance of retrograde motion. However, it still required numerous epicycles in order to make accurate predictions, because Copernicus constrained the theory to use only perfect circular motion. Allowing elliptical motion would have made the basic model slightly more complex, but would have drastically reduced the amount of necessary parameters and corrections. That’s exactly the tradeoff described by MML.
Not for Kepler who lived about a century before Newton.
My question was about the Copernicus—Kepler debates and Newtonian mechanics were quite unknown at that point.
Even Kepler’s theory expressed as his three separate laws is much simpler than a theory with dozens of epicycle.
The dozens of epicycles aren’t on a par with Kepler’s laws. “Planets move in circles plus epicycles” is what you have to compare with Kepler’s laws. “Such-and-such a planet moves in such-and-such a circle plus such-and-such epicycles” is parallel not to Kepler’s laws themselves but to “Such-and-such a planet moves in such-and-such an ellipse, apart from such-and-such further corrections”. If some epicycles are needed in the first case, but no corrections in the second, then Kepler wins. If you need to add corrections to the Keplerian model, either might come out ahead.
(Why would you need corrections in the Keplerian model? Inaccurate observations. Gravitational influences of one planet on another—this is how Neptune was discovered.)
I have heard that Copernican astronomy (circles centred on the sun, plus corrections) ended up needing more epicycles than Ptolemaic (circles centred on the earth, plus corrections) for reasons I don’t know. I think Kepler’s system needed much less correction, but don’t know the details.