This is the shock-level problem. If you let T1, T2, … be the competing theories, and O be the observations, and you choose Ti by maximizing P(Ti | O), and you do this by choosing Ti that maximizes P(O | Ti) * P(Ti),
… then P(O | Ti) can be at most 1; but P(Ti), the prior you assign to theory i, can be arbitrarily low.
In theory, this should be OK. In practice, P(O | Ti) is always near zero, because no theory accounts for all of the observations, and because any particular series of observations is extremely unlikely. Our poor little brains have an underflow error. So in place of P(O | Ti) we put an approximation that is scaled so that P(O | T0), where T0 is our current theory, is pretty large. Given that restriction, there’s no way for P(O | Ti) to be large enough to overcome the low prior P(Ti).
This means that there’s a maximum degree of dissimilarity between Ti and your current theory T0, beyond which the prior you assign Ti will be so low that you should dismiss it out of hand. “Truth” may lie farther away than that from T0.
(I don’t think anyone really thinks this way; so the observed shock level problem must have a non-Bayesian explanation. But one key point, of rescaling priors so your current beliefs look reasonable, may be the same.)
So you need to examine the potentially saner theory a piece at a time. If there’s no way to break the new theory up into independent parts, you may be out of luck.
Consider society transitioning from Catholicism in 1200AD to rationalist materialism. It would have been practically impossible for 1200AD Catholics to take the better theory one piece at a time and verify it, even if they’d been Bayesians. Even a single key idea of materialism would have shattered their entire worldview. The transition was made only through the noise of the Protestant Reformation, which did not move directly towards the eventual goal, but sideways, in a way that fractured Europe’s religioius power structure and shook it out of a local minimum.
This is the shock-level problem. If you let T1, T2, … be the competing theories, and O be the observations, and you choose Ti by maximizing P(Ti | O), and you do this by choosing Ti that maximizes P(O | Ti) * P(Ti),
… then P(O | Ti) can be at most 1; but P(Ti), the prior you assign to theory i, can be arbitrarily low.
In theory, this should be OK. In practice, P(O | Ti) is always near zero, because no theory accounts for all of the observations, and because any particular series of observations is extremely unlikely. Our poor little brains have an underflow error. So in place of P(O | Ti) we put an approximation that is scaled so that P(O | T0), where T0 is our current theory, is pretty large. Given that restriction, there’s no way for P(O | Ti) to be large enough to overcome the low prior P(Ti).
This means that there’s a maximum degree of dissimilarity between Ti and your current theory T0, beyond which the prior you assign Ti will be so low that you should dismiss it out of hand. “Truth” may lie farther away than that from T0.
(I don’t think anyone really thinks this way; so the observed shock level problem must have a non-Bayesian explanation. But one key point, of rescaling priors so your current beliefs look reasonable, may be the same.)
So you need to examine the potentially saner theory a piece at a time. If there’s no way to break the new theory up into independent parts, you may be out of luck.
Consider society transitioning from Catholicism in 1200AD to rationalist materialism. It would have been practically impossible for 1200AD Catholics to take the better theory one piece at a time and verify it, even if they’d been Bayesians. Even a single key idea of materialism would have shattered their entire worldview. The transition was made only through the noise of the Protestant Reformation, which did not move directly towards the eventual goal, but sideways, in a way that fractured Europe’s religioius power structure and shook it out of a local minimum.