One way to evaluate a Bayesian approach to science is to see how it has
fared in other domains where it is already being applied. For instance,
statistical approaches to machine translation have done surprisingly well
compared to rule-based approaches. However, a paper by Franz Josef Och
(one of the founders of statistical machine translation) shows that probabilistic
approaches do not always perform as well as non-probabilistic (but still
statistical) approaches. Basically, maximizing the likelihood of a machine
translation system produces results that are significantly worse than
directly minimizing the error. The general principle is that you should
maximize the function that is closest to the criteria that you most care
about. Maximizing the probability of a system won’t give you good results
if what you really care about is minimizing the error.
By analogy, maximizing the likelihood of scientific hypotheses may lead to
different results from minimizing the error. Currently, science tries to
minimize the error—it is always trying to disprove bad hypotheses
through experimentation. The best hypotheses are the ones left standing.
If science switched to maximizing the likelihood of the best hypotheses,
this might lead to unintended consequences. For instance, it might be
easier to maximize the probability of your pet hypothesis by refining your
priors rather than by seeking experiments that could potentially disprove it.
That’s an interesting notion. I don’t see how Bayesian reasoning is restricted to trying to maximize the likelihood of the ‘best’ theory’. One of its crowning achievements is to avoid talking just about the best theory and using the full ensemble at all times. You’re perfectly free to ask any question of the ensemble. This includes ‘Which response minimizes some error function?’
One way to evaluate a Bayesian approach to science is to see how it has fared in other domains where it is already being applied. For instance, statistical approaches to machine translation have done surprisingly well compared to rule-based approaches. However, a paper by Franz Josef Och (one of the founders of statistical machine translation) shows that probabilistic approaches do not always perform as well as non-probabilistic (but still statistical) approaches. Basically, maximizing the likelihood of a machine translation system produces results that are significantly worse than directly minimizing the error. The general principle is that you should maximize the function that is closest to the criteria that you most care about. Maximizing the probability of a system won’t give you good results if what you really care about is minimizing the error.
By analogy, maximizing the likelihood of scientific hypotheses may lead to different results from minimizing the error. Currently, science tries to minimize the error—it is always trying to disprove bad hypotheses through experimentation. The best hypotheses are the ones left standing. If science switched to maximizing the likelihood of the best hypotheses, this might lead to unintended consequences. For instance, it might be easier to maximize the probability of your pet hypothesis by refining your priors rather than by seeking experiments that could potentially disprove it.
That’s an interesting notion. I don’t see how Bayesian reasoning is restricted to trying to maximize the likelihood of the ‘best’ theory’. One of its crowning achievements is to avoid talking just about the best theory and using the full ensemble at all times. You’re perfectly free to ask any question of the ensemble. This includes ‘Which response minimizes some error function?’