Cyan comments on Where did mathematics begin to disagree between frequentist and Bayesian statistics, and why?

Cyan 14 Jul 2012 4:44 UTC
44 points
The development of mathematics isn’t relevant. The question was (and continues to be) what constitutes a valid and/or useful operationalization of mathematical probability.

It was always clear that the relative frequency of events in some suitably defined “random experiment” obeys the probability axioms (even though those axioms weren’t spelled out until Kolmogorov got around to it). John Venn of Venn diagram fame was the first influential promoter of the idea that probability should be restricted to just relative frequency. I think the notion was to exclude anything unobservable. Prior to that, people had treated mathematical probability as equivalent to the colloquial notion of probability without any particular justification.

In the early part of the 20th century, frequentists statistics provided a framework that seemed to permit reasonably well-principled data analysis while excluding subjective or nonsensical prior probability distributions from the scene. The result was a bit of a grab-bag, but practising scientists didn’t have to worry about that—they just consulted with “statisticians”, the newly trained class of professionals whose job it was to know which data analysis recipe should be followed.

Meanwhile, defenders of the “inverse” probability approach (as Bayesian statistics was then known) got busy providing justifications. Bruno de Finetti provided foundations in terms of coherence, which means immunity to Dutch books. Harold Jeffreys took an axiomatic approach. L. J. Savage also took an axiomatic approach, but in contrast to Jeffreys, Savage’s approach was in terms of rational preferences and mixed rational inference and rational decision making together. Perhaps the cleanest approach (and my personal favorite) was that of R. T. Cox (warning: pdf).

Interest in Bayes was revived in the frequentist community by a result in frequentist statistical decision theory known as the complete class theorem. It showed that (subject to some weak conditions) the class of estimators with a certain desirable property called “admissibility” was exactly the class of Bayes estimators. That plus Savage’s work on Bayesian foundations lead to a resurgence of interest in Bayesian statistics. But Bayesian statistics only really started to gain ground in the early 90s, when improvements in computing power made practical a class of algorithms collectively called Markov chain Monte Carlo. Suddenly problems that had never been tractable before became doable—complex, high dimensional non-linear models beyond the mathematical reach of frequentist approaches became practical to analyze by Bayesian methods.