I’ve just started reading the singular learning theory “green book”, a.k.a. Mathematical Theory of Bayesian Statistics by Watanabe. The experience has helped me to articulate the difference between two kinds of textbooks (and viewpoints more generally) on Bayesian statistics. I’ll call one of them “second-language Bayesian”, and the other “native Bayesian”.
Second-language Bayesian texts start from the standard frame of mid-twentieth-century frequentist statistics (which I’ll call “classical” statistics). It views Bayesian inference as a tool/technique for answering basically-similar questions and solving basically-similar problems to classical statistics. In particular, they typically assume that there’s some “true distribution” from which the data is sampled independently and identically. The core question is then “Does our inference technique converge to the true distribution as the number of data points grows?” (or variations thereon, like e.g. “Does the estimated mean converge to the true mean”, asymptotics, etc). The implicit underlying assumption is that convergence to the true distribution as the number of (IID) data points grows is the main criterion by which inference methods are judged; that’s the main reason to choose one method over another in the first place.
Watanabe’s book is pretty explicitly second-language Bayesian. I also remember Gelman & co’s Bayesian Data Analysis textbook being second-language Bayesian, although it’s been a while so I could be misremembering. In general, as the name suggests, second-language Bayesianism seems to be the default among people who started with a more traditional background in statistics or learning theory, then picked up Bayesianism later on.
In contrast, native Bayesian texts justify Bayesian inference via Cox’ theorem, dutch book theorems, or one among the long tail of similar theorems. “Does our inference technique converge to the ‘true distribution’ as the number of data points grows?” is not the main success criterion in the first place (in fact a native Bayesian would raise an eyebrow at the entire concept of a “true distribution”), so mostly the question of convergence just doesn’t come up. Insofar as it does come up, it’s an interesting but not particularly central question, mostly relevant to numerical approximation methods. Instead, native Bayesian work ends up focused mostly on (1) what priors accurately represent various realistic kinds of prior knowledge, and (2) what methods allow efficient calculation/approximation of the Bayesian update?
Jaynes’ writing is a good example of native Bayesianism. The native view seems to be more common among people with a background in economics or AI, where they’re more likely to absorb the Bayesian view from the start rather than adopt it later in life.
I’ve just started reading the singular learning theory “green book”, a.k.a. Mathematical Theory of Bayesian Statistics by Watanabe. The experience has helped me to articulate the difference between two kinds of textbooks (and viewpoints more generally) on Bayesian statistics. I’ll call one of them “second-language Bayesian”, and the other “native Bayesian”.
Second-language Bayesian texts start from the standard frame of mid-twentieth-century frequentist statistics (which I’ll call “classical” statistics). It views Bayesian inference as a tool/technique for answering basically-similar questions and solving basically-similar problems to classical statistics. In particular, they typically assume that there’s some “true distribution” from which the data is sampled independently and identically. The core question is then “Does our inference technique converge to the true distribution as the number of data points grows?” (or variations thereon, like e.g. “Does the estimated mean converge to the true mean”, asymptotics, etc). The implicit underlying assumption is that convergence to the true distribution as the number of (IID) data points grows is the main criterion by which inference methods are judged; that’s the main reason to choose one method over another in the first place.
Watanabe’s book is pretty explicitly second-language Bayesian. I also remember Gelman & co’s Bayesian Data Analysis textbook being second-language Bayesian, although it’s been a while so I could be misremembering. In general, as the name suggests, second-language Bayesianism seems to be the default among people who started with a more traditional background in statistics or learning theory, then picked up Bayesianism later on.
In contrast, native Bayesian texts justify Bayesian inference via Cox’ theorem, dutch book theorems, or one among the long tail of similar theorems. “Does our inference technique converge to the ‘true distribution’ as the number of data points grows?” is not the main success criterion in the first place (in fact a native Bayesian would raise an eyebrow at the entire concept of a “true distribution”), so mostly the question of convergence just doesn’t come up. Insofar as it does come up, it’s an interesting but not particularly central question, mostly relevant to numerical approximation methods. Instead, native Bayesian work ends up focused mostly on (1) what priors accurately represent various realistic kinds of prior knowledge, and (2) what methods allow efficient calculation/approximation of the Bayesian update?
Jaynes’ writing is a good example of native Bayesianism. The native view seems to be more common among people with a background in economics or AI, where they’re more likely to absorb the Bayesian view from the start rather than adopt it later in life.
Is there any “native” textbook that is pragmatic and explains how to use bayesian in practice (perhaps in some narrow domain)?
I don’t know of a good one, but never looked very hard.