To be a Bayesian in the purest sense is very demanding. One need not only articulate a basic model for the structure of the data and the distribution of the errors around that data (as in a regression model), but all your further uncertainty about each of those parts. If you have some sliver of doubt that maybe the errors have a slight serial correlation, that has to be expressed as a part of your prior before you look at any data. If you think that maybe the model for the structure might not be a line, but might be better expressed as an ordinary differential equation with a somewhat exotic expression for dy/dx then that had better be built in with appropriate prior mass too. And you’d better not do this just for the 3 or 4 leading possible modifications, but for every one that you assign prior mass to, and don’t forget uncertainty about that uncertainty, up the hierarchy. Only then can the posterior computation, which is now rather computationally demanding, compute your true posterior.
Since this is so difficult, practitioners often fall short somewhere. Maybe they compute the posterior from the simple form of their prior, then build in one complication and compute a posterior for that and compare and, if these two look similar enough, conclude that building in more complications is unnecessary. Or maybe… gasp… they look at residuals. Such behavior is often going to be a violation of the (full) likelihood principle b/c the principle demands that the probability densities all be laid out explicitly and that we only obtain information from ratios of those.
So pragmatic Bayesians will still look at the residuals Box 1980.
As a counterargument to my previous post, if anyone wants an exposition of the likelihood principle, here is reasonably neutral presentation by Birnbaum 1962. For coherence and Bayesianism see Lindley 1990.
Edited to add: As Lindley points out (section 2.6), the consideration of the adequacy of a small model can be tested in a Bayesian way through consideration of a larger model, which includes the smaller. Fair enough. But is the process of starting with a small model, thinking, and then considering, possibly, a succession of larger models, some of which reject the smaller one and some of which do not, actually a process that is true to the likelihood principle? I don’t think so.
Anyone care to elaborate on Why a Bayesian is not allowed to look at the residuals?
I got hunches, but don’t feel qualified to explain in detail.
To be a Bayesian in the purest sense is very demanding. One need not only articulate a basic model for the structure of the data and the distribution of the errors around that data (as in a regression model), but all your further uncertainty about each of those parts. If you have some sliver of doubt that maybe the errors have a slight serial correlation, that has to be expressed as a part of your prior before you look at any data. If you think that maybe the model for the structure might not be a line, but might be better expressed as an ordinary differential equation with a somewhat exotic expression for dy/dx then that had better be built in with appropriate prior mass too. And you’d better not do this just for the 3 or 4 leading possible modifications, but for every one that you assign prior mass to, and don’t forget uncertainty about that uncertainty, up the hierarchy. Only then can the posterior computation, which is now rather computationally demanding, compute your true posterior.
Since this is so difficult, practitioners often fall short somewhere. Maybe they compute the posterior from the simple form of their prior, then build in one complication and compute a posterior for that and compare and, if these two look similar enough, conclude that building in more complications is unnecessary. Or maybe… gasp… they look at residuals. Such behavior is often going to be a violation of the (full) likelihood principle b/c the principle demands that the probability densities all be laid out explicitly and that we only obtain information from ratios of those.
So pragmatic Bayesians will still look at the residuals Box 1980.
As a counterargument to my previous post, if anyone wants an exposition of the likelihood principle, here is reasonably neutral presentation by Birnbaum 1962. For coherence and Bayesianism see Lindley 1990.
Edited to add: As Lindley points out (section 2.6), the consideration of the adequacy of a small model can be tested in a Bayesian way through consideration of a larger model, which includes the smaller. Fair enough. But is the process of starting with a small model, thinking, and then considering, possibly, a succession of larger models, some of which reject the smaller one and some of which do not, actually a process that is true to the likelihood principle? I don’t think so.