The frequentist/Bayesian dispute is of real import, because ad-hoc frequentist statistical methods often break down in extreme cases, throw away useful data, only work well with Gaussian sampling distributions etc.
I think you have this backwards. Frequentist techniques typically come with adversarial guarantees (i.e., “as long as the underlying distribution has bounded variance, this method will work”), whereas Bayesian techniques, by choosing a specific prior (such as a Gaussian prior), are making an assumption that will hurt them in an extreme cases or when the data is not drawn from the prior. The tradeoff is that frequentist methods tend to be much more conservative as a result (requiring more data to come to the same conclusion).
If you have a reasonable Bayesian generative model, then using it will probably give you better results with less data. But if you really can’t even build the model (i.e. specify a prior that you trust) then frequentist techniques might actually be appropriate. Note that the distinction I’m drawing is between Bayesian and frequentist techniques, as opposed to Bayesian and frequentist interpretations of probability. In the former case, there are actual reasons to use both. In the latter case, I agree with you that the Bayesian interpretation is obviously correct.
Bayesian techniques, by choosing a specific prior (such as a Gaussian prior), are making an assumption that will hurt them in an extreme cases or when the data is not drawn from the prior. The tradeoff is that frequentist methods tend to be much more conservative as a result (requiring more data to come to the same conclusion).
Bayesian methods with uninformative (possibly improper) priors agree with frequentist methods whenever the latter make sense.
Can you explain further? Casually, I consider results like compressed sensing and multiplicative weights to be examples of frequentist approaches (as do people working in these areas), which achieve their results in adversarial settings where no prior is available. I would be interested in seeing how Bayesian methods with improper priors recommend similar behavior.
I let you choose some linear functionals, and then tell you the value of each one on some unknown sparse vector (compressed sensing).
We play an iterated game with unknown payoffs; you observe your payoff in each round, but nothing more, and want to maximize total payoff (multiplicative weights).
Put even more simply, what is the Bayesian method that plays randomly in rock-paper-scissors against an unknown adversary? Minimax play seems like a canonical example of a frequentist method; if you have any fixed model of your adversary you might as well play deterministically (at least if you are doing consequentialist loss minimization).
Are you referring to the result that every non-dominated decision procedure is either a Bayesian procedure or a limit of Bayesian procedures? If so, one could imagine a frequentist procedure that is strictly dominated by other procedures, but where finding the dominating procedures is computationally infeasible. Alternately, a procedure could be non-dominated, and thus Bayesian for the right choice of prior, but the correct choice of prior could be difficult to find (the only proof I know of the “non-dominated ⇒ Bayesian” result is non-constructive).
What I was trying to emphasise is that, pace “potato”, the frequentist/Bayesian dispute isn’t just an argument about words but actually has ramifications for how one is likely to approach statistical inference—so it shouldn’t be compared to the definitional dispute “If a tree falls in a forest and no one hears it, does it make a sound?”
If someone treated frequentist approaches as though they were equivalent to Bayesian methods in general, then he would occasionally be drastically in error. PT:TLoS offers many examples of this (for example the comparison of a Bayesian “psi test” and the chi-squared test on page 300). My comment about the Gaussian distribution had in mind Jaynes’s discussion of “pre-data and post-data considerations” starting on page 499, in which he discusses the fact that orthodox practice answers the wrong question: it gives correct answers to “if the hypothesis being tested is in fact true, what is the probability that we shall get data indicating that it is true?” when the real problems of scientific inference are concerned with the question “what is the probability conditional on the data that the hypothesis is true?”, and this problem is the result of frequentist philosophy’s failure to admit the existence of prior and posterior probabilities for a fixed parameter or an hypothesis. He suggests that this conflation goes somewhat unnoticed because in the case of the commonly encountered Gaussian sampling distribution the difference is relatively unimportant, but compares another case (Cauchy sampling distributions) in which the Bayesian analysis is far superior.
On the other hand the interlocutors in the standard definitional dispute have no substantive disagreement, i.e. they actually anticipate the same things, so their disagreement amounts to nothing apart from the fact that they waste their time arguing about words.
I’ll defer to your opinion (which is probably much better informed than mine) on whether frequentist methods work well when their limitations are borne in mind.
I think you have this backwards. Frequentist techniques typically come with adversarial guarantees (i.e., “as long as the underlying distribution has bounded variance, this method will work”), whereas Bayesian techniques, by choosing a specific prior (such as a Gaussian prior), are making an assumption that will hurt them in an extreme cases or when the data is not drawn from the prior. The tradeoff is that frequentist methods tend to be much more conservative as a result (requiring more data to come to the same conclusion).
If you have a reasonable Bayesian generative model, then using it will probably give you better results with less data. But if you really can’t even build the model (i.e. specify a prior that you trust) then frequentist techniques might actually be appropriate. Note that the distinction I’m drawing is between Bayesian and frequentist techniques, as opposed to Bayesian and frequentist interpretations of probability. In the former case, there are actual reasons to use both. In the latter case, I agree with you that the Bayesian interpretation is obviously correct.
Bayesian methods with uninformative (possibly improper) priors agree with frequentist methods whenever the latter make sense.
Can you explain further? Casually, I consider results like compressed sensing and multiplicative weights to be examples of frequentist approaches (as do people working in these areas), which achieve their results in adversarial settings where no prior is available. I would be interested in seeing how Bayesian methods with improper priors recommend similar behavior.
I admit I’m not familiar with either of those… Can you make a simple example of an “adversarial setting where no prior is available”?
I let you choose some linear functionals, and then tell you the value of each one on some unknown sparse vector (compressed sensing).
We play an iterated game with unknown payoffs; you observe your payoff in each round, but nothing more, and want to maximize total payoff (multiplicative weights).
Put even more simply, what is the Bayesian method that plays randomly in rock-paper-scissors against an unknown adversary? Minimax play seems like a canonical example of a frequentist method; if you have any fixed model of your adversary you might as well play deterministically (at least if you are doing consequentialist loss minimization).
The minimax estimator can be related to Bayesian estimation through the concept of a “least-favorable prior”.
Are you referring to the result that every non-dominated decision procedure is either a Bayesian procedure or a limit of Bayesian procedures? If so, one could imagine a frequentist procedure that is strictly dominated by other procedures, but where finding the dominating procedures is computationally infeasible. Alternately, a procedure could be non-dominated, and thus Bayesian for the right choice of prior, but the correct choice of prior could be difficult to find (the only proof I know of the “non-dominated ⇒ Bayesian” result is non-constructive).
Thanks for the clarification.
What I was trying to emphasise is that, pace “potato”, the frequentist/Bayesian dispute isn’t just an argument about words but actually has ramifications for how one is likely to approach statistical inference—so it shouldn’t be compared to the definitional dispute “If a tree falls in a forest and no one hears it, does it make a sound?”
If someone treated frequentist approaches as though they were equivalent to Bayesian methods in general, then he would occasionally be drastically in error. PT:TLoS offers many examples of this (for example the comparison of a Bayesian “psi test” and the chi-squared test on page 300). My comment about the Gaussian distribution had in mind Jaynes’s discussion of “pre-data and post-data considerations” starting on page 499, in which he discusses the fact that orthodox practice answers the wrong question: it gives correct answers to “if the hypothesis being tested is in fact true, what is the probability that we shall get data indicating that it is true?” when the real problems of scientific inference are concerned with the question “what is the probability conditional on the data that the hypothesis is true?”, and this problem is the result of frequentist philosophy’s failure to admit the existence of prior and posterior probabilities for a fixed parameter or an hypothesis. He suggests that this conflation goes somewhat unnoticed because in the case of the commonly encountered Gaussian sampling distribution the difference is relatively unimportant, but compares another case (Cauchy sampling distributions) in which the Bayesian analysis is far superior.
On the other hand the interlocutors in the standard definitional dispute have no substantive disagreement, i.e. they actually anticipate the same things, so their disagreement amounts to nothing apart from the fact that they waste their time arguing about words.
I’ll defer to your opinion (which is probably much better informed than mine) on whether frequentist methods work well when their limitations are borne in mind.