This theory that you’ve stated here—that any useful frequentist method will have a Bayesian interpretation—doesn’t serve much in the way of controlled anticipation.
A frequentist tool only works insomuch as it approximates a Bayesian approach. As such, given the domain in which it works well, you can prove that it approximates the Bayesian answer.
For example, if you’re trying to find the probability of a repeatable event ending in success, the frequentist method says to use success/total. The Bayesain approach with a maximum entropy prior gives (success + 0.5)/(total + 1). It can be shown that, with a sufficient number of successes and failures, these will work out similarly. It’s well known that with very few successes or very few failures, the frequentist version doesn’t work very well.
This is false (as explained in the linked-to video). If nothing else, the frequentist answer depends on the loss function (as does the Bayesian answer, although the posterior distribution is a way of summarising the answer simultaneously for all loss functions).
I think you’re taking the frequentist interpretation of what a probability is and trying to forcibly extend it to the entire frequentist decision theory. As far as the “frequentist interpretation of probability” goes, I have never met a single statistician who even explicitly identified “probabilities as frequencies” as a possible belief to hold, much less claimed to hold it themselves. As far as I can tell, this whole “probabilities as frequencies” thing is unique to LessWrong.
Everyone I’ve ever met who identified as a frequentist meant “not strictly Bayesian”. Whenever a method was identified as frequentist, it either meant “not strictly Bayesian” or else that it was adopting the decision theory described in Michael Jordan’s lecture.
In fact, the frequentist approach (not as you’ve defined it, but as the term is actually used by statisticians) is used precisely because it works extremely well in certain circumstances (for instance, cross-validation). This is, I believe, what Mike is arguing for when he says that a mix of Bayesian and frequentist techniques is necessary.
A frequentist tool only works insomuch as it approximates a Bayesian approach. As such, given the domain in which it works well, you can prove that it approximates the Bayesian answer.
For example, if you’re trying to find the probability of a repeatable event ending in success, the frequentist method says to use success/total. The Bayesain approach with a maximum entropy prior gives (success + 0.5)/(total + 1). It can be shown that, with a sufficient number of successes and failures, these will work out similarly. It’s well known that with very few successes or very few failures, the frequentist version doesn’t work very well.
This is false (as explained in the linked-to video). If nothing else, the frequentist answer depends on the loss function (as does the Bayesian answer, although the posterior distribution is a way of summarising the answer simultaneously for all loss functions).
I think you’re taking the frequentist interpretation of what a probability is and trying to forcibly extend it to the entire frequentist decision theory. As far as the “frequentist interpretation of probability” goes, I have never met a single statistician who even explicitly identified “probabilities as frequencies” as a possible belief to hold, much less claimed to hold it themselves. As far as I can tell, this whole “probabilities as frequencies” thing is unique to LessWrong.
Everyone I’ve ever met who identified as a frequentist meant “not strictly Bayesian”. Whenever a method was identified as frequentist, it either meant “not strictly Bayesian” or else that it was adopting the decision theory described in Michael Jordan’s lecture.
In fact, the frequentist approach (not as you’ve defined it, but as the term is actually used by statisticians) is used precisely because it works extremely well in certain circumstances (for instance, cross-validation). This is, I believe, what Mike is arguing for when he says that a mix of Bayesian and frequentist techniques is necessary.