A prior gives you as much information as the mean of a distribution. So, can’t I by the same token accuse both frequentist and Bayesian statistics of attempting to do probabilistic inference without using a distribution?
I mean, the frequentist uses the U-test to ask whether 2 data sets could be drawn from the same distribution, without assuming what the mean of the distribution is. The Bayesian would use some other test, assuming a prior or perhaps a mean for the distribution, but not assuming a shape for the distribution. And some other, uninvented, and (by the standards of LW) superior statistical methodology would use another test, assuming a mean and a shape for the distribution.
A prior gives you as much information as the mean of a distribution.
No, not in general, it can give much more or much less; it depends entirely on how detailed you can make your prior. Expanding out e.g. as a series of central moments can give you as detailed a shape as you want. It may reduce to knowing only the mean in certain very special inference problems. In other problems, you may know that the distribution is very definitely Cauchy (EDIT: which doesn’t even have a well-defined mean), but not know the parameters, and put some reasonable prior on them—flat for the center over some range, and approximately using a (1/x) improper prior for the width, perhaps cutting it off at physically relevant length scales.
The Bayesian would use some other test, assuming a prior or perhaps a mean for the distribution, but not assuming a shape for the distribution.
All that information can be encoded in the prior. The prior covers your probabilities over the space of your hypotheses, not a direct probabilistic encoding of what you think one sample will be.
A prior gives you as much information as the mean of a distribution. So, can’t I by the same token accuse both frequentist and Bayesian statistics of attempting to do probabilistic inference without using a distribution?
I mean, the frequentist uses the U-test to ask whether 2 data sets could be drawn from the same distribution, without assuming what the mean of the distribution is. The Bayesian would use some other test, assuming a prior or perhaps a mean for the distribution, but not assuming a shape for the distribution. And some other, uninvented, and (by the standards of LW) superior statistical methodology would use another test, assuming a mean and a shape for the distribution.
No, not in general, it can give much more or much less; it depends entirely on how detailed you can make your prior. Expanding out e.g. as a series of central moments can give you as detailed a shape as you want. It may reduce to knowing only the mean in certain very special inference problems. In other problems, you may know that the distribution is very definitely Cauchy (EDIT: which doesn’t even have a well-defined mean), but not know the parameters, and put some reasonable prior on them—flat for the center over some range, and approximately using a (1/x) improper prior for the width, perhaps cutting it off at physically relevant length scales.
All that information can be encoded in the prior. The prior covers your probabilities over the space of your hypotheses, not a direct probabilistic encoding of what you think one sample will be.