JGWeissman comments on Beautiful Probability

JGWeissman 27 Dec 2012 18:28 UTC
3 points
0

Now suppose they each ask the question “what is the probability that, when doing what I did, one will come up with at most the number of tails I actually saw?”

That is throwing away data. The evidence that they each observed is the sequence of coin flip results, and the number of tails in that sequence is a partial summary of the data. The reason they get different answers is because that summary throws away more data for B than A. As you say, B already expected to get exactly one tail, so that summary tells him nothing new and he has no information to update on, while A can recover from this summary the number of heads and only loses information about the order (which cancels out anyways in the likelihood ratios between theories of independent coin flips). But if you calculate the probability that they each see that sequence you get the same answer for both, p(heads)^9999 * (1 - p(heads).

That is, the data gathering procedure is needed to interpret a partial summary of the data, but not the complete data.
- nostalgebraist 27 Dec 2012 22:43 UTC
  3 points
  0
  Parent
  Sure, the likelihoods are the same in both cases, since A and B’s probability distributions assign the same probability to any sequence that is in both of their supports. But the distributions are still different, and various functionals of them are still different—e.g., the number of tails, the moments (if we convert heads and tails to numbers), etc.
  
  If you’re a Bayesian, you think any hypothesis worth considering can predict a whole probability distribution, so there’s no reason to worry about these functionals when you can just look at the probability of your whole data set given the hypothesis. If (as in actual scientific practice, at present) you often predict functionals but not the whole distribution, then the difference in the functionals matters. (I admit that the coin example is too basic here, because in any theory about a real coin, we really would have a whole distribution.)
  
  My point is just that there are differences between the two cases. Bayesians don’t think these differences could possibly matter to the sort of hypotheses they are interested in testing, but that doesn’t mean that in principle there can be no reason to differentiate between the two.