If you intend to flip the coin six times, then your null-hypothesis prior is “I will get 0 heads with probability 0.5^6, 1 head with probability 6*0.5^6, and so on”. If you intend to flip until you get a tail, the prior is “Probability 0.5 of one flip, 0.25 of two flips”, and so on.
Sorry, I was confused. Let me try to rephrase. Given some prior, your state of mind before the experiment affects your prediction of the outcome probabilities, and therefore informs your evaluation of the evidence. I should perhaps have said “affects the posterior” rather than “the prior”.
The exact example you’ve given (binomial versus negative binomial sampling distribution) is actually a counterexample to the above assertion. Those two distributions have the same likelihood function, so the evaluation of the evidence is the same under both scenarios. It’s true that the prior predictive distributions are different, but that doesn’t affect the posterior distribution of the parameter.
So it doesn’t matter whether the data were sampled according to Pr1 or Pr2. You can check that the binomial and negative binomial distributions satisfy the proportionality condition by looking them up in Wikipedia.
If you intend to flip the coin six times, then your null-hypothesis prior is “I will get 0 heads with probability 0.5^6, 1 head with probability 6*0.5^6, and so on”. If you intend to flip until you get a tail, the prior is “Probability 0.5 of one flip, 0.25 of two flips”, and so on.
That’s the likelihood under p = 0.5, not the prior. We want to infer something about p, so the prior is a distribution on p, not on the data.
Sorry, I was confused. Let me try to rephrase. Given some prior, your state of mind before the experiment affects your prediction of the outcome probabilities, and therefore informs your evaluation of the evidence. I should perhaps have said “affects the posterior” rather than “the prior”.
The exact example you’ve given (binomial versus negative binomial sampling distribution) is actually a counterexample to the above assertion. Those two distributions have the same likelihood function, so the evaluation of the evidence is the same under both scenarios. It’s true that the prior predictive distributions are different, but that doesn’t affect the posterior distribution of the parameter.
Really? I find that counterintuitive; could you show me the calculation?
Suppose that there are two sampling distributions that satisfy (sorry about the lousy math notation) the proportionality relationship,
Pr1(data | parameter) = k * Pr2(data | parameter)
where k may depend on the data but not on the parameter. Then the same proportionality relationship holds for the prior predictive distributions,
Pr1(data) = Integral { Pr1(data | parameter) Pr(parameter) d(parameter) }
Pr1(data) = Integral { k Pr2(data | parameter) Pr(parameter) d(parameter) }
Pr1(data) = k Integral { Pr2(data | parameter) Pr(parameter) d(parameter) }
Pr1(data) = k Pr2(data)
Now write out Bayes’ theorem:
Pr(parameter | data) = Pr(parameter) Pr1(data | parameter) / Pr1(data)
Pr(parameter | data) = Pr(parameter) k Pr2(data | parameter) / (k Pr2(data) )
Pr(parameter | data) = Pr(parameter) * Pr2(data | parameter) / Pr2(data))
So it doesn’t matter whether the data were sampled according to Pr1 or Pr2. You can check that the binomial and negative binomial distributions satisfy the proportionality condition by looking them up in Wikipedia.
Your argument is convincing; I sit corrected.