Hm. Interesting how what looks like a trivially simple situation can become so confusing. Let me try to walk through my reasoning and see what’s going on...
We have a coin and we would like to know whether it’s fair. For convenience let’s define heads as 1 and tails as 0, one consequence of that is that we can think of the coin as a bitstring generator. What does it mean for a coin to be fair? It means that expected value of the coin’s bitstring is 0.5. That’s the same thing as saying that the mean of the sample bitstring converges to 0.5.
Can we know for certain that the coin is fair on the basis of examining its bitsting? No, we can not. Therefore we need to introduce the concept of acceptable certainty, that is, the threshold beyond which we think that the chance of the coin being fair is high enough (that’s the same concept as the p-value). In frequentist statistics we will just run an exact binomial test, but Bayes makes things a bit more complicated.
Luckily, Gelman in Bayesian Data Analysis looks exactly at this case (2nd ed., pp.33-34). Assuming a uniform prior on [0,1] the posterior distribution for theta (which in our case is the probability of the coin coming up heads or generating a 1) is
p( th | y ) is proportional to (th ^ y) * ((1 - th)^(n—y))
where y is the number of heads and n is the number of trials.
After the first flip y=1, n=1 and so p( th | 1) is proportional to ( th )
Aha, this is interesting. Our prior was uniform so the density was just a straight horizontal line. After the first toss the line is still straight but is now sloping up with the minimum at zero and the maximum at 1.
So the expected value of the mean of our bitstring used to be 0.5 but is now greater than 0.5. And that is why I argued that the very first toss changes your expectations: your expected bitstring mean (= expected probability of the coin coming up heads) is now no longer 0.5 and so you don’t think that the coin is fair (because the fair coin’s expected mean is 0.5).
But that’s only one way of looking at it and now I see the error of my ways. After the first toss our probability density is still a straight line and it pivoted around the 0.5 point. This means that the probability mass in some neighborhood of [0.5-x, 0.5+x] did not change and so the probability of the coin being fair remains the same. The change in the expected value is because we think that if the coin is biased, it’s more likely to be biased towards heads than towards tails.
And yet this works because we started with a uniform prior, a straight density line. What if we start with a different, “curvier” prior? After the first toss the probability density should still pivot around the 0.5 point but because it’s not a straight line the probability mass in [0.5-x, 0.5+x] will not necessarily remain the same. Hmm… I don’t have time right now to play with it, but it requires some further thought.
What if we start with a different, “curvier” prior? After the first toss the probability density should still pivot around the 0.5 point but because it’s not a straight line the probability mass in [0.5-x, 0.5+x] will not necessarily remain the same.
Provided the prior is symmetrical, the probability mass in [0.5-x, 0.5+x] will remain the same after the first toss by the argument I sketched above, even though the probability density will not be a straight line. On subsequent tosses, of course, that will no longer be true. If you have flipped more heads than tails, then your probability distribution will be skewed, so flipping heads again will decrease the probability of the coin being fair, while flipping tails will increase the probability of the coin being fair. If you have flipped the same (nonzero) number of heads as tails so far, then your probability distribution will be different than it was when you started, but it will still be symmetrical, so the next flip does not change the probability of the coin being fair.
Hm. Interesting how what looks like a trivially simple situation can become so confusing. Let me try to walk through my reasoning and see what’s going on...
We have a coin and we would like to know whether it’s fair. For convenience let’s define heads as 1 and tails as 0, one consequence of that is that we can think of the coin as a bitstring generator. What does it mean for a coin to be fair? It means that expected value of the coin’s bitstring is 0.5. That’s the same thing as saying that the mean of the sample bitstring converges to 0.5.
Can we know for certain that the coin is fair on the basis of examining its bitsting? No, we can not. Therefore we need to introduce the concept of acceptable certainty, that is, the threshold beyond which we think that the chance of the coin being fair is high enough (that’s the same concept as the p-value). In frequentist statistics we will just run an exact binomial test, but Bayes makes things a bit more complicated.
Luckily, Gelman in Bayesian Data Analysis looks exactly at this case (2nd ed., pp.33-34). Assuming a uniform prior on [0,1] the posterior distribution for theta (which in our case is the probability of the coin coming up heads or generating a 1) is
p( th | y ) is proportional to (th ^ y) * ((1 - th)^(n—y))
where y is the number of heads and n is the number of trials.
After the first flip y=1, n=1 and so p( th | 1) is proportional to ( th )
Aha, this is interesting. Our prior was uniform so the density was just a straight horizontal line. After the first toss the line is still straight but is now sloping up with the minimum at zero and the maximum at 1.
So the expected value of the mean of our bitstring used to be 0.5 but is now greater than 0.5. And that is why I argued that the very first toss changes your expectations: your expected bitstring mean (= expected probability of the coin coming up heads) is now no longer 0.5 and so you don’t think that the coin is fair (because the fair coin’s expected mean is 0.5).
But that’s only one way of looking at it and now I see the error of my ways. After the first toss our probability density is still a straight line and it pivoted around the 0.5 point. This means that the probability mass in some neighborhood of [0.5-x, 0.5+x] did not change and so the probability of the coin being fair remains the same. The change in the expected value is because we think that if the coin is biased, it’s more likely to be biased towards heads than towards tails.
And yet this works because we started with a uniform prior, a straight density line. What if we start with a different, “curvier” prior? After the first toss the probability density should still pivot around the 0.5 point but because it’s not a straight line the probability mass in [0.5-x, 0.5+x] will not necessarily remain the same. Hmm… I don’t have time right now to play with it, but it requires some further thought.
Yes.
Provided the prior is symmetrical, the probability mass in [0.5-x, 0.5+x] will remain the same after the first toss by the argument I sketched above, even though the probability density will not be a straight line. On subsequent tosses, of course, that will no longer be true. If you have flipped more heads than tails, then your probability distribution will be skewed, so flipping heads again will decrease the probability of the coin being fair, while flipping tails will increase the probability of the coin being fair. If you have flipped the same (nonzero) number of heads as tails so far, then your probability distribution will be different than it was when you started, but it will still be symmetrical, so the next flip does not change the probability of the coin being fair.