If the distribution is symmetrical, then the probability density at .5 will be unchanged after a single coin toss.
In the continuous-distribution world the probability density at exactly 0.5 is infinitesimally small. And the probability density at 0.5 plus-minus epsilon will change.
No they don’t.
Yes, they do. We’re talking about expected values of coin tosses now, not about the probabilities of the coin being biased.
the probability mass at 0.5 plus-minus epsilon will change.
(army1987 already addressed density vs mass.) No, for any x, the probability density at 0.5+x goes up by the same amount that the probability density at 0.5-x goes down (assuming a symmetrical prior), so for any x, the probability mass in [0.5-x, 0.5+x] will remain exactly the same.
We’re talking about expected values of coin tosses now, not about the probabilities of the coin being biased.
Ok, instead of 1000 flips, think about the next 2 flips. The probability that exactly 1 of them lands heads does not change. This does not contradict the claim that the probability of the next flip being heads increases, because the probability of the next two flips both being heads increases while the probability of the next two flips both being tails decreases by the same amount (assuming you just saw the coin land heads).
You don’t even need to explicitly use Bayes’s theorem and do the math to see this (though you can). It all follows from symmetry and conservation of expected evidence. By symmetry, the change in probability of some event which is symmetric with respect to heads/tails must change by the same amount whether the result of the first flip is heads or tails, and by conservation of expected evidence, those changes must add to 0. Therefore those changes are 0.
for any x, the probability density at 0.5+x goes up by the same amount that the probability density at 0.5-x goes down (assuming a symmetrical prior)
I don’t think that is true. Imagine that your probability density is a normal distribution. You update in such a way that the mean changes, 0.5 is no longer the peak. This means that your probability density is no longer symmetrical around 0.5 (even if you started with a symmetrical prior) and the probability density line is not a 45 degree straight line—with the result that the density at 0.5+x changes by a different amount than at 0.5-x.
You update in such a way that the mean changes, 0.5 is no longer the peak. This means that your probability density is no longer symmetrical around 0.5 (even if you started with a symmetrical prior)
That is correct. Your probability distribution is no longer symmetrical after the first flip, which means that on the second flip, the symmetry argument I made above no longer holds, and you get information about whether the coin is biased or approximately fair. That doesn’t matter for the first flip though. Did you read the last paragraph in my previous comment? If so, was any part of it unclear?
with the result that the density at 0.5+x changes by a different amount than at 0.5-x.
That does not follow from anything you wrote before it (the 45 degree straight line part is particularly irrelevant).
Hm. Interesting how what looks like a trivially simple situation can become so confusing. Let me try to walk through my reasoning and see what’s going on...
We have a coin and we would like to know whether it’s fair. For convenience let’s define heads as 1 and tails as 0, one consequence of that is that we can think of the coin as a bitstring generator. What does it mean for a coin to be fair? It means that expected value of the coin’s bitstring is 0.5. That’s the same thing as saying that the mean of the sample bitstring converges to 0.5.
Can we know for certain that the coin is fair on the basis of examining its bitsting? No, we can not. Therefore we need to introduce the concept of acceptable certainty, that is, the threshold beyond which we think that the chance of the coin being fair is high enough (that’s the same concept as the p-value). In frequentist statistics we will just run an exact binomial test, but Bayes makes things a bit more complicated.
Luckily, Gelman in Bayesian Data Analysis looks exactly at this case (2nd ed., pp.33-34). Assuming a uniform prior on [0,1] the posterior distribution for theta (which in our case is the probability of the coin coming up heads or generating a 1) is
p( th | y ) is proportional to (th ^ y) * ((1 - th)^(n—y))
where y is the number of heads and n is the number of trials.
After the first flip y=1, n=1 and so p( th | 1) is proportional to ( th )
Aha, this is interesting. Our prior was uniform so the density was just a straight horizontal line. After the first toss the line is still straight but is now sloping up with the minimum at zero and the maximum at 1.
So the expected value of the mean of our bitstring used to be 0.5 but is now greater than 0.5. And that is why I argued that the very first toss changes your expectations: your expected bitstring mean (= expected probability of the coin coming up heads) is now no longer 0.5 and so you don’t think that the coin is fair (because the fair coin’s expected mean is 0.5).
But that’s only one way of looking at it and now I see the error of my ways. After the first toss our probability density is still a straight line and it pivoted around the 0.5 point. This means that the probability mass in some neighborhood of [0.5-x, 0.5+x] did not change and so the probability of the coin being fair remains the same. The change in the expected value is because we think that if the coin is biased, it’s more likely to be biased towards heads than towards tails.
And yet this works because we started with a uniform prior, a straight density line. What if we start with a different, “curvier” prior? After the first toss the probability density should still pivot around the 0.5 point but because it’s not a straight line the probability mass in [0.5-x, 0.5+x] will not necessarily remain the same. Hmm… I don’t have time right now to play with it, but it requires some further thought.
What if we start with a different, “curvier” prior? After the first toss the probability density should still pivot around the 0.5 point but because it’s not a straight line the probability mass in [0.5-x, 0.5+x] will not necessarily remain the same.
Provided the prior is symmetrical, the probability mass in [0.5-x, 0.5+x] will remain the same after the first toss by the argument I sketched above, even though the probability density will not be a straight line. On subsequent tosses, of course, that will no longer be true. If you have flipped more heads than tails, then your probability distribution will be skewed, so flipping heads again will decrease the probability of the coin being fair, while flipping tails will increase the probability of the coin being fair. If you have flipped the same (nonzero) number of heads as tails so far, then your probability distribution will be different than it was when you started, but it will still be symmetrical, so the next flip does not change the probability of the coin being fair.
In the continuous-distribution world the probability density at exactly 0.5 is infinitesimally small. And the probability density at 0.5 plus-minus epsilon will change.
Yes, they do. We’re talking about expected values of coin tosses now, not about the probabilities of the coin being biased.
That’s not what a probability density is. You’re thinking of a probability mass.
Yes, you are right.
(army1987 already addressed density vs mass.) No, for any x, the probability density at 0.5+x goes up by the same amount that the probability density at 0.5-x goes down (assuming a symmetrical prior), so for any x, the probability mass in [0.5-x, 0.5+x] will remain exactly the same.
Ok, instead of 1000 flips, think about the next 2 flips. The probability that exactly 1 of them lands heads does not change. This does not contradict the claim that the probability of the next flip being heads increases, because the probability of the next two flips both being heads increases while the probability of the next two flips both being tails decreases by the same amount (assuming you just saw the coin land heads).
You don’t even need to explicitly use Bayes’s theorem and do the math to see this (though you can). It all follows from symmetry and conservation of expected evidence. By symmetry, the change in probability of some event which is symmetric with respect to heads/tails must change by the same amount whether the result of the first flip is heads or tails, and by conservation of expected evidence, those changes must add to 0. Therefore those changes are 0.
I don’t think that is true. Imagine that your probability density is a normal distribution. You update in such a way that the mean changes, 0.5 is no longer the peak. This means that your probability density is no longer symmetrical around 0.5 (even if you started with a symmetrical prior) and the probability density line is not a 45 degree straight line—with the result that the density at 0.5+x changes by a different amount than at 0.5-x.
That is correct. Your probability distribution is no longer symmetrical after the first flip, which means that on the second flip, the symmetry argument I made above no longer holds, and you get information about whether the coin is biased or approximately fair. That doesn’t matter for the first flip though. Did you read the last paragraph in my previous comment? If so, was any part of it unclear?
That does not follow from anything you wrote before it (the 45 degree straight line part is particularly irrelevant).
Hm. Interesting how what looks like a trivially simple situation can become so confusing. Let me try to walk through my reasoning and see what’s going on...
We have a coin and we would like to know whether it’s fair. For convenience let’s define heads as 1 and tails as 0, one consequence of that is that we can think of the coin as a bitstring generator. What does it mean for a coin to be fair? It means that expected value of the coin’s bitstring is 0.5. That’s the same thing as saying that the mean of the sample bitstring converges to 0.5.
Can we know for certain that the coin is fair on the basis of examining its bitsting? No, we can not. Therefore we need to introduce the concept of acceptable certainty, that is, the threshold beyond which we think that the chance of the coin being fair is high enough (that’s the same concept as the p-value). In frequentist statistics we will just run an exact binomial test, but Bayes makes things a bit more complicated.
Luckily, Gelman in Bayesian Data Analysis looks exactly at this case (2nd ed., pp.33-34). Assuming a uniform prior on [0,1] the posterior distribution for theta (which in our case is the probability of the coin coming up heads or generating a 1) is
p( th | y ) is proportional to (th ^ y) * ((1 - th)^(n—y))
where y is the number of heads and n is the number of trials.
After the first flip y=1, n=1 and so p( th | 1) is proportional to ( th )
Aha, this is interesting. Our prior was uniform so the density was just a straight horizontal line. After the first toss the line is still straight but is now sloping up with the minimum at zero and the maximum at 1.
So the expected value of the mean of our bitstring used to be 0.5 but is now greater than 0.5. And that is why I argued that the very first toss changes your expectations: your expected bitstring mean (= expected probability of the coin coming up heads) is now no longer 0.5 and so you don’t think that the coin is fair (because the fair coin’s expected mean is 0.5).
But that’s only one way of looking at it and now I see the error of my ways. After the first toss our probability density is still a straight line and it pivoted around the 0.5 point. This means that the probability mass in some neighborhood of [0.5-x, 0.5+x] did not change and so the probability of the coin being fair remains the same. The change in the expected value is because we think that if the coin is biased, it’s more likely to be biased towards heads than towards tails.
And yet this works because we started with a uniform prior, a straight density line. What if we start with a different, “curvier” prior? After the first toss the probability density should still pivot around the 0.5 point but because it’s not a straight line the probability mass in [0.5-x, 0.5+x] will not necessarily remain the same. Hmm… I don’t have time right now to play with it, but it requires some further thought.
Yes.
Provided the prior is symmetrical, the probability mass in [0.5-x, 0.5+x] will remain the same after the first toss by the argument I sketched above, even though the probability density will not be a straight line. On subsequent tosses, of course, that will no longer be true. If you have flipped more heads than tails, then your probability distribution will be skewed, so flipping heads again will decrease the probability of the coin being fair, while flipping tails will increase the probability of the coin being fair. If you have flipped the same (nonzero) number of heads as tails so far, then your probability distribution will be different than it was when you started, but it will still be symmetrical, so the next flip does not change the probability of the coin being fair.