H1: “the coin will come up heads 95% of the time.”
Whether a given coinflip is evidence for or against H1 depends not only on the value of that coinflip, but on what other hypotheses you are comparing H1 to. So let’s introduce...
H2: “the coin will come up heads 50% of the time.”
By Bayes’ Theorem (odds form), the odds conditional upon the data D are:
p(H1|D) / p(H2|D) = p(H1)p(D|H1) / p(H2)p(D|H2)
So when we see the data, our odds are multiplied by the likelihood ratio p(D|H1)/p(D|H2).
If D = heads, our likelihood ratio is:
p(heads|H1) / p(heads|H2) = .95 / .5 = 1.9.
If D = tails, our likelihood ratio is:
p(tails|H1) / p(tails|H2) = .05 / .5 = 0.1.
If you prefer to measure evidence in decibels, then a result of heads is 10log10(1.9) ~= +2.8db of evidence and a result of tails is 10log10(0.1) = −10.0db of evidence.
The same result is true regardless of how you group the coinflips; if you get nothing but heads, that is even stronger evidence for H1 than if you get 95% heads and 5% tails. This is true because we are only comparing it to hypothesis H2. If we introduce hypothesis H3:
H3: “the coin will come up heads 99% of the time.”
Then we can also measure the likelihood ratio p(D|H1) / p(D|H3).
John, Stuart, let’s do the math:
H1: “the coin will come up heads 95% of the time.”
Whether a given coinflip is evidence for or against H1 depends not only on the value of that coinflip, but on what other hypotheses you are comparing H1 to. So let’s introduce...
H2: “the coin will come up heads 50% of the time.”
By Bayes’ Theorem (odds form), the odds conditional upon the data D are:
p(H1|D) / p(H2|D) = p(H1)p(D|H1) / p(H2)p(D|H2)
So when we see the data, our odds are multiplied by the likelihood ratio p(D|H1)/p(D|H2).
If D = heads, our likelihood ratio is:
p(heads|H1) / p(heads|H2) = .95 / .5 = 1.9.
If D = tails, our likelihood ratio is:
p(tails|H1) / p(tails|H2) = .05 / .5 = 0.1.
If you prefer to measure evidence in decibels, then a result of heads is 10log10(1.9) ~= +2.8db of evidence and a result of tails is 10log10(0.1) = −10.0db of evidence.
The same result is true regardless of how you group the coinflips; if you get nothing but heads, that is even stronger evidence for H1 than if you get 95% heads and 5% tails. This is true because we are only comparing it to hypothesis H2. If we introduce hypothesis H3:
H3: “the coin will come up heads 99% of the time.”
Then we can also measure the likelihood ratio p(D|H1) / p(D|H3).
Plugging in “heads” or “tails”, we get:
p(heads|H1) / p(heads|H3) = 0.95 / 0.99 = 0.9595… p(tails|H1) / p(tails|H3) = 0.05 / 0.01 = 5.0
So a result of heads is about −0.18 db of evidence for H1, and a result of tails is about +7.0 db of evidence.
If you have a uniform prior on [0, 1] for the frequency of a heads, then you can use Laplace’s Rule of Succession.