Imagine you have a coin of unknown bias (taken to be uniform on [0,1]).
If you flip this coin and get a heads (an event of initial probability 1⁄2), you update the prior strongly and your probability of heads on the next flip is 2⁄3.
Now suppose instead you have already flipped the coin two million times, and got a million heads and a million tails. The probability of heads on the next flip is still 1⁄2; however, you will barely update on that, and the probability of another heads after that is barely above 1⁄2[1].
In the first case you have no evidence either way, in the second case you have strong evidence either way, and so things update less.
In terms of odds ratios, let H be your hypothesis (with negative ¬H), B your past observation, and B’ your future observation.
Then O(H|B’,B) = P(B’|H,B) / P(B’|¬H,B) * O(H|B).
The Bayes factor is P(B’|H,B) / P(B’|¬H,B). If you’ve made a lot of observations in B, then this odds ratio might be close to 1. It’s not the same thing as P(B’|H) / P(B’|¬H), which might be very different from 1. Why? Because P(B’|H,B) / P(B’|¬H,B) measures how likely B’ is, given H and B versus how likely it is, given ¬H and B. The B might completely screen off the effect of H versus ¬H.
In a court case, for example, if you’ve already established a witness is untrustworthy (B), then their claims (B’) have little weight, and are pretty independent of guilt or not (H vs ¬H) - even if the claims would have weight if you didn’t know their trustworthiness.
Note you can still get massive updates if B’ is pretty independent of B. So if someone brings in camera footage of the crime, that has no connection with the previous witness’s trustworthiness, and can throw the odds strongly in one direction or another (in equation, independence means that P(B’|H,B) / P(B’|¬H,B) = P(B’|H) / P(B’|¬H)).
So:
At this point, I think I am somewhat below Nate Silver’s 60% odds that the virus escaped from the lab, and put myself at about 40%, but I haven’t looked carefully and this probability is weakly held.
This means that they expect that it’s quite likely that there is evidence out there that could change their mind (which makes sense, as they haven’t looked carefully). They would have a strongly held probability if they had looked at all the available evidence and converged on 40% at the end of weighing it all up; it’s unlikely that there’s anything major they missed, so they don’t expect anything new to change their estimate much.
Note you can still get massive updates if B’ is pretty independent of B. So if someone brings in camera footage of the crime, that has no connection with the previous witness’s trustworthiness, and can throw the odds strongly in one direction or another (in equation, independence means that P(B’|H,B)/P(B’|¬H,B) = P(B’|H)/P(B’|¬H)).
Thanks, I think this is the crucial point for me. I was implicitly operating under the assumption that the evidence is uncorrelated which is of course not warranted in most cases.
So if we have already updated on a lot of evidence, it is often reasonable to expect that part of what future evidence can tell us is already included in these updates. I think I wouldn’t say that the likelihood ratio is independent of the prior anymore. In most cases, they have a common dependency on past evidence.
Yep, that seems to be right. One minor caveat; instead of
it is often reasonable to expect that part of what future evidence can tell us is already included in these updates.
I’d say something like:
“Past evidence affects how we interpret future evidence, sometimes weakening its impact.”
Thinking of the untrustworthy witness example, I wouldn’t say that “the witness’s testimony is already included in the fact that they are untrustworthy” (=”part of B’ already included in B”), but I would say “the fact they are untrustworthy affects how we interpret their testimony” (=”B affects how we interpret B’ ”).
Imagine you have a coin of unknown bias (taken to be uniform on [0,1]).
If you flip this coin and get a heads (an event of initial probability 1⁄2), you update the prior strongly and your probability of heads on the next flip is 2⁄3.
Now suppose instead you have already flipped the coin two million times, and got a million heads and a million tails. The probability of heads on the next flip is still 1⁄2; however, you will barely update on that, and the probability of another heads after that is barely above 1⁄2[1].
In the first case you have no evidence either way, in the second case you have strong evidence either way, and so things update less.
In terms of odds ratios, let H be your hypothesis (with negative ¬H), B your past observation, and B’ your future observation.
Then O(H|B’,B) = P(B’|H,B) / P(B’|¬H,B) * O(H|B).
The Bayes factor is P(B’|H,B) / P(B’|¬H,B). If you’ve made a lot of observations in B, then this odds ratio might be close to 1. It’s not the same thing as P(B’|H) / P(B’|¬H), which might be very different from 1. Why? Because P(B’|H,B) / P(B’|¬H,B) measures how likely B’ is, given H and B versus how likely it is, given ¬H and B. The B might completely screen off the effect of H versus ¬H.
In a court case, for example, if you’ve already established a witness is untrustworthy (B), then their claims (B’) have little weight, and are pretty independent of guilt or not (H vs ¬H) - even if the claims would have weight if you didn’t know their trustworthiness.
Note you can still get massive updates if B’ is pretty independent of B. So if someone brings in camera footage of the crime, that has no connection with the previous witness’s trustworthiness, and can throw the odds strongly in one direction or another (in equation, independence means that P(B’|H,B) / P(B’|¬H,B) = P(B’|H) / P(B’|¬H)).
So:
This means that they expect that it’s quite likely that there is evidence out there that could change their mind (which makes sense, as they haven’t looked carefully). They would have a strongly held probability if they had looked at all the available evidence and converged on 40% at the end of weighing it all up; it’s unlikely that there’s anything major they missed, so they don’t expect anything new to change their estimate much.
It’s (106+2)/(2∗106+3)=0.50000025, I believe.
Thanks, I think this is the crucial point for me. I was implicitly operating under the assumption that the evidence is uncorrelated which is of course not warranted in most cases.
So if we have already updated on a lot of evidence, it is often reasonable to expect that part of what future evidence can tell us is already included in these updates. I think I wouldn’t say that the likelihood ratio is independent of the prior anymore. In most cases, they have a common dependency on past evidence.
Yep, that seems to be right. One minor caveat; instead of
I’d say something like:
“Past evidence affects how we interpret future evidence, sometimes weakening its impact.”
Thinking of the untrustworthy witness example, I wouldn’t say that “the witness’s testimony is already included in the fact that they are untrustworthy” (=”part of B’ already included in B”), but I would say “the fact they are untrustworthy affects how we interpret their testimony” (=”B affects how we interpret B’ ”).
But that’s a minor caveat.