At this point, I think I am somewhat below Nate Silver’s 60% odds that the virus escaped from the lab, and put myself at about 40%, but I haven’t looked carefully and this probability is weakly held.
Quite off-topic: what does it mean from a Bayesian perspective to hold a probability weakly vs. confidently? Likelihood ratios for updating are independent of the prior so a weakly-held probability should update exactly as a confidently-held one. Is there a way to quantifiy the “strongness” with which one holds a probability?
Imagine you have a coin of unknown bias (taken to be uniform on [0,1]).
If you flip this coin and get a heads (an event of initial probability 1⁄2), you update the prior strongly and your probability of heads on the next flip is 2⁄3.
Now suppose instead you have already flipped the coin two million times, and got a million heads and a million tails. The probability of heads on the next flip is still 1⁄2; however, you will barely update on that, and the probability of another heads after that is barely above 1⁄2[1].
In the first case you have no evidence either way, in the second case you have strong evidence either way, and so things update less.
In terms of odds ratios, let H be your hypothesis (with negative ¬H), B your past observation, and B’ your future observation.
Then O(H|B’,B) = P(B’|H,B) / P(B’|¬H,B) * O(H|B).
The Bayes factor is P(B’|H,B) / P(B’|¬H,B). If you’ve made a lot of observations in B, then this odds ratio might be close to 1. It’s not the same thing as P(B’|H) / P(B’|¬H), which might be very different from 1. Why? Because P(B’|H,B) / P(B’|¬H,B) measures how likely B’ is, given H and B versus how likely it is, given ¬H and B. The B might completely screen off the effect of H versus ¬H.
In a court case, for example, if you’ve already established a witness is untrustworthy (B), then their claims (B’) have little weight, and are pretty independent of guilt or not (H vs ¬H) - even if the claims would have weight if you didn’t know their trustworthiness.
Note you can still get massive updates if B’ is pretty independent of B. So if someone brings in camera footage of the crime, that has no connection with the previous witness’s trustworthiness, and can throw the odds strongly in one direction or another (in equation, independence means that P(B’|H,B) / P(B’|¬H,B) = P(B’|H) / P(B’|¬H)).
So:
At this point, I think I am somewhat below Nate Silver’s 60% odds that the virus escaped from the lab, and put myself at about 40%, but I haven’t looked carefully and this probability is weakly held.
This means that they expect that it’s quite likely that there is evidence out there that could change their mind (which makes sense, as they haven’t looked carefully). They would have a strongly held probability if they had looked at all the available evidence and converged on 40% at the end of weighing it all up; it’s unlikely that there’s anything major they missed, so they don’t expect anything new to change their estimate much.
Note you can still get massive updates if B’ is pretty independent of B. So if someone brings in camera footage of the crime, that has no connection with the previous witness’s trustworthiness, and can throw the odds strongly in one direction or another (in equation, independence means that P(B’|H,B)/P(B’|¬H,B) = P(B’|H)/P(B’|¬H)).
Thanks, I think this is the crucial point for me. I was implicitly operating under the assumption that the evidence is uncorrelated which is of course not warranted in most cases.
So if we have already updated on a lot of evidence, it is often reasonable to expect that part of what future evidence can tell us is already included in these updates. I think I wouldn’t say that the likelihood ratio is independent of the prior anymore. In most cases, they have a common dependency on past evidence.
Yep, that seems to be right. One minor caveat; instead of
it is often reasonable to expect that part of what future evidence can tell us is already included in these updates.
I’d say something like:
“Past evidence affects how we interpret future evidence, sometimes weakening its impact.”
Thinking of the untrustworthy witness example, I wouldn’t say that “the witness’s testimony is already included in the fact that they are untrustworthy” (=”part of B’ already included in B”), but I would say “the fact they are untrustworthy affects how we interpret their testimony” (=”B affects how we interpret B’ ”).
Likelihood ratios for updating are independent of the prior
This is kind of technically true, but not in a practical sense. As you learn more about most systems, the likelihood ratio should likely go down for each additional point of evidence. The likelihood ratio for an event X is after all P(X|E):P(¬X|E) where the E refers to all the previous observations you’ve made that are now integrated in your prior.
Usually when referring to “updating on En we use the likelihood ratio
P(En|E1,E2,...,En−1):P(¬En|E1,E2,...,En−1)
which kind of makes it clear that this will depend on the order of the different Ei.
As you learn more about most systems, the likelihood ratio should likely go down for each additional point of evidence.
I’d be interested to see the assumptions which go into this. As Stuart has pointed out, it’s got to do with how correlated the evidence is. And for fat-tailed distributions we probably should expect to be surprised at a constant rate.
From the article:
Quite off-topic: what does it mean from a Bayesian perspective to hold a probability weakly vs. confidently? Likelihood ratios for updating are independent of the prior so a weakly-held probability should update exactly as a confidently-held one. Is there a way to quantifiy the “strongness” with which one holds a probability?
Imagine you have a coin of unknown bias (taken to be uniform on [0,1]).
If you flip this coin and get a heads (an event of initial probability 1⁄2), you update the prior strongly and your probability of heads on the next flip is 2⁄3.
Now suppose instead you have already flipped the coin two million times, and got a million heads and a million tails. The probability of heads on the next flip is still 1⁄2; however, you will barely update on that, and the probability of another heads after that is barely above 1⁄2[1].
In the first case you have no evidence either way, in the second case you have strong evidence either way, and so things update less.
In terms of odds ratios, let H be your hypothesis (with negative ¬H), B your past observation, and B’ your future observation.
Then O(H|B’,B) = P(B’|H,B) / P(B’|¬H,B) * O(H|B).
The Bayes factor is P(B’|H,B) / P(B’|¬H,B). If you’ve made a lot of observations in B, then this odds ratio might be close to 1. It’s not the same thing as P(B’|H) / P(B’|¬H), which might be very different from 1. Why? Because P(B’|H,B) / P(B’|¬H,B) measures how likely B’ is, given H and B versus how likely it is, given ¬H and B. The B might completely screen off the effect of H versus ¬H.
In a court case, for example, if you’ve already established a witness is untrustworthy (B), then their claims (B’) have little weight, and are pretty independent of guilt or not (H vs ¬H) - even if the claims would have weight if you didn’t know their trustworthiness.
Note you can still get massive updates if B’ is pretty independent of B. So if someone brings in camera footage of the crime, that has no connection with the previous witness’s trustworthiness, and can throw the odds strongly in one direction or another (in equation, independence means that P(B’|H,B) / P(B’|¬H,B) = P(B’|H) / P(B’|¬H)).
So:
This means that they expect that it’s quite likely that there is evidence out there that could change their mind (which makes sense, as they haven’t looked carefully). They would have a strongly held probability if they had looked at all the available evidence and converged on 40% at the end of weighing it all up; it’s unlikely that there’s anything major they missed, so they don’t expect anything new to change their estimate much.
It’s (106+2)/(2∗106+3)=0.50000025, I believe.
Thanks, I think this is the crucial point for me. I was implicitly operating under the assumption that the evidence is uncorrelated which is of course not warranted in most cases.
So if we have already updated on a lot of evidence, it is often reasonable to expect that part of what future evidence can tell us is already included in these updates. I think I wouldn’t say that the likelihood ratio is independent of the prior anymore. In most cases, they have a common dependency on past evidence.
Yep, that seems to be right. One minor caveat; instead of
I’d say something like:
“Past evidence affects how we interpret future evidence, sometimes weakening its impact.”
Thinking of the untrustworthy witness example, I wouldn’t say that “the witness’s testimony is already included in the fact that they are untrustworthy” (=”part of B’ already included in B”), but I would say “the fact they are untrustworthy affects how we interpret their testimony” (=”B affects how we interpret B’ ”).
But that’s a minor caveat.
This is kind of technically true, but not in a practical sense. As you learn more about most systems, the likelihood ratio should likely go down for each additional point of evidence. The likelihood ratio for an event X is after all P(X|E):P(¬X|E) where the E refers to all the previous observations you’ve made that are now integrated in your prior.
Usually when referring to “updating on En we use the likelihood ratio
P(En|E1,E2,...,En−1):P(¬En|E1,E2,...,En−1)which kind of makes it clear that this will depend on the order of the different Ei.
I’d be interested to see the assumptions which go into this. As Stuart has pointed out, it’s got to do with how correlated the evidence is. And for fat-tailed distributions we probably should expect to be surprised at a constant rate.