I don’t think that it’s necessarily suspicious in that, a priori, I wouldn’t have a problem with 60 tests all being negative even though they’re all only 95% confident.
The reason being, depending on the nature of the test, the probability of a false negative might indeed be 5% while the probability of a false positive could be tiny. Suppose this is indeed the case and let’s consider the two cases that the true answer is either ‘positive’ or ‘negative’.
(A) if the true conclusion is ‘positive’, any test can yield a negative with 5% probability. (this test will be reported as a negative with 95% confidence, though one would expect most tests to yield the positive conclusion.)
(B) if the true conclusion is ‘negative’, any test that yields a negative will still be reported with the 95% confidence because of the possibility of case (A). Though if it is case (B), we should not expect any positive conclusion, even over 60 tests, because the false-positive rate is so low.
I have no idea if this lack of symmetry is the case for the set of MMR and autism studies. (It probably isn’t—so I apologize that I am probably accomplishing nothing but making it more difficult to argue what is likely a true intuition.)
But it is easy to think of an example where this asymmetry would apply: consider that you are searching for someone that you know well in a crowd, but you are not sure they are there. Consider a test to be looking for them over a 15 minute period, and you estimate that if they are there, you are likely to find them during that 15 minute period with 95% probability. Suppose they are there but you don’t find them in 15 minutes—that is a false negative with 5% probability. Supopse they are not there and you do not find them—you again say they are not there with 95% probability. But in this case where they are not there, even if you have 60 people looking for them over 15 minutes, no one will find them because the probability of a false positive is pretty much zero.
(I do see where you addressed false positives versus false negatives in several places, so this explanation was not for you specifically since I know you are familiar with this. But it is not so clear which is which in these studies from the top, and it is fleshing this out that will ultimately make the argument more difficult, but more water-tight.)
No, that 5% is the probability of a false positive, [...]
No, “that” 5% is the probability from my cooked-up example, which was the probability of a false-negative.
You’re saying (and Phil says also in several places) that in his example the 5% is the probability of a false positive. I don’t disagree, a priori, but I would like to know, how do we know this? This is a necessary component of the full argument that seems to be missing so far.
Another way of asking my question, perhaps more clearly, is: how do we know if the 60 considered studies were testing the hypothesis that there was a link or the hypothesis that there was not a link?
There is an asymmetry that makes it implausible that the null hypothesis would be that there is an effect. The null hypothesis has to be a definite value. The null hypothesis can be zero, which is what we think it is, or it could be some specific value, like a 10% increase in autism. But the null hypothesis cannot be “there is some effect of unspecified magnitude.” There is no data that can disprove that hypothesis, because it includes effects arbitrarily close to zero. But that can be the positive hypothesis, because it is possible to disprove the complementary null hypothesis, namely zero.
Another more symmetric way of phrasing it is that we do the study and compute a confidence interval, that we are 95% confident that the effect size is in that interval. That step does not depend on the choice of hypothesis. But what do we do with this interval? We reject every hypothesis not in the interval. If zero is not in the interval, we reject it. If a 10% increase is not in the interval, we can reject that. But we cannot reject all nonzero effect sizes at once.
I see. I was confused for a while, but in the hypothetical examples I was considering, a link between MMR and autism might be missed (a false negative with 5% probability) but isn’t going to found unless it was there (low false positive). Then Vanviver explains, above, that the canonical null-hypothesis framework assumes that random chance will make it look like there is an effect with some probability—so it is the false positive rate you can tune with your sample size.
I marginally understand this. For example, I can’t really zoom out and see why you can’t define your test so that the false positive rate is low instead. That’s OK. I do understand your example and see that it is relevant for the null-hypothesis framework. (My background in statistics is not strong and I do not have much time to dedicate to this right now.)
how do we know if the 60 considered studies were testing the hypothesis that there was a link or the hypothesis that there was not a link?
I think the answer to this is “because they’re using NHST.” They say “we couldn’t detect an effect at the level that random chance would give us 5% of the time, thus we are rather confident there is no effect.” But that we don’t see our 5% false positives suggests that something about the system is odd.
How does one know that the 60 studies are these? (rather then the others (e.g., that were designed to show an effect with 95% probability, but failed to do so and thus got a negative result)).
I don’t think that it’s necessarily suspicious in that, a priori, I wouldn’t have a problem with 60 tests all being negative even though they’re all only 95% confident.
The reason being, depending on the nature of the test, the probability of a false negative might indeed be 5% while the probability of a false positive could be tiny. Suppose this is indeed the case and let’s consider the two cases that the true answer is either ‘positive’ or ‘negative’.
(A) if the true conclusion is ‘positive’, any test can yield a negative with 5% probability. (this test will be reported as a negative with 95% confidence, though one would expect most tests to yield the positive conclusion.)
(B) if the true conclusion is ‘negative’, any test that yields a negative will still be reported with the 95% confidence because of the possibility of case (A). Though if it is case (B), we should not expect any positive conclusion, even over 60 tests, because the false-positive rate is so low.
I have no idea if this lack of symmetry is the case for the set of MMR and autism studies. (It probably isn’t—so I apologize that I am probably accomplishing nothing but making it more difficult to argue what is likely a true intuition.)
But it is easy to think of an example where this asymmetry would apply: consider that you are searching for someone that you know well in a crowd, but you are not sure they are there. Consider a test to be looking for them over a 15 minute period, and you estimate that if they are there, you are likely to find them during that 15 minute period with 95% probability. Suppose they are there but you don’t find them in 15 minutes—that is a false negative with 5% probability. Supopse they are not there and you do not find them—you again say they are not there with 95% probability. But in this case where they are not there, even if you have 60 people looking for them over 15 minutes, no one will find them because the probability of a false positive is pretty much zero.
(I do see where you addressed false positives versus false negatives in several places, so this explanation was not for you specifically since I know you are familiar with this. But it is not so clear which is which in these studies from the top, and it is fleshing this out that will ultimately make the argument more difficult, but more water-tight.)
No, that 5% is the probability of false positive, not the probability of false negative. Phil has the number he needs and uses it correctly.
Which 5%?
No, “that” 5% is the probability from my cooked-up example, which was the probability of a false-negative.
You’re saying (and Phil says also in several places) that in his example the 5% is the probability of a false positive. I don’t disagree, a priori, but I would like to know, how do we know this? This is a necessary component of the full argument that seems to be missing so far.
Another way of asking my question, perhaps more clearly, is: how do we know if the 60 considered studies were testing the hypothesis that there was a link or the hypothesis that there was not a link?
There is an asymmetry that makes it implausible that the null hypothesis would be that there is an effect. The null hypothesis has to be a definite value. The null hypothesis can be zero, which is what we think it is, or it could be some specific value, like a 10% increase in autism. But the null hypothesis cannot be “there is some effect of unspecified magnitude.” There is no data that can disprove that hypothesis, because it includes effects arbitrarily close to zero. But that can be the positive hypothesis, because it is possible to disprove the complementary null hypothesis, namely zero.
Another more symmetric way of phrasing it is that we do the study and compute a confidence interval, that we are 95% confident that the effect size is in that interval. That step does not depend on the choice of hypothesis. But what do we do with this interval? We reject every hypothesis not in the interval. If zero is not in the interval, we reject it. If a 10% increase is not in the interval, we can reject that. But we cannot reject all nonzero effect sizes at once.
(I realize I’m confused about something and am thinking it through for a moment.)
I see. I was confused for a while, but in the hypothetical examples I was considering, a link between MMR and autism might be missed (a false negative with 5% probability) but isn’t going to found unless it was there (low false positive). Then Vanviver explains, above, that the canonical null-hypothesis framework assumes that random chance will make it look like there is an effect with some probability—so it is the false positive rate you can tune with your sample size.
I marginally understand this. For example, I can’t really zoom out and see why you can’t define your test so that the false positive rate is low instead. That’s OK. I do understand your example and see that it is relevant for the null-hypothesis framework. (My background in statistics is not strong and I do not have much time to dedicate to this right now.)
I think the answer to this is “because they’re using NHST.” They say “we couldn’t detect an effect at the level that random chance would give us 5% of the time, thus we are rather confident there is no effect.” But that we don’t see our 5% false positives suggests that something about the system is odd.
OK, that sounds straightforward.
How does one know that the 60 studies are these? (rather then the others (e.g., that were designed to show an effect with 95% probability, but failed to do so and thus got a negative result)).