Another way of asking my question, perhaps more clearly, is: how do we know if the 60 considered studies were testing the hypothesis that there was a link or the hypothesis that there was not a link?
There is an asymmetry that makes it implausible that the null hypothesis would be that there is an effect. The null hypothesis has to be a definite value. The null hypothesis can be zero, which is what we think it is, or it could be some specific value, like a 10% increase in autism. But the null hypothesis cannot be “there is some effect of unspecified magnitude.” There is no data that can disprove that hypothesis, because it includes effects arbitrarily close to zero. But that can be the positive hypothesis, because it is possible to disprove the complementary null hypothesis, namely zero.
Another more symmetric way of phrasing it is that we do the study and compute a confidence interval, that we are 95% confident that the effect size is in that interval. That step does not depend on the choice of hypothesis. But what do we do with this interval? We reject every hypothesis not in the interval. If zero is not in the interval, we reject it. If a 10% increase is not in the interval, we can reject that. But we cannot reject all nonzero effect sizes at once.
I see. I was confused for a while, but in the hypothetical examples I was considering, a link between MMR and autism might be missed (a false negative with 5% probability) but isn’t going to found unless it was there (low false positive). Then Vanviver explains, above, that the canonical null-hypothesis framework assumes that random chance will make it look like there is an effect with some probability—so it is the false positive rate you can tune with your sample size.
I marginally understand this. For example, I can’t really zoom out and see why you can’t define your test so that the false positive rate is low instead. That’s OK. I do understand your example and see that it is relevant for the null-hypothesis framework. (My background in statistics is not strong and I do not have much time to dedicate to this right now.)
how do we know if the 60 considered studies were testing the hypothesis that there was a link or the hypothesis that there was not a link?
I think the answer to this is “because they’re using NHST.” They say “we couldn’t detect an effect at the level that random chance would give us 5% of the time, thus we are rather confident there is no effect.” But that we don’t see our 5% false positives suggests that something about the system is odd.
How does one know that the 60 studies are these? (rather then the others (e.g., that were designed to show an effect with 95% probability, but failed to do so and thus got a negative result)).
Another way of asking my question, perhaps more clearly, is: how do we know if the 60 considered studies were testing the hypothesis that there was a link or the hypothesis that there was not a link?
There is an asymmetry that makes it implausible that the null hypothesis would be that there is an effect. The null hypothesis has to be a definite value. The null hypothesis can be zero, which is what we think it is, or it could be some specific value, like a 10% increase in autism. But the null hypothesis cannot be “there is some effect of unspecified magnitude.” There is no data that can disprove that hypothesis, because it includes effects arbitrarily close to zero. But that can be the positive hypothesis, because it is possible to disprove the complementary null hypothesis, namely zero.
Another more symmetric way of phrasing it is that we do the study and compute a confidence interval, that we are 95% confident that the effect size is in that interval. That step does not depend on the choice of hypothesis. But what do we do with this interval? We reject every hypothesis not in the interval. If zero is not in the interval, we reject it. If a 10% increase is not in the interval, we can reject that. But we cannot reject all nonzero effect sizes at once.
(I realize I’m confused about something and am thinking it through for a moment.)
I see. I was confused for a while, but in the hypothetical examples I was considering, a link between MMR and autism might be missed (a false negative with 5% probability) but isn’t going to found unless it was there (low false positive). Then Vanviver explains, above, that the canonical null-hypothesis framework assumes that random chance will make it look like there is an effect with some probability—so it is the false positive rate you can tune with your sample size.
I marginally understand this. For example, I can’t really zoom out and see why you can’t define your test so that the false positive rate is low instead. That’s OK. I do understand your example and see that it is relevant for the null-hypothesis framework. (My background in statistics is not strong and I do not have much time to dedicate to this right now.)
I think the answer to this is “because they’re using NHST.” They say “we couldn’t detect an effect at the level that random chance would give us 5% of the time, thus we are rather confident there is no effect.” But that we don’t see our 5% false positives suggests that something about the system is odd.
OK, that sounds straightforward.
How does one know that the 60 studies are these? (rather then the others (e.g., that were designed to show an effect with 95% probability, but failed to do so and thus got a negative result)).