If all children are made hyperactive, the means will be different, and the study rejects this hypothesis. But if 99% of children are made hyperactive to the same degree, the means will be different by almost the same amount, and the test would also reject this hypothesis, though not as strongly.
This is math. You can’t say “If 2+2 = 4, then 2+1.9 = 4.” There is no “as strongly” being reported here. There is only accept or reject.
The study rejects a hypothesis using a specific number that was computed using the assumption that the effect is the same in all children. That specific number is not the correct number number to reject the hypothesis that the effect is the same in all but one.
It might so happen that the data used in the study would reject that hypothesis, if the correct threshold for it were computed. But the study did not do that, so it cannot claim to have proven that.
The reality in this case is that food dye promotes hyperactivity in around 15% of children. The correct F-value threshold to reject that hypothesis would be much, much lower!
You’re correct in a broader sense that passing the F-test under one set of assumptions is strong evidence that you’ll pass it with a similar set of assumptions. But papers such as this use logic and math in order to say things precisely, and while what they claimed is supported, and similar to, what they proved, it isn’t the same thing, so it’s still an error, just as 3.9 is similar to 4 for most purposes, but it is an error to say that 2 + 1.9 = 4.
The thing is, some such reasoning has to be done in any case to interpret the paper. Even if no logical mistake was made, the F-test can’t possibly disprove a hypothesis such as “the means of these two distributions are different”. There is always room for an epsilon difference in the means to be compatible with the data. A similar objection was stated elsewhere on this thread already:
The failure to reject a null hypothesis is a failure. It doesn’t allow or even encourage you to conclude anything.
And of course it’s legitimate to give up at this step and say “the null hypothesis has not been rejected, so we have nothing to say”. But if we don’t do this, then our only recourse is to say something like: “with 95% certainty, the difference in means is less than X”. In other words, we may be fairly certain that 2 + 1.9 is less than 5, and we’re a bit less certain that 2 + 1.9 is less than 4, as well.
Incidentally, is there some standard statistical test that produces this kind of output?
This is math. You can’t say “If 2+2 = 4, then 2+1.9 = 4.” There is no “as strongly” being reported here. There is only accept or reject.
The study rejects a hypothesis using a specific number that was computed using the assumption that the effect is the same in all children. That specific number is not the correct number number to reject the hypothesis that the effect is the same in all but one.
It might so happen that the data used in the study would reject that hypothesis, if the correct threshold for it were computed. But the study did not do that, so it cannot claim to have proven that.
The reality in this case is that food dye promotes hyperactivity in around 15% of children. The correct F-value threshold to reject that hypothesis would be much, much lower!
I don’t think we actually disagree.
Edit: Nor does reality disagree with either of us.
You’re correct in a broader sense that passing the F-test under one set of assumptions is strong evidence that you’ll pass it with a similar set of assumptions. But papers such as this use logic and math in order to say things precisely, and while what they claimed is supported, and similar to, what they proved, it isn’t the same thing, so it’s still an error, just as 3.9 is similar to 4 for most purposes, but it is an error to say that 2 + 1.9 = 4.
The thing is, some such reasoning has to be done in any case to interpret the paper. Even if no logical mistake was made, the F-test can’t possibly disprove a hypothesis such as “the means of these two distributions are different”. There is always room for an epsilon difference in the means to be compatible with the data. A similar objection was stated elsewhere on this thread already:
And of course it’s legitimate to give up at this step and say “the null hypothesis has not been rejected, so we have nothing to say”. But if we don’t do this, then our only recourse is to say something like: “with 95% certainty, the difference in means is less than X”. In other words, we may be fairly certain that 2 + 1.9 is less than 5, and we’re a bit less certain that 2 + 1.9 is less than 4, as well.
Incidentally, is there some standard statistical test that produces this kind of output?