You can only conclude that food dye affects behavior with 84% confidence, rather than the 95% you desired.
Or rather, you can conclude that, if there were no effect of food dye on hyperactivity and we did this test a whole lotta times, then we’d get data like this 16% of the time, rather than beneath the 5%-of-the-time maximum cutoff you were hoping for.
It’s not so easy to jump from frequentist confidence intervals to confidence for or against a hypothesis. We’d need a bunch of assumptions. I don’t have access to the original article so I’ll just make shit up. Specifically, if I assume that we got the 84% confidence interval from a normal distribution in which it was centrally located and two-tailed, then the corresponding minimum Bayes Factor is 0.37 for the model {mean hyperactivity = baseline} versus the model {mean hyperactivity = baseline + food dye effect}. Getting to an actual confidence level in the hypothesis requires having a prior. Since I’m too ignorant of the subject material to have an intuitive sense of the appropriate prior, I’ll go with my usual here which is to charge 1 nat per parameter as a complexity penalty. And that weak complexity prior wipes out the evidence from this study.
So given these assumptions, the original article’s claim...
The results of this study indicate that artificial food colorings do not affect the behavior of school-age children who are claimed to be sensitive to these agents
Getting to an actual confidence level in the hypothesis requires having a prior.
All you’re saying is that studies should use Bayesian statistics. No medical journal articles use Bayesian statistics.
Given that the frequentist approach behind these tests is “correct”, the article’s claim is incorrect. The authors intended to use frequentist statistics, and so they made an error.
If a weak default complexity prior of 1 nat for 1 extra variable wipes out 84% confidence, that implies that many articles have incorrect conclusions, because 95% confidence might not be enough to account for a one-variable complexity penalty.
In any case, you are still incorrect, because your penalty cannot prove that the null hypothesis is correct. It can only make it harder to prove it’s incorrect. Failure to prove that it is incorrect is not proof that it is correct. Which is a key point of this post.
Nah, they’re welcome to use whichever statistics they like. We might point out interpretation errors, though, if they make any.
Under the assumptions I described, a p-value of 0.16 is about 0.99 nats of evidence which is essentially canceled by the 1 nat prior. A p-value of 0.05 under the same assumptions would be about 1.92 nats of evidence, so if there’s a lot of published science that matches those assumptions (which is dubious), then they’re merely weak evidence, not necessarily wrong.
It’s not the job of the complexity penalty to “prove the null hypothesis is correct”. Proving what’s right and what’s wrong is a job for evidence. The penalty was merely a cheap substitute for an informed prior.
Or rather, you can conclude that, if there were no effect of food dye on hyperactivity and we did this test a whole lotta times, then we’d get data like this 16% of the time, rather than beneath the 5%-of-the-time maximum cutoff you were hoping for.
It’s not so easy to jump from frequentist confidence intervals to confidence for or against a hypothesis. We’d need a bunch of assumptions. I don’t have access to the original article so I’ll just make shit up. Specifically, if I assume that we got the 84% confidence interval from a normal distribution in which it was centrally located and two-tailed, then the corresponding minimum Bayes Factor is 0.37 for the model {mean hyperactivity = baseline} versus the model {mean hyperactivity = baseline + food dye effect}. Getting to an actual confidence level in the hypothesis requires having a prior. Since I’m too ignorant of the subject material to have an intuitive sense of the appropriate prior, I’ll go with my usual here which is to charge 1 nat per parameter as a complexity penalty. And that weak complexity prior wipes out the evidence from this study.
So given these assumptions, the original article’s claim...
...would be correct.
All you’re saying is that studies should use Bayesian statistics. No medical journal articles use Bayesian statistics.
Given that the frequentist approach behind these tests is “correct”, the article’s claim is incorrect. The authors intended to use frequentist statistics, and so they made an error.
If a weak default complexity prior of 1 nat for 1 extra variable wipes out 84% confidence, that implies that many articles have incorrect conclusions, because 95% confidence might not be enough to account for a one-variable complexity penalty.
In any case, you are still incorrect, because your penalty cannot prove that the null hypothesis is correct. It can only make it harder to prove it’s incorrect. Failure to prove that it is incorrect is not proof that it is correct. Which is a key point of this post.
Nah, they’re welcome to use whichever statistics they like. We might point out interpretation errors, though, if they make any.
Under the assumptions I described, a p-value of 0.16 is about 0.99 nats of evidence which is essentially canceled by the 1 nat prior. A p-value of 0.05 under the same assumptions would be about 1.92 nats of evidence, so if there’s a lot of published science that matches those assumptions (which is dubious), then they’re merely weak evidence, not necessarily wrong.
It’s not the job of the complexity penalty to “prove the null hypothesis is correct”. Proving what’s right and what’s wrong is a job for evidence. The penalty was merely a cheap substitute for an informed prior.