I think part of the problem is that there is a single confidence threshold, usually 90%. The problem is that setting the threshold high enough to compensate for random flukes and file drawer effects causes problems when people start interpreting threshold—epsilon to mean the null hypothesis has been proven. Maybe it would be better to have two thresholds with results between them interpreted as inconclusive.
That is part of the problem. If it weren’t for using a cutoff, then it would be the case that “proving” ”! forall X P(X)” with high confidence would be evidence for “for many X, !P(X)”, as several of the comments below are claiming.
But even if they’d used some kind of Bayesian approach, assuming that all children are identical would still mean they were measuring evidence about the claim “X affects all Y”, and that evidence could not be used to conclusively refute the claim that X affects some fraction of Y.
Using a cutoff, though, isn’t an error. It’s a non-Bayesian statistical approach that loses a lot of information, but it can give useful answers. It would be difficult to use a Bayesian approach in any food toxicity study, because setting the priors would be a political problem. They did their statistical analysis correctly.
I think part of the problem is that there is a single confidence threshold, usually 90%. The problem is that setting the threshold high enough to compensate for random flukes and file drawer effects causes problems when people start interpreting threshold—epsilon to mean the null hypothesis has been proven. Maybe it would be better to have two thresholds with results between them interpreted as inconclusive.
That is part of the problem. If it weren’t for using a cutoff, then it would be the case that “proving” ”! forall X P(X)” with high confidence would be evidence for “for many X, !P(X)”, as several of the comments below are claiming.
But even if they’d used some kind of Bayesian approach, assuming that all children are identical would still mean they were measuring evidence about the claim “X affects all Y”, and that evidence could not be used to conclusively refute the claim that X affects some fraction of Y.
Using a cutoff, though, isn’t an error. It’s a non-Bayesian statistical approach that loses a lot of information, but it can give useful answers. It would be difficult to use a Bayesian approach in any food toxicity study, because setting the priors would be a political problem. They did their statistical analysis correctly.