Another problem with NHST in particular: the choice of a null and a null distribution is itself a modeling assumption, but is rarely checked and in real-world datasets, it’s entirely possible for the null distribution to be much more extreme than assumed and hence the nominal alpha/false-positive-conditional-on-null error rates are incorrect & too forgiving. Two links on that:
Another problem with NHST in particular: the choice of a null and a null distribution is itself a modeling assumption, but is rarely checked and in real-world datasets, it’s entirely possible for the null distribution to be much more extreme than assumed and hence the nominal alpha/false-positive-conditional-on-null error rates are incorrect & too forgiving. Two links on that:
“Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis”, Efron 2004
“Interpreting observational studies: why empirical calibration is needed to correct p-values”, Schuemie et al 2012 (excerpts)