I for one think that 0.05 is way too lax (other than for the purposes of seeing whenever it is worth it to conduct a bigger study and other such value-of-information related uses) and 0.05 results require rather carefully constructed meta-study to interpret correctly. Because a selection factor of 20 is well within the range attainable by dodgy practices that are almost impossible to prevent, and even in the absence of the dodgy practices, selection due to you being more likely to hear of something interesting.
I can only imagine considering it too strict if I were unaware of those issues or their importance (Bayesianism or not)
This goes much more so for weaker forms of information, such as “Here’s a plausible looking speculation I came up with”. To get anywhere with that kind of stuff one would need to somehow account for the preference towards specific lines of speculation.
edit: plus, effective cures in medicine are the ones supported by very very strong evidence, on par with particle physics (e.g. the same penicillin killing bacteria, you have really big sample sizes when you are dealing with bacteria). The weak stuff—antidepressants for which we don’t know if they lower or raise the risk of the suicide, and are uncertain whenever the effect is an artefact from using in any way whatsoever a depression score that includes weight loss and insomnia as symptoms when testing a drug that causes weight gain and sleepiness.
I think it is mostly because priors for finding a strongly effective drug are very low, so when large p-values are involved, you can only find low effect, near-placebo drugs.
edit2: Other issue is that many studies are plagued by at least some un-blinding that can modulate the placebo effect. So, I think a threshold on the strength of the effect (not just p-value) is also necessary—things that are within the potential systematic error margin from the placebo effect may mostly be a result of systematic error.
edit3: By the way, note that for a study of same size, stronger effect will result in much lower p-value, and so a higher standard on p-values does not interfere with detection of strong effects much. When you are testing an antibiotic… well, the chance probability of one bacterium dying in some short timespan may be 0.1, and with antibiotic at a fairly high concentration, 99.99999… . Needless to say, a dozen bacteria put you far beyond the standards from the particle physics, and a whole poisoned petri dish makes point moot, with all the unconfidence coming from the possibility of killing the bacteria in some other way.
I for one think that 0.05 is way too lax (other than for the purposes of seeing whenever it is worth it to conduct a bigger study and other such value-of-information related uses) and 0.05 results require rather carefully constructed meta-study to interpret correctly. Because a selection factor of 20 is well within the range attainable by dodgy practices that are almost impossible to prevent, and even in the absence of the dodgy practices, selection due to you being more likely to hear of something interesting.
I can only imagine considering it too strict if I were unaware of those issues or their importance (Bayesianism or not)
This goes much more so for weaker forms of information, such as “Here’s a plausible looking speculation I came up with”. To get anywhere with that kind of stuff one would need to somehow account for the preference towards specific lines of speculation.
edit: plus, effective cures in medicine are the ones supported by very very strong evidence, on par with particle physics (e.g. the same penicillin killing bacteria, you have really big sample sizes when you are dealing with bacteria). The weak stuff—antidepressants for which we don’t know if they lower or raise the risk of the suicide, and are uncertain whenever the effect is an artefact from using in any way whatsoever a depression score that includes weight loss and insomnia as symptoms when testing a drug that causes weight gain and sleepiness.
I think it is mostly because priors for finding a strongly effective drug are very low, so when large p-values are involved, you can only find low effect, near-placebo drugs.
edit2: Other issue is that many studies are plagued by at least some un-blinding that can modulate the placebo effect. So, I think a threshold on the strength of the effect (not just p-value) is also necessary—things that are within the potential systematic error margin from the placebo effect may mostly be a result of systematic error.
edit3: By the way, note that for a study of same size, stronger effect will result in much lower p-value, and so a higher standard on p-values does not interfere with detection of strong effects much. When you are testing an antibiotic… well, the chance probability of one bacterium dying in some short timespan may be 0.1, and with antibiotic at a fairly high concentration, 99.99999… . Needless to say, a dozen bacteria put you far beyond the standards from the particle physics, and a whole poisoned petri dish makes point moot, with all the unconfidence coming from the possibility of killing the bacteria in some other way.