Sorry, ambiguous wording. 0.05 is too weak, and should be replaced with, say, 0.005. It would be a better scientific investment to do fewer studies with twice as many subjects and have nearly all the reported results be replicable. Unfortunately, this change has to be standardized within a field, because otherwise you’re deliberately handicapping yourself in an arms race. This probably deserves its own post.
In my head, I always translate so-called “statistically significant” results into (an often poorly-computed approximation to) a likelihood ratio of 0.05 over the null hypothesis. I believe that experiments should report likelihood ratios.
I am an infinite set atheist—have you ever actually seen an infinite set?
I am a “subjective/objective” Bayesian. If we are ignorant about a phenomenon, this is a fact about our state of mind, not a fact about the phenomenon. Probabilities are in the mind, not in the environment. Nonetheless I follow a correspondence, rather than a coherentist, theory of truth: we are trying to concentrate as much subjective probability mass as possible into (the mental representation that corresponds to) the real state of affairs. See my “The Simple Truth” and “A Technical Explanation of Technical Explanation”.
Sorry, ambiguous wording. 0.05 is too weak, and should be replaced with, say, 0.005. It would be a better scientific investment to do fewer studies with twice as many subjects and have nearly all the reported results be replicable.
I’d rather prefer two studies with 0.05% on the same claim by different scientifists to one study with 0.005%. Proving replicable of scientific studies with actually replicating them is better than going for a even lower p value.
Experiments that are literal replications of previously published experiments are very seldom published—I do not believe I have ever seen one. Others who have done systematic searches for examples of them confirm that they are rare (Mahoney, 1976; Sterling, 1959)....PhD committees generally expect more from dissertations than the replication of someone else’s findings. Evidence suggests that manuscripts that report only replication experiments are likely to get negative reactions from journal reviewers and editors alike (Neuliep & Crandall, 1990, 1993)
I wouldn’t. Two studies opens the door to publication bias concerns
Agreed. It’s much easier for a false effect to garner two ‘statistically significant’ studies with p < .05 than to gain one statistically significant study with p < .005 (though you really want p < .0001).
I wouldn’t. Two studies opens the door to publication bias concerns and muddles the ‘replication’: rarely do people do a straight replication.
If you put the general significance standard at P<0.005 you will even further decrease the amount of straight replications. We need more straight replication instead of less.
A single study can wrong due to systematic bias. One researcher could engage in fraud and therefore get a P<0.005 result. He could also simply be bad at blinding his subjects properly.
There are many possible ways to get a P<0.005 result by messing up the underlying science in a way that you can’t see by reading a paper.
Having a second researcher reproduce the effects is vital to know that the first result is not due to some error in the experiment setup of the first study.
Sorry, ambiguous wording. 0.05 is too weak, and should be replaced with, say, 0.005. It would be a better scientific investment to do fewer studies with twice as many subjects and have nearly all the reported results be replicable. Unfortunately, this change has to be standardized within a field, because otherwise you’re deliberately handicapping yourself in an arms race. This probably deserves its own post.
In my head, I always translate so-called “statistically significant” results into (an often poorly-computed approximation to) a likelihood ratio of 0.05 over the null hypothesis. I believe that experiments should report likelihood ratios.
I am an infinite set atheist—have you ever actually seen an infinite set?
I am a “subjective/objective” Bayesian. If we are ignorant about a phenomenon, this is a fact about our state of mind, not a fact about the phenomenon. Probabilities are in the mind, not in the environment. Nonetheless I follow a correspondence, rather than a coherentist, theory of truth: we are trying to concentrate as much subjective probability mass as possible into (the mental representation that corresponds to) the real state of affairs. See my “The Simple Truth” and “A Technical Explanation of Technical Explanation”.
I’d rather prefer two studies with 0.05% on the same claim by different scientifists to one study with 0.005%. Proving replicable of scientific studies with actually replicating them is better than going for a even lower p value.
I wouldn’t. Two studies opens the door to publication bias concerns and muddles the ‘replication’: rarely do people do a straight replication.
From Nickerson in http://lesswrong.com/lw/g13/against_nhst/
Agreed. It’s much easier for a false effect to garner two ‘statistically significant’ studies with p < .05 than to gain one statistically significant study with p < .005 (though you really want p < .0001).
If you put the general significance standard at P<0.005 you will even further decrease the amount of straight replications. We need more straight replication instead of less.
A single study can wrong due to systematic bias. One researcher could engage in fraud and therefore get a P<0.005 result. He could also simply be bad at blinding his subjects properly. There are many possible ways to get a P<0.005 result by messing up the underlying science in a way that you can’t see by reading a paper.
Having a second researcher reproduce the effects is vital to know that the first result is not due to some error in the experiment setup of the first study.