In statistical significance testing, the p-value is the probability of obtaining a test statistic result at least as extreme as the one that was actually observed, assuming that the null hypothesis is true.[1][2] A researcher will often “reject the null hypothesis” when the p-value turns out to be less than a predetermined significance level, often 0.05[3][4] or 0.01.
Quoting authorities without further commentary is a dick thing to do. I am going to spend more words speculating about the intention of the quote than are in the quote, let alone that you bothered to type.
I have no idea what you think is relevant about that passage. It says exactly what I said, except transformed from the effect size scale to the p-value scale. But somehow I doubt that’s why you posted it. The most common problem in the comments on this thread is that people confuse false positive rate with false negative rate, so my best guess is that you are making that mistake and thinking the passage supports that error (though I have no idea why you’re telling me). Another possibility, slightly more relevant to this subthread, is that you’re pointing out that some people use other p-values. But in medicine, they don’t. They almost always use 95%, though sometimes 90%.
So if I set size at 5%, collect the data, and run the test, and repeat the whole experiment with fresh data multiple times, should I expect that, if the null hypothesis is true, the test accepts exactly %5 of times, or at most 5% of times?
If the null hypothesis is simple (that is, if it picks out a single point in the hypothesis space), and the model assumptions are true blah blah blah, then the test (falsely) rejects the null with exactly 5% probability. If the null is composite (comprises a non-singleton subset of parameter space), and there is no nice reduction to a simple null via mathematical tricks like sufficiency or the availability of a pivot, then the test falsely rejects the null with at most 5% probability.
But that’s all very technical; somewhat less technically, almost always, a bootstrap procedure is available that obviates these questions and gets you to “exactly 5%”… asymptotically. Here “asymptotically” means “if the sample size is big enough”. This just throws the question onto “how big is big enough,” and that’s context-dependent. And all of this is about one million times less important than the question of how well each study addresses systematic biases, which is an issue of real, actual study design and implementation rather than mathematical statistical theory.
According to Wikipedia:
Quoting authorities without further commentary is a dick thing to do. I am going to spend more words speculating about the intention of the quote than are in the quote, let alone that you bothered to type.
I have no idea what you think is relevant about that passage. It says exactly what I said, except transformed from the effect size scale to the p-value scale. But somehow I doubt that’s why you posted it. The most common problem in the comments on this thread is that people confuse false positive rate with false negative rate, so my best guess is that you are making that mistake and thinking the passage supports that error (though I have no idea why you’re telling me). Another possibility, slightly more relevant to this subthread, is that you’re pointing out that some people use other p-values. But in medicine, they don’t. They almost always use 95%, though sometimes 90%.
My confusion is about “at least” vs. “exactly”. See my answer to Cyan.
You want size), not p-value. The difference is that size is a “pre-data” (or “design”) quantity, while the p-value is post-data, i.e., data-dependent.
Thanks.
So if I set size at 5%, collect the data, and run the test, and repeat the whole experiment with fresh data multiple times, should I expect that, if the null hypothesis is true, the test accepts exactly %5 of times, or at most 5% of times?
If the null hypothesis is simple (that is, if it picks out a single point in the hypothesis space), and the model assumptions are true blah blah blah, then the test (falsely) rejects the null with exactly 5% probability. If the null is composite (comprises a non-singleton subset of parameter space), and there is no nice reduction to a simple null via mathematical tricks like sufficiency or the availability of a pivot, then the test falsely rejects the null with at most 5% probability.
But that’s all very technical; somewhat less technically, almost always, a bootstrap procedure is available that obviates these questions and gets you to “exactly 5%”… asymptotically. Here “asymptotically” means “if the sample size is big enough”. This just throws the question onto “how big is big enough,” and that’s context-dependent. And all of this is about one million times less important than the question of how well each study addresses systematic biases, which is an issue of real, actual study design and implementation rather than mathematical statistical theory.