How likely is it that you’d get a result this impressive just by chance if the effect you’re looking for isn’t actually there?
What they’re commonly taken to mean?
How likely is it, given the impressiveness of the result, that the effect you’re looking for is actually there?
That is, p-values measure Pr(observations | null hypothesis) whereas what you want is more like Pr(alternative hypothesis | observations).
(Actually, what you want is more like a probability distribution for the size of the effect—that’s the “overly binary* thing—but never mind that for now.)
So what are the relevant differences between these?
If your null hypothesis and alternative hypothesis are one another’s negations (as they’re supposed to be) then you’re looking at the relationship between Pr(A|B) and Pr(B|A). These are famously related by Bayes’ theorem, but they are certainly not the same thing. We have Pr(A|B) = Pr(A&B)/Pr(B) and Pr(B|A) = Pr(A&B)/Pr(A) so the ratio between the two is the ratio of probabilities of A and B. So, e.g., suppose you are interested in ESP and you do a study on precognition or something whose result has a p-value of 0.05. If your priors are like mine, your estimate of Pr(precognition) will still be extremely small because precognition is (in advance of the experimental evidence) much more unlikely than just randomly getting however many correct guesses it takes to get a p-value of 0.05.
In practice, the null hypothesis is usually something like “X =Y” or “X<=Y”. Then your alternative is “X /= Y” or “X > Y”. But in practice what you actually care about is that X and Y are substantially unequal, or X is substantially bigger than Y, and that’s probably the alternative you actually have in mind even if you’re doing statistical tests that just accept or reject the null hypothesis. So a small p-value may come from a very carefully measured difference that’s too small to care about. E.g., suppose that before you do your precognition study you think (for whatever reason) that precog is about as likely to be real as not. Then after the study results come in, you should in fact think it’s probably real. But if you then think “aha, time to book my flight to Las Vegas” you may be making a terrible mistake even if you’re right about precognition being real. Because maybe your study looked at someone predicting a million die rolls and they got 500 more right than you’d expect by chance; that would be very exciting scientifically but probably useless for casino gambling because it’s not enough to outweigh the house’s advantage.
What p-values actually mean:
How likely is it that you’d get a result this impressive just by chance if the effect you’re looking for isn’t actually there?
What they’re commonly taken to mean?
How likely is it, given the impressiveness of the result, that the effect you’re looking for is actually there?
That is, p-values measure Pr(observations | null hypothesis) whereas what you want is more like Pr(alternative hypothesis | observations).
(Actually, what you want is more like a probability distribution for the size of the effect—that’s the “overly binary* thing—but never mind that for now.)
So what are the relevant differences between these?
If your null hypothesis and alternative hypothesis are one another’s negations (as they’re supposed to be) then you’re looking at the relationship between Pr(A|B) and Pr(B|A). These are famously related by Bayes’ theorem, but they are certainly not the same thing. We have Pr(A|B) = Pr(A&B)/Pr(B) and Pr(B|A) = Pr(A&B)/Pr(A) so the ratio between the two is the ratio of probabilities of A and B. So, e.g., suppose you are interested in ESP and you do a study on precognition or something whose result has a p-value of 0.05. If your priors are like mine, your estimate of Pr(precognition) will still be extremely small because precognition is (in advance of the experimental evidence) much more unlikely than just randomly getting however many correct guesses it takes to get a p-value of 0.05.
In practice, the null hypothesis is usually something like “X =Y” or “X<=Y”. Then your alternative is “X /= Y” or “X > Y”. But in practice what you actually care about is that X and Y are substantially unequal, or X is substantially bigger than Y, and that’s probably the alternative you actually have in mind even if you’re doing statistical tests that just accept or reject the null hypothesis. So a small p-value may come from a very carefully measured difference that’s too small to care about. E.g., suppose that before you do your precognition study you think (for whatever reason) that precog is about as likely to be real as not. Then after the study results come in, you should in fact think it’s probably real. But if you then think “aha, time to book my flight to Las Vegas” you may be making a terrible mistake even if you’re right about precognition being real. Because maybe your study looked at someone predicting a million die rolls and they got 500 more right than you’d expect by chance; that would be very exciting scientifically but probably useless for casino gambling because it’s not enough to outweigh the house’s advantage.
[EDITED to fix a typo and clarify a bit.]
Thank you—I get it now.