I’m currently learning about hypothesis testing in my statistics class. The idea is that you perform some test and you use the results of that test to calculate:
P(data at least as extreme as your data | Null hypothesis)
This is the p-value. If the p-value is below a certain threshold then you can reject the null hypothesis (which is the complement of the hypothesis that you are trying to test).
Put another way:
P(data | hypothesis) = 1 - p-value
and if 1 - p-value is high enough then you accept the hypothesis. (My use of “data” is handwaving and not quite correct but it doesn’t matter.)
But it seems more useful to me to calculate P(hypothesis | data). And that’s not quite the same thing.
So what I’m wondering is whether under frequentism P(hypothesis | data) is actually meaningless. The hypothesis is either true or false and depending on whether its true or not the data has a certain propensity of turning out one way or the other. Its meaningless to ask what the probability of the hypothesis is, you can only ask what the probability of obtaining your data is under certain assumptions.
I’m currently learning about hypothesis testing in my statistics class. The idea is that you perform some test and you use the results of that test to calculate:
P(data at least as extreme as your data | Null hypothesis)
This is the p-value. If the p-value is below a certain threshold then you can reject the null hypothesis.
This is correct.
Put another way:
P(data | hypothesis) = 1 - p-value
and if 1 - p-value is high enough then you accept the hypothesis. (My use of “data” is handwaving and not quite correct but it doesn’t matter.)
This is not correct. You seem to be under the impression that
complement(null hypothesis) may not have a well-defined distribution (frequentists might especially object to defining a prior here), and
even if complement(null hypothesis) were well defined, the sum could fall anywhere in the closed interval [0, 2].
More generally, most people (both frequentists and bayesians) would object to “accepting the hypothesis” based on rejecting the null, because rejecting the null means exactly what it says, and no more. You cannot conclude that an alternative hypothesis (such as the complement of the null) has higher likelihood or probability.
But it seems more useful to me to calculate P(hypothesis | data).
That may be true if you have little influence over what data is available.
Frequentists are mainly interested in situations where they can create experiments that cause P(hypothesis) to approach 0 or 1. The p-value is intended to be good at deciding whether the hypothesis has been adequately tested, not at deciding whether to believe the hypothesis given crappy data.
So what I’m wondering is whether under frequentism P(hypothesis | data) is actually meaningless. The hypothesis is either true or false and depending on whether its true or not the data has a certain propensity of turning out one way or the other. Its meaningless to ask what the probability of the hypothesis is, you can only ask what the probability of obtaining your data is under certain assumptions.
is correct. Frequentists do indeed claim that P(hypothesis | data) is meaningless for exactly the reasons you gave. However there are some little details in the rest of your post that are incorrect.
null hypothesis (which is the complement of the hypothesis that you are trying to test).
The hypothesis you are trying to test is typically not the complement of the null hypothesis. For example we could have:
H0: theta=0
H1:theta>0
where theta is some variable that we care about. Note that the region theta<0 isn’t in either hypothesis. If we were instead testing
H1′:theta isn’t equal to 0
then frequentists would suggest a different test. They would use a one-tailed test to test H1 and a two-tailed test to test H1′. See here.
P(data | hypothesis) = 1 - p-value
No. This is just mathematically wrong. P(A|B) is not necessarily equal to 1-P(A|¬B). Just think about it for a bit and you’ll see why. If that doesn’t work, take A=”sky is blue” and B=”my car is red” and note that P(A|B)=P(A|¬B)~1.
So what I’m wondering is whether under frequentism P(hypothesis | data) is actually meaningless.
It’s not meaningless, but people who follow R. A. Fisher’s ideas for rejecting the null do not use p(hypothesis | data). “Meaningless” would be if frequentists literally did not have p(hypothesis | data) in their language, which is not true because they use probability theory just like everybody else.
Don’t ask lesswrong about what frequentists claim, ask frequentists. Very few people on lesswrong are statisticians.
“Meaningless” would be if frequentists literally did not have p(hypothesis | data) in their language, which is not true because they use probability theory just like everybody else.
Many frequentists do insist that P(hypothesis) are meaningless, despite “using probability theory.”
There are two misconceptions that you must be aware of, as you will certainly hear these. The first is thinking that we calculate the probability of the null hypothesis being true or false. Whether the null hypothesis is true or false is not subject to chance; it either is true or it is false—there is no probability of one or the other.
So from this statement you conclude that frequentists think P(hypothesis) is meaningless? Bayesians assign degrees of belief to things that are actually true or false also. The coin really is either fair or not fair, but you will never find out with finite trials. This is a map/territory distinction, I am surprised you didn’t get it. This quote has nothing to do with B/F differences.
A Bayesian version of this quote would point out that it is a type error to confuse the truth value of the underlying thing, and the belief about this truth value.
You have successfully explained why it is irrational for frequentists to consider P(hypothesis) meaningless. And yet they do. They would say that probabilities can only be defined as limiting frequencies in repeated experiments, and that for a typical hypothesis there is no experiment you can rerun to get a sample for the truth of the hypothesis.
Yes, you’re right. Clearly many people who identify as frequentists do hold P(hypothesis) to be meaningful. There are statisticians all over the B/F spectrum as well as not on the spectrum at all. So when I said “frequentists believe …” I could never really be correct because various frequentists believe various different things.
Perhaps we could agree on the following statement: “Probabilities such as P(hypothesis) are never needed to do frequentist analysis.”
For example, the link you gave suggests the following as a characterisation of frequentism:
Goal of Frequentist Inference: Construct procedure with frequency guarantees. (For example, confidence intervals.)
Since frequency guarantees are typically of the form “for each possible true value of theta doing the construction blah on the data will, with probability at least 1-p, yield a result with property blah”. Then since this must hold true for each theta, the distribution for the true value of theta is irrelevant.
I could never really be correct because various frequentists believe various different things.
The interesting questions to me are: (a) “what is the steelman of the frequentist position?” (folks like Larry are useful here), and (b) “are there actually prominent frequentist statisticians who say stupid things?”
By (b) I mean “actually stupid under any reasonable interpretation.”
Clearly many people who identify as frequentists
Quote from the url I linked:
One thing that has harmed statistics — and harmed science — is identity statistics. By this I mean that some
people identify themselves as “Bayesians” or “Frequentists.” Once you attach a label to yourself, you have
painted yourself in a corner.
When I was a student, I took a seminar course from Art Dempster. He was the one who suggested to me that
it was silly to describe a person as being Bayesian of Frequentist. Instead, he suggested that we describe a
particular data analysis as being Bayesian of Frequentist. But we shouldn’t label a person that way.
I think Art’s advice was very wise.
“Keep your identity small”—advice familiar to a LW audience.
Perhaps we could agree on the following statement: “Probabilities such as P(hypothesis) are never needed to do
frequentist analysis.”
I guess you disagree with Larry’s take: B vs F is about goals not methods. I could do Bayesian looking things while having a frequentist interpretation in mind.
In the spirit of collaborative argumentation, can we agree on the following:
We have better things to do than engage in identity politics.
But it seems more useful to me to calculate P(hypothesis | data). And that’s not quite the same thing.
It is not the same thing and knowing P(hypothesis | data) would be very useful. Unfortunately, it is also very hard to estimate because usually the best you can do is calculate the probability, given the data, of a hypothesis out of a fixed set of hypotheses which you know about and for which you can estimate probabilities. If your understanding of the true data-generation process is not so good (which is very common in real life) your P(hypothesis | data) is going to be pretty bad and what’s worse, you have no idea how bad it is.
Not having a good grasp on the set of all hypotheses does not distinguish bayesians from frequentists and does not seem to me to motivate any difference in their methodologies.
Added: I don’t think it has much to do with the original comment, but testing a model without specific competition is called “model checking.” It is a common frequentist complaint that bayesians don’t do it. I don’t think that this is an accurate complaint, but it is true that it is easier to fit it into a frequentist framework than a bayesian framework.
I have said nothing about the differences between bayesians and frequentists. I just pointed out some issues with trying to estimate P(hypothesis | data).
Am I confused about frequentism?
I’m currently learning about hypothesis testing in my statistics class. The idea is that you perform some test and you use the results of that test to calculate:
P(data at least as extreme as your data | Null hypothesis)
This is the p-value. If the p-value is below a certain threshold then you can reject the null hypothesis (which is the complement of the hypothesis that you are trying to test).
Put another way:
P(data | hypothesis) = 1 - p-value
and if 1 - p-value is high enough then you accept the hypothesis. (My use of “data” is handwaving and not quite correct but it doesn’t matter.)
But it seems more useful to me to calculate P(hypothesis | data). And that’s not quite the same thing.
So what I’m wondering is whether under frequentism P(hypothesis | data) is actually meaningless. The hypothesis is either true or false and depending on whether its true or not the data has a certain propensity of turning out one way or the other. Its meaningless to ask what the probability of the hypothesis is, you can only ask what the probability of obtaining your data is under certain assumptions.
This is correct.
This is not correct. You seem to be under the impression that
P(data | null hypothesis) + P(data | complement(null hypothesis)) = 1,
but this is not true because
complement(null hypothesis) may not have a well-defined distribution (frequentists might especially object to defining a prior here), and
even if complement(null hypothesis) were well defined, the sum could fall anywhere in the closed interval [0, 2].
More generally, most people (both frequentists and bayesians) would object to “accepting the hypothesis” based on rejecting the null, because rejecting the null means exactly what it says, and no more. You cannot conclude that an alternative hypothesis (such as the complement of the null) has higher likelihood or probability.
Huh? P(X|Y) + P(X|Y’) = P(X) and an event that has already occurred has a probably of one. Am I missing something?
That may be true if you have little influence over what data is available.
Frequentists are mainly interested in situations where they can create experiments that cause P(hypothesis) to approach 0 or 1. The p-value is intended to be good at deciding whether the hypothesis has been adequately tested, not at deciding whether to believe the hypothesis given crappy data.
Your conclusion
is correct. Frequentists do indeed claim that P(hypothesis | data) is meaningless for exactly the reasons you gave. However there are some little details in the rest of your post that are incorrect.
The hypothesis you are trying to test is typically not the complement of the null hypothesis. For example we could have:
where theta is some variable that we care about. Note that the region theta<0 isn’t in either hypothesis. If we were instead testing
then frequentists would suggest a different test. They would use a one-tailed test to test H1 and a two-tailed test to test H1′. See here.
No. This is just mathematically wrong. P(A|B) is not necessarily equal to 1-P(A|¬B). Just think about it for a bit and you’ll see why. If that doesn’t work, take A=”sky is blue” and B=”my car is red” and note that P(A|B)=P(A|¬B)~1.
It’s not meaningless, but people who follow R. A. Fisher’s ideas for rejecting the null do not use p(hypothesis | data). “Meaningless” would be if frequentists literally did not have p(hypothesis | data) in their language, which is not true because they use probability theory just like everybody else.
Don’t ask lesswrong about what frequentists claim, ask frequentists. Very few people on lesswrong are statisticians.
Many frequentists do insist that P(hypothesis) are meaningless, despite “using probability theory.”
Could you give me something to read? Who are these frequentists, and where do they insist on this?
Let us take a common phrase from the original comment “the hypothesis is either true or false”. The first google hit:
So from this statement you conclude that frequentists think P(hypothesis) is meaningless? Bayesians assign degrees of belief to things that are actually true or false also. The coin really is either fair or not fair, but you will never find out with finite trials. This is a map/territory distinction, I am surprised you didn’t get it. This quote has nothing to do with B/F differences.
A Bayesian version of this quote would point out that it is a type error to confuse the truth value of the underlying thing, and the belief about this truth value.
You have successfully explained why it is irrational for frequentists to consider P(hypothesis) meaningless. And yet they do. They would say that probabilities can only be defined as limiting frequencies in repeated experiments, and that for a typical hypothesis there is no experiment you can rerun to get a sample for the truth of the hypothesis.
You guys need to stop assuming frequentists are morons. Here are posts by a frequentist:
http://normaldeviate.wordpress.com/2012/12/04/nate-silver-is-a-frequentist-review-of-the-signal-and-the-noise/
http://normaldeviate.wordpress.com/2012/11/17/what-is-bayesianfrequentist-inference/
Some of the comments are good as well.
Yes, you’re right. Clearly many people who identify as frequentists do hold P(hypothesis) to be meaningful. There are statisticians all over the B/F spectrum as well as not on the spectrum at all. So when I said “frequentists believe …” I could never really be correct because various frequentists believe various different things.
Perhaps we could agree on the following statement: “Probabilities such as P(hypothesis) are never needed to do frequentist analysis.”
For example, the link you gave suggests the following as a characterisation of frequentism:
Since frequency guarantees are typically of the form “for each possible true value of theta doing the construction blah on the data will, with probability at least 1-p, yield a result with property blah”. Then since this must hold true for each theta, the distribution for the true value of theta is irrelevant.
The interesting questions to me are: (a) “what is the steelman of the frequentist position?” (folks like Larry are useful here), and (b) “are there actually prominent frequentist statisticians who say stupid things?”
By (b) I mean “actually stupid under any reasonable interpretation.”
Quote from the url I linked:
“Keep your identity small”—advice familiar to a LW audience.
I guess you disagree with Larry’s take: B vs F is about goals not methods. I could do Bayesian looking things while having a frequentist interpretation in mind.
In the spirit of collaborative argumentation, can we agree on the following:
We have better things to do than engage in identity politics.
It is not the same thing and knowing P(hypothesis | data) would be very useful. Unfortunately, it is also very hard to estimate because usually the best you can do is calculate the probability, given the data, of a hypothesis out of a fixed set of hypotheses which you know about and for which you can estimate probabilities. If your understanding of the true data-generation process is not so good (which is very common in real life) your P(hypothesis | data) is going to be pretty bad and what’s worse, you have no idea how bad it is.
Not having a good grasp on the set of all hypotheses does not distinguish bayesians from frequentists and does not seem to me to motivate any difference in their methodologies.
Added: I don’t think it has much to do with the original comment, but testing a model without specific competition is called “model checking.” It is a common frequentist complaint that bayesians don’t do it. I don’t think that this is an accurate complaint, but it is true that it is easier to fit it into a frequentist framework than a bayesian framework.
I have said nothing about the differences between bayesians and frequentists. I just pointed out some issues with trying to estimate P(hypothesis | data).
As far as I can tell, you’re correct.