There are manymoreproblems with NHST and with “frequentist” statistics in general, but the central one is this: NHST does not follow from the axioms (foundational logical rules) of probability theory. It is a grab-bag of techniques that, depending on how those techniques are applied, can lead to different results when analyzing the same data — something that should horrify every mathematician.
The inferential method that solves the problems with frequentism — and, more importantly, follows deductively from the axioms of probability theory — is Bayesian inference.
But two Bayesian inferences from the same data can also give different results. How could this be a non-issue for Bayesian inference while being indicative of a central problem for NHST? (If the answer is that Bayesian inference is rigorously deduced from probability theory’s axioms but NHST is not, then the fact that NHST can give different results for the same data is not a true objection, and you might want to rephrase.)
By a coincidence of dubious humor, I recently read a paper on exactly this topic, how NHST is completely misunderstood and employed wrongly and what can be improved! I was only reading it for a funny & insightful quote, but Jacob Cohen (as in, ‘Cohen’s d’) in pg 5-6 of “The Earth Is Round (p < 0.05)” tells us that we shouldn’t seek to replace NHST with a “magic alternative” because “it doesn’t exist”. What we should do is focus on understanding the data with graphics and datamining techniques; report confidence limits on effect sizes, which gives us various things I haven’t looked up; and finally, place way more emphasis on replication than we currently do.
An admirable program; we don’t have to shift all the way to Bayesian reasoning to improve matters. Incidentally, what Bayesian inferences are you talking about? I thought the usual proposals/methods involved principally reporting log odds, to avoid exactly the issue of people having varying priors and updating on trials to get varying posteriors.
I thought the usual proposals/methods involved principally reporting log odds, to avoid exactly the issue of people having varying priors and updating on trials to get varying posteriors.
Any example where there are more than two potential hypotheses.
Note, that for example, “this coin is unbiased”, “this coin is biased toward heads with p=.61″, and “this coin is biased toward heads with p=.62” count as three different hypotheses for this purpose.
This is fair as a criticism of log-odds, but in the example you give, one could avoid the issue of people having varying priors by just reporting the value of the likelihood function. However, this likelihood function reporting idea fails to be a practical summary in the context of massive models with lots of nuisance parameters.
Incidentally, what Bayesian inferences are you talking about? I thought the usual proposals/methods involved principally reporting log odds, to avoid exactly the issue of people having varying priors and updating on trials to get varying posteriors.
I didn’t have any specific examples in mind. But more generally, posteriors are a function of both priors and likelihoods. So even if one avoids using priors entirely by reporting only likelihoods (or some function of the likelihoods, like the log of the likelihood ratio), the resulting implied inferences can change if one’s likelihoods change, which can happen by calculating likelihoods with a different model.
depending on how those techniques are applied, can lead to different results when analyzing the same data
But two Bayesian inferences from the same data can also give different results. How could this be a non-issue for Bayesian inference while being indicative of a central problem for NHST?
If the OP is read to hold constant everything not mentioned as a difference, that includes the prior beliefs of the person doing the analysis, as against the hypothetical analysis that wasn’t performed by that person.
Does “two Bayesian inferences” imply it is two different people making those inferences, with two people not possibly having identical prior beliefs? Could a person performing axiom-obeying Bayesian inference reach different conclusions than that same person hypothetically would have had they performed a different axiom-obeying Bayesian inference?
I think my reply to gwern’s comment (sibling of yours) all but answers your two questions already. But to be explicit:
Does “two Bayesian inferences” imply it is two different people making those inferences, with two people not possibly having identical prior beliefs?
Not necessarily, no. It could be two people who have identical prior beliefs but just construct likelihoods differently. It could be the same person calculating two inferences that rely on the same prior but use different likelihoods.
Could a person performing axiom-obeying Bayesian inference reach different conclusions than that same person hypothetically would have had they performed a different axiom-obeying Bayesian inference?
I think so. If I do a Bayesian analysis with some prior and likelihood-generating model, I might get one posterior distribution. But as far as I know there’s nothing in Cox’s theorem or the axioms of probability theory or anything like those that says I had to use that particular prior and that particular likelihood-generating model. I could just as easily have used a different prior and/or a different likelihood model, and gotten a totally different posterior that’s nonetheless legitimate.
But as far as I know there’s nothing in Cox’s theorem or the axioms of probability theory or anything like those that says I had to use that particular prior
The way I interpret hypotheticals in which one person is said to be able to do something other than what they will do, such as “depending on how those techniques are applied,” all of the person’s priors are to be held constant in the hypothetical. This is the most charitable interpretation of the OP because the claim is that, under Bayesian reasoning, results do not depend on how the same data is applied. This seems obviously wrong if the OP is interpreted as discussing results reached after decision processes with identical data but differing priors, so it’s more interesting to talk about agents with other things differing, such as perhaps likelihood-generating models, than it is to talk about agents with different priors.
I could just as easily have used a different...likelihood model, and gotten a totally different posterior that’s nonetheless legitimate.
This is the most charitable interpretation of the OP because the claim is that, under Bayesian reasoning, results do not depend on how the same data is applied. This seems obviously wrong if the OP is interpreted as discussing results reached after decision processes with identical data but differing priors, so it’s more interesting to talk about agents with other things differing, such as perhaps likelihood-generating models, than it is to talk about agents with different priors.
But even if we assume the OP means that data and priors are held constant but not likelihoods, it still seems to me obviously wrong. Moreover, likelihoods are just as fundamental to an application of Bayes’s theorem as priors, so I’m not sure why I would have/ought to have read the OP as implicitly assuming priors were held constant but not likelihoods (or likelihood-generating models).
Can you give an example?
I didn’t have one, but here’s a quick & dirty ESP example I just made up. Suppose that out of the blue, I get a gut feeling that my friend Joe is about to phone me, and a few minutes later Joe does. After we finish talking and I hang up, I realize I can use what just happened as evidence to update my prior probability for my having ESP. I write down:
my evidence: “I correctly predicted Joe would call” (call this E for short)
the hypothesis H0 — that I don’t have ESP — and its prior probability, 95%
the opposing hypothesis H1 — that I have ESP — and its prior probability, 5%
Now let’s think about two hypothetical mes.
The first me guesses at some likelihoods, deciding that both P(E | H0) and P(E | H1) were both 10%. Turning the crank, it gets a posterior for H1, P(H1 | E), that’s proportional to P(H1) P(E | H1) = 5% × 10% = 0.5%, and a posterior for H0, P(H0 | E), that’s proportional to P(H0) P(E | H0) = 95% × 10% = 9.5%. Of course its posteriors have to add to 100%, not 10%, so it multiplies both by 10 to normalize them. Unsurprisingly, as the likelihoods were equal, its posteriors come out at 95% for H0 and 5% for H1; the priors are unchanged.
When the second me is about to guess at some likelihoods, its brain is suddenly zapped by a stray gamma ray. The second me therefore decides that P(E | H0) was 2% but that P(E | H1) was 50%. Applying Bayes’s theorem in precisely the same way as the first me, it gets a P(H1 | E) proportional to 5% × 50% = 2.5%, and a P(H0 | E) proportional to 95% × 2% = 1.9%. Normalizing (but this time multiplying by 100/(2.5+1.9)) gives posteriors of P(H0 | E) = 43.2% and P(H1 | E) = 56.8%.
So the first me still strongly doubts it has ESP after updating on the evidence, but the second me ends up believing ESP the more likely hypothesis. Yet both used the same method of inference, the same piece of evidence and the same priors!
But two Bayesian inferences from the same data can also give different results. How could this be a non-issue for Bayesian inference while being indicative of a central problem for NHST? (If the answer is that Bayesian inference is rigorously deduced from probability theory’s axioms but NHST is not, then the fact that NHST can give different results for the same data is not a true objection, and you might want to rephrase.)
By a coincidence of dubious humor, I recently read a paper on exactly this topic, how NHST is completely misunderstood and employed wrongly and what can be improved! I was only reading it for a funny & insightful quote, but Jacob Cohen (as in, ‘Cohen’s d’) in pg 5-6 of “The Earth Is Round (p < 0.05)” tells us that we shouldn’t seek to replace NHST with a “magic alternative” because “it doesn’t exist”. What we should do is focus on understanding the data with graphics and datamining techniques; report confidence limits on effect sizes, which gives us various things I haven’t looked up; and finally, place way more emphasis on replication than we currently do.
An admirable program; we don’t have to shift all the way to Bayesian reasoning to improve matters. Incidentally, what Bayesian inferences are you talking about? I thought the usual proposals/methods involved principally reporting log odds, to avoid exactly the issue of people having varying priors and updating on trials to get varying posteriors.
This only works in extremely simple cases.
Could you give an example of an experiment that would be too complex for log odds to be useful?
Any example where there are more than two potential hypotheses.
Note, that for example, “this coin is unbiased”, “this coin is biased toward heads with p=.61″, and “this coin is biased toward heads with p=.62” count as three different hypotheses for this purpose.
This is fair as a criticism of log-odds, but in the example you give, one could avoid the issue of people having varying priors by just reporting the value of the likelihood function. However, this likelihood function reporting idea fails to be a practical summary in the context of massive models with lots of nuisance parameters.
I didn’t have any specific examples in mind. But more generally, posteriors are a function of both priors and likelihoods. So even if one avoids using priors entirely by reporting only likelihoods (or some function of the likelihoods, like the log of the likelihood ratio), the resulting implied inferences can change if one’s likelihoods change, which can happen by calculating likelihoods with a different model.
If the OP is read to hold constant everything not mentioned as a difference, that includes the prior beliefs of the person doing the analysis, as against the hypothetical analysis that wasn’t performed by that person.
Does “two Bayesian inferences” imply it is two different people making those inferences, with two people not possibly having identical prior beliefs? Could a person performing axiom-obeying Bayesian inference reach different conclusions than that same person hypothetically would have had they performed a different axiom-obeying Bayesian inference?
I think my reply to gwern’s comment (sibling of yours) all but answers your two questions already. But to be explicit:
Not necessarily, no. It could be two people who have identical prior beliefs but just construct likelihoods differently. It could be the same person calculating two inferences that rely on the same prior but use different likelihoods.
I think so. If I do a Bayesian analysis with some prior and likelihood-generating model, I might get one posterior distribution. But as far as I know there’s nothing in Cox’s theorem or the axioms of probability theory or anything like those that says I had to use that particular prior and that particular likelihood-generating model. I could just as easily have used a different prior and/or a different likelihood model, and gotten a totally different posterior that’s nonetheless legitimate.
The way I interpret hypotheticals in which one person is said to be able to do something other than what they will do, such as “depending on how those techniques are applied,” all of the person’s priors are to be held constant in the hypothetical. This is the most charitable interpretation of the OP because the claim is that, under Bayesian reasoning, results do not depend on how the same data is applied. This seems obviously wrong if the OP is interpreted as discussing results reached after decision processes with identical data but differing priors, so it’s more interesting to talk about agents with other things differing, such as perhaps likelihood-generating models, than it is to talk about agents with different priors.
Can you give an example?
But even if we assume the OP means that data and priors are held constant but not likelihoods, it still seems to me obviously wrong. Moreover, likelihoods are just as fundamental to an application of Bayes’s theorem as priors, so I’m not sure why I would have/ought to have read the OP as implicitly assuming priors were held constant but not likelihoods (or likelihood-generating models).
I didn’t have one, but here’s a quick & dirty ESP example I just made up. Suppose that out of the blue, I get a gut feeling that my friend Joe is about to phone me, and a few minutes later Joe does. After we finish talking and I hang up, I realize I can use what just happened as evidence to update my prior probability for my having ESP. I write down:
my evidence: “I correctly predicted Joe would call” (call this E for short)
the hypothesis H0 — that I don’t have ESP — and its prior probability, 95%
the opposing hypothesis H1 — that I have ESP — and its prior probability, 5%
Now let’s think about two hypothetical mes.
The first me guesses at some likelihoods, deciding that both P(E | H0) and P(E | H1) were both 10%. Turning the crank, it gets a posterior for H1, P(H1 | E), that’s proportional to P(H1) P(E | H1) = 5% × 10% = 0.5%, and a posterior for H0, P(H0 | E), that’s proportional to P(H0) P(E | H0) = 95% × 10% = 9.5%. Of course its posteriors have to add to 100%, not 10%, so it multiplies both by 10 to normalize them. Unsurprisingly, as the likelihoods were equal, its posteriors come out at 95% for H0 and 5% for H1; the priors are unchanged.
When the second me is about to guess at some likelihoods, its brain is suddenly zapped by a stray gamma ray. The second me therefore decides that P(E | H0) was 2% but that P(E | H1) was 50%. Applying Bayes’s theorem in precisely the same way as the first me, it gets a P(H1 | E) proportional to 5% × 50% = 2.5%, and a P(H0 | E) proportional to 95% × 2% = 1.9%. Normalizing (but this time multiplying by 100/(2.5+1.9)) gives posteriors of P(H0 | E) = 43.2% and P(H1 | E) = 56.8%.
So the first me still strongly doubts it has ESP after updating on the evidence, but the second me ends up believing ESP the more likely hypothesis. Yet both used the same method of inference, the same piece of evidence and the same priors!