Something popped into my mind while I was reading about the example in the very beginning. What about research that goes out to prove one thing, but discovers something else?
Group of scientists want to see if there’s a link between the consumption of Coca-Cola and stomach cancer. They put together a huge questionnaire full of dozens of questions and have 1000 people fill it out. Looking at the data they discover that there is no correlation between Coca-Cola drinking and stomach cancer, but there is a correlation between excessive sneezing and having large ears.
So now we have a group of scientists who set out to test correlation A, but found correlation B in the data instead. Should they publish a paper about correlation B?
Before they publish anything (other than a article on Coca-Cola not being related to stomach cancer) they should first use a different test group in order to determine that the first result wasn’t a sampling fluke or otherwise biased, (Perhaps sneezing wasn’t causing large ears after all, or large ears were correlated to something that also caused sneezing.)
What brought the probability to your attention in the first place shouldn’t be what proves it.
If A then B is a separate experiment than If C then D and should require separate additional proof.
That’s a useful heuristic to combat our tendency to see patterns that aren’t there. It’s not strictly necessary.
Another way to solve the same problem is to look at the first 500 questionnaires first. The scientists then notice that there is a correlation between excessive sneezing and large ears. Now the scientists look at the last 500 questionnaires—an independent experiment. If these questionnaires also show correlation, that is also evidence for the hypothesis, although it’s necessarily weaker than if another 1000-person poll were conducted.
So this shows that a second experiment isn’t necessary if we think ahead. Now the question is, if we’ve already foolishly looked at all 1000 results, is there any way to recover?
It turns out that what can save us is math. There’s a bunch of standard tests for significance when lots of variables are compared. But the basic idea is the following: we can test if the correlation between sneezing and ears is high, by computing our prior for what sort of correlation the two most closely correlated variables would show.
Note that although our prior for two arbitrary variables might be centered at 0 correlation, our prior for two variables that are selected by choosing the highest correlation should be centered at some positive value. In other words: even if the questions were all about unrelated things, we expect a certain amount of correlation between some things to happen by chance. But we can figure out how much correlation to expect from this phenomenon! And by doing some math, we might be able to show that the correlation between sneezing and having ears is too high to be explained in this way.
There is other information to consider though. If there really was a correlation it’s likely others would have noticed it in their studies. The fact that you haven’t heard of it before suggests a lower prior probability.
Eventually someone just by chance will stumble upon seeming correlations that aren’t really there. If you only publish when you find a correlation but not when you don’t, then publication bias is created.
I have no idea about what’s done in actual statistical practice, but it seems to make sense to do this:
Publish the likelihood ratio for each correlation. The likelihood ratio for the correlation being real and replicable will be very high.
Since they bothered to do the test, you can figure that people in the know have decently sized prior odds for the association being real and replicable. There must have been animal studies or a biochemical argument or something. Consequently, a high likelihood ratio for this hypothesis may been enough to convinced them—that is, when it’s multiplied with the prior, the resulting posterior may have been high enough to represent the “I’m convinced” state of knowledge.
But the prior odds for the correlation being real and replicable are the same tiny prior odds you would have for any equally unsupported correlation. When they combine the likelihood ratio with their prior odds they do end up with a much higher posterior odds for than they do for other arbitrary-seeming correlations. But, still insignificant.
The critical thing that distinguishes the two hypotheses is whatever previous evidence led them to attempt the test; that’s why the prior for the association is higher. It’s subjective only in the sense that it depends on what you’ve already seen—it doesn’t depend on your thoughts. Whereas, in what Kindly says is the standard solution, you apply a different test depending upon what the researcher’s intentions were.
(I have no idea how you would calculate the prior odds. I mean, Solomonoff induction with your previous observations is the Carnot engine for doing it, but I have no idea how you would actually do it in practice)
Something popped into my mind while I was reading about the example in the very beginning. What about research that goes out to prove one thing, but discovers something else?
Group of scientists want to see if there’s a link between the consumption of Coca-Cola and stomach cancer. They put together a huge questionnaire full of dozens of questions and have 1000 people fill it out. Looking at the data they discover that there is no correlation between Coca-Cola drinking and stomach cancer, but there is a correlation between excessive sneezing and having large ears.
So now we have a group of scientists who set out to test correlation A, but found correlation B in the data instead. Should they publish a paper about correlation B?
Before they publish anything (other than a article on Coca-Cola not being related to stomach cancer) they should first use a different test group in order to determine that the first result wasn’t a sampling fluke or otherwise biased, (Perhaps sneezing wasn’t causing large ears after all, or large ears were correlated to something that also caused sneezing.)
What brought the probability to your attention in the first place shouldn’t be what proves it.
If A then B is a separate experiment than If C then D and should require separate additional proof.
That’s a useful heuristic to combat our tendency to see patterns that aren’t there. It’s not strictly necessary.
Another way to solve the same problem is to look at the first 500 questionnaires first. The scientists then notice that there is a correlation between excessive sneezing and large ears. Now the scientists look at the last 500 questionnaires—an independent experiment. If these questionnaires also show correlation, that is also evidence for the hypothesis, although it’s necessarily weaker than if another 1000-person poll were conducted.
So this shows that a second experiment isn’t necessary if we think ahead. Now the question is, if we’ve already foolishly looked at all 1000 results, is there any way to recover?
It turns out that what can save us is math. There’s a bunch of standard tests for significance when lots of variables are compared. But the basic idea is the following: we can test if the correlation between sneezing and ears is high, by computing our prior for what sort of correlation the two most closely correlated variables would show.
Note that although our prior for two arbitrary variables might be centered at 0 correlation, our prior for two variables that are selected by choosing the highest correlation should be centered at some positive value. In other words: even if the questions were all about unrelated things, we expect a certain amount of correlation between some things to happen by chance. But we can figure out how much correlation to expect from this phenomenon! And by doing some math, we might be able to show that the correlation between sneezing and having ears is too high to be explained in this way.
Okay, that makes tons more sense, I apparently wasn’t thinking too clearly when I wrote the first post. (plus I didn’t know about the standard tests)
Thanks for setting me straight.
There is other information to consider though. If there really was a correlation it’s likely others would have noticed it in their studies. The fact that you haven’t heard of it before suggests a lower prior probability.
Eventually someone just by chance will stumble upon seeming correlations that aren’t really there. If you only publish when you find a correlation but not when you don’t, then publication bias is created.
I have no idea about what’s done in actual statistical practice, but it seems to make sense to do this:
Publish the likelihood ratio for each correlation. The likelihood ratio for the correlation being real and replicable will be very high.
Since they bothered to do the test, you can figure that people in the know have decently sized prior odds for the association being real and replicable. There must have been animal studies or a biochemical argument or something. Consequently, a high likelihood ratio for this hypothesis may been enough to convinced them—that is, when it’s multiplied with the prior, the resulting posterior may have been high enough to represent the “I’m convinced” state of knowledge.
But the prior odds for the correlation being real and replicable are the same tiny prior odds you would have for any equally unsupported correlation. When they combine the likelihood ratio with their prior odds they do end up with a much higher posterior odds for than they do for other arbitrary-seeming correlations. But, still insignificant.
The critical thing that distinguishes the two hypotheses is whatever previous evidence led them to attempt the test; that’s why the prior for the association is higher. It’s subjective only in the sense that it depends on what you’ve already seen—it doesn’t depend on your thoughts. Whereas, in what Kindly says is the standard solution, you apply a different test depending upon what the researcher’s intentions were.
(I have no idea how you would calculate the prior odds. I mean, Solomonoff induction with your previous observations is the Carnot engine for doing it, but I have no idea how you would actually do it in practice)