“So now we have a group of scientists who set out to test correlation A, but found correlation B in the data instead. Should they publish a paper about correlation B?”
Since you testing multiple hypotheses simultaneously, it is not comparable to Eliezer’s example. Still, it is an interesting question...
Sure. The more papers you publish the better. If you are lucky the correlation may hold in other test populations and you’ve staked your claim on the discovery. Success is largely based on who gets credit.
Should a magazine publish papers reporting correlations with relatively high P-values? When thousands of scientists are data mining for genetic correlations to disease, chance correlations will be very common. If the genetic difference occurred in a metabolic pathway known to be relevant to the disease, the correlation might be publishable even with a high P-value. If the scientists just reported a random correlation they should have a low P-value.
A better approach might be to replace publication in a journal by some other mechanism. Suppose there were an online, centralized database for hypotheses related to a disease or trait. No single population study would be meaningful, but multiple reports by different researchers in different populations would be significant. Evidence would accumulate and credit would be shared among all those responsible for validating or disproving the hypothesis.
“So now we have a group of scientists who set out to test correlation A, but found correlation B in the data instead. Should they publish a paper about correlation B?”
Since you testing multiple hypotheses simultaneously, it is not comparable to Eliezer’s example. Still, it is an interesting question...
Sure. The more papers you publish the better. If you are lucky the correlation may hold in other test populations and you’ve staked your claim on the discovery. Success is largely based on who gets credit.
Should a magazine publish papers reporting correlations with relatively high P-values? When thousands of scientists are data mining for genetic correlations to disease, chance correlations will be very common. If the genetic difference occurred in a metabolic pathway known to be relevant to the disease, the correlation might be publishable even with a high P-value. If the scientists just reported a random correlation they should have a low P-value.
A better approach might be to replace publication in a journal by some other mechanism. Suppose there were an online, centralized database for hypotheses related to a disease or trait. No single population study would be meaningful, but multiple reports by different researchers in different populations would be significant. Evidence would accumulate and credit would be shared among all those responsible for validating or disproving the hypothesis.