And we can’t explain away all of this low success rate as the result of illusory correlations being throw up by the standard statistical problems with findings such as small n, sampling error (A & B just happened to sync together due to randomness), selection bias, publication bias, etc. I’ve read about those problems at length, and despite knowing about all that, there still seems to be a problem: correlation too often ≠ causation.
I’m pointing out that your list isn’t complete, and not considering this possibility when we see a correlation is irresponsible. There are a lot of apparent correlations, and your three possibilities provide no means to reject false positives.
You are fighting the hypothetical. In the least convenient possible world where no dataset is smaller than a petabyte and no one has ever heard of sampling error, would you magically be able to spin the straw of correlation into the gold of causation? No. Why not? That’s what I am discussing here.
I suggest you move that point closer to the list of 3 possibilities—I too read that list and immediately thought, ”...and also coincidence.”
The quote you posted above (“And we can’t explain away...”) is an unsupported assertion—a correct one in my opinion, but it really doesn’t do enough to direct attention away from false positive correlations. I suggest that you make it explicit in the OP that you’re talking about a hypothetical in which random coincidences are excluded from the start. (Upvoted the OP FWIW.)
(Also, if I understand it correctly, Ramsey theory suggests that coincidences are inevitable even in the absence of sampling error.)
I agree with gwern’s decision to separate statistical issues from issues which arise even with infinite samples. Statistical issues are also extremely important, and deserve careful study, however we should divide and conquer complicated subjects.
I see. I really didn’t expect this to be such an issue and come up in both the open thread & Main… I’ve tried rewriting the introduction a bit. If people still insist on getting snagged on that, I give up.
I’m pointing out that your list isn’t complete, and not considering this possibility when we see a correlation is irresponsible. There are a lot of apparent correlations, and your three possibilities provide no means to reject false positives.
You are fighting the hypothetical. In the least convenient possible world where no dataset is smaller than a petabyte and no one has ever heard of sampling error, would you magically be able to spin the straw of correlation into the gold of causation? No. Why not? That’s what I am discussing here.
I suggest you move that point closer to the list of 3 possibilities—I too read that list and immediately thought, ”...and also coincidence.”
The quote you posted above (“And we can’t explain away...”) is an unsupported assertion—a correct one in my opinion, but it really doesn’t do enough to direct attention away from false positive correlations. I suggest that you make it explicit in the OP that you’re talking about a hypothetical in which random coincidences are excluded from the start. (Upvoted the OP FWIW.)
(Also, if I understand it correctly, Ramsey theory suggests that coincidences are inevitable even in the absence of sampling error.)
I agree with gwern’s decision to separate statistical issues from issues which arise even with infinite samples. Statistical issues are also extremely important, and deserve careful study, however we should divide and conquer complicated subjects.
I also agree—I’m recommending that he make that split clearer to the reader by addressing it up front.
I see. I really didn’t expect this to be such an issue and come up in both the open thread & Main… I’ve tried rewriting the introduction a bit. If people still insist on getting snagged on that, I give up.
It ends with “etc.” for Pete’s sake!
...no it doesn’t?