I think I have no way of assigning numbers to the quantities P(causation|correlation) and P(causation|~correlation) assessed over all examples of pairs of variables. If you do, tell me what numbers you get.
My original question was whether you think the probabilities are equal. This reply does not appear to address that question. Even if you have no way of assigning numbers, that does not imply that the three possibilities (>, =, <) are equally likely. Let’s say we somehow did find those probabilities. Would you be willing to say, right now, that they would turn out to be equal (with high probability)?
I asked why and you have said “intuition”, which means that you don’t know why.
Okay, here’s my reasoning (which I thought was intuitively obvious, hence the talk of “intuition”, but illusion of transparency, I guess):
The presence of a correlation between two variables means (among other things) that those two variables are statistically dependent. There are many ways for variables to be dependent, one of which is causation. When you observe that a correlation is present, you are effectively eliminating the possibility that the variables are independent. With this possibility gone, the remaining possibilities must increase in probability mass, i.e. become more likely, if we still want the total to sum to 1. This includes the possibility of causation. Thus, the probability of some causal link existing is higher after we observe a correlation than before: P(causation|correlation) > P(causation|~correlation).
There is no such thing as a correlation not “present in the data”.
If you are using a flawed or unsuitable analysis method, it is very possible for you to (seemingly) get a correlation when in fact no such correlation exists. An example of such a flawed method may be found here, where a correlation is found between ratios of quantities despite those quantities being statistically independent, thus giving the false impression that a correlation is present when it is actually not.
What observations would you undertake to determine whether a correlation is, in your terms, a “real” correlation?
As I suggested in my reply to Lumifer, redundancy helps.
Okay, here’s my reasoning (which I thought was intuitively obvious, hence the talk of “intuition”, but illusion of transparency, I guess):
The illusion of transparency applies not only to explaining things to other people, but to explaining things to oneself.
The presence of a correlation between two variables means (among other things) that those two variables are statistically dependent. There are many ways for variables to be dependent, one of which is causation. When you observe that a correlation is present, you are effectively eliminating the possibility that the variables are independent. With this possibility gone, the remaining possibilities must increase in probability mass, i.e. become more likely, if we still want the total to sum to 1. This includes the possibility of causation. Thus, the probability of some causal link existing is higher after we observe a correlation than before: P(causation|correlation) > P(causation|~correlation).
The argument still does not work. Statistical independence does not imply causal independence. In causal reasoning the idea that it does is called the assumption or axiom of faithfulness, and there are at least two reasons why it may fail. Firstly, the finiteness of sample sizes mean that observations can never prove statistical independence, only put likely upper bounds on its magnitude. As Andrew Gelman has put it, with enough data, nothing in independent. Secondly, dynamical systems and systems of cyclic causation are capable of producing robust statistical independence of variables that are directly causally related. There may be reasons for expecting faithfulness to hold in a specific situation, but it cannot be regarded as a physical law true always and everywhere.
Even when faithfulness does hold, statistical dependence tells you only that either causation or selection is happening somewhere. If your observations are selected on a common effect of the two variables, you may observe correlation when the variables are causally independent. If you have reason to think that selection is absent, you still have to decide whether you are looking at one variable causing the other, both being effects of common causes, or a combination.
Given all of these complications, which in a real application of statistics you would have to have thought about before even collecting any data, the argument that correlation is evidence for causation, in the absence of any other information about the variables, has no role to play. The supposed conclusion that P(causation|correlation) > P(causation|~correlation) is useless unless there is reason to think that the difference in probabilities is substantial, which is something you have not addressed, and which would require coming up with something like actual values for the probabilities.
Redundancy helps. Use multiple analysis methods, show someone else your results, etc. If everything turns out the way it’s supposed to, then that’s strong evidence that the correlation is “real”.
This is too vague to be helpful. What multiple analysis methods? The correlation coefficient simply is what it is. There are other statistics you can calculate for statistical dependency in general, but they are subject to the same problem as correlation: none of them imply causation. What does showing someone else your results accomplish? What are you expecting them to do that you did not? What is “the way everything is supposed to turn out”?
What, in concrete terms, would you do to determine the causal efficacy of a medication? You won’t get anywhere trying to publish results with no better argument than “correlation raises the probability of causation”.
My original question was whether you think the probabilities are equal. This reply does not appear to address that question. Even if you have no way of assigning numbers, that does not imply that the three possibilities (>, =, <) are equally likely. Let’s say we somehow did find those probabilities. Would you be willing to say, right now, that they would turn out to be equal (with high probability)?
Okay, here’s my reasoning (which I thought was intuitively obvious, hence the talk of “intuition”, but illusion of transparency, I guess):
The presence of a correlation between two variables means (among other things) that those two variables are statistically dependent. There are many ways for variables to be dependent, one of which is causation. When you observe that a correlation is present, you are effectively eliminating the possibility that the variables are independent. With this possibility gone, the remaining possibilities must increase in probability mass, i.e. become more likely, if we still want the total to sum to 1. This includes the possibility of causation. Thus, the probability of some causal link existing is higher after we observe a correlation than before: P(causation|correlation) > P(causation|~correlation).
If you are using a flawed or unsuitable analysis method, it is very possible for you to (seemingly) get a correlation when in fact no such correlation exists. An example of such a flawed method may be found here, where a correlation is found between ratios of quantities despite those quantities being statistically independent, thus giving the false impression that a correlation is present when it is actually not.
As I suggested in my reply to Lumifer, redundancy helps.
Sorry it’s taken me so long to get back to this.
The illusion of transparency applies not only to explaining things to other people, but to explaining things to oneself.
The argument still does not work. Statistical independence does not imply causal independence. In causal reasoning the idea that it does is called the assumption or axiom of faithfulness, and there are at least two reasons why it may fail. Firstly, the finiteness of sample sizes mean that observations can never prove statistical independence, only put likely upper bounds on its magnitude. As Andrew Gelman has put it, with enough data, nothing in independent. Secondly, dynamical systems and systems of cyclic causation are capable of producing robust statistical independence of variables that are directly causally related. There may be reasons for expecting faithfulness to hold in a specific situation, but it cannot be regarded as a physical law true always and everywhere.
Even when faithfulness does hold, statistical dependence tells you only that either causation or selection is happening somewhere. If your observations are selected on a common effect of the two variables, you may observe correlation when the variables are causally independent. If you have reason to think that selection is absent, you still have to decide whether you are looking at one variable causing the other, both being effects of common causes, or a combination.
Given all of these complications, which in a real application of statistics you would have to have thought about before even collecting any data, the argument that correlation is evidence for causation, in the absence of any other information about the variables, has no role to play. The supposed conclusion that P(causation|correlation) > P(causation|~correlation) is useless unless there is reason to think that the difference in probabilities is substantial, which is something you have not addressed, and which would require coming up with something like actual values for the probabilities.
This is too vague to be helpful. What multiple analysis methods? The correlation coefficient simply is what it is. There are other statistics you can calculate for statistical dependency in general, but they are subject to the same problem as correlation: none of them imply causation. What does showing someone else your results accomplish? What are you expecting them to do that you did not? What is “the way everything is supposed to turn out”?
What, in concrete terms, would you do to determine the causal efficacy of a medication? You won’t get anywhere trying to publish results with no better argument than “correlation raises the probability of causation”.