There is correlation without causation, but there is also causation without correlation. Why, when is the latter? Is there one reason or more and if so how can they be structured and by what? If one of the observables does not change, because there is a controlling observer (prediction+feedback), there is no way to establish correlation. I am displeased by bayesian probability combined with graphs (DAG), it so obviously lacks the nonlinear activation function. If two random binary streams feed into a XOR gate, the output is uncorrelated with anyone of the streams even though there is plenty of change to observe and perfect causality.
The very purpose of a control system is to pump information out of the relationship between the variable under control and the influences acting on it. This results in zero correlation between variables that are directly causally connected, and large (above 0.99) correlations between variables that are causally connected only via those zero-correlation links. I have a paper on the subject.
This remains true even if correlation (generally meant in the linear product-moment sense) is replaced by the more general concept of statistical dependence (non-zero mutual information).
The graphs describing such systems contain cycles, so the apparatus of causal analysis based on DAGs does not apply.
Correlation is linear. Many causal functions can be non-linear.
Think of medicine. X is the dosage, Y is the improvement of health. If the dose is too low, you will get no response. If the does is within a good range, health improves. If the does is too high, you will begin to get even sicker. If data was gathered all along this inverted parabola, the correlation might be zero. But there is still a causal relationship between health and dosage.
Thus you can have causation without correlation.
You can probably think of many such functions with diminishing or negative returns as the dosage increases, e.g. years of education vs. lifetime earnings.
Whether you see a positive, negative, or null correlation can depend on where you sample from the response function. In the “real world” data might be bunched up around certain regions of the response function. Thus for the “average person/instance” you can determine if there is a correlation or not, and then say this is basically the causal effect (for the average person/instance).
But if you want accuracy and precision over concision you will use a more complex model.
Concise models are better memes than complex models, however, and so we are flooded with linear models or binary models.
“parabola” That would be a third category then: No correlation observed because the aggregated observation cancels out the effect working both ways.
No, it’s just causation without correlation; correlation is defined to be the aggregate effect.
yes, but I am posing the WHY question. In this case it is just an averaging effect not a feedback controller.
Typically you get causality without correlation when there is some controller that manipulates the causal variable in order to control the variable that it has an effect on.
DAGs only encode the structural relations, they make no inherent claims that things have to be linear. A common model is to allow each node to be an arbitrary function of its parents. The reason this isn’t used much in practice, even though this is what the math is based on, is that it is usually very hard to fit.
Correlation is a statistical pattern, so the obvious example of causation without correlation would be some kind of one-shot cause.