I feel like there is a major explaining paragraph missing here, explaining the difference between causality and probability. Something like:
Armed with knowledge of the future, we could know exactly what will happen. (e.g. The other doctor will give Alice medicine, and she will get better.) Given a full probability distribution over events we could make optimal predictions. (e.g. There is a 2⁄3 chance the other doctor will give Alice medicine, 1⁄4 chance of her getting better if he doesn’t and 2⁄3 chance of her getting better if he does.) Causality gives us a way to combine a partial probability distribution with additional knowledge of the world to make predictions about events that are out of distribution. (e.g. Since I understand that the medicine works mostly by placebo, I can intervene and give Alice a placebo when the other doctor doesn’t give her the medicine, raising her chances. Furthermore, if I have a distribution of how effective a placebo is relative to the medicine, I can quantify how helpful my intervention is.)
An intervention is a really important example of an out of distribution generalization; but if I gave you the full probability distribution of the outcomes that your interventions would achieve it would no longer be out of distribution (and you’d need to deal with paradoxes involving seemingly not having chocies about certain things).
Thanks for the suggestion. We made an effort to be brief, but perhaps we went too far. In our paper Reasoning about causality in games, we have a longer discussion about probabilistic, causal, and structural models (in Section 2), and Pearl’s book A Primer also offers a more comprehensive introduction.
I agree with you that causality offers a way to make out-of-distribution predictions (in post number 6, we plan to go much deeper into this). In fact, a causal Bayesian network is equivalent to an exponentially large set of probability distributions, where there is one joint distribution $P_{\do(X=x)}$ for any possible combinations of interventions $X=x$.
We’ll probably at least add some pointers to further reading, per your suggestion. (ETA: also added a short paragraph near the end of the Intervention section.)
I feel like there is a major explaining paragraph missing here, explaining the difference between causality and probability. Something like:
Armed with knowledge of the future, we could know exactly what will happen. (e.g. The other doctor will give Alice medicine, and she will get better.) Given a full probability distribution over events we could make optimal predictions. (e.g. There is a 2⁄3 chance the other doctor will give Alice medicine, 1⁄4 chance of her getting better if he doesn’t and 2⁄3 chance of her getting better if he does.) Causality gives us a way to combine a partial probability distribution with additional knowledge of the world to make predictions about events that are out of distribution. (e.g. Since I understand that the medicine works mostly by placebo, I can intervene and give Alice a placebo when the other doctor doesn’t give her the medicine, raising her chances. Furthermore, if I have a distribution of how effective a placebo is relative to the medicine, I can quantify how helpful my intervention is.)
An intervention is a really important example of an out of distribution generalization; but if I gave you the full probability distribution of the outcomes that your interventions would achieve it would no longer be out of distribution (and you’d need to deal with paradoxes involving seemingly not having chocies about certain things).
Thanks for the suggestion. We made an effort to be brief, but perhaps we went too far. In our paper Reasoning about causality in games, we have a longer discussion about probabilistic, causal, and structural models (in Section 2), and Pearl’s book A Primer also offers a more comprehensive introduction.
I agree with you that causality offers a way to make out-of-distribution predictions (in post number 6, we plan to go much deeper into this). In fact, a causal Bayesian network is equivalent to an exponentially large set of probability distributions, where there is one joint distribution $P_{\do(X=x)}$ for any possible combinations of interventions $X=x$.
We’ll probably at least add some pointers to further reading, per your suggestion. (ETA: also added a short paragraph near the end of the Intervention section.)