This post does a sort of head-to-head comparison of causal models and deep nets. But I view the relationship between them differently—they’re better together! The causal framework gives us the notion of “screening off”, which is missing from the ML/deep learning framework. Screening-off turns out to be useful in analyzing feature importance.
A workflow that 1) uses a complex modern gradient booster or deep net to fit the data, then 2) uses causal math to interpret the features—which are most important, which screen off which—is really nice. [This workflow requires fitting multiple models, on different sets of variables, so it’s not just fit a single model in step 1), analyze it in step 2), done].
Causal math lacks the ability to auto-fit complex functions, and ML-without-causality lacks the ability to measure things like “which variables screen off which”. Causality tools, paired with modern feature-importance measures like SHAP values, help us interpret black-box models.
But in order for “screening off” etc to be interesting, you’d need to know it for interpretable features, no? I wouldn’t care so much that pixel (42, 33) screens off node (7, 1, 54) in the network.
Sure—there are plenty of cases where a pair of interactions isn’t interesting. In the image net context, probably you’ll care more about screening-off behavior at more abstract levels.
For example, maybe you find that, in your trained network, a hidden representation that seems to correspond to “trunk” isn’t very predictive of the class “tree”. And that one that looks like “leaves” is predictive of “tree”. It’d be useful to know if the reason “trunk” isn’t predictive is that “leaves” screens it off. (This could happen if all the tree trunks in your training images come with leaves in the frame).
Of course, the causality parts of the above analysis don’t address the “how should you assign labels in the first place” problem that the post is most focused on! I’m just saying both the ML parts and the causality parts work well in concert, and are not opposing methods.
This post does a sort of head-to-head comparison of causal models and deep nets. But I view the relationship between them differently—they’re better together! The causal framework gives us the notion of “screening off”, which is missing from the ML/deep learning framework. Screening-off turns out to be useful in analyzing feature importance.
A workflow that 1) uses a complex modern gradient booster or deep net to fit the data, then 2) uses causal math to interpret the features—which are most important, which screen off which—is really nice. [This workflow requires fitting multiple models, on different sets of variables, so it’s not just fit a single model in step 1), analyze it in step 2), done].
Causal math lacks the ability to auto-fit complex functions, and ML-without-causality lacks the ability to measure things like “which variables screen off which”. Causality tools, paired with modern feature-importance measures like SHAP values, help us interpret black-box models.
But in order for “screening off” etc to be interesting, you’d need to know it for interpretable features, no? I wouldn’t care so much that pixel (42, 33) screens off node (7, 1, 54) in the network.
Sure—there are plenty of cases where a pair of interactions isn’t interesting. In the image net context, probably you’ll care more about screening-off behavior at more abstract levels.
For example, maybe you find that, in your trained network, a hidden representation that seems to correspond to “trunk” isn’t very predictive of the class “tree”. And that one that looks like “leaves” is predictive of “tree”. It’d be useful to know if the reason “trunk” isn’t predictive is that “leaves” screens it off. (This could happen if all the tree trunks in your training images come with leaves in the frame).
Of course, the causality parts of the above analysis don’t address the “how should you assign labels in the first place” problem that the post is most focused on! I’m just saying both the ML parts and the causality parts work well in concert, and are not opposing methods.