Weird thought I had based on a tweet about gradient descent in the brain: it seems like one under-explored perspective on computational graphs is the causal one. That is, we can view propagating gradients through the computational graph as assessing the effect of an intervention on some variable on all of a nodes’ children.
Reason to think this might be useful:
*Maybe* this can act as a different lens for examining NN training?
Reasons why this might not be useful:
It’s not obvious that it makes sense to think of nodes in an NN (or any differential computational graph) as causally related in the sense we usually talk about in causal inference.
A causal interpretation of gradients isn’t obvious because they’re so local, whereas most causal inference focuses on the impact of more non-trivial interventions. OTOH, there are some NN interpretability techniques that try to solve this, so maybe these have better causal interpretations?
Weird thought I had based on a tweet about gradient descent in the brain: it seems like one under-explored perspective on computational graphs is the causal one. That is, we can view propagating gradients through the computational graph as assessing the effect of an intervention on some variable on all of a nodes’ children.
Reason to think this might be useful:
*Maybe* this can act as a different lens for examining NN training?
Reasons why this might not be useful:
It’s not obvious that it makes sense to think of nodes in an NN (or any differential computational graph) as causally related in the sense we usually talk about in causal inference.
A causal interpretation of gradients isn’t obvious because they’re so local, whereas most causal inference focuses on the impact of more non-trivial interventions. OTOH, there are some NN interpretability techniques that try to solve this, so maybe these have better causal interpretations?