tailcalled comments on Bayesian Networks Aren’t Necessarily Causal

tailcalled 21 May 2023 11:19 UTC
4 points
0
I don’t think this is much better than just linking up variables to each other if they are strongly correlated (at least in ways not explained by existing links)?
- johnswentworth 21 May 2023 18:05 UTC
  2 points
  0
  Parent
  Just that does usually work pretty well for (at least a rough estimate of) the undirected graph structure, but then you don’t know the directions of any arrows.
  - tailcalled 21 May 2023 19:13 UTC
    5 points
    0
    Parent
    I think this approach only gets the direction of the arrows from two structures, which I’ll call colliders and instrumental variables (because that’s what they are usually called).
    Colliders are the case of A → B ← C, which in terms of correlations shows up as A and B being correlated, B and C being correlated, and A and C being independent. This is a distinct pattern of correlations from the A → B → C or A ← B → C structures where all three could be correlated, so it is possible for this method to distinguish the structures (well, sometimes not, but that’s tangential to my point).
    Instrumental variables are the case of A → B → C, where A → B is known but the direction of B—C is unknown. In that case, the fact that C correlates with A suggests that B → C rather than B ← C.
    I think the main advantage larger causal networks give you is that they give you more opportunities to apply these structures?
    But I see two issues with them. First, they don’t see to work very well in nondeterministic cases. They both rely on the correlation between A and C, and they both need to distinguish whether that correlation is 0 or $a b \cdot b c$ (where $a b$ and $b c$ refer to the effects A—B and B—C) respectively. If the effects in your causal network are of order $e$ , then you are basically trying to distinguish something of order 0 from something of order $e^{2}$ , which is likely going to be hard if $e$ is small. (The smaller of a difference you are trying to detect, the more affected you are going to be by model misspecification, unobserved confounders, measurement error, etc..) This is not a problem in Zack’s case because his effects are near-deterministic, but it would be a problem in other cases. (I in particularly have various social science applications in mind.)
    Secondly, Zack’s example had an advantage that multiple root causes of wet sidewalks were measured. This gave him a collider to kick off the inference process. (Though I actually suspect this to be unrealistic—wouldn’t you be less likely to turn on the sprinkler if it’s raining? But that relationship would be much less deterministic, so I suspect that’s OK in this case where it’s less deterministic.) But this seems very much like a luxury that often doesn’t occur in the practical cases where I’ve seen people attempt to apply this. (Again various social science applications.)