Chapter 2 of Pearl’s Causality book claims you can recover causal models given only the observational data, under very natural assumptions of minimality and stability[1].
In graphical models lingo, Pearl identifies a causal model of the observational distribution with the distribution’s perfect map (if they exist).
But I’m confused about a pretty fundamental point: “What does this have to do at all with causality??” More precisely:
“Okay, it’s pretty cool that minimality and stability alone lets us narrow down such a large number of arrow directions (in the Bayes Net independency sense) of the minimal network. But … what does this have to do with arrow directions in the causal sense, i.e. [independent stable mechanisms of reality that, by the virtue of its independence, respond to interventions in a modular way]”?
To be clear, Pearl acknowledges this in his Temporal Bias Conjecture (2.8.2):
“In most natural phenomenon, the physical time coincides with at least one statistical time.”
And Pearl conjectures that the reason for this is possibly because human language is optimized such that our [choice of variables / factorization of reality] makes the Temporal Bias true.
I … guess that could be an explanation? But honestly I don’t think I understand his point very well and I find it pretty unsatisfying. I would appreciate any explanation as to why it makes sense to identify perfect maps with Causal Models.
Minimality: Choose the network structure that is minimally expressive among those that can express the observational distribution.
This is pretty reasonable imo, occam’s razor blah blah
Stability: Assume that there exists a network structure that perfectly captures all and only the independencies implied by the observational distribution. i.e. independencies are structural.
Stability is a reasonable assumption since it would be pretty unlikely for the conditional probability distributions to be fine-tuned as to cancel each other out and induce an independency not present in the network.
Your question may be only aimed at people who have studied the relevant part of the book, but to me it is very unclear what you mean here with “recover” and “express” in “recover causal models given only the training data” or “minimally expressive among those that can express the observational distribution”.
[Question] Why do Minimal Bayes Nets often correspond to Causal Models of Reality?
Chapter 2 of Pearl’s Causality book claims you can recover causal models given only the observational data, under very natural assumptions of minimality and stability[1].
In graphical models lingo, Pearl identifies a causal model of the observational distribution with the distribution’s perfect map (if they exist).
But I’m confused about a pretty fundamental point: “What does this have to do at all with causality??” More precisely:
“Okay, it’s pretty cool that minimality and stability alone lets us narrow down such a large number of arrow directions (in the Bayes Net independency sense) of the minimal network. But … what does this have to do with arrow directions in the causal sense, i.e. [independent stable mechanisms of reality that, by the virtue of its independence, respond to interventions in a modular way]”?
To be clear, Pearl acknowledges this in his Temporal Bias Conjecture (2.8.2):
And Pearl conjectures that the reason for this is possibly because human language is optimized such that our [choice of variables / factorization of reality] makes the Temporal Bias true.
I … guess that could be an explanation? But honestly I don’t think I understand his point very well and I find it pretty unsatisfying. I would appreciate any explanation as to why it makes sense to identify perfect maps with Causal Models.
Minimality: Choose the network structure that is minimally expressive among those that can express the observational distribution.
This is pretty reasonable imo, occam’s razor blah blah
Stability: Assume that there exists a network structure that perfectly captures all and only the independencies implied by the observational distribution. i.e. independencies are structural.
Stability is a reasonable assumption since it would be pretty unlikely for the conditional probability distributions to be fine-tuned as to cancel each other out and induce an independency not present in the network.
Your question may be only aimed at people who have studied the relevant part of the book, but to me it is very unclear what you mean here with “recover” and “express” in “recover causal models given only the training data” or “minimally expressive among those that can express the observational distribution”.