I have not forgotten about your paper, I am just extremely busy until early March. Three quick comments though:
(a) People have viewed cyclic models as defining a stable distribution in an appropriate Markov chain. There are some complications, and it seems with cyclic models (unlike the DAG case) the graph which predicts what happens after an intervention, and the graph which represents the independence structure of the equilibrium distribution are not the same graph (this is another reason to treat the statistical and causal graphical models separately). See Richardson and Lauritzen’s chain graph paper for a simple 4 node example of this.
So when we say there is a faithfulness violation, we have to make sure we are talking about the right graph representing the right distribution.
(b) In general I view a derivative not as a node, but as an effect. So e.g. in a linear model:
y = f(x) = ax + e
dy/dx = a = E[y|do(x=1)] - E[y|do(x=0)], which is just the causal effect of x on y on the mean difference scale.
In general, the partial derivative of the outcome wrt some treatment holding the other treatments constant is a kind of direct causal effect. So viewed through that lens it is not perhaps so surprising that x and dy/dx are independent. After all, the direct effect/derivative is a function of p(y|do(x),do(other parents of y)), and we know do(.) cuts incoming arcs to y, so the distribution p(y|do(x),do(other parents of y)) is independent of p(x) by construction.
But this is more an explanation of why derivatives sensibly represent interventional effects, not whether there is something more to this observation (I think there might be). I do feel that Newton’s intuition for doing derivatives was trying to formalize a limit of “wiggle the independent variable and see what happens to the dependent variable”, which is precisely the causal effect. He was worried about physical systems, also, where causality is fairly clear.
In general, p(y) and any function of p(y | do(x)) are not independent of course.
(c) I think you define a causal model in terms of the Markov factorization, which I disagree with. The Markov factorization
p[x1,…,xn]=∏ip[xi|pa[xi]]
defines a statistical model. To define a causal model you essentially need to formally state that parents of every node are that node’s direct causes. Usually people use the truncated factorization (g-formula) to do this. See, e.g. chapter 1 in Pearl’s book.
Dear Richard,
I have not forgotten about your paper, I am just extremely busy until early March. Three quick comments though:
(a) People have viewed cyclic models as defining a stable distribution in an appropriate Markov chain. There are some complications, and it seems with cyclic models (unlike the DAG case) the graph which predicts what happens after an intervention, and the graph which represents the independence structure of the equilibrium distribution are not the same graph (this is another reason to treat the statistical and causal graphical models separately). See Richardson and Lauritzen’s chain graph paper for a simple 4 node example of this.
So when we say there is a faithfulness violation, we have to make sure we are talking about the right graph representing the right distribution.
(b) In general I view a derivative not as a node, but as an effect. So e.g. in a linear model:
y = f(x) = ax + e
dy/dx = a = E[y|do(x=1)] - E[y|do(x=0)], which is just the causal effect of x on y on the mean difference scale.
In general, the partial derivative of the outcome wrt some treatment holding the other treatments constant is a kind of direct causal effect. So viewed through that lens it is not perhaps so surprising that x and dy/dx are independent. After all, the direct effect/derivative is a function of p(y|do(x),do(other parents of y)), and we know do(.) cuts incoming arcs to y, so the distribution p(y|do(x),do(other parents of y)) is independent of p(x) by construction.
But this is more an explanation of why derivatives sensibly represent interventional effects, not whether there is something more to this observation (I think there might be). I do feel that Newton’s intuition for doing derivatives was trying to formalize a limit of “wiggle the independent variable and see what happens to the dependent variable”, which is precisely the causal effect. He was worried about physical systems, also, where causality is fairly clear.
In general, p(y) and any function of p(y | do(x)) are not independent of course.
(c) I think you define a causal model in terms of the Markov factorization, which I disagree with. The Markov factorization
p[x1,…,xn]=∏ip[xi|pa[xi]]
defines a statistical model. To define a causal model you essentially need to formally state that parents of every node are that node’s direct causes. Usually people use the truncated factorization (g-formula) to do this. See, e.g. chapter 1 in Pearl’s book.