Graphical models are only a “thing” because our brain dedicates lots of processing to vision, so, for instance, we immediately understand complicated conditional independence statements if expressed in the visual form of d-separation. In some sense, graphs in the context of graphical models do not really add any extra information mathematically that wasn’t already encoded even without graphs.
Given this, I am not sure there really is a context for graphical models separate from the context of “variables and their relationships”. What you are saying above is that we seem to need “something extra” to be able to tell the direction of causality in a two variable system. (For example, in an additive noise model you can do this:
I think the “no causes in—no causes out” principle is more general than that though. For example if we had a three variable case, with variables A, B, C where:
A is marginally independent of B, but no other independences hold, than the only faithful graphical explanation for this model is:
A → C ← B
It seems that, unlike the previous case, here there is no causal ambiguity—A points to C, and B points to C. However, since the only information you inserted into the procedure which gave you this graph is the information about conditional independences, all you are getting out is a graphical description of a conditional independence model (that is a Bayesian network, or a statistical DAG model). In particular, the absence of arrows aren’t telling you about absent causal relationships (that is whether A would change if I intervene on C), but absent statistical relationships (that is, whether A is independent of B). The statistical interpretation of the above graph is that it corresponds to a set of densities:
{ p(A,B,C) | A is independent of B }
The same graph can also correspond to a causal model, where we are explicitly talking about interventions, that is:
{ p(A,B,C,C(a,b),B(a)) | C(a,b) is independent of B(a) is independent of A, p(B(a)) = p(B) }
where C(a,b) is just stats notation for do(.), that is p(C(a,b)) = p(C | do(a,b)).
This is a different object from before, and the interpretation of arrows is different. That is, the absence of an arrow from A to B means that intervening on A does not affect B, etc. This causal model also induces an independence model on the same graph, where the interpretation of arrows changes back to the statistical interpretation. However, we could imagine a very different causal model on three variables, that will also induce the same independence model where A is marginally independent of B. For example, maybe the set of all densities where the real direction of causality is A → C → B, but somehow the probabilities involved happened to line up in such a way that A is marginally independent of B. In other words, the mapping from causal to statistical models is many to one.
Given this view, it seems pretty clear that going from independences to causal models (even via a very complicated procedure) involves making some sort of assumption that makes the mapping one to one. Maybe the prior in Solomonoff induction gives this to you, but my intuitions about what non-computable procedures will do are fairly poor.
It sort of seems like Solomonoff induction operates at a (very low) level of abstraction where interventionist causality isn’t really necessary (because we just figure out what the observable environment as a whole—including action-capable agents, etc. -- will do), and thus isn’t explicitly represented. This is similar to how Blockhead (http://en.wikipedia.org/wiki/Blockhead_(computer_system%29) does not need an explicit internal model of the other participant in the conversation.
I think Solomonoff induction is sort of a boring subject, if one is interested in induction, in the same sense that Blockhead is boring if one is interested in passing the Turing test, and particle physics is boring if one is interested in biology.
Graphical models are only a “thing” because our brain dedicates lots of processing to vision, so, for instance, we immediately understand complicated conditional independence statements if expressed in the visual form of d-separation. In some sense, graphs in the context of graphical models do not really add any extra information mathematically that wasn’t already encoded even without graphs.
Given this, I am not sure there really is a context for graphical models separate from the context of “variables and their relationships”. What you are saying above is that we seem to need “something extra” to be able to tell the direction of causality in a two variable system. (For example, in an additive noise model you can do this:
http://machinelearning.wustl.edu/mlpapers/paper_files/ShimizuHHK06.pdf)
I think the “no causes in—no causes out” principle is more general than that though. For example if we had a three variable case, with variables A, B, C where:
A is marginally independent of B, but no other independences hold, than the only faithful graphical explanation for this model is:
A → C ← B
It seems that, unlike the previous case, here there is no causal ambiguity—A points to C, and B points to C. However, since the only information you inserted into the procedure which gave you this graph is the information about conditional independences, all you are getting out is a graphical description of a conditional independence model (that is a Bayesian network, or a statistical DAG model). In particular, the absence of arrows aren’t telling you about absent causal relationships (that is whether A would change if I intervene on C), but absent statistical relationships (that is, whether A is independent of B). The statistical interpretation of the above graph is that it corresponds to a set of densities:
{ p(A,B,C) | A is independent of B }
The same graph can also correspond to a causal model, where we are explicitly talking about interventions, that is:
{ p(A,B,C,C(a,b),B(a)) | C(a,b) is independent of B(a) is independent of A, p(B(a)) = p(B) }
where C(a,b) is just stats notation for do(.), that is p(C(a,b)) = p(C | do(a,b)).
This is a different object from before, and the interpretation of arrows is different. That is, the absence of an arrow from A to B means that intervening on A does not affect B, etc. This causal model also induces an independence model on the same graph, where the interpretation of arrows changes back to the statistical interpretation. However, we could imagine a very different causal model on three variables, that will also induce the same independence model where A is marginally independent of B. For example, maybe the set of all densities where the real direction of causality is A → C → B, but somehow the probabilities involved happened to line up in such a way that A is marginally independent of B. In other words, the mapping from causal to statistical models is many to one.
Given this view, it seems pretty clear that going from independences to causal models (even via a very complicated procedure) involves making some sort of assumption that makes the mapping one to one. Maybe the prior in Solomonoff induction gives this to you, but my intuitions about what non-computable procedures will do are fairly poor.
It sort of seems like Solomonoff induction operates at a (very low) level of abstraction where interventionist causality isn’t really necessary (because we just figure out what the observable environment as a whole—including action-capable agents, etc. -- will do), and thus isn’t explicitly represented. This is similar to how Blockhead (http://en.wikipedia.org/wiki/Blockhead_(computer_system%29) does not need an explicit internal model of the other participant in the conversation.
I think Solomonoff induction is sort of a boring subject, if one is interested in induction, in the same sense that Blockhead is boring if one is interested in passing the Turing test, and particle physics is boring if one is interested in biology.