I’m doing causal inference in academia. I do not work on causal discovery, neither does anyone of my colleagues, but I can tell my impression about it from occasional seminars on the topic.
The few causal discovery seminars I have seen belong to these categories:
Purely theoretical work that does not show actual applications,
Applied work that does not work well,
Applied work that (maybe) worked but required years and a team.
Consider this a skewed but actual slice of the field.
My own thoughts on the subject matter:
In practice you don’t have the information to completely reconstruct the causal relationships, neither to do it with low enough uncertainty that you can pretend you knew the graph, in cases where you have enough constraints to converge in principle to a single graph with infinite i.i.d. data. So an ideal method would provide you a list of graphs with a posterior probability for each, and then you would carry out the inference conditional on each graph. This is what Bayes tells you to do.
However, a graph with less arcs leads naturally to a lower-dimensional parameter space than one with more arcs, when you try to specify a model. This would suggest that the graph with missing arcs has probability zero. You can try to repair this with delta distributions (i.e., probability mass given to a single point in a continuous space), but does it make sense? As Andrew Gelman sometimes says, everything has a causal effect on everything, it’s just that it can be very small. So maybe a model with shrinkage (i.e., keeping all connections in the graph, but defining a notion of “small” connection in the model and using a prior distribution that prefers simpler graphs) would make more sense.
I’ve not had these doubts answered in seminars nor by asking.
I’m doing causal inference in academia. I do not work on causal discovery, neither does anyone of my colleagues, but I can tell my impression about it from occasional seminars on the topic.
The few causal discovery seminars I have seen belong to these categories:
Purely theoretical work that does not show actual applications,
Applied work that does not work well,
Applied work that (maybe) worked but required years and a team.
Consider this a skewed but actual slice of the field.
My own thoughts on the subject matter:
In practice you don’t have the information to completely reconstruct the causal relationships, neither to do it with low enough uncertainty that you can pretend you knew the graph, in cases where you have enough constraints to converge in principle to a single graph with infinite i.i.d. data. So an ideal method would provide you a list of graphs with a posterior probability for each, and then you would carry out the inference conditional on each graph. This is what Bayes tells you to do.
However, a graph with less arcs leads naturally to a lower-dimensional parameter space than one with more arcs, when you try to specify a model. This would suggest that the graph with missing arcs has probability zero. You can try to repair this with delta distributions (i.e., probability mass given to a single point in a continuous space), but does it make sense? As Andrew Gelman sometimes says, everything has a causal effect on everything, it’s just that it can be very small. So maybe a model with shrinkage (i.e., keeping all connections in the graph, but defining a notion of “small” connection in the model and using a prior distribution that prefers simpler graphs) would make more sense.
I’ve not had these doubts answered in seminars nor by asking.
Finally, @IlyaShpitser may know something more.
This matches my impressions relatively well.