The short of it, having read a few of Pearl’s papers and taken a lecture with him, is that you build causal networks including every variable you think of and then use physical assumptions to eliminate some edges from the fully connected (assumption free) graph.
With this partially connected causal graph, Pearl identifies a number of structures which allow you to estimate correlations where all identified confounding variables are corrected for (which can be interpreted as causation under the assumptions of your graph).
Often times, it seems like these methods only serve to show you just how bad a situation “estimation causation” actually is, but it’s possible to design experiments (or get lucky, or make strong assumptions) so as to turn them into useful tools.
Clearly one could split a data set using basically any possible variable, but most are obviously wrong. (That is to say, they lack explanatory power, and are actually irrelevant.) To attempt to simplify, then, if you understand a system, or have a good hypothesis, it is frequently easier to pick variables that should be important, and gather further data to confirm.
This is made explicit in removing connections from the graph. The more “obviously” “wrong” connections you sever, the more powerful the graph becomes. This is potentially harmful, though, since like assigning 0 probability weight to some outcome, once you sever a connection you lose the machinery to reason about it. If your “obvious” belief proves incorrect, you’ve backed yourself into a room with no escape. Therefore, test your assumptions.
This is actually a huge component of Pearl’s methods since his belief is that the very mechanism of adding causal reasoning to probability is to include “counterfactual” statements that encode causation into these graphs. Without counterfactuals, you’re sunk. With them, you have a whole new set of concerns but are also made more powerful.
It’s also really, really important to dispute that “one could split a data set using basically any possible variable”. While this is true in principle, Pearl made/confirmed some great discoveries by his causal networks which helped to show that certain sets of conditioning variables will, when selected together, actively mislead you. Moreover, without using counterfactual information encoded in a causal graph, you cannot discover which variables these are.
Finally, I’d just like to suggest that picking a good hypothesis, coming to understand a system; these are undoubtedly the hardest part of knowledge involving creativity, risk, and some of the most developed probabilistic arguments. Actually making comparisons between competing hypotheses such that you can end up with a good model and know what “should be important” is the tough part fraught with possibility of failure.
I’d really like to see the follow-up on how to decide which data to actually use. Right now, it’s pretty unsatisfactory and I’m left quite confused.
(Unless this was an elaborate plot to get me to read Judea Pearl, whose book I just picked up, in which case, gratz.)
The short of it, having read a few of Pearl’s papers and taken a lecture with him, is that you build causal networks including every variable you think of and then use physical assumptions to eliminate some edges from the fully connected (assumption free) graph.
With this partially connected causal graph, Pearl identifies a number of structures which allow you to estimate correlations where all identified confounding variables are corrected for (which can be interpreted as causation under the assumptions of your graph).
Often times, it seems like these methods only serve to show you just how bad a situation “estimation causation” actually is, but it’s possible to design experiments (or get lucky, or make strong assumptions) so as to turn them into useful tools.
Clearly one could split a data set using basically any possible variable, but most are obviously wrong. (That is to say, they lack explanatory power, and are actually irrelevant.) To attempt to simplify, then, if you understand a system, or have a good hypothesis, it is frequently easier to pick variables that should be important, and gather further data to confirm.
This is made explicit in removing connections from the graph. The more “obviously” “wrong” connections you sever, the more powerful the graph becomes. This is potentially harmful, though, since like assigning 0 probability weight to some outcome, once you sever a connection you lose the machinery to reason about it. If your “obvious” belief proves incorrect, you’ve backed yourself into a room with no escape. Therefore, test your assumptions.
This is actually a huge component of Pearl’s methods since his belief is that the very mechanism of adding causal reasoning to probability is to include “counterfactual” statements that encode causation into these graphs. Without counterfactuals, you’re sunk. With them, you have a whole new set of concerns but are also made more powerful.
It’s also really, really important to dispute that “one could split a data set using basically any possible variable”. While this is true in principle, Pearl made/confirmed some great discoveries by his causal networks which helped to show that certain sets of conditioning variables will, when selected together, actively mislead you. Moreover, without using counterfactual information encoded in a causal graph, you cannot discover which variables these are.
Finally, I’d just like to suggest that picking a good hypothesis, coming to understand a system; these are undoubtedly the hardest part of knowledge involving creativity, risk, and some of the most developed probabilistic arguments. Actually making comparisons between competing hypotheses such that you can end up with a good model and know what “should be important” is the tough part fraught with possibility of failure.
wholeheartedly agree, but as I said, I’m not confident that I understand Pearl well enough to be the one to write it.