“From your description, you say that Rubin insists on conditioning on all available data, so that includes W. But that doesn’t mean he has to get garbage, that just means he needs the right conditional.”
The right expression for p(y | do(x)) in this example should ignore W, that’s all there is to it. It’s not a notational issue.
“You can assign probabilities using observational data to create P(X1...XN | Intervention=No). How do I use that model to assign P(X1...XN | Intervention=Yes)?”
Good question! The answer is to use something called the consistency assumption (I think Pearl might call it “composition” in his book). This states, roughly that Y(X) = Y. (That is, observing Y when there is no intervention is the same as observing Y when X is intervened to attain whatever value it would naturally attain). This assumption is untestable, but to my knowledge every single paper in causal inference makes this assumption in some form. Without something like this assumption there is no link between the data we observe and the data after a hypothetical intervention.
I think the kinds of examples that are drastically biased given Rubin’s “condition on everything” policy are not very common in practical data analysis problems, but it’s certainly easy to construct them. While I have not asked him, I suspect if I were to put a gun to Rubin’s head and gave him the above example, he will admit to not adjusting on W (and then say the situations in the example never happen in practice).
My view: M-bias is a special case of a more general issue where conditioning opens paths (due to how d-separation works in graphs). The way this issue manifests in practice is people assume they observe all confounders, adjust for them, get an estimate, and call it a day. In practice, their assumption is wrong, adjusting for all observable confounders opens a bunch of non-causal paths due to the inevitable presence of hidden variables, and the estimate they get is biased for this reason. There is, however, some evidence that this bias is sometimes not very big (I think Sander Greenland did some work on this)
“From your description, you say that Rubin insists on conditioning on all available data, so that includes W. But that doesn’t mean he has to get garbage, that just means he needs the right conditional.”
The right expression for p(y | do(x)) in this example should ignore W, that’s all there is to it. It’s not a notational issue.
“You can assign probabilities using observational data to create P(X1...XN | Intervention=No). How do I use that model to assign P(X1...XN | Intervention=Yes)?”
Good question! The answer is to use something called the consistency assumption (I think Pearl might call it “composition” in his book). This states, roughly that Y(X) = Y. (That is, observing Y when there is no intervention is the same as observing Y when X is intervened to attain whatever value it would naturally attain). This assumption is untestable, but to my knowledge every single paper in causal inference makes this assumption in some form. Without something like this assumption there is no link between the data we observe and the data after a hypothetical intervention.
I think the kinds of examples that are drastically biased given Rubin’s “condition on everything” policy are not very common in practical data analysis problems, but it’s certainly easy to construct them. While I have not asked him, I suspect if I were to put a gun to Rubin’s head and gave him the above example, he will admit to not adjusting on W (and then say the situations in the example never happen in practice).
My view: M-bias is a special case of a more general issue where conditioning opens paths (due to how d-separation works in graphs). The way this issue manifests in practice is people assume they observe all confounders, adjust for them, get an estimate, and call it a day. In practice, their assumption is wrong, adjusting for all observable confounders opens a bunch of non-causal paths due to the inevitable presence of hidden variables, and the estimate they get is biased for this reason. There is, however, some evidence that this bias is sometimes not very big (I think Sander Greenland did some work on this)