I think in Pearl’s example, he may have even made his hypothetical data give the opposite result to the real world.
He introduces a “hypothetical data set,” works through the math, then follows the conclusion that tar deposits protect against cancer with this paragraph:
The data in Table 3.1 are obviously unrealistic and were deliberately crafted so as to support the genotype theory. However, the purpose of this exercise was to demonstrate how reasonable qualitative assumptions about the workings of mechanisms, coupled with nonexperimental data, can produce precise quantitative assessments of causal effects. In reality, we would expect observational studies involving mediating variables to refute the genotype theory by showing, for example, that the mediating consequences of smoking (such as tar deposits) tend to increase, not decrease, the risk of cancer in smokers and nonsmokers alike. The estimand of (3.29) could then be used for quantifying the causal effect of smoking on cancer.
When I read it, I remember being mildly bothered by the example (why not have a clearly fictional example to match clearly fictional data, or find an actual study and use the real data as an example?) but mostly mollified by his extended disclaimer.
(I feel like pointing out, as another example, the decision analysis class that I took, which had a central example which was repeated and extended throughout the semester. The professor was an active consultant, and could have drawn on a wealth of examples in, say, petroleum exploration. But the example was a girl choosing a location for a party, subject to uncertain weather. Why that? Because it was obviously a toy example. If they tried to use a petroleum example for petroleum engineers, the petroleum engineers would be rightly suspicious of any simplified model put in front of them- “you mean this procedure only takes into account two things!?”- and any accurate model would be far too complicated to teach the methodology. An obviously toy example taught the process, and then once they understood the process, they were willing to apply it to more complicated situations- which, of course, needed much more complicated models.)
He introduces a “hypothetical data set,” works through the math, then follows the conclusion that tar deposits protect against cancer with this paragraph:
When I read it, I remember being mildly bothered by the example (why not have a clearly fictional example to match clearly fictional data, or find an actual study and use the real data as an example?) but mostly mollified by his extended disclaimer.
(I feel like pointing out, as another example, the decision analysis class that I took, which had a central example which was repeated and extended throughout the semester. The professor was an active consultant, and could have drawn on a wealth of examples in, say, petroleum exploration. But the example was a girl choosing a location for a party, subject to uncertain weather. Why that? Because it was obviously a toy example. If they tried to use a petroleum example for petroleum engineers, the petroleum engineers would be rightly suspicious of any simplified model put in front of them- “you mean this procedure only takes into account two things!?”- and any accurate model would be far too complicated to teach the methodology. An obviously toy example taught the process, and then once they understood the process, they were willing to apply it to more complicated situations- which, of course, needed much more complicated models.)