Very good examples of perceptions driving self-selection.
It might be useful to discuss direct and indirect effects.
Suppose we want to compare fatality rates if everyone drove a Volvo versus if no one did. If the fatality rate was lower in the former scenario than in the latter, that would indicate that Volvo’s (causally) decrease fatality rates.
It’s possible that it is entirely through an indirect effect. For example, the decrease in the fatality rate might entirely be due to behavior changes (maybe when you get in a Volvo you think ‘safety’ and drive slower). On the DAG, we would have an arrow from volvo to behavior to fatality, and no arrow from volvo to fatality.
A total causal effect is much easier to estimate. We would need to assume ignorability (conditional independence of assignment given covariates). And even though safer drivers might tend to self-select into the Volvo group, it’s never uniform. Safe drivers who select other vehicles would be given a lot of weight in the analysis. We would just have to have good, detailed data on predictors of driver safety.
Estimating direct and indirect effects is much harder. Typically it requires assuming ignorability of the intervention and the mediator(s). It also typically involves indexing counterfactuals with non-manipulable variables.
as an aside: a machine learning graduate student worked with me last year, and in most simulated data settings that we explored, logistic regression outperformed SVM
Short nitpick—lots of assumptions other than ignorability can work for identifying direct effects (there is more to life than covariate adjustment). In particular, if we can agree on the causal diagram, then all sorts of crazy identification can become possible.
Very good examples of perceptions driving self-selection.
It might be useful to discuss direct and indirect effects.
Suppose we want to compare fatality rates if everyone drove a Volvo versus if no one did. If the fatality rate was lower in the former scenario than in the latter, that would indicate that Volvo’s (causally) decrease fatality rates.
It’s possible that it is entirely through an indirect effect. For example, the decrease in the fatality rate might entirely be due to behavior changes (maybe when you get in a Volvo you think ‘safety’ and drive slower). On the DAG, we would have an arrow from volvo to behavior to fatality, and no arrow from volvo to fatality.
A total causal effect is much easier to estimate. We would need to assume ignorability (conditional independence of assignment given covariates). And even though safer drivers might tend to self-select into the Volvo group, it’s never uniform. Safe drivers who select other vehicles would be given a lot of weight in the analysis. We would just have to have good, detailed data on predictors of driver safety.
Estimating direct and indirect effects is much harder. Typically it requires assuming ignorability of the intervention and the mediator(s). It also typically involves indexing counterfactuals with non-manipulable variables.
as an aside: a machine learning graduate student worked with me last year, and in most simulated data settings that we explored, logistic regression outperformed SVM
Short nitpick—lots of assumptions other than ignorability can work for identifying direct effects (there is more to life than covariate adjustment). In particular, if we can agree on the causal diagram, then all sorts of crazy identification can become possible.