One form of more convincing evidence based on observational longitudinal data is using g-computation to adjust for the so called “time varying confounders” of lead exposure.
The paper is 120 pages, but the short version is, in graphical terms all you do is pretend that you are interested in lead exposure interventions (via do(.)) at every time slice, and simply identify this causal effect from the observational data you have. The trick is you can’t adjust for confounders as usual, because of this issue:
C → A1 → L → A2 → Y
Say A1, A2 are exposures to lead at two time slices, C is baseline confounders, L is an intermediate response, and Y is a final response. The issue is the usual adjustment here:
p[y|do[a1,a2]]=∫l∫cp[y|a1,a2,c,l]p[c,l]dcdl
is wrong. That’s because C, L and Y are all confounded by things we aren’t observing, and moreover if you condition on L, you open a path A1 → L <-> Y via these unobserved confounders which you do not want open. Here L is the “time-varying confounder”: for the purposes of A2 we want to adjust for it, but for the purposes of A1 we do not. This implies the above formula is actually wrong and will bias your estimate of the early lead exposure A1 on Y.
The issue here is you still might not have all the confounders at every time slice. But this kind of evidence is still far better than nothing at all (e.g. reporting correlations across 23 years).
Prediction: if you did this analysis, you would find no statistically significant effect on any scale.
One form of more convincing evidence based on observational longitudinal data is using g-computation to adjust for the so called “time varying confounders” of lead exposure.
A classic paper on this from way back in 1986 is this: http://www.biostat.harvard.edu/robins/new-approach.pdf
The paper is 120 pages, but the short version is, in graphical terms all you do is pretend that you are interested in lead exposure interventions (via do(.)) at every time slice, and simply identify this causal effect from the observational data you have. The trick is you can’t adjust for confounders as usual, because of this issue:
C → A1 → L → A2 → Y
Say A1, A2 are exposures to lead at two time slices, C is baseline confounders, L is an intermediate response, and Y is a final response. The issue is the usual adjustment here:
p[y|do[a1,a2]]=∫l∫cp[y|a1,a2,c,l]p[c,l]dcdl
is wrong. That’s because C, L and Y are all confounded by things we aren’t observing, and moreover if you condition on L, you open a path A1 → L <-> Y via these unobserved confounders which you do not want open. Here L is the “time-varying confounder”: for the purposes of A2 we want to adjust for it, but for the purposes of A1 we do not. This implies the above formula is actually wrong and will bias your estimate of the early lead exposure A1 on Y.
What we want to do instead is this:
p[y|do[a1,a2]]=∫l∫cp[y|a1,a2,c,l]p[l|a1,c]p[c]dcdl
The issue here is you still might not have all the confounders at every time slice. But this kind of evidence is still far better than nothing at all (e.g. reporting correlations across 23 years).
Prediction: if you did this analysis, you would find no statistically significant effect on any scale.