What do the stats experts here think of “adjusting for confounders”?
I notice I often see correlational studies I would expect to be confounded by something (for example, a study showing that people who go to the doctor more often are less likely to get heart attacks might be confounded by income, since rich people can afford more doctor visits and also have different diets). Then the study says “We adjusted for confounders including income, class, race, and education, and the effect remained robust”. Then they do a controlled experiment investigating the same thing and the effect disappears.
Is there any conventional wisdom in the stats community about how far to trust these kinds of adjustments?
What is going on here is that what we care about is some causal parameter, for instance “average causal effect (ACE) : E[Y | do(A=1)] - E[Y | do(A=0)].”
This parameter is sometimes identified, and sometimes not identified.
If it is NOT identified, it is simply not a function of the observed data. So any sort of number you get by massaging the observed data will not equal to the ACE. Naturally, if we try to randomize (which will get us the ACE directly) we will not reproduce what our observational data massage got us.
If it IS identified, then it is the matter of what functional of the observed data equals to the ACE. Maybe if we have treatment A, outcome Y, and a set of baseline confounders, the correct functional is:
\sum_{c} ( E[Y | A=1,c] - E[Y | A=0,c] ) p(c)
This is what “adjusting for confounders” means.
However, maybe that’s not the right functional at all! Maybe you have a mediating variable M between A and Y, and the right functional is:
How do we tell what functional is right? We have to agree on what the right causal graph is for our problem, and then consult an algorithm that will either give us the right functional for the ACE given the graph, or tell us the ACE is not identifiable given the graph we got. This algorithm was what a part of my thesis was about.
There is one important historical example of people ignoring graph structure to their peril. In epidemiology people worry about something called the “healthy worker survivor effect.” Say we have workers who work with asbestos, which is a bad chemical. We want to get an idea of how bad it is by running a study. The longer you work with asbestos, the worse your outcome. However, if you are sick, you will probably terminate employment early, which means you will not get more exposure to asbestos. So people who get more asbestos are also healthier. So it might seem based on observational data that even though we suspect asbestos is very bad for you, it seems to have a protective effect on workers. This is the “healthy worker survivor effect.”
If we were to draw a simple graph with two time slices for this, we would get:
A1 → H → A2 → D
where A1 and A2 are asbestos exposure, H is health status after A1, and D is death (or not). H and D are confounded by a common cause we do not see H ← U → D. A1 determines H. If H is bad enough, it will cause the worker to leave, and thus set A2 to 0. A1 and A2 determine D.
What we want here is E[D | do(a1,a2)]. The point is that blindly adjusting for H is incorrect, because of the particular graph structure where H arises. H is a standard confounder for A2, but is NOT a standard confounder for A1 (H is what is called a “time-varying confounder.”) So you need to use a particular form of adjustment called “g-computation”:
\sum_{h} E[D | a1,a2,h] p[h | a1]
If you use the standard adjustment
\sum_{h} E[D | a1,a2,h] p[h]
you will get a biased answer. Jamie Robins wrote a giant 120 page paper in 1986 (that no one ever reads) on (among many many other things) this precise issue:
(edit: the reason you get bias with standard adjustment is because A1 → H ← U is in your graph. If you condition on H, A1 and U become dependent: this is the so called “Berkson’s bias, selection bias, collider stratification bias, or explaining away phenomenon.” So standard adjustment creates a non-causal path A1 → H ← U → Y between a treatment and the outcome which accounts for part of the magnitude of the effect, and thus creates bias.)
What happens in practice is if you try to get the ACE from observed data, you will have too much confounding to get identification by any method (adjustment or anything else, really). So you need some sort of extra “trick.” Maybe you can find a good instrumental variable. Or maybe you have a natural experiment. Or maybe you had really good data collection that really observed most important confounders. Or maybe the treatment variable only has observed parents (this happens in observational longitudinal studies sometimes). If you just blindly use covariate adjustment without thinking about your causal structure you will generally get garbage.
I’d very much like to know more about this too. I distinctly recall at least three separate papers by separate authors in different domains that mentioned “adjusting for confounders” on things that I thought would be, so I thought “Oh, it’s fine then, they did their thinking properly and there is an effect!”. (At times like these I wish I remembered things or kept notes on research papers a fraction as diligently as Yvain or gwern...)
Then I read further because now I’m very interested in why, and upon finding the details of their adjustments, in small print under one of the many tables of Annex F, I discover that the “adjustment” was that they guessed at an effective rate for the confounders and plugged that percentage in their math. “Oh, there must be about 15% more rich people than poor people who go to the doctor for any given condition, so let’s adjust the results by that amount and see if things work!”
(I’m exaggerating for dramatization, these guess numbers are rarely “hidden” in this way and rarely this important, but even tiny examples of such piss me off and I get angry at the paper for it every time.)
Is there any conventional wisdom in the stats community about how far to trust these kinds of adjustments?
In my experience there’s no general answer other than the observation that if people did NOT adjust for confounders, it’s usually a very bad sign. But if they did, you actually have to go read the paper and form your own opinion on whether their adjustments look reasonable, whether they did them correctly, whether they picked the right confounders (or just grabbed whatever characteristics they had handy), etc.
Typically people don’t adjust properly because it’s against their incentives to do so.
As a reader you don’t always know how many different ways the people who made a study tried to adjust for different confounders till they got their result.
What do the stats experts here think of “adjusting for confounders”?
I notice I often see correlational studies I would expect to be confounded by something (for example, a study showing that people who go to the doctor more often are less likely to get heart attacks might be confounded by income, since rich people can afford more doctor visits and also have different diets). Then the study says “We adjusted for confounders including income, class, race, and education, and the effect remained robust”. Then they do a controlled experiment investigating the same thing and the effect disappears.
Is there any conventional wisdom in the stats community about how far to trust these kinds of adjustments?
This is a good thing to read: http://intersci.ss.uci.edu/wiki/pdf/Pearl/22_Greenland.pdf (chapter 22 in Judea’s Festschrift). In particular the contrast between Fig. 1 and Fig. 2 is relevant.
What is going on here is that what we care about is some causal parameter, for instance “average causal effect (ACE) : E[Y | do(A=1)] - E[Y | do(A=0)].”
This parameter is sometimes identified, and sometimes not identified.
If it is NOT identified, it is simply not a function of the observed data. So any sort of number you get by massaging the observed data will not equal to the ACE. Naturally, if we try to randomize (which will get us the ACE directly) we will not reproduce what our observational data massage got us.
If it IS identified, then it is the matter of what functional of the observed data equals to the ACE. Maybe if we have treatment A, outcome Y, and a set of baseline confounders, the correct functional is:
\sum_{c} ( E[Y | A=1,c] - E[Y | A=0,c] ) p(c)
This is what “adjusting for confounders” means.
However, maybe that’s not the right functional at all! Maybe you have a mediating variable M between A and Y, and the right functional is:
\sum{m} \sum{a’} E(Y | m,a’) p(a’) P(m | A=1) - \sum{m} \sum{a’} E(Y | m,a’) p(a’) P(m | A=0)
How do we tell what functional is right? We have to agree on what the right causal graph is for our problem, and then consult an algorithm that will either give us the right functional for the ACE given the graph, or tell us the ACE is not identifiable given the graph we got. This algorithm was what a part of my thesis was about.
There is one important historical example of people ignoring graph structure to their peril. In epidemiology people worry about something called the “healthy worker survivor effect.” Say we have workers who work with asbestos, which is a bad chemical. We want to get an idea of how bad it is by running a study. The longer you work with asbestos, the worse your outcome. However, if you are sick, you will probably terminate employment early, which means you will not get more exposure to asbestos. So people who get more asbestos are also healthier. So it might seem based on observational data that even though we suspect asbestos is very bad for you, it seems to have a protective effect on workers. This is the “healthy worker survivor effect.”
If we were to draw a simple graph with two time slices for this, we would get:
A1 → H → A2 → D
where A1 and A2 are asbestos exposure, H is health status after A1, and D is death (or not). H and D are confounded by a common cause we do not see H ← U → D. A1 determines H. If H is bad enough, it will cause the worker to leave, and thus set A2 to 0. A1 and A2 determine D.
What we want here is E[D | do(a1,a2)]. The point is that blindly adjusting for H is incorrect, because of the particular graph structure where H arises. H is a standard confounder for A2, but is NOT a standard confounder for A1 (H is what is called a “time-varying confounder.”) So you need to use a particular form of adjustment called “g-computation”:
\sum_{h} E[D | a1,a2,h] p[h | a1]
If you use the standard adjustment
\sum_{h} E[D | a1,a2,h] p[h]
you will get a biased answer. Jamie Robins wrote a giant 120 page paper in 1986 (that no one ever reads) on (among many many other things) this precise issue:
http://www.hsph.harvard.edu/james-robins/files/2013/03/new-approach.pdf
(edit: the reason you get bias with standard adjustment is because A1 → H ← U is in your graph. If you condition on H, A1 and U become dependent: this is the so called “Berkson’s bias, selection bias, collider stratification bias, or explaining away phenomenon.” So standard adjustment creates a non-causal path A1 → H ← U → Y between a treatment and the outcome which accounts for part of the magnitude of the effect, and thus creates bias.)
What happens in practice is if you try to get the ACE from observed data, you will have too much confounding to get identification by any method (adjustment or anything else, really). So you need some sort of extra “trick.” Maybe you can find a good instrumental variable. Or maybe you have a natural experiment. Or maybe you had really good data collection that really observed most important confounders. Or maybe the treatment variable only has observed parents (this happens in observational longitudinal studies sometimes). If you just blindly use covariate adjustment without thinking about your causal structure you will generally get garbage.
I’d very much like to know more about this too. I distinctly recall at least three separate papers by separate authors in different domains that mentioned “adjusting for confounders” on things that I thought would be, so I thought “Oh, it’s fine then, they did their thinking properly and there is an effect!”. (At times like these I wish I remembered things or kept notes on research papers a fraction as diligently as Yvain or gwern...)
Then I read further because now I’m very interested in why, and upon finding the details of their adjustments, in small print under one of the many tables of Annex F, I discover that the “adjustment” was that they guessed at an effective rate for the confounders and plugged that percentage in their math. “Oh, there must be about 15% more rich people than poor people who go to the doctor for any given condition, so let’s adjust the results by that amount and see if things work!”
(I’m exaggerating for dramatization, these guess numbers are rarely “hidden” in this way and rarely this important, but even tiny examples of such piss me off and I get angry at the paper for it every time.)
In my experience there’s no general answer other than the observation that if people did NOT adjust for confounders, it’s usually a very bad sign. But if they did, you actually have to go read the paper and form your own opinion on whether their adjustments look reasonable, whether they did them correctly, whether they picked the right confounders (or just grabbed whatever characteristics they had handy), etc.
Typically people don’t adjust properly because it’s against their incentives to do so.
It pretty easy to abuse controlling factors.
As a reader you don’t always know how many different ways the people who made a study tried to adjust for different confounders till they got their result.