Not to mention the inverse correlation between IQ and number of offspring, although humans tend to use forms of group selection, where a smarter minority whose innovations help the community as a whole appears to be more sustainable than a smarter majority.
For the extreme and unrealistic assumption of endogamous mating in IQ subgroups, a differential fertility change of 2.5/1.5 to 1.5/2.5 (high IQ/low IQ) causes a maximum shift of four IQ points. For random mating, the shift is less than one IQ point.
When controlling for education and socioeconomic status, the relationship between intelligence and number of children [...] reduces to statistical insignificance.
Not in this case. This is a good example of how you can go wrong by overcontrolling (or maybe we should chalk this up as an example of how correlations!=causations?)
Suppose the causal model of Genes->Intelligence->Education->Less-Reproduction is true (and there are no other relationships). Then if we regress on Less-Reproduction and include Intelligence & Education as predictors, we discover that after controlling for Education, Intelligence adds no predictive value & explains no variance & is uncorrelated with Less-Reproduction. Sure, of course: all Intelligence is good for is predicting Education, but we already know each individual’s Education. This is an interesting and valid result worth further research in our hypothetical world.
Does this mean dysgenics will be false, since the coefficient of Intelligence is estimated at ~0 by our little regression formula? Nope! We can get dysgenics easily: people with high levels of Genes will cause high levels of Intelligence, which will cause high levels of Education, which will cause high levels of Less-Reproduction, which means that their genes will be be selected against and the next generation start with lower Genes. Even though it’s all Education’s fault for causing Less-Reproduction, it’s still going to hammer the Genes.
James Ray (2003a, 2003b) has discussed several ways in which this garbage-can approach to research can go wrong, even in the simplest cases. First, researchers may be operating in a multi-equation system, perhaps with a triangular causal structure. For example, we may have three endogenous variables, with this causal order: y1 → y2 → y3 . If so, then y1 has an indirect impact on y3 (via y2 ), but controlled for y2 , it has no direct impact...Under the appropriate conditions, the estimated coefficient βˆ1 will represent the direct effect of y1 , and that estimate will converge to zero in this case, even though the total effect may be substantial.^1 If a researcher foolishly concludes from this vanishing coefficient that the total effect of y1 is zero, then, of course, a statistical error has been committed. It may be worth knowing for descriptive purposes that a variable like y2 is correlated with the dependent variable, but as Ray says, that does not mean that it belongs in a regression as a control factor.^2
Ray, J. L. 2003a. “Explaining interstate conflict and war: What should be controlled for?” Presidential address to the Peace Science Society, University of Arizona, Tucson, November 2, 2002.
Ray, J. L. 2003b. “Constructing multivariate analyses (of dangerous dyads)”. Paper prepared for the annual meeting of the Peace Science Society, University of Michigan, Ann Arbor, Michigan, November 13.
We define overadjustment bias as control for an intermediate variable (or a descending proxy for an intermediate variable) on a causal path from exposure to outcome. We define unnecessary adjustment as control for a variable that does not affect bias of the causal relation between exposure and outcome but may affect its precision. We use causal diagrams and an empirical example (the effect of maternal smoking on neonatal mortality) to illustrate and clarify the definition of overadjustment bias, and to distinguish overadjustment bias from unnecessary adjustment. Using simulations, we quantify the amount of bias associated with overadjustment. Moreover, we show that this bias is based on a different causal structure from confounding or selection biases.
Hi, sorry I missed this post earlier. Yes, this is sometimes called overadjustment. Their definition of overadjustment is incomplete—they are missing the case where there is a variable associated with both exposure and outcome, is not an intermediate variable, but adjusting for it increases bias anyways. This case has a different name, M-bias, and occurs for instance in this graph:
A → Y ← H1 → M ← H2 → A
Say we do not observe H1, H2, and A is our exposure (treatment), Y is our outcome. The right thing to do here is to not adjust for M. It’s called “M-bias” because the part of this graph involving H variables kind of looks like an M, if you draw it using the standard convention of unobserved confounders on top.
But there is a wider problem here than this, because sometimes what you are doing is ‘adjusting for confounders,’ but in reality you shouldn’t even be using the formula that adjusting for confounders gives you, but use another formula. This happens for example with longitudinal studies (with a non-genetic treatment that is vulnerable to confounders over time). In such studies you want to use something called the g-computation algorithm instead of adjusting for confounders.
I guess if I were to name the resulting bias, it would be “causal model misspecification bias.” That is, you are adjusting for confounders in a particular way because you think the true causal model is a certain way, but you are wrong about that—the model is actually different and the causal effect requires a different approach from what you are using.
I have a paper with Tyler Vanderweele and Jamie Robins that characterizes exactly what has to be true on the graph for adjustment to be valid for causal effects. So you will get bias from adjustment (for a particular set) if and only if the condition in the paper does not hold for your model.
Ah, I stand corrected, then. Unless intelligence correlates enough with education and socioeconomic status to make the above meaningless, but it’d be weird to control for it if such was the case.
Not to mention the inverse correlation between IQ and number of offspring, although humans tend to use forms of group selection, where a smarter minority whose innovations help the community as a whole appears to be more sustainable than a smarter majority.
wiki
Does it make sense to control for education and socioeconomic status when measuring the effect of intelligence?
Not in this case. This is a good example of how you can go wrong by overcontrolling (or maybe we should chalk this up as an example of how correlations!=causations?)
Suppose the causal model of Genes->Intelligence->Education->Less-Reproduction is true (and there are no other relationships). Then if we regress on Less-Reproduction and include Intelligence & Education as predictors, we discover that after controlling for Education, Intelligence adds no predictive value & explains no variance & is uncorrelated with Less-Reproduction. Sure, of course: all Intelligence is good for is predicting Education, but we already know each individual’s Education. This is an interesting and valid result worth further research in our hypothetical world.
Does this mean dysgenics will be false, since the coefficient of Intelligence is estimated at ~0 by our little regression formula? Nope! We can get dysgenics easily: people with high levels of Genes will cause high levels of Intelligence, which will cause high levels of Education, which will cause high levels of Less-Reproduction, which means that their genes will be be selected against and the next generation start with lower Genes. Even though it’s all Education’s fault for causing Less-Reproduction, it’s still going to hammer the Genes.
I don’t know if this sort of problem has a widely-known name (IlyaShpitser might know one); I’ve seen it described in some papers but without a specific term attached, for example, “Let’s Put Garbage-Can Regressions and Garbage-Can Probits Where They Belong”:
Epidemiology seems to call this “overadjustment”; for example, “Overadjustment Bias and Unnecessary Adjustment in Epidemiologic Studies”:
Hi, sorry I missed this post earlier. Yes, this is sometimes called overadjustment. Their definition of overadjustment is incomplete—they are missing the case where there is a variable associated with both exposure and outcome, is not an intermediate variable, but adjusting for it increases bias anyways. This case has a different name, M-bias, and occurs for instance in this graph:
A → Y ← H1 → M ← H2 → A
Say we do not observe H1, H2, and A is our exposure (treatment), Y is our outcome. The right thing to do here is to not adjust for M. It’s called “M-bias” because the part of this graph involving H variables kind of looks like an M, if you draw it using the standard convention of unobserved confounders on top.
But there is a wider problem here than this, because sometimes what you are doing is ‘adjusting for confounders,’ but in reality you shouldn’t even be using the formula that adjusting for confounders gives you, but use another formula. This happens for example with longitudinal studies (with a non-genetic treatment that is vulnerable to confounders over time). In such studies you want to use something called the g-computation algorithm instead of adjusting for confounders.
I guess if I were to name the resulting bias, it would be “causal model misspecification bias.” That is, you are adjusting for confounders in a particular way because you think the true causal model is a certain way, but you are wrong about that—the model is actually different and the causal effect requires a different approach from what you are using.
I have a paper with Tyler Vanderweele and Jamie Robins that characterizes exactly what has to be true on the graph for adjustment to be valid for causal effects. So you will get bias from adjustment (for a particular set) if and only if the condition in the paper does not hold for your model.
Ah, I stand corrected, then. Unless intelligence correlates enough with education and socioeconomic status to make the above meaningless, but it’d be weird to control for it if such was the case.
Doesn’t that inverse correlation disappear when you control for schooling?