“The mean correlation of IQ scores between monozygotic twins was 0.86, between siblings 0.47, between half-siblings 0.31, and between cousins 0.15.”
I’d like to get an intuitive sense of what those quantities actually mean, “how big” they are, how impressed I should be with them.
I imagine I would do that by working out a series of examples. Examples like...
If I know that Alice has has an IQ of 120, what does that tell me about the IQ of her twin sister Beth? (What should my probability distribution for Beth’s IQ be, after I condition on Alice’s 120 IQ and the 0.86 correlation?) And how does that contrast with what I know about her younger brother Carl?
What if instead, Alice has an IQ of 110? How much does that change what I know about Beth and Carl?
How do I do this kind of computation?
[I’m aware that herritability is a very misleading concept, because as defined, it varies with changes in environmental conditions. I’m less interested in heritability of IQ, in particular, at the moment, and more in the general conversion from correlation to Bayes.]
In theory, you can use measured correlation to rule out models that predict the measured correlation to be some other number. In practice this is not very useful because the space of all possible models is enormous. So what happens in practice is that we make some enormously strong assumptions that restrict the space of possible models to something manageable.
Such assumptions may include: that measured IQ scores consist of some genetic base plus some noise from other factors including environmental factors and measurement error. We might further assume that the inherited base is linear in contributions from genetic factors with unknown weights, and the noise is independent and normally distributed with zero mean and unknown variance parameter. I’ve emphasized some of the words indicating stronger assumptions.
You might think that these assumptions are wildly restrictive and unlikely to be true, and you would be correct. Simplified models are almost never true, but they may be useful nonetheless because we have bounded rationality. So there is now a hypothesis A: “The model is adequate for predicting reality”.
Now that you have a model with various parameters, you can do Bayesian updates to update distributions for parameters—that is the hypotheses “A and (specific parameter values)”—and also various alternative “assumption failure” hypotheses. In the given example, we would very quickly find overwhelming evidence for “the noise is not independent”, and consequently employ our limited capacity for evaluation on a different class of (probably more complex) models.
This hasn’t actually answered your original question “what does that tell me about the IQ of her twin sister Beth?”, because in the absence of a model it tells you essentially nothing. There exist distributions for the conditional distributions of twin IQ (I1,I2) that have a correlation coefficient 0.86 and yield any distribution you like for I1 given I2 = 120. We can rule most of them out on more or less vague grounds of being “biologically implausible”, but not purely from a mathematical perspective.
But let’s continue anyway.
First, we need to know more about the circumstances in which we arrived at this situation, where we knew Alice’s IQ and not Beth’s. Is this event likely to have been dependent in any significant way upon their IQs, or the ordering thereof? Let’s assume not, because that’s simpler. E.g. we just happened to pick some twin pair out of the world and found out one of their IQs at random but not yet the other.
Then maybe we could use a model like the one I introduced, where the IQs I1 and I2 of twins are of the form
I_k = S + e_k,
where S is some shared “predisposition” which is normally distributed, and the noise terms e_k are independent and normally distributed with zero mean and common variance. Common genetics and (usually) common environment would influence S, while individual variations and measurement errors would be considered in the e_k.
Now, this model is almost certainly wrong in important ways. In particular the assumption of independent additivity doesn’t have any experimental evidence for it, and there doesn’t seem to be any reason to expect it to hold (especially for a curve-fitted statistic like IQ). Nonetheless, it’s worth investigating one of the simplest models.
There is some evidence that the distribution of IQ for twins is slightly different from that for the general population, but probably by less than 1 IQ point so it’s fairly safe to assume that both I_1 and I_2 have mean close to 100 and standard deviation close to 15. In this simple model, the correlation coefficient of the population is just var(S) / 15^2, and so if the study was conducted well enough to accurately measure the population correlation coefficient, then we should conclude that standard deviations are near 13.9 for S and 5.6 for e_k.
Now we can look at the distribution of (unknown) S and e_1 that could result in I_1 = 120. Each of these are normally distributed and so the conditional distribution for the components of the sum is also normally distributed, with E[S | I_1 = 120] = 100 + 20 * var(S) / 15^2 and E[e_1 | I_1 = 120] = 20 * var(e_1) / 15^2.
So in this case, the conditional distribution for S will be centered on 117.2. This differs from the mean by a factor of 0.86 of the difference between I_1 and the mean, which is just the correlation coefficient r. The conditional variance for S is √(1-r) times the unconditional variance, so about 5.2.
Now you have enough information to calculate a conditional distribution for Beth. The expected conditional distribution for her IQ would (under this model) be normally distributed with mean ≅ 117.2 and standard deviation 15 √(1 - r^2) ≅ 7.6.
Therefore to the extent that you have credence in this model and the studies estimating those correlations you could expect about a 70% chance for her IQ to be in the range 110 to 125.
Similar calculations for Carl lead to a lower and wider distribution with a 70% range more like 96 to 123.
The corresponding range for cousin Dominic’s distribution would be 88 to 118, almost the same as you might expect for a completely random person (85 to 115).
How do you use a correlation coefficient to do a Bayesian update?
For instance, the wikipedia page on the Heritability of IQ reads:
“The mean correlation of IQ scores between monozygotic twins was 0.86, between siblings 0.47, between half-siblings 0.31, and between cousins 0.15.”
I’d like to get an intuitive sense of what those quantities actually mean, “how big” they are, how impressed I should be with them.
I imagine I would do that by working out a series of examples. Examples like...
If I know that Alice has has an IQ of 120, what does that tell me about the IQ of her twin sister Beth? (What should my probability distribution for Beth’s IQ be, after I condition on Alice’s 120 IQ and the 0.86 correlation?) And how does that contrast with what I know about her younger brother Carl?
What if instead, Alice has an IQ of 110? How much does that change what I know about Beth and Carl?
How do I do this kind of computation?
[I’m aware that herritability is a very misleading concept, because as defined, it varies with changes in environmental conditions. I’m less interested in heritability of IQ, in particular, at the moment, and more in the general conversion from correlation to Bayes.]
In theory, you can use measured correlation to rule out models that predict the measured correlation to be some other number. In practice this is not very useful because the space of all possible models is enormous. So what happens in practice is that we make some enormously strong assumptions that restrict the space of possible models to something manageable.
Such assumptions may include: that measured IQ scores consist of some genetic base plus some noise from other factors including environmental factors and measurement error. We might further assume that the inherited base is linear in contributions from genetic factors with unknown weights, and the noise is independent and normally distributed with zero mean and unknown variance parameter. I’ve emphasized some of the words indicating stronger assumptions.
You might think that these assumptions are wildly restrictive and unlikely to be true, and you would be correct. Simplified models are almost never true, but they may be useful nonetheless because we have bounded rationality. So there is now a hypothesis A: “The model is adequate for predicting reality”.
Now that you have a model with various parameters, you can do Bayesian updates to update distributions for parameters—that is the hypotheses “A and (specific parameter values)”—and also various alternative “assumption failure” hypotheses. In the given example, we would very quickly find overwhelming evidence for “the noise is not independent”, and consequently employ our limited capacity for evaluation on a different class of (probably more complex) models.
This hasn’t actually answered your original question “what does that tell me about the IQ of her twin sister Beth?”, because in the absence of a model it tells you essentially nothing. There exist distributions for the conditional distributions of twin IQ (I1,I2) that have a correlation coefficient 0.86 and yield any distribution you like for I1 given I2 = 120. We can rule most of them out on more or less vague grounds of being “biologically implausible”, but not purely from a mathematical perspective.
But let’s continue anyway.
First, we need to know more about the circumstances in which we arrived at this situation, where we knew Alice’s IQ and not Beth’s. Is this event likely to have been dependent in any significant way upon their IQs, or the ordering thereof? Let’s assume not, because that’s simpler. E.g. we just happened to pick some twin pair out of the world and found out one of their IQs at random but not yet the other.
Then maybe we could use a model like the one I introduced, where the IQs I1 and I2 of twins are of the form
I_k = S + e_k,
where S is some shared “predisposition” which is normally distributed, and the noise terms e_k are independent and normally distributed with zero mean and common variance. Common genetics and (usually) common environment would influence S, while individual variations and measurement errors would be considered in the e_k.
Now, this model is almost certainly wrong in important ways. In particular the assumption of independent additivity doesn’t have any experimental evidence for it, and there doesn’t seem to be any reason to expect it to hold (especially for a curve-fitted statistic like IQ). Nonetheless, it’s worth investigating one of the simplest models.
There is some evidence that the distribution of IQ for twins is slightly different from that for the general population, but probably by less than 1 IQ point so it’s fairly safe to assume that both I_1 and I_2 have mean close to 100 and standard deviation close to 15. In this simple model, the correlation coefficient of the population is just var(S) / 15^2, and so if the study was conducted well enough to accurately measure the population correlation coefficient, then we should conclude that standard deviations are near 13.9 for S and 5.6 for e_k.
Now we can look at the distribution of (unknown) S and e_1 that could result in I_1 = 120. Each of these are normally distributed and so the conditional distribution for the components of the sum is also normally distributed, with E[S | I_1 = 120] = 100 + 20 * var(S) / 15^2 and E[e_1 | I_1 = 120] = 20 * var(e_1) / 15^2.
So in this case, the conditional distribution for S will be centered on 117.2. This differs from the mean by a factor of 0.86 of the difference between I_1 and the mean, which is just the correlation coefficient r. The conditional variance for S is √(1-r) times the unconditional variance, so about 5.2.
Now you have enough information to calculate a conditional distribution for Beth. The expected conditional distribution for her IQ would (under this model) be normally distributed with mean ≅ 117.2 and standard deviation 15 √(1 - r^2) ≅ 7.6.
Therefore to the extent that you have credence in this model and the studies estimating those correlations you could expect about a 70% chance for her IQ to be in the range 110 to 125.
Similar calculations for Carl lead to a lower and wider distribution with a 70% range more like 96 to 123.
The corresponding range for cousin Dominic’s distribution would be 88 to 118, almost the same as you might expect for a completely random person (85 to 115).