It should be noted that if measured IQ is fat-tailed, this is because there is something wrong with IQ tests. IQ is defined to be normally distributed with a mean of 100 and a standard deviation of either 15 or 16 depending on which definition you’re using. So if measured IQ is fat-tailed, then the tests aren’t calibrated properly(of course, if your test goes all the way up to 160, it is almost inevitably miscalibrated, because there just aren’t enough people to calibrate it with).
You don’t want to force a normal distribution on the data. You’re free to do so if you’d like, e.g. by asking takers millions of questions so as to get very fine levels of granularity, and then mapping people at the 84th percentile of “questions answered correctly” to IQ 115, people at the 98th percentile to IQ 130, etc.
But what you really want is a situation where you have a (log)-linear relationship between standard deviations and other things that IQ correlates with, and if you force the data to obey a normal distribution, you’ll lose this.
The rationale for using a normal distribution is the central limit theorem, but that holds only when the summands are uncorrelated: assortative mating can induce correlations between e.g. having gene A that increases IQ and having gene B that increases IQ.
But what you really want is a situation where you have a (log)-linear relationship between standard deviations and other things that IQ correlates with, and if you force the data to obey a normal distribution, you’ll lose this.
Could you expand on this point? I am not sure I follow it.
Say that you have a function
f: rawScores ---> percentiles
and you want to compose it with a function
g: percentiles ---> IQ scores
so that log(g(f(x))) is as correlated with things that you care about other than IQ as much as possible (income, the log odds ratio of winning a Fields medal, etc.).
The default choice for g would be the function that takes a percentile to the associated standard deviation under a normal distribution. I’m claiming that the best choice for g is probably instead a function that takes a percentile to the associated standard deviation under a distribution that has fatter tails than the normal distribution.
The intuition is:
Measures of the practical significance of IQ are plausibly best modeled as a weighted average of many individual genes that increase IQ. If people had been mating with randomly selected members of the opposite sex, the probabilities of getting two such genes would be independent. But in practice, people (weakly) tend to marry people of intelligence similar to their own (link), inducing a positive correlation between the respective probabilities of a child getting two different genes that contribute to IQ.
is as correlated with things that you care about other than IQ as much as possible (income, the log odds ratio of winning a Fields medal, etc.)
First question: do you actually care about correlation (given that it’s a linear metric) or do you mean some tight dependency, not necessarily linear?
Second question: if that is the case, don’t you want your function g to produce a distribution shaped similarly to the “thing you care about”? If that thing-you-care-about is distributed normally, you would prefer g to generate a normal(-looking) distribution, if it’s distributed, say, as chi-squared, you would prefer g to give you a chi-squared shape, etc...?
Measures of the practical significance of IQ are plausibly best modeled as a weighted average of many individual genes that increase IQ.
That’s an iffy approach. Take, say, income (as a measure of the practical significance of IQ) -- are you saying income is best modeled as a weighted average of many IQ-related genes? You need the concept (and the link) of IQ to identify these genes to start with, but then you want to throw IQ out and go straight from genes to “practical” outcomes.
I agree that assortative mating would lead to a fat-tailed distribution, but your original goal was make IQ correlate with “things you care about” and for that purpose the fat tails are not particularly relevant.
First question: do you actually care about correlation (given that it’s a linear metric) or do you mean some tight dependency, not necessarily linear?
If g(y) is monotonic , then the degree to which there’s a right dependency is independent of g(y), which is just a change of coordinates. I do want to chose g(y) maximize the degree to which the dependency is a linear one.
Second question: if that is the case, don’t you want your function g to produce a distribution shaped similarly to the “thing you care about”? If that thing-you-care-about is distributed normally, you would prefer g to generate a normal(-looking) distribution, if it’s distributed, say, as chi-squared, you would prefer g to give you a chi-squared shape, etc...?
Yes, this is true and a good point, though the distribution of “the thing we care about” will vary from thing to thing, and I think that if we have to used a fixed distribution for IQ that’s uniform over all of them, the log of a fat-tailed distribution is probably the best choice.
are you saying income is best modeled as a weighted average of many IQ-related genes?
Here I’m just adopting an Occamistic approach – I don’t have high confidence – I’m just using a linear model because it’s the simplest possible function from genes to outcomes that are correlated with IQ. Feel free to suggest an alternative.
I agree that assortative mating would lead to a fat-tailed distribution, but your original goal was make IQ correlate with “things you care about” and for that purpose the fat tails are not particularly relevant.
Suppose, hypothetically, that human brains were such that IQ was capped at 145 by present day standards (e.g. because unbeknownst to us babies with IQ above that threshold died in childbirth for some reason having to do with IQ genes) . Then if we were to choose g(y) to get a normal distribution, it would look like the correlation between IQ and real world outcomes vanishes after 145, whereas the actual situation would be that the people who scored above 144 have essentially the same genetic composition (with respect to IQ) as the people who scored 144, so that “IQ doesn’t yield returns past 145” would be connotatively misleading.
I’m saying that defining IQ so that it’s normally distributed has a similar (though much smaller) connotatively distortionary effect similar to this one.
Suppose, hypothetically, that human brains were such that IQ was capped at 145 … it would look like the correlation between IQ and real world outcomes vanishes after 145
In your hypothetical there would be a lot of warning signs—for example all IQs above 145 would be random, that is, re-testing IQs above 145 would produce a random draw from the appropriate distribution tail.
And I suspect that it should be possible to figure out real-world distributions (the fatness of the tails, in particular), by looking at raw, non-normalized test scores.
It should be noted that if measured IQ is fat-tailed, this is because there is something wrong with IQ tests. IQ is defined to be normally distributed with a mean of 100 and a standard deviation of either 15 or 16 depending on which definition you’re using. So if measured IQ is fat-tailed, then the tests aren’t calibrated properly(of course, if your test goes all the way up to 160, it is almost inevitably miscalibrated, because there just aren’t enough people to calibrate it with).
You don’t want to force a normal distribution on the data. You’re free to do so if you’d like, e.g. by asking takers millions of questions so as to get very fine levels of granularity, and then mapping people at the 84th percentile of “questions answered correctly” to IQ 115, people at the 98th percentile to IQ 130, etc.
But what you really want is a situation where you have a (log)-linear relationship between standard deviations and other things that IQ correlates with, and if you force the data to obey a normal distribution, you’ll lose this.
The rationale for using a normal distribution is the central limit theorem, but that holds only when the summands are uncorrelated: assortative mating can induce correlations between e.g. having gene A that increases IQ and having gene B that increases IQ.
Could you expand on this point? I am not sure I follow it.
Say that you have a function f: rawScores ---> percentiles and you want to compose it with a function g: percentiles ---> IQ scores so that log(g(f(x))) is as correlated with things that you care about other than IQ as much as possible (income, the log odds ratio of winning a Fields medal, etc.).
The default choice for g would be the function that takes a percentile to the associated standard deviation under a normal distribution. I’m claiming that the best choice for g is probably instead a function that takes a percentile to the associated standard deviation under a distribution that has fatter tails than the normal distribution.
The intuition is:
Measures of the practical significance of IQ are plausibly best modeled as a weighted average of many individual genes that increase IQ. If people had been mating with randomly selected members of the opposite sex, the probabilities of getting two such genes would be independent. But in practice, people (weakly) tend to marry people of intelligence similar to their own (link), inducing a positive correlation between the respective probabilities of a child getting two different genes that contribute to IQ.
First question: do you actually care about correlation (given that it’s a linear metric) or do you mean some tight dependency, not necessarily linear?
Second question: if that is the case, don’t you want your function g to produce a distribution shaped similarly to the “thing you care about”? If that thing-you-care-about is distributed normally, you would prefer g to generate a normal(-looking) distribution, if it’s distributed, say, as chi-squared, you would prefer g to give you a chi-squared shape, etc...?
That’s an iffy approach. Take, say, income (as a measure of the practical significance of IQ) -- are you saying income is best modeled as a weighted average of many IQ-related genes? You need the concept (and the link) of IQ to identify these genes to start with, but then you want to throw IQ out and go straight from genes to “practical” outcomes.
I agree that assortative mating would lead to a fat-tailed distribution, but your original goal was make IQ correlate with “things you care about” and for that purpose the fat tails are not particularly relevant.
If g(y) is monotonic , then the degree to which there’s a right dependency is independent of g(y), which is just a change of coordinates. I do want to chose g(y) maximize the degree to which the dependency is a linear one.
Yes, this is true and a good point, though the distribution of “the thing we care about” will vary from thing to thing, and I think that if we have to used a fixed distribution for IQ that’s uniform over all of them, the log of a fat-tailed distribution is probably the best choice.
Here I’m just adopting an Occamistic approach – I don’t have high confidence – I’m just using a linear model because it’s the simplest possible function from genes to outcomes that are correlated with IQ. Feel free to suggest an alternative.
Suppose, hypothetically, that human brains were such that IQ was capped at 145 by present day standards (e.g. because unbeknownst to us babies with IQ above that threshold died in childbirth for some reason having to do with IQ genes) . Then if we were to choose g(y) to get a normal distribution, it would look like the correlation between IQ and real world outcomes vanishes after 145, whereas the actual situation would be that the people who scored above 144 have essentially the same genetic composition (with respect to IQ) as the people who scored 144, so that “IQ doesn’t yield returns past 145” would be connotatively misleading.
I’m saying that defining IQ so that it’s normally distributed has a similar (though much smaller) connotatively distortionary effect similar to this one.
In your hypothetical there would be a lot of warning signs—for example all IQs above 145 would be random, that is, re-testing IQs above 145 would produce a random draw from the appropriate distribution tail.
And I suspect that it should be possible to figure out real-world distributions (the fatness of the tails, in particular), by looking at raw, non-normalized test scores.
Yes, you and I are on the same page, I was just saying that IQ shouldn’t be defined to be normally distributed.