I’d start with an anecdote from the local practice here, with regards to learning math shallowly vs with an understanding from the grounds up:
It is fairly common to derive supposed ultra low prevalences of geniuses in populations with lower mean IQs.
For example, an IQ of 160 or more is 5 SDs from 85 , but 4SDs from the 100 , so the rarity is 1⁄3,483,046 vs 1⁄31,560 , for a huge ratio of 110 times the prevalence of genius in the population with the mean IQ of 100.
This is not how it works; the higher means are a result of decreased prevalence of negative contributors—iodine deficiency, perhaps some alleles, etc. For a very extreme example, suppose that you have a population which is like US baseline, but with 50% prevalence of iodine deficiency. The mean IQ could well be 85 , but the ratio at high IQs will still be 2 rather than increase exponentially with the deviation. Of course in practice, it won’t be as clear cut as this, the example is just to illustrate the point.
Figuring things like this out is not so much helped by knowing the concepts as by training and actual practice, and of course, by being trained to know where things like Gaussian distribution come from, not merely declaratively but procedurally as well. (I’m mostly speaking from the perspective of applied mathematics here).
I heard somewhere that IQ scores are normally distributed by definition, because they are calculated by projecting the measured rank onto the normal distribution with mean 100 and stddev 15. Can’t seem to find a reference on Wikipedia though, so maybe that’s not true.
IQ distributions are calibrated based on a reference sample, such that the reference sample has mean 100 and std 15 and follows a normal distribution. I believe the reference sample is generally British nationals or European Americans, so that interracial comparisons are sensible.
That doesn’t mean that the distribution of all test-takers follows a normal distribution with mean 100 and std 15.
Precisely. If you are looking at some third world nation, well, there’s all those kids who have various nutritional deficiencies, their IQs are impaired. The mean is lowered considerably, but that’s through introduction of extra variables into the (approximate) sum.
If you don’t take that into account and assume that only the mean in the distribution has changed, you get entirely invalid results at the high range due to how rapidly the normal distribution falls off far from the mean (as exponent of a square). For example if you were to calculate number of some rare geniuses out of the reference population (say, 300 millions with mean of 100 and standard deviation of 15), and from the world population assuming some lower mean and same standard deviation, for sufficiently rare “genius” you’ll get a smaller number of geniuses in the whole world than in that one reference population (which is ridiculous).
edit: which you can see by noting that this with c smaller than b grows as x grows (i.e. ratio of prevalences between two populations grows with distance from the mean).
The example I’d give here is India, where you have lots of mostly distinct ethnic groups, and so it’s reasonable to expect that the true distribution is a mixture of Gaussians. Knowing the Indian average national IQ would totally mislead you on the number of Parsis with IQs of 120 or above, if all you knew about Parsis was that they lived in India.
(It’s not clear to me that malnourishment leads to multiple modes, rather than just decreasing the mean while probably increasing the variance, because I think damage due to malnourishment is linear, and it’s probably the case that many different levels of severity of malnourishment are roughly equally well represented.)
In the limit, the mixture of Gaussians is a Gaussian.
It’s not clear to me that malnourishment leads to multiple modes, rather than just decreasing the mean while probably increasing the variance
Theoretically, malnourishment (given that only a part of the population suffers from it) should lead to a negatively skewed distribution. And yes, with a lower mean and higher variance.
In the limit, the mixture of Gaussians is a Gaussian.
Nope. The sum of Gaussian random variables is a Gaussian random variable, but a mixture Gaussian model is a very different thing. (In particular, mixture Gaussians are useful for modeling because their components are easy to deal with, but if you have infinite mixtures you can faithfully represent an arbitrary distribution.)
Theoretically, malnourishment (given that only a part of the population suffers from it) should lead to a negatively skewed distribution.
(It’s not clear to me that malnourishment leads to multiple modes, rather than just decreasing the mean while probably increasing the variance, because I think damage due to malnourishment is linear, and it’s probably the case that many different levels of severity of malnourishment are roughly equally well represented.)
Not everyone’s malnourished, though—a significant number of people are into diminishing returns, nutrition wise. It’s very nonlinear in the sense that as long as there’s adequate nutrition, it plate-outs—access to more nutrition does not improve anything.
Sorry, is your claim that IQ does not follow a normal distribution in the general population?
It seems likely to me that this is actually the case, especially when you look at the tails, which is what he was discussing. The existence of things like Down’s syndrome means that the lower part of the tail certainly doesn’t look like you would expect from a solely additive model, and that might also be true at the upper end of the distribution.
(It’s also much more likely to be the case if you want to use some other measure of intelligence which is scaled to be linear in predictive ability for some task, rather than designed to be a normal distribution.)
This should be straightforwardly testable by standard statistics.
Given the empirical distribution of IQ scores and given the estimated measurement error (which depends on the score—scores in the tails are much less accurate) one should be able to come up with a probability that the empirical distribution was drawn from a particular normal.
Although I don’t know if I’d want to include cases with clear brain damage (e.g. Downs) into the population for this purpose.
This should be straightforwardly testable by standard statistics.
Agreed.
Given the empirical distribution of IQ scores
If you have a source for one of these, I would love to see it. I haven’t been able to find any, but I also haven’t put on my “I’m affiliated with a research university” hat and emailed people asking for their data, so it might be available.
estimated measurement error (which depends on the score—scores in the tails are much less accurate)
Agreed that this should be the case, but it’s not clear to me how to estimate measurement error besides test-retest variability, which can be corrupted by learning effects unless you wait a significant time between tests. I think Project Talent only tested its subjects once, but unless you have something of that size which tests people during adulthood several times you’re unlikely to get sufficient data to have a good estimate here.
This should be straightforwardly testable by standard statistics
Agreed.
That may require prohibitively large sample sizes, i.e. not be testable.
With regards to measuring g, and high IQs, you need to keep in mind regression towards the mean, which becomes fairly huge at the high range, even for fairly strongly correlated variables.
Other more subtle issue is that proxies generally fare even worse far from the mean than you’d expect from regression alone. I.e. if you use grip strength as a proxy for how quick someone runs a mile, that’ll obviously work great for your average person, but at the very high range—professional athletes—you could obtain negative correlation because athletes with super strong grip—weightlifters maybe? - aren’t very good runners, and very good runners do not have extreme grip strength. It’s not very surprising that folks like Chris Langan are at very best mediocre crackpots rather than super-Einsteins.
That may require prohibitively large sample sizes, i.e. not be testable.
At least for certain populations the sample sizes should be pretty large. Also a smaller-than-desired sample size doesn’t mean it’s not testable, all it means is that your confidence in the outcome will be lower.
proxies generally fare even worse far from the mean than you’d expect from regression alone
Yes, I agree. The tails are a problem in general, estimation in the tails gets very fuzzy very quickly.
Yes, I agree. The tails are a problem in general, estimation in the tails gets very fuzzy very quickly.
And it seems to me that having studied math complete with boring exercises could help with understanding of that somewhat… all too often you see people not even ballpark by just how much necessary application of regression towards the mean affects the rarity.
Now that I’ve started to think about it, the estimation of the measurement error might be a problem.
First we need to keep in mind the difference between precision and accuracy. Re-tests will only help with precision, obviously.
Moreover, given that we’re trying to measure g, it happens to be unobservable. That makes estimates of accuracy somewhat iffy. Maybe it will help if you define g “originally”, as the first principal component of a variety of IQ tests...
On the other hand, I think our measurement error estimates can afford to be guesstimates and as long as they are in the ballpark we shouldn’t have too many problems.
As to the empirical datasets, I don’t have time atm to go look for them, but didn’t US Army and such ran large studies at some point? Theoretically the results should be in public domain. We can also look at proxies (of the SAT/GRE/GMAT/LSAT/etc.) kind, but, of course, these are only imperfect proxies.
In any population other than the one for which the test has been normed to follow a normal distribution with mean of 100 and standard deviation of 15, yes, results need not be normally distributed or to have a standard deviation of 15.
When discussing a population with a mean IQ other than 100, it is automatically implied that it is not the population that the test has been normed for.
So, one of the known things is that standard deviation varies by race. For example, both the African American mean and variance are lower than the European American mean and variance.
To the best of my knowledge, few people have actually applied goodness of fit tests to IQ score distributions to check normality.
So, one of the known things is that standard deviation varies by race. For example, both the African American
mean and variance are lower than the European American mean and variance.
Hm. When I read the great-grandparent earlier, I got the impression it would be helpful to corroborate this claim in the great-great-grandparent:
In any population other than the one for which the test has been normed to follow a normal distribution with mean of 100 and standard deviation of 15, yes, results need not be normally distributed or to have a standard deviation of 15.
Rereading the great-grandparent now, it’s not clear to me why I got that impression. (I may have been thinking that the “general population,” as it contains distinct subpopulations, will be at best a mixture Gaussian rather than a Gaussian.)
I do agree that private_messaging’s claim- that the ratio we see at the tails doesn’t seem to follow what would be predicted by the normal distribution- hinges on the right tail being fatter than what the normal distribution predicts. (The mixture Gaussian claim is irrelevant if you’ve split the general population up into subpopulations that are normally distributed, unless the low IQ group contains subpopulations, so it isn’t normally distributed. There’s some reason to believe this is true for African Americans, for example, if you don’t separate out people by ancestry and recency of immigration.)
The data is sparse enough that I would not be surprised if this were the case, but I don’t think anyone’s directly investigated it, and a few of the investigations that hinge on the thickness of the tails (like Sex Differences in Mathematical Aptitude, which predicts female representation in elite math institutions by looking at the mean and variance of math SAT scores of large populations) seem to have worked well, which is evidence for normality.
I don’t think any existing measure could be Gaussian with any sort of accuracy at tail ends, because there you need too large sample size to norm the test & generally, the approximate Gaussian you get due to many random additive factors deviates by huge factors from Gaussian at the tail ends. Bulk of norming of a test comes from average people.
Ditto for correlations between IQ and anything. Bulk of reported correlation comes from near the mean.
I’d start with an anecdote from the local practice here, with regards to learning math shallowly vs with an understanding from the grounds up:
It is fairly common to derive supposed ultra low prevalences of geniuses in populations with lower mean IQs.
For example, an IQ of 160 or more is 5 SDs from 85 , but 4SDs from the 100 , so the rarity is 1⁄3,483,046 vs 1⁄31,560 , for a huge ratio of 110 times the prevalence of genius in the population with the mean IQ of 100.
This is not how it works; the higher means are a result of decreased prevalence of negative contributors—iodine deficiency, perhaps some alleles, etc. For a very extreme example, suppose that you have a population which is like US baseline, but with 50% prevalence of iodine deficiency. The mean IQ could well be 85 , but the ratio at high IQs will still be 2 rather than increase exponentially with the deviation. Of course in practice, it won’t be as clear cut as this, the example is just to illustrate the point.
Figuring things like this out is not so much helped by knowing the concepts as by training and actual practice, and of course, by being trained to know where things like Gaussian distribution come from, not merely declaratively but procedurally as well. (I’m mostly speaking from the perspective of applied mathematics here).
I heard somewhere that IQ scores are normally distributed by definition, because they are calculated by projecting the measured rank onto the normal distribution with mean 100 and stddev 15. Can’t seem to find a reference on Wikipedia though, so maybe that’s not true.
IQ distributions are calibrated based on a reference sample, such that the reference sample has mean 100 and std 15 and follows a normal distribution. I believe the reference sample is generally British nationals or European Americans, so that interracial comparisons are sensible.
That doesn’t mean that the distribution of all test-takers follows a normal distribution with mean 100 and std 15.
Precisely. If you are looking at some third world nation, well, there’s all those kids who have various nutritional deficiencies, their IQs are impaired. The mean is lowered considerably, but that’s through introduction of extra variables into the (approximate) sum.
If you don’t take that into account and assume that only the mean in the distribution has changed, you get entirely invalid results at the high range due to how rapidly the normal distribution falls off far from the mean (as exponent of a square). For example if you were to calculate number of some rare geniuses out of the reference population (say, 300 millions with mean of 100 and standard deviation of 15), and from the world population assuming some lower mean and same standard deviation, for sufficiently rare “genius” you’ll get a smaller number of geniuses in the whole world than in that one reference population (which is ridiculous).
edit: which you can see by noting that this with c smaller than b grows as x grows (i.e. ratio of prevalences between two populations grows with distance from the mean).
The example I’d give here is India, where you have lots of mostly distinct ethnic groups, and so it’s reasonable to expect that the true distribution is a mixture of Gaussians. Knowing the Indian average national IQ would totally mislead you on the number of Parsis with IQs of 120 or above, if all you knew about Parsis was that they lived in India.
(It’s not clear to me that malnourishment leads to multiple modes, rather than just decreasing the mean while probably increasing the variance, because I think damage due to malnourishment is linear, and it’s probably the case that many different levels of severity of malnourishment are roughly equally well represented.)
In the limit, the mixture of Gaussians is a Gaussian.
Theoretically, malnourishment (given that only a part of the population suffers from it) should lead to a negatively skewed distribution. And yes, with a lower mean and higher variance.
Nope. The sum of Gaussian random variables is a Gaussian random variable, but a mixture Gaussian model is a very different thing. (In particular, mixture Gaussians are useful for modeling because their components are easy to deal with, but if you have infinite mixtures you can faithfully represent an arbitrary distribution.)
Yep, I should have mentioned that also.
Yes, you are correct, I got confused between a sum and a mixture.
Not everyone’s malnourished, though—a significant number of people are into diminishing returns, nutrition wise. It’s very nonlinear in the sense that as long as there’s adequate nutrition, it plate-outs—access to more nutrition does not improve anything.
Sorry, is your claim that IQ does not follow a normal distribution in the general population?
It seems likely to me that this is actually the case, especially when you look at the tails, which is what he was discussing. The existence of things like Down’s syndrome means that the lower part of the tail certainly doesn’t look like you would expect from a solely additive model, and that might also be true at the upper end of the distribution.
(It’s also much more likely to be the case if you want to use some other measure of intelligence which is scaled to be linear in predictive ability for some task, rather than designed to be a normal distribution.)
This should be straightforwardly testable by standard statistics.
Given the empirical distribution of IQ scores and given the estimated measurement error (which depends on the score—scores in the tails are much less accurate) one should be able to come up with a probability that the empirical distribution was drawn from a particular normal.
Although I don’t know if I’d want to include cases with clear brain damage (e.g. Downs) into the population for this purpose.
Agreed.
If you have a source for one of these, I would love to see it. I haven’t been able to find any, but I also haven’t put on my “I’m affiliated with a research university” hat and emailed people asking for their data, so it might be available.
Agreed that this should be the case, but it’s not clear to me how to estimate measurement error besides test-retest variability, which can be corrupted by learning effects unless you wait a significant time between tests. I think Project Talent only tested its subjects once, but unless you have something of that size which tests people during adulthood several times you’re unlikely to get sufficient data to have a good estimate here.
That may require prohibitively large sample sizes, i.e. not be testable.
With regards to measuring g, and high IQs, you need to keep in mind regression towards the mean, which becomes fairly huge at the high range, even for fairly strongly correlated variables.
Other more subtle issue is that proxies generally fare even worse far from the mean than you’d expect from regression alone. I.e. if you use grip strength as a proxy for how quick someone runs a mile, that’ll obviously work great for your average person, but at the very high range—professional athletes—you could obtain negative correlation because athletes with super strong grip—weightlifters maybe? - aren’t very good runners, and very good runners do not have extreme grip strength. It’s not very surprising that folks like Chris Langan are at very best mediocre crackpots rather than super-Einsteins.
At least for certain populations the sample sizes should be pretty large. Also a smaller-than-desired sample size doesn’t mean it’s not testable, all it means is that your confidence in the outcome will be lower.
Yes, I agree. The tails are a problem in general, estimation in the tails gets very fuzzy very quickly.
And it seems to me that having studied math complete with boring exercises could help with understanding of that somewhat… all too often you see people not even ballpark by just how much necessary application of regression towards the mean affects the rarity.
Now that I’ve started to think about it, the estimation of the measurement error might be a problem.
First we need to keep in mind the difference between precision and accuracy. Re-tests will only help with precision, obviously.
Moreover, given that we’re trying to measure g, it happens to be unobservable. That makes estimates of accuracy somewhat iffy. Maybe it will help if you define g “originally”, as the first principal component of a variety of IQ tests...
On the other hand, I think our measurement error estimates can afford to be guesstimates and as long as they are in the ballpark we shouldn’t have too many problems.
As to the empirical datasets, I don’t have time atm to go look for them, but didn’t US Army and such ran large studies at some point? Theoretically the results should be in public domain. We can also look at proxies (of the SAT/GRE/GMAT/LSAT/etc.) kind, but, of course, these are only imperfect proxies.
In any population other than the one for which the test has been normed to follow a normal distribution with mean of 100 and standard deviation of 15, yes, results need not be normally distributed or to have a standard deviation of 15.
When discussing a population with a mean IQ other than 100, it is automatically implied that it is not the population that the test has been normed for.
Do you have any psychometric lit. pointers on cases where e.g. normal goodness of fit tests fail? Is this just standard knowledge in the field?
So, one of the known things is that standard deviation varies by race. For example, both the African American mean and variance are lower than the European American mean and variance.
To the best of my knowledge, few people have actually applied goodness of fit tests to IQ score distributions to check normality.
I don’t understand why this is relevant.
Hm. When I read the great-grandparent earlier, I got the impression it would be helpful to corroborate this claim in the great-great-grandparent:
Rereading the great-grandparent now, it’s not clear to me why I got that impression. (I may have been thinking that the “general population,” as it contains distinct subpopulations, will be at best a mixture Gaussian rather than a Gaussian.)
I do agree that private_messaging’s claim- that the ratio we see at the tails doesn’t seem to follow what would be predicted by the normal distribution- hinges on the right tail being fatter than what the normal distribution predicts. (The mixture Gaussian claim is irrelevant if you’ve split the general population up into subpopulations that are normally distributed, unless the low IQ group contains subpopulations, so it isn’t normally distributed. There’s some reason to believe this is true for African Americans, for example, if you don’t separate out people by ancestry and recency of immigration.)
The data is sparse enough that I would not be surprised if this were the case, but I don’t think anyone’s directly investigated it, and a few of the investigations that hinge on the thickness of the tails (like Sex Differences in Mathematical Aptitude, which predicts female representation in elite math institutions by looking at the mean and variance of math SAT scores of large populations) seem to have worked well, which is evidence for normality.
Incidentally, is there even any empirical evidence that intelligence is normally distributed in any concrete sense?
I don’t think any existing measure could be Gaussian with any sort of accuracy at tail ends, because there you need too large sample size to norm the test & generally, the approximate Gaussian you get due to many random additive factors deviates by huge factors from Gaussian at the tail ends. Bulk of norming of a test comes from average people.
Ditto for correlations between IQ and anything. Bulk of reported correlation comes from near the mean.