Sorry, is your claim that IQ does not follow a normal distribution in the general population?
It seems likely to me that this is actually the case, especially when you look at the tails, which is what he was discussing. The existence of things like Down’s syndrome means that the lower part of the tail certainly doesn’t look like you would expect from a solely additive model, and that might also be true at the upper end of the distribution.
(It’s also much more likely to be the case if you want to use some other measure of intelligence which is scaled to be linear in predictive ability for some task, rather than designed to be a normal distribution.)
This should be straightforwardly testable by standard statistics.
Given the empirical distribution of IQ scores and given the estimated measurement error (which depends on the score—scores in the tails are much less accurate) one should be able to come up with a probability that the empirical distribution was drawn from a particular normal.
Although I don’t know if I’d want to include cases with clear brain damage (e.g. Downs) into the population for this purpose.
This should be straightforwardly testable by standard statistics.
Agreed.
Given the empirical distribution of IQ scores
If you have a source for one of these, I would love to see it. I haven’t been able to find any, but I also haven’t put on my “I’m affiliated with a research university” hat and emailed people asking for their data, so it might be available.
estimated measurement error (which depends on the score—scores in the tails are much less accurate)
Agreed that this should be the case, but it’s not clear to me how to estimate measurement error besides test-retest variability, which can be corrupted by learning effects unless you wait a significant time between tests. I think Project Talent only tested its subjects once, but unless you have something of that size which tests people during adulthood several times you’re unlikely to get sufficient data to have a good estimate here.
This should be straightforwardly testable by standard statistics
Agreed.
That may require prohibitively large sample sizes, i.e. not be testable.
With regards to measuring g, and high IQs, you need to keep in mind regression towards the mean, which becomes fairly huge at the high range, even for fairly strongly correlated variables.
Other more subtle issue is that proxies generally fare even worse far from the mean than you’d expect from regression alone. I.e. if you use grip strength as a proxy for how quick someone runs a mile, that’ll obviously work great for your average person, but at the very high range—professional athletes—you could obtain negative correlation because athletes with super strong grip—weightlifters maybe? - aren’t very good runners, and very good runners do not have extreme grip strength. It’s not very surprising that folks like Chris Langan are at very best mediocre crackpots rather than super-Einsteins.
That may require prohibitively large sample sizes, i.e. not be testable.
At least for certain populations the sample sizes should be pretty large. Also a smaller-than-desired sample size doesn’t mean it’s not testable, all it means is that your confidence in the outcome will be lower.
proxies generally fare even worse far from the mean than you’d expect from regression alone
Yes, I agree. The tails are a problem in general, estimation in the tails gets very fuzzy very quickly.
Yes, I agree. The tails are a problem in general, estimation in the tails gets very fuzzy very quickly.
And it seems to me that having studied math complete with boring exercises could help with understanding of that somewhat… all too often you see people not even ballpark by just how much necessary application of regression towards the mean affects the rarity.
Now that I’ve started to think about it, the estimation of the measurement error might be a problem.
First we need to keep in mind the difference between precision and accuracy. Re-tests will only help with precision, obviously.
Moreover, given that we’re trying to measure g, it happens to be unobservable. That makes estimates of accuracy somewhat iffy. Maybe it will help if you define g “originally”, as the first principal component of a variety of IQ tests...
On the other hand, I think our measurement error estimates can afford to be guesstimates and as long as they are in the ballpark we shouldn’t have too many problems.
As to the empirical datasets, I don’t have time atm to go look for them, but didn’t US Army and such ran large studies at some point? Theoretically the results should be in public domain. We can also look at proxies (of the SAT/GRE/GMAT/LSAT/etc.) kind, but, of course, these are only imperfect proxies.
In any population other than the one for which the test has been normed to follow a normal distribution with mean of 100 and standard deviation of 15, yes, results need not be normally distributed or to have a standard deviation of 15.
When discussing a population with a mean IQ other than 100, it is automatically implied that it is not the population that the test has been normed for.
So, one of the known things is that standard deviation varies by race. For example, both the African American mean and variance are lower than the European American mean and variance.
To the best of my knowledge, few people have actually applied goodness of fit tests to IQ score distributions to check normality.
So, one of the known things is that standard deviation varies by race. For example, both the African American
mean and variance are lower than the European American mean and variance.
Hm. When I read the great-grandparent earlier, I got the impression it would be helpful to corroborate this claim in the great-great-grandparent:
In any population other than the one for which the test has been normed to follow a normal distribution with mean of 100 and standard deviation of 15, yes, results need not be normally distributed or to have a standard deviation of 15.
Rereading the great-grandparent now, it’s not clear to me why I got that impression. (I may have been thinking that the “general population,” as it contains distinct subpopulations, will be at best a mixture Gaussian rather than a Gaussian.)
I do agree that private_messaging’s claim- that the ratio we see at the tails doesn’t seem to follow what would be predicted by the normal distribution- hinges on the right tail being fatter than what the normal distribution predicts. (The mixture Gaussian claim is irrelevant if you’ve split the general population up into subpopulations that are normally distributed, unless the low IQ group contains subpopulations, so it isn’t normally distributed. There’s some reason to believe this is true for African Americans, for example, if you don’t separate out people by ancestry and recency of immigration.)
The data is sparse enough that I would not be surprised if this were the case, but I don’t think anyone’s directly investigated it, and a few of the investigations that hinge on the thickness of the tails (like Sex Differences in Mathematical Aptitude, which predicts female representation in elite math institutions by looking at the mean and variance of math SAT scores of large populations) seem to have worked well, which is evidence for normality.
I don’t think any existing measure could be Gaussian with any sort of accuracy at tail ends, because there you need too large sample size to norm the test & generally, the approximate Gaussian you get due to many random additive factors deviates by huge factors from Gaussian at the tail ends. Bulk of norming of a test comes from average people.
Ditto for correlations between IQ and anything. Bulk of reported correlation comes from near the mean.
Sorry, is your claim that IQ does not follow a normal distribution in the general population?
It seems likely to me that this is actually the case, especially when you look at the tails, which is what he was discussing. The existence of things like Down’s syndrome means that the lower part of the tail certainly doesn’t look like you would expect from a solely additive model, and that might also be true at the upper end of the distribution.
(It’s also much more likely to be the case if you want to use some other measure of intelligence which is scaled to be linear in predictive ability for some task, rather than designed to be a normal distribution.)
This should be straightforwardly testable by standard statistics.
Given the empirical distribution of IQ scores and given the estimated measurement error (which depends on the score—scores in the tails are much less accurate) one should be able to come up with a probability that the empirical distribution was drawn from a particular normal.
Although I don’t know if I’d want to include cases with clear brain damage (e.g. Downs) into the population for this purpose.
Agreed.
If you have a source for one of these, I would love to see it. I haven’t been able to find any, but I also haven’t put on my “I’m affiliated with a research university” hat and emailed people asking for their data, so it might be available.
Agreed that this should be the case, but it’s not clear to me how to estimate measurement error besides test-retest variability, which can be corrupted by learning effects unless you wait a significant time between tests. I think Project Talent only tested its subjects once, but unless you have something of that size which tests people during adulthood several times you’re unlikely to get sufficient data to have a good estimate here.
That may require prohibitively large sample sizes, i.e. not be testable.
With regards to measuring g, and high IQs, you need to keep in mind regression towards the mean, which becomes fairly huge at the high range, even for fairly strongly correlated variables.
Other more subtle issue is that proxies generally fare even worse far from the mean than you’d expect from regression alone. I.e. if you use grip strength as a proxy for how quick someone runs a mile, that’ll obviously work great for your average person, but at the very high range—professional athletes—you could obtain negative correlation because athletes with super strong grip—weightlifters maybe? - aren’t very good runners, and very good runners do not have extreme grip strength. It’s not very surprising that folks like Chris Langan are at very best mediocre crackpots rather than super-Einsteins.
At least for certain populations the sample sizes should be pretty large. Also a smaller-than-desired sample size doesn’t mean it’s not testable, all it means is that your confidence in the outcome will be lower.
Yes, I agree. The tails are a problem in general, estimation in the tails gets very fuzzy very quickly.
And it seems to me that having studied math complete with boring exercises could help with understanding of that somewhat… all too often you see people not even ballpark by just how much necessary application of regression towards the mean affects the rarity.
Now that I’ve started to think about it, the estimation of the measurement error might be a problem.
First we need to keep in mind the difference between precision and accuracy. Re-tests will only help with precision, obviously.
Moreover, given that we’re trying to measure g, it happens to be unobservable. That makes estimates of accuracy somewhat iffy. Maybe it will help if you define g “originally”, as the first principal component of a variety of IQ tests...
On the other hand, I think our measurement error estimates can afford to be guesstimates and as long as they are in the ballpark we shouldn’t have too many problems.
As to the empirical datasets, I don’t have time atm to go look for them, but didn’t US Army and such ran large studies at some point? Theoretically the results should be in public domain. We can also look at proxies (of the SAT/GRE/GMAT/LSAT/etc.) kind, but, of course, these are only imperfect proxies.
In any population other than the one for which the test has been normed to follow a normal distribution with mean of 100 and standard deviation of 15, yes, results need not be normally distributed or to have a standard deviation of 15.
When discussing a population with a mean IQ other than 100, it is automatically implied that it is not the population that the test has been normed for.
Do you have any psychometric lit. pointers on cases where e.g. normal goodness of fit tests fail? Is this just standard knowledge in the field?
So, one of the known things is that standard deviation varies by race. For example, both the African American mean and variance are lower than the European American mean and variance.
To the best of my knowledge, few people have actually applied goodness of fit tests to IQ score distributions to check normality.
I don’t understand why this is relevant.
Hm. When I read the great-grandparent earlier, I got the impression it would be helpful to corroborate this claim in the great-great-grandparent:
Rereading the great-grandparent now, it’s not clear to me why I got that impression. (I may have been thinking that the “general population,” as it contains distinct subpopulations, will be at best a mixture Gaussian rather than a Gaussian.)
I do agree that private_messaging’s claim- that the ratio we see at the tails doesn’t seem to follow what would be predicted by the normal distribution- hinges on the right tail being fatter than what the normal distribution predicts. (The mixture Gaussian claim is irrelevant if you’ve split the general population up into subpopulations that are normally distributed, unless the low IQ group contains subpopulations, so it isn’t normally distributed. There’s some reason to believe this is true for African Americans, for example, if you don’t separate out people by ancestry and recency of immigration.)
The data is sparse enough that I would not be surprised if this were the case, but I don’t think anyone’s directly investigated it, and a few of the investigations that hinge on the thickness of the tails (like Sex Differences in Mathematical Aptitude, which predicts female representation in elite math institutions by looking at the mean and variance of math SAT scores of large populations) seem to have worked well, which is evidence for normality.
Incidentally, is there even any empirical evidence that intelligence is normally distributed in any concrete sense?
I don’t think any existing measure could be Gaussian with any sort of accuracy at tail ends, because there you need too large sample size to norm the test & generally, the approximate Gaussian you get due to many random additive factors deviates by huge factors from Gaussian at the tail ends. Bulk of norming of a test comes from average people.
Ditto for correlations between IQ and anything. Bulk of reported correlation comes from near the mean.