Sorry, is your claim that IQ does not follow a normal distribution in the general population?
It seems likely to me that this is actually the case, especially when you look at the tails, which is what he was discussing. The existence of things like Down’s syndrome means that the lower part of the tail certainly doesn’t look like you would expect from a solely additive model, and that might also be true at the upper end of the distribution.
(It’s also much more likely to be the case if you want to use some other measure of intelligence which is scaled to be linear in predictive ability for some task, rather than designed to be a normal distribution.)
This should be straightforwardly testable by standard statistics.
Given the empirical distribution of IQ scores and given the estimated measurement error (which depends on the score—scores in the tails are much less accurate) one should be able to come up with a probability that the empirical distribution was drawn from a particular normal.
Although I don’t know if I’d want to include cases with clear brain damage (e.g. Downs) into the population for this purpose.
This should be straightforwardly testable by standard statistics.
Agreed.
Given the empirical distribution of IQ scores
If you have a source for one of these, I would love to see it. I haven’t been able to find any, but I also haven’t put on my “I’m affiliated with a research university” hat and emailed people asking for their data, so it might be available.
estimated measurement error (which depends on the score—scores in the tails are much less accurate)
Agreed that this should be the case, but it’s not clear to me how to estimate measurement error besides test-retest variability, which can be corrupted by learning effects unless you wait a significant time between tests. I think Project Talent only tested its subjects once, but unless you have something of that size which tests people during adulthood several times you’re unlikely to get sufficient data to have a good estimate here.
This should be straightforwardly testable by standard statistics
Agreed.
That may require prohibitively large sample sizes, i.e. not be testable.
With regards to measuring g, and high IQs, you need to keep in mind regression towards the mean, which becomes fairly huge at the high range, even for fairly strongly correlated variables.
Other more subtle issue is that proxies generally fare even worse far from the mean than you’d expect from regression alone. I.e. if you use grip strength as a proxy for how quick someone runs a mile, that’ll obviously work great for your average person, but at the very high range—professional athletes—you could obtain negative correlation because athletes with super strong grip—weightlifters maybe? - aren’t very good runners, and very good runners do not have extreme grip strength. It’s not very surprising that folks like Chris Langan are at very best mediocre crackpots rather than super-Einsteins.
That may require prohibitively large sample sizes, i.e. not be testable.
At least for certain populations the sample sizes should be pretty large. Also a smaller-than-desired sample size doesn’t mean it’s not testable, all it means is that your confidence in the outcome will be lower.
proxies generally fare even worse far from the mean than you’d expect from regression alone
Yes, I agree. The tails are a problem in general, estimation in the tails gets very fuzzy very quickly.
Yes, I agree. The tails are a problem in general, estimation in the tails gets very fuzzy very quickly.
And it seems to me that having studied math complete with boring exercises could help with understanding of that somewhat… all too often you see people not even ballpark by just how much necessary application of regression towards the mean affects the rarity.
Now that I’ve started to think about it, the estimation of the measurement error might be a problem.
First we need to keep in mind the difference between precision and accuracy. Re-tests will only help with precision, obviously.
Moreover, given that we’re trying to measure g, it happens to be unobservable. That makes estimates of accuracy somewhat iffy. Maybe it will help if you define g “originally”, as the first principal component of a variety of IQ tests...
On the other hand, I think our measurement error estimates can afford to be guesstimates and as long as they are in the ballpark we shouldn’t have too many problems.
As to the empirical datasets, I don’t have time atm to go look for them, but didn’t US Army and such ran large studies at some point? Theoretically the results should be in public domain. We can also look at proxies (of the SAT/GRE/GMAT/LSAT/etc.) kind, but, of course, these are only imperfect proxies.
It seems likely to me that this is actually the case, especially when you look at the tails, which is what he was discussing. The existence of things like Down’s syndrome means that the lower part of the tail certainly doesn’t look like you would expect from a solely additive model, and that might also be true at the upper end of the distribution.
(It’s also much more likely to be the case if you want to use some other measure of intelligence which is scaled to be linear in predictive ability for some task, rather than designed to be a normal distribution.)
This should be straightforwardly testable by standard statistics.
Given the empirical distribution of IQ scores and given the estimated measurement error (which depends on the score—scores in the tails are much less accurate) one should be able to come up with a probability that the empirical distribution was drawn from a particular normal.
Although I don’t know if I’d want to include cases with clear brain damage (e.g. Downs) into the population for this purpose.
Agreed.
If you have a source for one of these, I would love to see it. I haven’t been able to find any, but I also haven’t put on my “I’m affiliated with a research university” hat and emailed people asking for their data, so it might be available.
Agreed that this should be the case, but it’s not clear to me how to estimate measurement error besides test-retest variability, which can be corrupted by learning effects unless you wait a significant time between tests. I think Project Talent only tested its subjects once, but unless you have something of that size which tests people during adulthood several times you’re unlikely to get sufficient data to have a good estimate here.
That may require prohibitively large sample sizes, i.e. not be testable.
With regards to measuring g, and high IQs, you need to keep in mind regression towards the mean, which becomes fairly huge at the high range, even for fairly strongly correlated variables.
Other more subtle issue is that proxies generally fare even worse far from the mean than you’d expect from regression alone. I.e. if you use grip strength as a proxy for how quick someone runs a mile, that’ll obviously work great for your average person, but at the very high range—professional athletes—you could obtain negative correlation because athletes with super strong grip—weightlifters maybe? - aren’t very good runners, and very good runners do not have extreme grip strength. It’s not very surprising that folks like Chris Langan are at very best mediocre crackpots rather than super-Einsteins.
At least for certain populations the sample sizes should be pretty large. Also a smaller-than-desired sample size doesn’t mean it’s not testable, all it means is that your confidence in the outcome will be lower.
Yes, I agree. The tails are a problem in general, estimation in the tails gets very fuzzy very quickly.
And it seems to me that having studied math complete with boring exercises could help with understanding of that somewhat… all too often you see people not even ballpark by just how much necessary application of regression towards the mean affects the rarity.
Now that I’ve started to think about it, the estimation of the measurement error might be a problem.
First we need to keep in mind the difference between precision and accuracy. Re-tests will only help with precision, obviously.
Moreover, given that we’re trying to measure g, it happens to be unobservable. That makes estimates of accuracy somewhat iffy. Maybe it will help if you define g “originally”, as the first principal component of a variety of IQ tests...
On the other hand, I think our measurement error estimates can afford to be guesstimates and as long as they are in the ballpark we shouldn’t have too many problems.
As to the empirical datasets, I don’t have time atm to go look for them, but didn’t US Army and such ran large studies at some point? Theoretically the results should be in public domain. We can also look at proxies (of the SAT/GRE/GMAT/LSAT/etc.) kind, but, of course, these are only imperfect proxies.