What if the people who have taken IQ tests are on average smarter than the people who haven’t? My impression is that people mostly take IQ tests when they’re somewhat extreme: either low and trying to qualify for assistive services or high and trying to get “gifted” treatment. If we figure lesswrong draws mostly from the high end, then we should expect the IQ among test-takers to be higher than what we would get if we tested random people who had not previously been tested.
This sounds plausible, but from looking at the data, I don’t think this is happening in our sample. In particular, if this were the case, then we would expect the SAT scores of those who did not submit IQ data to be different from those who did submit IQ data. I ran an Anderson–Darling test on each of the following pairs of distributions:
SAT out of 2400 for those who submitted IQ data (n = 89) vs SAT out of 2400 for those who did not submit IQ data (n = 230)
SAT out of 1600 for those who submitted IQ data (n = 155) vs SAT out of 1600 for those who did not submit IQ data (n = 217)
The p-values came out as 0.477 and 0.436 respectively, which means that the Anderson–Darling test was unable to distinguish between the two distributions in each pair at any significance.
As I did for my last plot, I’ve once again computed for each distribution a kernel density estimate with bootstrapped confidence bands from 999 resamples. From visual inspection, I tend to agree that there is no clear difference between the distributions. The plots should be self-explanatory:
(More details about these plots are available in my previous comment.)
Edit: Updated plots. The kernel density estimates are now fixed-bandwidth using the Sheather–Jones method for bandwidth selection. The density near the right edge is bias-corrected using an ad hoc fix described by whuber on stats.SE.
This sounds plausible, but from looking at the data, I don’t think this is happening in our sample. In particular, if this were the case, then we would expect the SAT scores of those who did not submit IQ data to be different from those who did submit IQ data. I ran an Anderson–Darling test on each of the following pairs of distributions:
SAT out of 2400 for those who submitted IQ data (n = 89) vs SAT out of 2400 for those who did not submit IQ data (n = 230)
SAT out of 1600 for those who submitted IQ data (n = 155) vs SAT out of 1600 for those who did not submit IQ data (n = 217)
The p-values came out as 0.477 and 0.436 respectively, which means that the Anderson–Darling test was unable to distinguish between the two distributions in each pair at any significance.
As I did for my last plot, I’ve once again computed for each distribution a kernel density estimate with bootstrapped confidence bands from 999 resamples. From visual inspection, I tend to agree that there is no clear difference between the distributions. The plots should be self-explanatory:
(More details about these plots are available in my previous comment.)
Edit: Updated plots. The kernel density estimates are now fixed-bandwidth using the Sheather–Jones method for bandwidth selection. The density near the right edge is bias-corrected using an ad hoc fix described by whuber on stats.SE.
Thanks for digging into this! Looks like the selection bias isn’t significant.