IQ tests are normalized (so they have a median of 100 and standard deviation of 15, but they are not forced to be normally distributed), so I think the distributional properties can be evidence for something.
I think you are mistaken and they simply are forced to be bell curves.
But even if IQ is an affine transformation of the number of questions answered correctly, the simple act of adding up the questions is likely to produce a bell curve, so its appearance is not much evidence.
I confirm that IQ tests are forced to be bell curves; at least those using the methodology I learned at university.
Calibrating the test (giving it to many people) returns information like: “50% of test subjects can solve 23 problems of these 50” and “98% of test subjects can solve 41 problems of these 50″.
Then the next step is to put these data in the bell curve, saying: “therefore 23⁄50 means 0 sigma = 100 IQ” and “therefore 41⁄50 means 2 sigma = 130 IQ”.
But you can’t assume that this is linear. To explain it simply, let’s assume that the more intelligent person always solves a superset of the problems the less intelligent person solved. Therefore, any person with IQ between 100 and 130 would solve all the 23 “easy” problems, some of the 18 “hard” problems, and none of the 9 “impossible” problems. But how many exactly—that depends on how difficult exactly those “hard” problems are. Maybe they are relatively easy, and a person with IQ 115 will solve all of them; and maybe they are relatively hard, and a person with IQ 115 will solve none of them. But that is a fact about the test, not about the intelligence distribution of the population. Therefore this fact should be removed in the normalization.
Then the next step is to put these data in the bell curve, saying: “therefore 23⁄50 means 0 sigma = 100 IQ” and “therefore 41⁄50 means 2 sigma = 130 IQ”.
This is NOT forcing the outcome to be a bell curve. This is just normalizing to a given mean and standard deviation, a linear operation that does not change the shape of the distribution.
Consider a hypothetical case where an IQ test consists of 100 questions and 100 people take it. These hundred people all get a different number of questions correct—from 1 to 100: the distribution of the number of correct answers is flat or uniform over [1 .. 100]. Now you normalize the mean to 100 and one standard deviation to 15 -- and yet the distribution remains flat and does not magically become a bell curve.
These hundred people all get a different number of questions correct—from 1 to 100: the distribution of the number of correct answers is flat or uniform over [1 .. 100].
This is a fact about the test.
Now you normalize the mean to 100 and one standard deviation to 15 -- and yet the distribution remains flat and does not magically become a bell curve.
Maybe it was wrong for me to use the word “normalization” in this context, but no, the distribution of raw scores is not mapped linearly to the distribution of IQs. It is mapped onto the bell curve.
Otherwise every intelligence test would produce a different intelligence curve, because inventing 100 questions such that they get the same distribution of raw scores as some other set of 100 questions, that would be an impossible task. (Just try to imagine how you would try to obtain the set of 100 questions for which the distribution of raw scores is linear. Keep in mind that every testing on many real subjects costs you a lot of money, and on a few subjects you won’t get statistical significance.)
the distribution of raw scores is not mapped linearly to the distribution of IQs. It is mapped onto the bell curve.
Could you provide links showing this to be the case?
because inventing 100 questions such that they get the same distribution of raw scores as some other set of 100 questions, that would be an impossible task.
Not exactly Gaussian—that’s even theoretically impossible because a Gaussian has infinitely long tails—but approximately Gaussian. Bell-shaped, in other words.
An IQ test in which the scores are only normalized linearly is a worse approximation to a Gaussian distribution than one which is intentionally designed to give Gaussianly distributed scores.
Therefore this fact should be removed in the normalization.
Perhaps, but it doesn’t follow that the new normalization should be Gaussian. One test I’d like to see is what happens when you give a test calibrated for one population to a different one.
If the test is normalized for a population A, then if we give it to a population B, the results don’t have to be Gaussian. The normalization occurs only once, when the relationship between the raw scores and the IQ values is defined. Later the existing definition can be reused.
You would get somewhat different shape when you a) calibrate the test for population A and then measure population B, or b) calibrate the test for A+B and then measure population B.
Probably the most correct way to compare two populations would be to skip the normalization step and just compare the histograms of raw scores for both populations. (I am not good enough in math to say how exactly.)
Also, I am not sure how much such comparison would depend on the specific test. Let’s imagine that we have one population with average IQ 100 and other population with average IQ 120. If we give them a test consisting of IQ-110-hard questions, the two populations will probably seem more different than if we give them a test consisting of a mix of IQ-80-hard and IQ-140-hard questions.
Also, I am not sure how much such comparison would depend on the specific test. Let’s imagine that we have one population with average IQ 100 and other population with average IQ 120. If we give them a test consisting of IQ-110-hard questions, the two populations will probably seem more different than if we give them a test consisting of a mix of IQ-80-hard and IQ-140-hard questions.
You can compare by looking at which percentile of population B, the median of population A corresponds to.
Edit: also once you’ve compared several populations this way, you can try to see if there is a way to normalize the test such that the distributions for all the populations have similar shapes.
IQ tests are normalized (so they have a median of 100 and standard deviation of 15, but they are not forced to be normally distributed), so I think the distributional properties can be evidence for something.
I think you are mistaken and they simply are forced to be bell curves.
But even if IQ is an affine transformation of the number of questions answered correctly, the simple act of adding up the questions is likely to produce a bell curve, so its appearance is not much evidence.
I confirm that IQ tests are forced to be bell curves; at least those using the methodology I learned at university.
Calibrating the test (giving it to many people) returns information like: “50% of test subjects can solve 23 problems of these 50” and “98% of test subjects can solve 41 problems of these 50″.
Then the next step is to put these data in the bell curve, saying: “therefore 23⁄50 means 0 sigma = 100 IQ” and “therefore 41⁄50 means 2 sigma = 130 IQ”.
But you can’t assume that this is linear. To explain it simply, let’s assume that the more intelligent person always solves a superset of the problems the less intelligent person solved. Therefore, any person with IQ between 100 and 130 would solve all the 23 “easy” problems, some of the 18 “hard” problems, and none of the 9 “impossible” problems. But how many exactly—that depends on how difficult exactly those “hard” problems are. Maybe they are relatively easy, and a person with IQ 115 will solve all of them; and maybe they are relatively hard, and a person with IQ 115 will solve none of them. But that is a fact about the test, not about the intelligence distribution of the population. Therefore this fact should be removed in the normalization.
This is NOT forcing the outcome to be a bell curve. This is just normalizing to a given mean and standard deviation, a linear operation that does not change the shape of the distribution.
Consider a hypothetical case where an IQ test consists of 100 questions and 100 people take it. These hundred people all get a different number of questions correct—from 1 to 100: the distribution of the number of correct answers is flat or uniform over [1 .. 100]. Now you normalize the mean to 100 and one standard deviation to 15 -- and yet the distribution remains flat and does not magically become a bell curve.
This is a fact about the test.
Maybe it was wrong for me to use the word “normalization” in this context, but no, the distribution of raw scores is not mapped linearly to the distribution of IQs. It is mapped onto the bell curve.
Otherwise every intelligence test would produce a different intelligence curve, because inventing 100 questions such that they get the same distribution of raw scores as some other set of 100 questions, that would be an impossible task. (Just try to imagine how you would try to obtain the set of 100 questions for which the distribution of raw scores is linear. Keep in mind that every testing on many real subjects costs you a lot of money, and on a few subjects you won’t get statistical significance.)
Could you provide links showing this to be the case?
There is a helpful theorem.
It assumes that all the variables you’re summing are independent.
Weaker forms of CLT hold up even if you relax the independence assumption. See Wikipedia for details.
As a practical matter, in IQ testing even with only linear normalization of raw scores you will get something approximately Gaussian.
I wouldn’t count on that more than about one standard deviation away from the mean.
Not exactly Gaussian—that’s even theoretically impossible because a Gaussian has infinitely long tails—but approximately Gaussian. Bell-shaped, in other words.
Fallacy of grey. Certain approximations are worse than others.
So in this particular example, which approximation is worse than which other approximation and by which metric?
An IQ test in which the scores are only normalized linearly is a worse approximation to a Gaussian distribution than one which is intentionally designed to give Gaussianly distributed scores.
Well, duh, but I don’t see the point.
Perhaps, but it doesn’t follow that the new normalization should be Gaussian. One test I’d like to see is what happens when you give a test calibrated for one population to a different one.
If the test is normalized for a population A, then if we give it to a population B, the results don’t have to be Gaussian. The normalization occurs only once, when the relationship between the raw scores and the IQ values is defined. Later the existing definition can be reused.
You would get somewhat different shape when you a) calibrate the test for population A and then measure population B, or b) calibrate the test for A+B and then measure population B.
Probably the most correct way to compare two populations would be to skip the normalization step and just compare the histograms of raw scores for both populations. (I am not good enough in math to say how exactly.)
Also, I am not sure how much such comparison would depend on the specific test. Let’s imagine that we have one population with average IQ 100 and other population with average IQ 120. If we give them a test consisting of IQ-110-hard questions, the two populations will probably seem more different than if we give them a test consisting of a mix of IQ-80-hard and IQ-140-hard questions.
This backs my general notion that for a lot of measurements (especially of people?), we need graphs, not single numbers.
You can compare by looking at which percentile of population B, the median of population A corresponds to.
Edit: also once you’ve compared several populations this way, you can try to see if there is a way to normalize the test such that the distributions for all the populations have similar shapes.