Hmm. This interpretation was the impression that I recall getting from reading Jensen’s The g Factor, though it’s possible that I misremember. Though it’s possible that he was arguing that IQ tests should be aiming to measure g, even if they don’t necessarily always do, and held the most g-loaded ones as the gold standard.
I think it’s important to realize that what g is, shifts when you change what subtests your IQ test consists of, and how much “weight” you give to each different result. And as such it isn’t itself something that you can easily optimize for.
Like, you always have to define g-loadings with respect to a test battery over which you measure g. And while the correlations between different test-batteries’ g’s are themselves highly correlated, they are not perfectly correlated, and those correlations do come apart as you optimize for it.
Like, an IQ test with a single task, will obviously find a single g-factor that explains all the variance in the test results.
As such, we need to define a grounding for IQ tests that is about external validity and predictiveness of life-outcomes or outcomes on pre-specified tasks. And then we can analyze the results of those tests and see whether we can uncover any structure, but the tests themselves have to aim to measure something externally valid.
To make this more concrete, the two biggest sources of IQ-test data we have come from american SAT scores, and the norwegian military draft which has an IQ-test component for all males who are above 18 years old since the mid of the 20th century.
The goal of the SAT was to be a measure of scholastic aptitude, as well as a measure of educational outcomes.
The goal of the norwegian military draft test was to be a measure of military aptitude, in particular to screen out people below a certain threshold of intelligence that were unfit for military service and would pose a risk to others, or be a net-drag on the military.
Neither of these are optimized to measure g. But we found that the test results in both of these score batteries are well-explained by a single g-factor. And the fact that whenever we try to measure aptitude on any real-life outcomes, we seem to find a common g-factor, is why we think there is something interesting with g going on in the first place.
Hmm. This interpretation was the impression that I recall getting from reading Jensen’s The g Factor, though it’s possible that I misremember. Though it’s possible that he was arguing that IQ tests should be aiming to measure g, even if they don’t necessarily always do, and held the most g-loaded ones as the gold standard.
I think it’s important to realize that what g is, shifts when you change what subtests your IQ test consists of, and how much “weight” you give to each different result. And as such it isn’t itself something that you can easily optimize for.
Like, you always have to define g-loadings with respect to a test battery over which you measure g. And while the correlations between different test-batteries’ g’s are themselves highly correlated, they are not perfectly correlated, and those correlations do come apart as you optimize for it.
Like, an IQ test with a single task, will obviously find a single g-factor that explains all the variance in the test results.
As such, we need to define a grounding for IQ tests that is about external validity and predictiveness of life-outcomes or outcomes on pre-specified tasks. And then we can analyze the results of those tests and see whether we can uncover any structure, but the tests themselves have to aim to measure something externally valid.
To make this more concrete, the two biggest sources of IQ-test data we have come from american SAT scores, and the norwegian military draft which has an IQ-test component for all males who are above 18 years old since the mid of the 20th century.
The goal of the SAT was to be a measure of scholastic aptitude, as well as a measure of educational outcomes.
The goal of the norwegian military draft test was to be a measure of military aptitude, in particular to screen out people below a certain threshold of intelligence that were unfit for military service and would pose a risk to others, or be a net-drag on the military.
Neither of these are optimized to measure g. But we found that the test results in both of these score batteries are well-explained by a single g-factor. And the fact that whenever we try to measure aptitude on any real-life outcomes, we seem to find a common g-factor, is why we think there is something interesting with g going on in the first place.