The g-factor, or g for short, is the thing that IQ tries to measure.
The name “g factor” comes from the fact that it is a common, general factor which all kinds of intelligence draw upon. For instance, Deary (2001) analyzed an American standardization sample of the WAIS-III intelligence test, and built a model where performance on the 13 subtests was primarily influenced by four group factors, or components of intelligence: verbal comprehension, perceptual organization, working memory, and processing speed. In addition, there was a common g factor that strongly influenced all four.
The model indicated that the variance in g was responsible for 74% of the variance in verbal comprehension, 88% of the variance in perceptual organization, 83% of the variance in working memory, and 61% of the variance in processing speed.
Technically, g is something that is computed from the correlations between various test scores in a given sample, and there’s no such thing as the g of any specific individual. The technique doesn’t even guarantee that g actually corresponds with any physical quantity, as opposed to something that the method just happened to produce by accident.
So when you want to measure someone’s intelligence, you make a lot of people take tests that are known to be strongly g-loaded. That means that the performance on the tests is strongly correlated with g. Then you take their raw scores and standardize them to produce an IQ score, so that if e.g. only 10% of the test-takers got a raw score of X, then anyone getting the raw score of X is assigned an IQ indicating that they’re in the top 10% of the population. And although IQ still doesn’t tell us what an individual’s g score is, it gives us a score that’s closely correlated with g.
The g-factor, or g for short, is the thing that IQ tries to measure.
See my reply above. I think thinking about IQ tests trying to “measure g” is pretty confusing, and while I used to have this view, I updated pretty strongly against it after reading more of the psychometrics literature.
Hmm. This interpretation was the impression that I recall getting from reading Jensen’s The g Factor, though it’s possible that I misremember. Though it’s possible that he was arguing that IQ tests should be aiming to measure g, even if they don’t necessarily always do, and held the most g-loaded ones as the gold standard.
I think it’s important to realize that what g is, shifts when you change what subtests your IQ test consists of, and how much “weight” you give to each different result. And as such it isn’t itself something that you can easily optimize for.
Like, you always have to define g-loadings with respect to a test battery over which you measure g. And while the correlations between different test-batteries’ g’s are themselves highly correlated, they are not perfectly correlated, and those correlations do come apart as you optimize for it.
Like, an IQ test with a single task, will obviously find a single g-factor that explains all the variance in the test results.
As such, we need to define a grounding for IQ tests that is about external validity and predictiveness of life-outcomes or outcomes on pre-specified tasks. And then we can analyze the results of those tests and see whether we can uncover any structure, but the tests themselves have to aim to measure something externally valid.
To make this more concrete, the two biggest sources of IQ-test data we have come from american SAT scores, and the norwegian military draft which has an IQ-test component for all males who are above 18 years old since the mid of the 20th century.
The goal of the SAT was to be a measure of scholastic aptitude, as well as a measure of educational outcomes.
The goal of the norwegian military draft test was to be a measure of military aptitude, in particular to screen out people below a certain threshold of intelligence that were unfit for military service and would pose a risk to others, or be a net-drag on the military.
Neither of these are optimized to measure g. But we found that the test results in both of these score batteries are well-explained by a single g-factor. And the fact that whenever we try to measure aptitude on any real-life outcomes, we seem to find a common g-factor, is why we think there is something interesting with g going on in the first place.
If “X” is something we don’t have a “gears model” of yet, aren’t “tests that highly correlate with X” the only way to measure X? Especially when it’s not physics.
In other words, why go the extra mile to emphasize that Y is merely the best available method to measure X, but not X itself? Is this a standard way of talking about scientific topics, or is it only used for politically sensitive topics?
Here the situation is different in that it’s not just that we don’t know how to measure X, but rather the way in which we have derived X means that directly measuring it is impossible even in principle.
That’s distinct from something like (say) self-esteem, where it might be the case that we might figure out what self-esteem really means, or at least come up with a satisfactory instrumental definition for it. There’s nothing in the normal definition of self-esteem that would make it impossible to measure on an individual level. Not so with g.
Of course, one could come up with a definition for something like “intelligence”, and then try to measure that directly—which is what people often do, when they say that “intelligence is what intelligence tests measure”. But that’s not the same as measuring g.
This matters because it’s part of what makes e.g. the Flynn effect so hard to interpret—yes raw test scores on IQ tests have gone up, but have people actually gotten smarter? We can’t directly measure g, so a rise alone doesn’t yet tell us anything. On the other hand, if people’s scores on a test of self-esteem went up over time, then it would be much more straightforward to assume that people’s self-esteem has probably actually gone up.
In this case it’s important to emphasize that difference, because a commonly raised hypothesis is that while we can see clear training effects on IQ, none of these effects are on the underlying g-factor, i.e. the gains do not generalize to new tasks. For naive interventions, this has been pretty clearly demonstrated:
IQ scores provide the best general predictor of success in education, job training, and work. However, there are many ways in which IQ scores can be increased, for instance by means of retesting or participation in learning potential training programs. What is the nature of these score gains?
[...]
The meta-analysis of 64 test– retest studies using IQ batteries (total N= 26,990) yielded a correlation between g loadings and score gains of −1.00, meaning there is no g saturation in score gains.
The g-factor, or g for short, is the thing that IQ tries to measure.
The name “g factor” comes from the fact that it is a common, general factor which all kinds of intelligence draw upon. For instance, Deary (2001) analyzed an American standardization sample of the WAIS-III intelligence test, and built a model where performance on the 13 subtests was primarily influenced by four group factors, or components of intelligence: verbal comprehension, perceptual organization, working memory, and processing speed. In addition, there was a common g factor that strongly influenced all four.
The model indicated that the variance in g was responsible for 74% of the variance in verbal comprehension, 88% of the variance in perceptual organization, 83% of the variance in working memory, and 61% of the variance in processing speed.
Technically, g is something that is computed from the correlations between various test scores in a given sample, and there’s no such thing as the g of any specific individual. The technique doesn’t even guarantee that g actually corresponds with any physical quantity, as opposed to something that the method just happened to produce by accident.
So when you want to measure someone’s intelligence, you make a lot of people take tests that are known to be strongly g-loaded. That means that the performance on the tests is strongly correlated with g. Then you take their raw scores and standardize them to produce an IQ score, so that if e.g. only 10% of the test-takers got a raw score of X, then anyone getting the raw score of X is assigned an IQ indicating that they’re in the top 10% of the population. And although IQ still doesn’t tell us what an individual’s g score is, it gives us a score that’s closely correlated with g.
See my reply above. I think thinking about IQ tests trying to “measure g” is pretty confusing, and while I used to have this view, I updated pretty strongly against it after reading more of the psychometrics literature.
Hmm. This interpretation was the impression that I recall getting from reading Jensen’s The g Factor, though it’s possible that I misremember. Though it’s possible that he was arguing that IQ tests should be aiming to measure g, even if they don’t necessarily always do, and held the most g-loaded ones as the gold standard.
I think it’s important to realize that what g is, shifts when you change what subtests your IQ test consists of, and how much “weight” you give to each different result. And as such it isn’t itself something that you can easily optimize for.
Like, you always have to define g-loadings with respect to a test battery over which you measure g. And while the correlations between different test-batteries’ g’s are themselves highly correlated, they are not perfectly correlated, and those correlations do come apart as you optimize for it.
Like, an IQ test with a single task, will obviously find a single g-factor that explains all the variance in the test results.
As such, we need to define a grounding for IQ tests that is about external validity and predictiveness of life-outcomes or outcomes on pre-specified tasks. And then we can analyze the results of those tests and see whether we can uncover any structure, but the tests themselves have to aim to measure something externally valid.
To make this more concrete, the two biggest sources of IQ-test data we have come from american SAT scores, and the norwegian military draft which has an IQ-test component for all males who are above 18 years old since the mid of the 20th century.
The goal of the SAT was to be a measure of scholastic aptitude, as well as a measure of educational outcomes.
The goal of the norwegian military draft test was to be a measure of military aptitude, in particular to screen out people below a certain threshold of intelligence that were unfit for military service and would pose a risk to others, or be a net-drag on the military.
Neither of these are optimized to measure g. But we found that the test results in both of these score batteries are well-explained by a single g-factor. And the fact that whenever we try to measure aptitude on any real-life outcomes, we seem to find a common g-factor, is why we think there is something interesting with g going on in the first place.
If “X” is something we don’t have a “gears model” of yet, aren’t “tests that highly correlate with X” the only way to measure X? Especially when it’s not physics.
In other words, why go the extra mile to emphasize that Y is merely the best available method to measure X, but not X itself? Is this a standard way of talking about scientific topics, or is it only used for politically sensitive topics?
Here the situation is different in that it’s not just that we don’t know how to measure X, but rather the way in which we have derived X means that directly measuring it is impossible even in principle.
That’s distinct from something like (say) self-esteem, where it might be the case that we might figure out what self-esteem really means, or at least come up with a satisfactory instrumental definition for it. There’s nothing in the normal definition of self-esteem that would make it impossible to measure on an individual level. Not so with g.
Of course, one could come up with a definition for something like “intelligence”, and then try to measure that directly—which is what people often do, when they say that “intelligence is what intelligence tests measure”. But that’s not the same as measuring g.
This matters because it’s part of what makes e.g. the Flynn effect so hard to interpret—yes raw test scores on IQ tests have gone up, but have people actually gotten smarter? We can’t directly measure g, so a rise alone doesn’t yet tell us anything. On the other hand, if people’s scores on a test of self-esteem went up over time, then it would be much more straightforward to assume that people’s self-esteem has probably actually gone up.
In this case it’s important to emphasize that difference, because a commonly raised hypothesis is that while we can see clear training effects on IQ, none of these effects are on the underlying g-factor, i.e. the gains do not generalize to new tasks. For naive interventions, this has been pretty clearly demonstrated: