I am confused. You might be talking about g, not IQ, since we have very significant evidence that we can raise IQ by creating proper learning environments, given that most psychometrics researchers credit widespread education for a large fraction of the Flynn effect, and generally don’t think that genetic changes explain much.
A 2017 survey of 75 experts in the field of intelligence research suggested four key causes of the Flynn effect: Better health, better nutrition, more and better education, and rising standards of living. Genetic changes were seen as not important.[28] The experts’ views agreed with an independently performed[29] meta-analysis on published Flynn effect data, except that the latter found life history speed to be the most important factor.[30]
Yes, I am referring to “IQ” not g because most people do not know what g is. (For other readers ,IQ is the measurement, g is the real thing.) I have looked into IQ research a lot and spoken to a few experts. While genetics likely doesn’t play much of a role in the Flynn effect, it plays a huge role in g and IQ. This is established beyond any reasonable doubt. IQ is a very politically sensitive topic and people are not always honest about it. Indeed, some experts admit to other experts that they lie about IQ when discussing IQ in public (Source: my friend and podcasting partner Greg Cochran. The podcast is Future Strategist.). We don’t know if the Flynn effect is real, it might just come from measurement errors arising from people becoming more familiar with IQ-like tests, although it also could reflect real gains in g that are being captured by higher IQ scores. There is no good evidence that education raises g. The literature on IQ is so massive, and so poisoned by political correctness (and some would claim racism) that it is not possible to resolve the issues you raise by citing literature. If you ask IQ experts why they disagree with other IQ experts they will say that the other experts are idiots/liars/racists/cowards. I interviewed a lot of IQ experts when writing my book Singularity Rising.
To be clear, I think it’s very obvious that genetics has a large effect on g. The key question that you seemed to dismiss above is whether education or really any form of training has an additional effect (or more likely, some complicated dynamic with genetics) on g.
And after looking into this question a lot over the past few years, I think the answer is “maybe, probably a bit”. The big problem is that for population-wide studies, we can’t really get nice data on the effects of education because the Flynn effect is adding a pretty clear positive trend and geographic variance in education levels doesn’t really capture what we would naively think as the likely contributors to the observed increase in g.
And you can’t do directed interventions because all IQ tests (even very heavily g-loaded ones) are extremely susceptible to training effects, with even just an hour of practicing on Raven’s progressive matrices seeming to result in large gains. As such, you can’t really use IQ tests as any kind of feedback loop, and almost any real gains will be drowned out by the local training effects.
(For other readers ,IQ is the measurement, g is the real thing.)
This seems like a misleading summary of what g is.
g is the shared principal component of various subsets of IQ tests. As such, it measures the shared variance between your performance on many different tasks, and so is the thing that we expect to generalize most between different tasks. But in most psychometric contexts I’ve seen, we split g into 3-5 different components, which tends to add significant additional predictive accuracy (at the cost of simplicity, obviously).
To describe it as “the real thing” requires defining what our goal with IQ testing is. Results on IQ tests have predictive power over income and life-outcomes even beyond the variance that is explained by g, and predictive power over outcomes on a large variety of different tasks beyond only g.
The goal of IQ tests is not to measure g, it isn’t even clear whether g is a single thing that can be “measured”. The goal of IQ tests historically has been to assess aptitude for various jobs and roles (such as whether you should be admitted to the military, which is where a large fraction of our IQ-score data comes from). For those purposes, we’ve often found that solely focusing on trying to measure aptitude that generalizes between tasks is a bad idea, since there is still significant task-specific variance that we care about, and would have to give up on measuring in the case of defining g as the ultimate goal of measurement.
We don’t know if the Flynn effect is real, it might just come from measurement errors arising from people becoming more familiar with IQ-like tests, although it also could reflect real gains in g that are being captured by higher IQ scores.
I think the Flynn effect has been pretty solidly established, as well as the fact that it has had a significant effect on g.
I do think the most likely explanation of a large fraction of the effect on g is explained via the other factors I cited above, namely better nutrition and more broadly better health-care, resulting in significantly fewer deficiencies.
The g-factor, or g for short, is the thing that IQ tries to measure.
The name “g factor” comes from the fact that it is a common, general factor which all kinds of intelligence draw upon. For instance, Deary (2001) analyzed an American standardization sample of the WAIS-III intelligence test, and built a model where performance on the 13 subtests was primarily influenced by four group factors, or components of intelligence: verbal comprehension, perceptual organization, working memory, and processing speed. In addition, there was a common g factor that strongly influenced all four.
The model indicated that the variance in g was responsible for 74% of the variance in verbal comprehension, 88% of the variance in perceptual organization, 83% of the variance in working memory, and 61% of the variance in processing speed.
Technically, g is something that is computed from the correlations between various test scores in a given sample, and there’s no such thing as the g of any specific individual. The technique doesn’t even guarantee that g actually corresponds with any physical quantity, as opposed to something that the method just happened to produce by accident.
So when you want to measure someone’s intelligence, you make a lot of people take tests that are known to be strongly g-loaded. That means that the performance on the tests is strongly correlated with g. Then you take their raw scores and standardize them to produce an IQ score, so that if e.g. only 10% of the test-takers got a raw score of X, then anyone getting the raw score of X is assigned an IQ indicating that they’re in the top 10% of the population. And although IQ still doesn’t tell us what an individual’s g score is, it gives us a score that’s closely correlated with g.
The g-factor, or g for short, is the thing that IQ tries to measure.
See my reply above. I think thinking about IQ tests trying to “measure g” is pretty confusing, and while I used to have this view, I updated pretty strongly against it after reading more of the psychometrics literature.
Hmm. This interpretation was the impression that I recall getting from reading Jensen’s The g Factor, though it’s possible that I misremember. Though it’s possible that he was arguing that IQ tests should be aiming to measure g, even if they don’t necessarily always do, and held the most g-loaded ones as the gold standard.
I think it’s important to realize that what g is, shifts when you change what subtests your IQ test consists of, and how much “weight” you give to each different result. And as such it isn’t itself something that you can easily optimize for.
Like, you always have to define g-loadings with respect to a test battery over which you measure g. And while the correlations between different test-batteries’ g’s are themselves highly correlated, they are not perfectly correlated, and those correlations do come apart as you optimize for it.
Like, an IQ test with a single task, will obviously find a single g-factor that explains all the variance in the test results.
As such, we need to define a grounding for IQ tests that is about external validity and predictiveness of life-outcomes or outcomes on pre-specified tasks. And then we can analyze the results of those tests and see whether we can uncover any structure, but the tests themselves have to aim to measure something externally valid.
To make this more concrete, the two biggest sources of IQ-test data we have come from american SAT scores, and the norwegian military draft which has an IQ-test component for all males who are above 18 years old since the mid of the 20th century.
The goal of the SAT was to be a measure of scholastic aptitude, as well as a measure of educational outcomes.
The goal of the norwegian military draft test was to be a measure of military aptitude, in particular to screen out people below a certain threshold of intelligence that were unfit for military service and would pose a risk to others, or be a net-drag on the military.
Neither of these are optimized to measure g. But we found that the test results in both of these score batteries are well-explained by a single g-factor. And the fact that whenever we try to measure aptitude on any real-life outcomes, we seem to find a common g-factor, is why we think there is something interesting with g going on in the first place.
If “X” is something we don’t have a “gears model” of yet, aren’t “tests that highly correlate with X” the only way to measure X? Especially when it’s not physics.
In other words, why go the extra mile to emphasize that Y is merely the best available method to measure X, but not X itself? Is this a standard way of talking about scientific topics, or is it only used for politically sensitive topics?
Here the situation is different in that it’s not just that we don’t know how to measure X, but rather the way in which we have derived X means that directly measuring it is impossible even in principle.
That’s distinct from something like (say) self-esteem, where it might be the case that we might figure out what self-esteem really means, or at least come up with a satisfactory instrumental definition for it. There’s nothing in the normal definition of self-esteem that would make it impossible to measure on an individual level. Not so with g.
Of course, one could come up with a definition for something like “intelligence”, and then try to measure that directly—which is what people often do, when they say that “intelligence is what intelligence tests measure”. But that’s not the same as measuring g.
This matters because it’s part of what makes e.g. the Flynn effect so hard to interpret—yes raw test scores on IQ tests have gone up, but have people actually gotten smarter? We can’t directly measure g, so a rise alone doesn’t yet tell us anything. On the other hand, if people’s scores on a test of self-esteem went up over time, then it would be much more straightforward to assume that people’s self-esteem has probably actually gone up.
In this case it’s important to emphasize that difference, because a commonly raised hypothesis is that while we can see clear training effects on IQ, none of these effects are on the underlying g-factor, i.e. the gains do not generalize to new tasks. For naive interventions, this has been pretty clearly demonstrated:
IQ scores provide the best general predictor of success in education, job training, and work. However, there are many ways in which IQ scores can be increased, for instance by means of retesting or participation in learning potential training programs. What is the nature of these score gains?
[...]
The meta-analysis of 64 test– retest studies using IQ batteries (total N= 26,990) yielded a correlation between g loadings and score gains of −1.00, meaning there is no g saturation in score gains.
I am confused. You might be talking about g, not IQ, since we have very significant evidence that we can raise IQ by creating proper learning environments, given that most psychometrics researchers credit widespread education for a large fraction of the Flynn effect, and generally don’t think that genetic changes explain much.
Yes, I am referring to “IQ” not g because most people do not know what g is. (For other readers ,IQ is the measurement, g is the real thing.) I have looked into IQ research a lot and spoken to a few experts. While genetics likely doesn’t play much of a role in the Flynn effect, it plays a huge role in g and IQ. This is established beyond any reasonable doubt. IQ is a very politically sensitive topic and people are not always honest about it. Indeed, some experts admit to other experts that they lie about IQ when discussing IQ in public (Source: my friend and podcasting partner Greg Cochran. The podcast is Future Strategist.). We don’t know if the Flynn effect is real, it might just come from measurement errors arising from people becoming more familiar with IQ-like tests, although it also could reflect real gains in g that are being captured by higher IQ scores. There is no good evidence that education raises g. The literature on IQ is so massive, and so poisoned by political correctness (and some would claim racism) that it is not possible to resolve the issues you raise by citing literature. If you ask IQ experts why they disagree with other IQ experts they will say that the other experts are idiots/liars/racists/cowards. I interviewed a lot of IQ experts when writing my book Singularity Rising.
To be clear, I think it’s very obvious that genetics has a large effect on g. The key question that you seemed to dismiss above is whether education or really any form of training has an additional effect (or more likely, some complicated dynamic with genetics) on g.
And after looking into this question a lot over the past few years, I think the answer is “maybe, probably a bit”. The big problem is that for population-wide studies, we can’t really get nice data on the effects of education because the Flynn effect is adding a pretty clear positive trend and geographic variance in education levels doesn’t really capture what we would naively think as the likely contributors to the observed increase in g.
And you can’t do directed interventions because all IQ tests (even very heavily g-loaded ones) are extremely susceptible to training effects, with even just an hour of practicing on Raven’s progressive matrices seeming to result in large gains. As such, you can’t really use IQ tests as any kind of feedback loop, and almost any real gains will be drowned out by the local training effects.
This seems like a misleading summary of what g is.
g is the shared principal component of various subsets of IQ tests. As such, it measures the shared variance between your performance on many different tasks, and so is the thing that we expect to generalize most between different tasks. But in most psychometric contexts I’ve seen, we split g into 3-5 different components, which tends to add significant additional predictive accuracy (at the cost of simplicity, obviously).
To describe it as “the real thing” requires defining what our goal with IQ testing is. Results on IQ tests have predictive power over income and life-outcomes even beyond the variance that is explained by g, and predictive power over outcomes on a large variety of different tasks beyond only g.
The goal of IQ tests is not to measure g, it isn’t even clear whether g is a single thing that can be “measured”. The goal of IQ tests historically has been to assess aptitude for various jobs and roles (such as whether you should be admitted to the military, which is where a large fraction of our IQ-score data comes from). For those purposes, we’ve often found that solely focusing on trying to measure aptitude that generalizes between tasks is a bad idea, since there is still significant task-specific variance that we care about, and would have to give up on measuring in the case of defining g as the ultimate goal of measurement.
I think the Flynn effect has been pretty solidly established, as well as the fact that it has had a significant effect on g.
I do think the most likely explanation of a large fraction of the effect on g is explained via the other factors I cited above, namely better nutrition and more broadly better health-care, resulting in significantly fewer deficiencies.
By “g, not IQ” you mean the difference between genotype and phenotype, or something else?
The g-factor, or g for short, is the thing that IQ tries to measure.
The name “g factor” comes from the fact that it is a common, general factor which all kinds of intelligence draw upon. For instance, Deary (2001) analyzed an American standardization sample of the WAIS-III intelligence test, and built a model where performance on the 13 subtests was primarily influenced by four group factors, or components of intelligence: verbal comprehension, perceptual organization, working memory, and processing speed. In addition, there was a common g factor that strongly influenced all four.
The model indicated that the variance in g was responsible for 74% of the variance in verbal comprehension, 88% of the variance in perceptual organization, 83% of the variance in working memory, and 61% of the variance in processing speed.
Technically, g is something that is computed from the correlations between various test scores in a given sample, and there’s no such thing as the g of any specific individual. The technique doesn’t even guarantee that g actually corresponds with any physical quantity, as opposed to something that the method just happened to produce by accident.
So when you want to measure someone’s intelligence, you make a lot of people take tests that are known to be strongly g-loaded. That means that the performance on the tests is strongly correlated with g. Then you take their raw scores and standardize them to produce an IQ score, so that if e.g. only 10% of the test-takers got a raw score of X, then anyone getting the raw score of X is assigned an IQ indicating that they’re in the top 10% of the population. And although IQ still doesn’t tell us what an individual’s g score is, it gives us a score that’s closely correlated with g.
See my reply above. I think thinking about IQ tests trying to “measure g” is pretty confusing, and while I used to have this view, I updated pretty strongly against it after reading more of the psychometrics literature.
Hmm. This interpretation was the impression that I recall getting from reading Jensen’s The g Factor, though it’s possible that I misremember. Though it’s possible that he was arguing that IQ tests should be aiming to measure g, even if they don’t necessarily always do, and held the most g-loaded ones as the gold standard.
I think it’s important to realize that what g is, shifts when you change what subtests your IQ test consists of, and how much “weight” you give to each different result. And as such it isn’t itself something that you can easily optimize for.
Like, you always have to define g-loadings with respect to a test battery over which you measure g. And while the correlations between different test-batteries’ g’s are themselves highly correlated, they are not perfectly correlated, and those correlations do come apart as you optimize for it.
Like, an IQ test with a single task, will obviously find a single g-factor that explains all the variance in the test results.
As such, we need to define a grounding for IQ tests that is about external validity and predictiveness of life-outcomes or outcomes on pre-specified tasks. And then we can analyze the results of those tests and see whether we can uncover any structure, but the tests themselves have to aim to measure something externally valid.
To make this more concrete, the two biggest sources of IQ-test data we have come from american SAT scores, and the norwegian military draft which has an IQ-test component for all males who are above 18 years old since the mid of the 20th century.
The goal of the SAT was to be a measure of scholastic aptitude, as well as a measure of educational outcomes.
The goal of the norwegian military draft test was to be a measure of military aptitude, in particular to screen out people below a certain threshold of intelligence that were unfit for military service and would pose a risk to others, or be a net-drag on the military.
Neither of these are optimized to measure g. But we found that the test results in both of these score batteries are well-explained by a single g-factor. And the fact that whenever we try to measure aptitude on any real-life outcomes, we seem to find a common g-factor, is why we think there is something interesting with g going on in the first place.
If “X” is something we don’t have a “gears model” of yet, aren’t “tests that highly correlate with X” the only way to measure X? Especially when it’s not physics.
In other words, why go the extra mile to emphasize that Y is merely the best available method to measure X, but not X itself? Is this a standard way of talking about scientific topics, or is it only used for politically sensitive topics?
Here the situation is different in that it’s not just that we don’t know how to measure X, but rather the way in which we have derived X means that directly measuring it is impossible even in principle.
That’s distinct from something like (say) self-esteem, where it might be the case that we might figure out what self-esteem really means, or at least come up with a satisfactory instrumental definition for it. There’s nothing in the normal definition of self-esteem that would make it impossible to measure on an individual level. Not so with g.
Of course, one could come up with a definition for something like “intelligence”, and then try to measure that directly—which is what people often do, when they say that “intelligence is what intelligence tests measure”. But that’s not the same as measuring g.
This matters because it’s part of what makes e.g. the Flynn effect so hard to interpret—yes raw test scores on IQ tests have gone up, but have people actually gotten smarter? We can’t directly measure g, so a rise alone doesn’t yet tell us anything. On the other hand, if people’s scores on a test of self-esteem went up over time, then it would be much more straightforward to assume that people’s self-esteem has probably actually gone up.
In this case it’s important to emphasize that difference, because a commonly raised hypothesis is that while we can see clear training effects on IQ, none of these effects are on the underlying g-factor, i.e. the gains do not generalize to new tasks. For naive interventions, this has been pretty clearly demonstrated: