Some mathematicians more successful than Tao hold a contrary position.
There aren’t a lot of mathematicians more successful than Tao. I suppose he hasn’t won the Abel Prize yet. (Looking at the list of winners, it looks as if that one tends to go to older mathematicians in recognition of their lifetime’s great achievements. The youngest winner was Gromov: born 1943, Abel Prize in 2009.) Could tyou name some of the mathematicians you have in mind (and, even better, point us at what they’ve said on the subject)?
There aren’t a lot of mathematicians more successful than Tao.
I was referring to successful research as opposed to success at winning prizes.
The connection between prizes and quality of research comes apart for a variety of reasons: arbitrary age restrictions (in both directions), ceiling effects (many prizes are awarded once a year independently of the quality of research of potential prize recipients), individual idiosyncrasies of the people on the committees that award prizes, etc.
One mathematician more accomplished than Tao is Robert Langlands, known for the so-called “Langlands Program.”
The program provides a long sought after vast generalization of the Artin reciprocity law (giving a conjectural answer to a 40 year old question). The Artin reciprocity law was in turn a far-reaching generalization of quadratic reciprocity, which Gauss referred to as “theorema aureum” (the golden theorem).
I would very much have liked to continue reflecting on renormalization, but to attack it seriously requires not only enormous mathematical strength but also broad, concrete experience in the various domains mentioned, fluid mechanics, statistical mechanics, quantum field theory. The first only God can give; the second requires a lifetime to acquire.
I know this is only a single example – it’s hard to find examples of mathematicians writing about the nature of mathematical talent in the public domain altogether. But I’ll try to provide more later.
I wasn’t meaning to imply that you define success in terms of prizes (and, for that matter, neither do I). I agree that Langlands is a more important mathematician than Tao. But that’s a hell of a bar to clear. (Also, speaking of age effects, I remark that if you define mathematical success in terms of what one has achieved to date and its demonstrated influence in mathematics generally, you’re inevitably going to prefer older mathematicians—Langlands is 78 to Tao’s 40ish—and that’s going to affect what biases they have affecting their ideas about intelligence, native talent, etc.)
The quotation from Langlands that you give is not affirming the same thing as Tao is denying (though it’s possible that Tao would in fact deny it if asked), in at least two ways.
It refers to “mathematical strength” rather than “intelligence”. The assertion Tao made that you were disagreeing with was that you can be an exceptional mathematician without having exceptional intelligence, which is not the same thing as saying that you can be an exceptional mathematician without having exceptional “mathematical strength”.
It refers to a single particular mathematical/physical problem. It’s perfectly consistent to believe (1) that you can be an exceptional mathematician without exceptional intelligence (or exceptional “mathematical strength”) but (2) that if you’re going to try, you should work on something other than renormalization.
For the avoidance of doubt, I won’t be terribly surprised if it turns out that (say) 75% of world-class mathematicians think top 0.1% IQ is necessary to be a top 0.1% mathematician, but I’m not sure you’ve made much of a case yet. I’d be a little more surprised if it were 75% of world-class mathematicians who have put as much thought into the question as Tao has; I’ve no idea how much Langlands has actually thought about the question, but a throwaway aside in an essay about something else isn’t necessarily the product of deep thought.
I’ll briefly state my own opinions on all this, not that there’s any reason why anyone should care. I’ll use the term “R” for the particular thing that, e.g., Raven’s matrices test. I think it’s obvious that, all else being equal, more of any cognitive strength is better for mathematical success; more R is always going to be an advantage. So, of course, are other cognitive strengths, and other attributes such as love of mathematics, capacity for hard work, etc. They all doubtless correlate somewhat with R. So the question is something like: at a given overall level of rarity-of-useful-attributes, what things does mathematical success depend on most strongly? If you’re looking at a population for which some particular attribute is extreme, then prima facie you would expect that attribute to drop off in importance by this criterion as it gets more extreme, because of how rarity increases. I think it’s uncontroversial that to achieve any kind of mathematical success it’s almost essential to have a pretty good R. Beyond a certain level—let’s say roughly corresponding to a measured IQ of 140 or so—I think the importance of different cognitive strengths starts to depend a lot on what kind of mathematics you’re doing. For instance, I would guess that combinatorialists tend to have higher IQ than differential geometers. That doesn’t mean that combinatorialists are smarter than differential geometers, for two reasons. (1) I think equating R with smartness becomes less sensible at the highest levels; extreme R is a specialized talent and in terms of (e.g.) practical success, giving an immediate impression of great brainpower, etc., I strongly suspect that at a given overall level of rarity you’re going to do better with very high R plus very high other things (e.g., Scott-style verbalish reasoning) than with extreme R optimized for acing IQ tests. (2) The particular bundle of talents needed for great success in differential geometry is probably about as rare, and about as clearly constitutive of “smartness” by any reasonable standard, as the particular bundle needed for great success in combinatorics. And beyond a certain level—whose rarity maybe corresponds to a measured IQ of 150 or so—at any given level of rarity I bet other attributes that aren’t exactly cognitive strengths matter more than all those cognitive strengths do. So if you stratify prospective mathematicians by IQ (which of course is very similar to R), I would expect a picture like this. IQ ⇐ 130: mathematical success very very strongly dependent on R. IQ ⇐ 150: mathematical success correlated with R, but other cognitive strengths matter more in many fields of mathematics. IQ > 150: mathematical success mostly determined by other cognitive and not-so-cognitive strengths.
I conjecture that the above is broadly consistent with your evidence. I think it is also consistent (with a little interpretive licence—e.g., his idea of what constitutes exceptional intelligence is probably skewed upwards relative to most people’s) with what Tao has said. I am willing to be convinced that I’m wrong on either point.
I don’t think that exceptional intelligence is either necessary or sufficient to be an exceptional mathematician. Tao’s statement “But an exceptional amount of intelligence has almost no bearing on whether one is an exceptional mathematician.” is a very strong statement: if he had said “plays only a moderate role in whether one is an exceptional mathematician” he would have been on much more solid ground.
I agree that the Langlands quote is by itself not strong evidence against Tao’s assertion for the reasons that you give, but it’s still evidence. I’m relying on many weak arguments. I’ll gradually flesh them out in my sequence.
I share your intuition re: combinatorialists vs.geometers. One of my friends spent a lot of time with Chern, who struck him as being quite ordinary with respect to R, while being exceptional on a number of other dimensions. Grothendieck’s self-assessment suggests that it is in fact possible to be amongst the greatest mathematicians without exceptional R.
A key point that you might be missing (certainly I did for many years) is that there just aren’t many people of exceptional intelligence. Suppose that it were true that IQ is normally distributed: then the number of people of IQ 145+ would be 60x larger than the number of people of IQ 160+. Under this hypothesis, even if only 1 in 20 exceptional mathematicians had IQ 160+, that would mean that people in that range were 3x as likely as their IQ 145+ counterparts.
to become exceptional mathematicians. It’s been suggested that the distribution of IQ is in fact fat-tailed because of assortative mating, and this blunts the force of the aforementioned argument, but it’s also true that more than 5% of exceptional mathematicians have IQ 160+: I think the actual figure is closer to 50%.
It should be noted that if measured IQ is fat-tailed, this is because there is something wrong with IQ tests. IQ is defined to be normally distributed with a mean of 100 and a standard deviation of either 15 or 16 depending on which definition you’re using. So if measured IQ is fat-tailed, then the tests aren’t calibrated properly(of course, if your test goes all the way up to 160, it is almost inevitably miscalibrated, because there just aren’t enough people to calibrate it with).
You don’t want to force a normal distribution on the data. You’re free to do so if you’d like, e.g. by asking takers millions of questions so as to get very fine levels of granularity, and then mapping people at the 84th percentile of “questions answered correctly” to IQ 115, people at the 98th percentile to IQ 130, etc.
But what you really want is a situation where you have a (log)-linear relationship between standard deviations and other things that IQ correlates with, and if you force the data to obey a normal distribution, you’ll lose this.
The rationale for using a normal distribution is the central limit theorem, but that holds only when the summands are uncorrelated: assortative mating can induce correlations between e.g. having gene A that increases IQ and having gene B that increases IQ.
But what you really want is a situation where you have a (log)-linear relationship between standard deviations and other things that IQ correlates with, and if you force the data to obey a normal distribution, you’ll lose this.
Could you expand on this point? I am not sure I follow it.
Say that you have a function
f: rawScores ---> percentiles
and you want to compose it with a function
g: percentiles ---> IQ scores
so that log(g(f(x))) is as correlated with things that you care about other than IQ as much as possible (income, the log odds ratio of winning a Fields medal, etc.).
The default choice for g would be the function that takes a percentile to the associated standard deviation under a normal distribution. I’m claiming that the best choice for g is probably instead a function that takes a percentile to the associated standard deviation under a distribution that has fatter tails than the normal distribution.
The intuition is:
Measures of the practical significance of IQ are plausibly best modeled as a weighted average of many individual genes that increase IQ. If people had been mating with randomly selected members of the opposite sex, the probabilities of getting two such genes would be independent. But in practice, people (weakly) tend to marry people of intelligence similar to their own (link), inducing a positive correlation between the respective probabilities of a child getting two different genes that contribute to IQ.
is as correlated with things that you care about other than IQ as much as possible (income, the log odds ratio of winning a Fields medal, etc.)
First question: do you actually care about correlation (given that it’s a linear metric) or do you mean some tight dependency, not necessarily linear?
Second question: if that is the case, don’t you want your function g to produce a distribution shaped similarly to the “thing you care about”? If that thing-you-care-about is distributed normally, you would prefer g to generate a normal(-looking) distribution, if it’s distributed, say, as chi-squared, you would prefer g to give you a chi-squared shape, etc...?
Measures of the practical significance of IQ are plausibly best modeled as a weighted average of many individual genes that increase IQ.
That’s an iffy approach. Take, say, income (as a measure of the practical significance of IQ) -- are you saying income is best modeled as a weighted average of many IQ-related genes? You need the concept (and the link) of IQ to identify these genes to start with, but then you want to throw IQ out and go straight from genes to “practical” outcomes.
I agree that assortative mating would lead to a fat-tailed distribution, but your original goal was make IQ correlate with “things you care about” and for that purpose the fat tails are not particularly relevant.
First question: do you actually care about correlation (given that it’s a linear metric) or do you mean some tight dependency, not necessarily linear?
If g(y) is monotonic , then the degree to which there’s a right dependency is independent of g(y), which is just a change of coordinates. I do want to chose g(y) maximize the degree to which the dependency is a linear one.
Second question: if that is the case, don’t you want your function g to produce a distribution shaped similarly to the “thing you care about”? If that thing-you-care-about is distributed normally, you would prefer g to generate a normal(-looking) distribution, if it’s distributed, say, as chi-squared, you would prefer g to give you a chi-squared shape, etc...?
Yes, this is true and a good point, though the distribution of “the thing we care about” will vary from thing to thing, and I think that if we have to used a fixed distribution for IQ that’s uniform over all of them, the log of a fat-tailed distribution is probably the best choice.
are you saying income is best modeled as a weighted average of many IQ-related genes?
Here I’m just adopting an Occamistic approach – I don’t have high confidence – I’m just using a linear model because it’s the simplest possible function from genes to outcomes that are correlated with IQ. Feel free to suggest an alternative.
I agree that assortative mating would lead to a fat-tailed distribution, but your original goal was make IQ correlate with “things you care about” and for that purpose the fat tails are not particularly relevant.
Suppose, hypothetically, that human brains were such that IQ was capped at 145 by present day standards (e.g. because unbeknownst to us babies with IQ above that threshold died in childbirth for some reason having to do with IQ genes) . Then if we were to choose g(y) to get a normal distribution, it would look like the correlation between IQ and real world outcomes vanishes after 145, whereas the actual situation would be that the people who scored above 144 have essentially the same genetic composition (with respect to IQ) as the people who scored 144, so that “IQ doesn’t yield returns past 145” would be connotatively misleading.
I’m saying that defining IQ so that it’s normally distributed has a similar (though much smaller) connotatively distortionary effect similar to this one.
Suppose, hypothetically, that human brains were such that IQ was capped at 145 … it would look like the correlation between IQ and real world outcomes vanishes after 145
In your hypothetical there would be a lot of warning signs—for example all IQs above 145 would be random, that is, re-testing IQs above 145 would produce a random draw from the appropriate distribution tail.
And I suspect that it should be possible to figure out real-world distributions (the fatness of the tails, in particular), by looking at raw, non-normalized test scores.
There aren’t a lot of mathematicians more successful than Tao. I suppose he hasn’t won the Abel Prize yet. (Looking at the list of winners, it looks as if that one tends to go to older mathematicians in recognition of their lifetime’s great achievements. The youngest winner was Gromov: born 1943, Abel Prize in 2009.) Could tyou name some of the mathematicians you have in mind (and, even better, point us at what they’ve said on the subject)?
I was referring to successful research as opposed to success at winning prizes.
The connection between prizes and quality of research comes apart for a variety of reasons: arbitrary age restrictions (in both directions), ceiling effects (many prizes are awarded once a year independently of the quality of research of potential prize recipients), individual idiosyncrasies of the people on the committees that award prizes, etc.
One mathematician more accomplished than Tao is Robert Langlands, known for the so-called “Langlands Program.”
The program provides a long sought after vast generalization of the Artin reciprocity law (giving a conjectural answer to a 40 year old question). The Artin reciprocity law was in turn a far-reaching generalization of quadratic reciprocity, which Gauss referred to as “theorema aureum” (the golden theorem).
Three Fields medals have been awarded for work in the area, to Vladimir Drinfeld, Laurent Lafforgue and Ngô Bảo Châu.
One special case (proved by Langlands) was a crucial ingredient in Andrew Wiles’ proof of Fermat’s last theorem.
In this essay, Langlands wrote:
I know this is only a single example – it’s hard to find examples of mathematicians writing about the nature of mathematical talent in the public domain altogether. But I’ll try to provide more later.
I wasn’t meaning to imply that you define success in terms of prizes (and, for that matter, neither do I). I agree that Langlands is a more important mathematician than Tao. But that’s a hell of a bar to clear. (Also, speaking of age effects, I remark that if you define mathematical success in terms of what one has achieved to date and its demonstrated influence in mathematics generally, you’re inevitably going to prefer older mathematicians—Langlands is 78 to Tao’s 40ish—and that’s going to affect what biases they have affecting their ideas about intelligence, native talent, etc.)
The quotation from Langlands that you give is not affirming the same thing as Tao is denying (though it’s possible that Tao would in fact deny it if asked), in at least two ways.
It refers to “mathematical strength” rather than “intelligence”. The assertion Tao made that you were disagreeing with was that you can be an exceptional mathematician without having exceptional intelligence, which is not the same thing as saying that you can be an exceptional mathematician without having exceptional “mathematical strength”.
It refers to a single particular mathematical/physical problem. It’s perfectly consistent to believe (1) that you can be an exceptional mathematician without exceptional intelligence (or exceptional “mathematical strength”) but (2) that if you’re going to try, you should work on something other than renormalization.
For the avoidance of doubt, I won’t be terribly surprised if it turns out that (say) 75% of world-class mathematicians think top 0.1% IQ is necessary to be a top 0.1% mathematician, but I’m not sure you’ve made much of a case yet. I’d be a little more surprised if it were 75% of world-class mathematicians who have put as much thought into the question as Tao has; I’ve no idea how much Langlands has actually thought about the question, but a throwaway aside in an essay about something else isn’t necessarily the product of deep thought.
I’ll briefly state my own opinions on all this, not that there’s any reason why anyone should care. I’ll use the term “R” for the particular thing that, e.g., Raven’s matrices test. I think it’s obvious that, all else being equal, more of any cognitive strength is better for mathematical success; more R is always going to be an advantage. So, of course, are other cognitive strengths, and other attributes such as love of mathematics, capacity for hard work, etc. They all doubtless correlate somewhat with R. So the question is something like: at a given overall level of rarity-of-useful-attributes, what things does mathematical success depend on most strongly? If you’re looking at a population for which some particular attribute is extreme, then prima facie you would expect that attribute to drop off in importance by this criterion as it gets more extreme, because of how rarity increases. I think it’s uncontroversial that to achieve any kind of mathematical success it’s almost essential to have a pretty good R. Beyond a certain level—let’s say roughly corresponding to a measured IQ of 140 or so—I think the importance of different cognitive strengths starts to depend a lot on what kind of mathematics you’re doing. For instance, I would guess that combinatorialists tend to have higher IQ than differential geometers. That doesn’t mean that combinatorialists are smarter than differential geometers, for two reasons. (1) I think equating R with smartness becomes less sensible at the highest levels; extreme R is a specialized talent and in terms of (e.g.) practical success, giving an immediate impression of great brainpower, etc., I strongly suspect that at a given overall level of rarity you’re going to do better with very high R plus very high other things (e.g., Scott-style verbalish reasoning) than with extreme R optimized for acing IQ tests. (2) The particular bundle of talents needed for great success in differential geometry is probably about as rare, and about as clearly constitutive of “smartness” by any reasonable standard, as the particular bundle needed for great success in combinatorics. And beyond a certain level—whose rarity maybe corresponds to a measured IQ of 150 or so—at any given level of rarity I bet other attributes that aren’t exactly cognitive strengths matter more than all those cognitive strengths do. So if you stratify prospective mathematicians by IQ (which of course is very similar to R), I would expect a picture like this. IQ ⇐ 130: mathematical success very very strongly dependent on R. IQ ⇐ 150: mathematical success correlated with R, but other cognitive strengths matter more in many fields of mathematics. IQ > 150: mathematical success mostly determined by other cognitive and not-so-cognitive strengths.
I conjecture that the above is broadly consistent with your evidence. I think it is also consistent (with a little interpretive licence—e.g., his idea of what constitutes exceptional intelligence is probably skewed upwards relative to most people’s) with what Tao has said. I am willing to be convinced that I’m wrong on either point.
Thanks for the detailed comment.
I don’t think that exceptional intelligence is either necessary or sufficient to be an exceptional mathematician. Tao’s statement “But an exceptional amount of intelligence has almost no bearing on whether one is an exceptional mathematician.” is a very strong statement: if he had said “plays only a moderate role in whether one is an exceptional mathematician” he would have been on much more solid ground.
I agree that the Langlands quote is by itself not strong evidence against Tao’s assertion for the reasons that you give, but it’s still evidence. I’m relying on many weak arguments. I’ll gradually flesh them out in my sequence.
I share your intuition re: combinatorialists vs.geometers. One of my friends spent a lot of time with Chern, who struck him as being quite ordinary with respect to R, while being exceptional on a number of other dimensions. Grothendieck’s self-assessment suggests that it is in fact possible to be amongst the greatest mathematicians without exceptional R.
A key point that you might be missing (certainly I did for many years) is that there just aren’t many people of exceptional intelligence. Suppose that it were true that IQ is normally distributed: then the number of people of IQ 145+ would be 60x larger than the number of people of IQ 160+. Under this hypothesis, even if only 1 in 20 exceptional mathematicians had IQ 160+, that would mean that people in that range were 3x as likely as their IQ 145+ counterparts. to become exceptional mathematicians. It’s been suggested that the distribution of IQ is in fact fat-tailed because of assortative mating, and this blunts the force of the aforementioned argument, but it’s also true that more than 5% of exceptional mathematicians have IQ 160+: I think the actual figure is closer to 50%.
It should be noted that if measured IQ is fat-tailed, this is because there is something wrong with IQ tests. IQ is defined to be normally distributed with a mean of 100 and a standard deviation of either 15 or 16 depending on which definition you’re using. So if measured IQ is fat-tailed, then the tests aren’t calibrated properly(of course, if your test goes all the way up to 160, it is almost inevitably miscalibrated, because there just aren’t enough people to calibrate it with).
You don’t want to force a normal distribution on the data. You’re free to do so if you’d like, e.g. by asking takers millions of questions so as to get very fine levels of granularity, and then mapping people at the 84th percentile of “questions answered correctly” to IQ 115, people at the 98th percentile to IQ 130, etc.
But what you really want is a situation where you have a (log)-linear relationship between standard deviations and other things that IQ correlates with, and if you force the data to obey a normal distribution, you’ll lose this.
The rationale for using a normal distribution is the central limit theorem, but that holds only when the summands are uncorrelated: assortative mating can induce correlations between e.g. having gene A that increases IQ and having gene B that increases IQ.
Could you expand on this point? I am not sure I follow it.
Say that you have a function f: rawScores ---> percentiles and you want to compose it with a function g: percentiles ---> IQ scores so that log(g(f(x))) is as correlated with things that you care about other than IQ as much as possible (income, the log odds ratio of winning a Fields medal, etc.).
The default choice for g would be the function that takes a percentile to the associated standard deviation under a normal distribution. I’m claiming that the best choice for g is probably instead a function that takes a percentile to the associated standard deviation under a distribution that has fatter tails than the normal distribution.
The intuition is:
Measures of the practical significance of IQ are plausibly best modeled as a weighted average of many individual genes that increase IQ. If people had been mating with randomly selected members of the opposite sex, the probabilities of getting two such genes would be independent. But in practice, people (weakly) tend to marry people of intelligence similar to their own (link), inducing a positive correlation between the respective probabilities of a child getting two different genes that contribute to IQ.
First question: do you actually care about correlation (given that it’s a linear metric) or do you mean some tight dependency, not necessarily linear?
Second question: if that is the case, don’t you want your function g to produce a distribution shaped similarly to the “thing you care about”? If that thing-you-care-about is distributed normally, you would prefer g to generate a normal(-looking) distribution, if it’s distributed, say, as chi-squared, you would prefer g to give you a chi-squared shape, etc...?
That’s an iffy approach. Take, say, income (as a measure of the practical significance of IQ) -- are you saying income is best modeled as a weighted average of many IQ-related genes? You need the concept (and the link) of IQ to identify these genes to start with, but then you want to throw IQ out and go straight from genes to “practical” outcomes.
I agree that assortative mating would lead to a fat-tailed distribution, but your original goal was make IQ correlate with “things you care about” and for that purpose the fat tails are not particularly relevant.
If g(y) is monotonic , then the degree to which there’s a right dependency is independent of g(y), which is just a change of coordinates. I do want to chose g(y) maximize the degree to which the dependency is a linear one.
Yes, this is true and a good point, though the distribution of “the thing we care about” will vary from thing to thing, and I think that if we have to used a fixed distribution for IQ that’s uniform over all of them, the log of a fat-tailed distribution is probably the best choice.
Here I’m just adopting an Occamistic approach – I don’t have high confidence – I’m just using a linear model because it’s the simplest possible function from genes to outcomes that are correlated with IQ. Feel free to suggest an alternative.
Suppose, hypothetically, that human brains were such that IQ was capped at 145 by present day standards (e.g. because unbeknownst to us babies with IQ above that threshold died in childbirth for some reason having to do with IQ genes) . Then if we were to choose g(y) to get a normal distribution, it would look like the correlation between IQ and real world outcomes vanishes after 145, whereas the actual situation would be that the people who scored above 144 have essentially the same genetic composition (with respect to IQ) as the people who scored 144, so that “IQ doesn’t yield returns past 145” would be connotatively misleading.
I’m saying that defining IQ so that it’s normally distributed has a similar (though much smaller) connotatively distortionary effect similar to this one.
In your hypothetical there would be a lot of warning signs—for example all IQs above 145 would be random, that is, re-testing IQs above 145 would produce a random draw from the appropriate distribution tail.
And I suspect that it should be possible to figure out real-world distributions (the fatness of the tails, in particular), by looking at raw, non-normalized test scores.
Yes, you and I are on the same page, I was just saying that IQ shouldn’t be defined to be normally distributed.
Would you characterize this post as a reasonable description of what you’re talking about in your discussion of “R”?
Yes, that’s the guts of it.