gjm comments on Innate Mathematical Ability

gjm 18 Feb 2015 22:28 UTC
5 points

I think that his verbal abilities are [...] perhaps only average relative to mathematicians as a group.

Seems plausible—though for what it’s worth I’d rate his verbal abilities substantially above those of mathematicians generally.

weighted average [...] must be lower on that second thing

That would be what I described as “the trivial observation that min < average” :-) and sure, I agree that whatever feature of Tao’s verbal intelligence is worst has to be worse than his overall verbal intelligence, but I don’t see why that’s interesting enough to be worth drawing attention to. I guess your point is that if his general intelligence is so spectacularly high then to average out correctly some aspect of his verbal intelligence must be quite a lot lower than his overall verbal—but it seems equally plausible to me that verbal SAT results just don’t depend all that strongly on the kind of pattern-spotting tested by the really hard Raven matrices.

Do you think there’s an alternative explanation [...] ?

I can think of several. He may have too low an opinion of his own intelligence because of the sort of weird psychological hangups that many very clever people have. He may interpret “intelligence” in a way that weights more-mathematical things less heavily (perhaps because, being so exceptional in the latter, he sees more clearly the distinction between those and other sorts of thinking). He may be a victim of Political Correctness Gone Mad and feel that he has to play down the importance of intelligence. His idea of what constitutes exceptionally high intelligence may be skewed by the fact that he is surrounded by super-smart people. He may have spent less time thinking about intelligence than Scott has (intelligence being something of a preoccupation in the rationalist community, and I suspect less so in Tao’s circles).

But I can’t quite agree with your framing of the question: that is, I am not convinced that he has missed the argument Scott describes. Scott’s argument just says: one person who’s incredibly good at mathematics and incredibly good at Raven’s matrices is evidence that being exceptionally good at Raven’s matrices is important for being incredibly good at mathematics. But one can accept that but still find that, overall, the evidence suggests that the connection (beyond a certain level of Raven’s-matrices skill) isn’t all that strong. (For instance, knowing a few people who are first-rate research mathematicians and not particularly good at super-hard Raven problems is pretty good evidence the other way.) And it seems to me, taking the outside view, that if I have to guess whether Terry Tao or Jonah Sinick has seen more evidence about what makes really exceptional professional mathematicians then I’m going to go with Terry Tao.

(My understanding is that you have spent a lot of time working with extremely able school pupils on Olympiad-ish mathematics—my apologies if I’ve got the wrong end of the stick—and my own experience suggests that Olympiad problems are substantially closer to pure g-tests than doing actual mathematical research. Have you considered the possibility that your estimate of the relationship between IQ-style intelligence and success in pure mathematical research may be biased by this?)
- JonahS 18 Feb 2015 22:56 UTC
  9 points
  Parent
  
  it seems equally plausible to me that verbal SAT results just don’t depend all that strongly on the kind of pattern-spotting tested by the really hard Raven matrices.
  
  I agree, there’s still some effect though.
  
  I can think of several
  
  The things that you list seem to me closely related to my second suggestion under “Is this all depressing?”, e.g. I think that one factor that plays into “the political correctness gone mad” on this point is people want to believe that life is more fair than it actually is (for reasons overlapping somewhat with the reasons for the just-world fallacy).
  
  And it seems to me, taking the outside view, that if I have to guess whether Terry Tao or Jonah Sinick has seen more evidence about what makes really exceptional professional mathematicians then I’m going to go with Terry Tao.
  
  I would agree, if not for the fact that I’m drawing on many sources (as I described in the introduction of my last post). Some mathematicians more successful than Tao hold a contrary position.
  
  (My understanding is that you have spent a lot of time working with extremely able school pupils on Olympiad-ish mathematics—my apologies if I’ve got the wrong end of the stick—and my own experience suggests that Olympiad problems are substantially closer to pure g-tests than doing actual mathematical research. Have you considered the possibility that your estimate of the relationship between IQ-style intelligence and success in pure mathematical research may be biased by this?)
  
  Your interpretation is very understandable. I wrote a blog post back in October 2010 implicitly expressing a position similar to your own.
  
  What started to change my thinking was point (3) of Carl’s Shulman’s response to my post. At the time, I was unaware of the phenomenon that he described: that performance on one task is often highly predictive of performance on an apparently unrelated task.
  
  Using a simple machine learning model, I found that amongst International Mathematics Olympiad contestants, those who went on to earn Fields medals and similar prizes had ~5x as great a priori odds relative to the average contestant, based on their IMO scores alone. The effect becomes even more pronounced when one weights prize winners by the significance of their work: for example, Perelman was one of only three perfect scorers in 1982. It doesn’t necessarily agree with the inside view intuition that I’ve formed talking with lots of mathematicians, but the existence of a robust effect is unambiguous. I’ll make a post going into detail later.
  - gjm 19 Feb 2015 14:57 UTC
    5 points
    Parent
    
    Some mathematicians more successful than Tao hold a contrary position.
    
    There aren’t a lot of mathematicians more successful than Tao. I suppose he hasn’t won the Abel Prize yet. (Looking at the list of winners, it looks as if that one tends to go to older mathematicians in recognition of their lifetime’s great achievements. The youngest winner was Gromov: born 1943, Abel Prize in 2009.) Could tyou name some of the mathematicians you have in mind (and, even better, point us at what they’ve said on the subject)?
    - JonahS 20 Feb 2015 9:29 UTC
      5 points
      Parent
      
      There aren’t a lot of mathematicians more successful than Tao.
      
      I was referring to successful research as opposed to success at winning prizes.
      
      The connection between prizes and quality of research comes apart for a variety of reasons: arbitrary age restrictions (in both directions), ceiling effects (many prizes are awarded once a year independently of the quality of research of potential prize recipients), individual idiosyncrasies of the people on the committees that award prizes, etc.
      
      One mathematician more accomplished than Tao is Robert Langlands, known for the so-called “Langlands Program.”
      
      The program provides a long sought after vast generalization of the Artin reciprocity law (giving a conjectural answer to a 40 year old question). The Artin reciprocity law was in turn a far-reaching generalization of quadratic reciprocity, which Gauss referred to as “theorema aureum” (the golden theorem).
      Three Fields medals have been awarded for work in the area, to Vladimir Drinfeld, Laurent Lafforgue and Ngô Bảo Châu.
      One special case (proved by Langlands) was a crucial ingredient in Andrew Wiles’ proof of Fermat’s last theorem.
      
      In this essay, Langlands wrote:
      
      I would very much have liked to continue reflecting on renormalization, but to attack it seriously requires not only enormous mathematical strength but also broad, concrete experience in the various domains mentioned, fluid mechanics, statistical mechanics, quantum field theory. The first only God can give; the second requires a lifetime to acquire.
      
      I know this is only a single example – it’s hard to find examples of mathematicians writing about the nature of mathematical talent in the public domain altogether. But I’ll try to provide more later.
      - gjm 20 Feb 2015 10:25 UTC
        7 points
        Parent
        I wasn’t meaning to imply that you define success in terms of prizes (and, for that matter, neither do I). I agree that Langlands is a more important mathematician than Tao. But that’s a hell of a bar to clear. (Also, speaking of age effects, I remark that if you define mathematical success in terms of what one has achieved to date and its demonstrated influence in mathematics generally, you’re inevitably going to prefer older mathematicians—Langlands is 78 to Tao’s 40ish—and that’s going to affect what biases they have affecting their ideas about intelligence, native talent, etc.)
        
        The quotation from Langlands that you give is not affirming the same thing as Tao is denying (though it’s possible that Tao would in fact deny it if asked), in at least two ways.
        
        It refers to “mathematical strength” rather than “intelligence”. The assertion Tao made that you were disagreeing with was that you can be an exceptional mathematician without having exceptional intelligence, which is not the same thing as saying that you can be an exceptional mathematician without having exceptional “mathematical strength”.
        It refers to a single particular mathematical/physical problem. It’s perfectly consistent to believe (1) that you can be an exceptional mathematician without exceptional intelligence (or exceptional “mathematical strength”) but (2) that if you’re going to try, you should work on something other than renormalization.
        
        For the avoidance of doubt, I won’t be terribly surprised if it turns out that (say) 75% of world-class mathematicians think top 0.1% IQ is necessary to be a top 0.1% mathematician, but I’m not sure you’ve made much of a case yet. I’d be a little more surprised if it were 75% of world-class mathematicians who have put as much thought into the question as Tao has; I’ve no idea how much Langlands has actually thought about the question, but a throwaway aside in an essay about something else isn’t necessarily the product of deep thought.
        
        I’ll briefly state my own opinions on all this, not that there’s any reason why anyone should care. I’ll use the term “R” for the particular thing that, e.g., Raven’s matrices test. I think it’s obvious that, all else being equal, more of any cognitive strength is better for mathematical success; more R is always going to be an advantage. So, of course, are other cognitive strengths, and other attributes such as love of mathematics, capacity for hard work, etc. They all doubtless correlate somewhat with R. So the question is something like: at a given overall level of rarity-of-useful-attributes, what things does mathematical success depend on most strongly? If you’re looking at a population for which some particular attribute is extreme, then prima facie you would expect that attribute to drop off in importance by this criterion as it gets more extreme, because of how rarity increases. I think it’s uncontroversial that to achieve any kind of mathematical success it’s almost essential to have a pretty good R. Beyond a certain level—let’s say roughly corresponding to a measured IQ of 140 or so—I think the importance of different cognitive strengths starts to depend a lot on what kind of mathematics you’re doing. For instance, I would guess that combinatorialists tend to have higher IQ than differential geometers. That doesn’t mean that combinatorialists are smarter than differential geometers, for two reasons. (1) I think equating R with smartness becomes less sensible at the highest levels; extreme R is a specialized talent and in terms of (e.g.) practical success, giving an immediate impression of great brainpower, etc., I strongly suspect that at a given overall level of rarity you’re going to do better with very high R plus very high other things (e.g., Scott-style verbalish reasoning) than with extreme R optimized for acing IQ tests. (2) The particular bundle of talents needed for great success in differential geometry is probably about as rare, and about as clearly constitutive of “smartness” by any reasonable standard, as the particular bundle needed for great success in combinatorics. And beyond a certain level—whose rarity maybe corresponds to a measured IQ of 150 or so—at any given level of rarity I bet other attributes that aren’t exactly cognitive strengths matter more than all those cognitive strengths do. So if you stratify prospective mathematicians by IQ (which of course is very similar to R), I would expect a picture like this. IQ ⇐ 130: mathematical success very very strongly dependent on R. IQ ⇐ 150: mathematical success correlated with R, but other cognitive strengths matter more in many fields of mathematics. IQ > 150: mathematical success mostly determined by other cognitive and not-so-cognitive strengths.
        
        I conjecture that the above is broadly consistent with your evidence. I think it is also consistent (with a little interpretive licence—e.g., his idea of what constitutes exceptional intelligence is probably skewed upwards relative to most people’s) with what Tao has said. I am willing to be convinced that I’m wrong on either point.
        JonahS 20 Feb 2015 18:38 UTC
        3 points
        Parent
        Thanks for the detailed comment.
        
        I don’t think that exceptional intelligence is either necessary or sufficient to be an exceptional mathematician. Tao’s statement “But an exceptional amount of intelligence has almost no bearing on whether one is an exceptional mathematician.” is a very strong statement: if he had said “plays only a moderate role in whether one is an exceptional mathematician” he would have been on much more solid ground.
        
        I agree that the Langlands quote is by itself not strong evidence against Tao’s assertion for the reasons that you give, but it’s still evidence. I’m relying on many weak arguments. I’ll gradually flesh them out in my sequence.
        
        I share your intuition re: combinatorialists vs.geometers. One of my friends spent a lot of time with Chern, who struck him as being quite ordinary with respect to R, while being exceptional on a number of other dimensions. Grothendieck’s self-assessment suggests that it is in fact possible to be amongst the greatest mathematicians without exceptional R.
        
        A key point that you might be missing (certainly I did for many years) is that there just aren’t many people of exceptional intelligence. Suppose that it were true that IQ is normally distributed: then the number of people of IQ 145+ would be 60x larger than the number of people of IQ 160+. Under this hypothesis, even if only 1 in 20 exceptional mathematicians had IQ 160+, that would mean that people in that range were 3x as likely as their IQ 145+ counterparts. to become exceptional mathematicians. It’s been suggested that the distribution of IQ is in fact fat-tailed because of assortative mating, and this blunts the force of the aforementioned argument, but it’s also true that more than 5% of exceptional mathematicians have IQ 160+: I think the actual figure is closer to 50%.
        
        Quill_McGee 20 Feb 2015 19:02 UTC
        1 point
        Parent
        It should be noted that if measured IQ is fat-tailed, this is because there is something wrong with IQ tests. IQ is defined to be normally distributed with a mean of 100 and a standard deviation of either 15 or 16 depending on which definition you’re using. So if measured IQ is fat-tailed, then the tests aren’t calibrated properly(of course, if your test goes all the way up to 160, it is almost inevitably miscalibrated, because there just aren’t enough people to calibrate it with).
        JonahS 20 Feb 2015 19:12 UTC
        4 points
        Parent
        You don’t want to force a normal distribution on the data. You’re free to do so if you’d like, e.g. by asking takers millions of questions so as to get very fine levels of granularity, and then mapping people at the 84th percentile of “questions answered correctly” to IQ 115, people at the 98th percentile to IQ 130, etc.
        
        But what you really want is a situation where you have a (log)-linear relationship between standard deviations and other things that IQ correlates with, and if you force the data to obey a normal distribution, you’ll lose this.
        
        The rationale for using a normal distribution is the central limit theorem, but that holds only when the summands are uncorrelated: assortative mating can induce correlations between e.g. having gene A that increases IQ and having gene B that increases IQ.
        Lumifer 20 Feb 2015 19:48 UTC
        1 point
        Parent
        
        But what you really want is a situation where you have a (log)-linear relationship between standard deviations and other things that IQ correlates with, and if you force the data to obey a normal distribution, you’ll lose this.
        
        Could you expand on this point? I am not sure I follow it.
        JonahS 20 Feb 2015 20:26 UTC
        1 point
        Parent
        Say that you have a function f: rawScores ---> percentiles and you want to compose it with a function g: percentiles ---> IQ scores so that log(g(f(x))) is as correlated with things that you care about other than IQ as much as possible (income, the log odds ratio of winning a Fields medal, etc.).
        
        The default choice for g would be the function that takes a percentile to the associated standard deviation under a normal distribution. I’m claiming that the best choice for g is probably instead a function that takes a percentile to the associated standard deviation under a distribution that has fatter tails than the normal distribution.
        
        The intuition is:
        
        Measures of the practical significance of IQ are plausibly best modeled as a weighted average of many individual genes that increase IQ. If people had been mating with randomly selected members of the opposite sex, the probabilities of getting two such genes would be independent. But in practice, people (weakly) tend to marry people of intelligence similar to their own (link), inducing a positive correlation between the respective probabilities of a child getting two different genes that contribute to IQ.
        Lumifer 20 Feb 2015 21:08 UTC
        1 point
        Parent
        
        is as correlated with things that you care about other than IQ as much as possible (income, the log odds ratio of winning a Fields medal, etc.)
        
        First question: do you actually care about correlation (given that it’s a linear metric) or do you mean some tight dependency, not necessarily linear?
        
        Second question: if that is the case, don’t you want your function g to produce a distribution shaped similarly to the “thing you care about”? If that thing-you-care-about is distributed normally, you would prefer g to generate a normal(-looking) distribution, if it’s distributed, say, as chi-squared, you would prefer g to give you a chi-squared shape, etc...?
        
        Measures of the practical significance of IQ are plausibly best modeled as a weighted average of many individual genes that increase IQ.
        
        That’s an iffy approach. Take, say, income (as a measure of the practical significance of IQ) -- are you saying income is best modeled as a weighted average of many IQ-related genes? You need the concept (and the link) of IQ to identify these genes to start with, but then you want to throw IQ out and go straight from genes to “practical” outcomes.
        
        I agree that assortative mating would lead to a fat-tailed distribution, but your original goal was make IQ correlate with “things you care about” and for that purpose the fat tails are not particularly relevant.
        Expand this thread
        JonahS 20 Feb 2015 21:47 UTC
        1 point
        Parent
        
        First question: do you actually care about correlation (given that it’s a linear metric) or do you mean some tight dependency, not necessarily linear?
        
        If g(y) is monotonic , then the degree to which there’s a right dependency is independent of g(y), which is just a change of coordinates. I do want to chose g(y) maximize the degree to which the dependency is a linear one.
        
        Second question: if that is the case, don’t you want your function g to produce a distribution shaped similarly to the “thing you care about”? If that thing-you-care-about is distributed normally, you would prefer g to generate a normal(-looking) distribution, if it’s distributed, say, as chi-squared, you would prefer g to give you a chi-squared shape, etc...?
        
        Yes, this is true and a good point, though the distribution of “the thing we care about” will vary from thing to thing, and I think that if we have to used a fixed distribution for IQ that’s uniform over all of them, the log of a fat-tailed distribution is probably the best choice.
        
        are you saying income is best modeled as a weighted average of many IQ-related genes?
        
        Here I’m just adopting an Occamistic approach – I don’t have high confidence – I’m just using a linear model because it’s the simplest possible function from genes to outcomes that are correlated with IQ. Feel free to suggest an alternative.
        
        I agree that assortative mating would lead to a fat-tailed distribution, but your original goal was make IQ correlate with “things you care about” and for that purpose the fat tails are not particularly relevant.
        
        Suppose, hypothetically, that human brains were such that IQ was capped at 145 by present day standards (e.g. because unbeknownst to us babies with IQ above that threshold died in childbirth for some reason having to do with IQ genes) . Then if we were to choose g(y) to get a normal distribution, it would look like the correlation between IQ and real world outcomes vanishes after 145, whereas the actual situation would be that the people who scored above 144 have essentially the same genetic composition (with respect to IQ) as the people who scored 144, so that “IQ doesn’t yield returns past 145” would be connotatively misleading.
        
        I’m saying that defining IQ so that it’s normally distributed has a similar (though much smaller) connotatively distortionary effect similar to this one.
        Lumifer 23 Feb 2015 18:12 UTC
        1 point
        Parent
        
        Suppose, hypothetically, that human brains were such that IQ was capped at 145 … it would look like the correlation between IQ and real world outcomes vanishes after 145
        
        In your hypothetical there would be a lot of warning signs—for example all IQs above 145 would be random, that is, re-testing IQs above 145 would produce a random draw from the appropriate distribution tail.
        
        And I suspect that it should be possible to figure out real-world distributions (the fatness of the tails, in particular), by looking at raw, non-normalized test scores.
        JonahS 23 Feb 2015 18:20 UTC
        1 point
        Parent
        Yes, you and I are on the same page, I was just saying that IQ shouldn’t be defined to be normally distributed.
        dxu 20 Feb 2015 16:39 UTC
        0 points
        Parent
        Would you characterize this post as a reasonable description of what you’re talking about in your discussion of “R”?
        gjm 20 Feb 2015 17:19 UTC
        0 points
        Parent
        Yes, that’s the guts of it.