Lately I’ve been feeling particularly incompetent mathematically, to the point that I question how much of a future I have in the subject. Therefore I quite often wonder what mathematical ability is all about, and I look forward to hearing if your perspective gels with my own.
More later, but just a brief remark – I think that one issue is that the top ~200 mathematicians are of such high intellectual caliber that they’ve plucked all of the low hanging fruit and that as a result mathematicians outside of that group have a really hard time doing research that’s both interesting and original. (The standard that I have in mind here is high, but I think that as one gains perspective one starts to see that superficially original research is often much less so than it looks.) I know many brilliant people who have only done so once over an entire career.
Outside of pure math, the situation is very different – it seems to me that there’s a lot of room for “normal” mathematically talented people to do highly original work. Note for example that the Gale-Shapley theorem was considered significant enough so that Gale and Shapley were awarded a Nobel prize in economics for it, even though it’s something that a lot of mathematicians could have figured out in a few days (!!!). I think that my speed dating project is such an example, though I haven’t been presenting it in a way that’s made it clear why.
Of course, if you’re really committed to pure math in particular, my observation isn’t so helpful, but my later posts might be.
More later, but just a brief remark – I think that one issue is that the top ~200 mathematicians are of such high
intellectual caliber that they’ve plucked all of the low hanging fruit and that as a result mathematicians outside of that
group have a really hard time doing research that’s both interesting and original.
I disagree with this. I think it is a feature that all the low hanging fruit looks picked, until you pick another one. Also I am not entirely sure if there is a divide between pure math and stuff pure mathematicians would consider “applied” (e.g. causal inference, theoretical economics, ?complexity theory? etc.) other than a cultural divide.
I disagree with this. I think it is a feature that all the low hanging fruit looks picked, until you pick another one.
Maybe our difference here is semantic, or we have different standards in mind for what constitutes “fruit.” Googling you, I see that you seem to be in theoretical CS? My impression from talking with in the field people is that there is in fact a lot more low hanging fruit there.
Also I am not entirely sure if there is a divide between pure math and stuff pure mathematicians would consider “applied” (e.g. causal inference, theoretical economics, ?complexity theory? etc.) other than a cultural divide.
I strongly agree with this, which is one point that I’ll be making later in my sequence.
But the cultural divide is significant, and it seems that in practice the most mathematically talented do skew heavily toward going into “pure math” so that more low hanging fruit has been plucked in the areas that mathematicians in math departments work in. I say this based on knowledge of apples-to-apples comparisons coming from people who work on math within and outside of “pure math.” For example, Razborov’s achievements in TCS have been hugely significant, but he’s also worked in combinatorics and hasn’t had similar success there. This isn’t very much evidence – it could be that the combinatorics problems that he’s worked on are really hard, or that he’s only done it casually – but it’s still evidence, and there are other examples.
I think that one issue is that the top ~200 mathematicians are of such high intellectual caliber that they’ve plucked all of the low hanging fruit and that as a result mathematicians outside of that group have a really hard time doing research that’s both interesting and original.
And I think that anyone who makes even the slightest substantial contribution to homotopy type theory is doing interesting, original work. I think the Low-Hanging Fruit Complaint is more often a result of not knowing where there’s a hot, productive research frontier than of the universe actually lacking interesting new mathematics to uncover.
I think the Low-Hanging Fruit Complaint is more often a result of not knowing where there’s a hot, productive research frontier than of the universe actually lacking interesting new mathematics to uncover.
There’s a lot of potential for semantic differences here, and risk of talking past each other. I’ll try to be explicit. I believe that:
There are very few people who have a nontrivial probability of discovering statements about the prime numbers that are both true, that people didn’t already believe to be true, and that people find fascinating.
The same is not far from being true for all areas of math that have been mainstream for 100+ years: algebraic topology, algebraic geometry, algebraic number theory, analytic number theory, partial differential equations, Lie Groups, functional analysis, etc.
There is a lot of rich math to be discovered outside of the areas that pure mathematicians have focused on historically, and that people might find equally fascinating. In particular, I believe this to be true within the broad domain of machine learning.
There are few historical examples of mathematicians discovering interesting new fields of math without being motivated by applications.
There is a lot of rich math to be discovered outside of the areas that pure mathematicians have focused on historically, and that people might find equally fascinating. In particular, I believe this to be true within the broad domain of machine learning.
That’s largely because machine learning is in its infancy. It is still a field largely defined by three very limited approaches:
Structural Risk Minimization (support-vector machines and other approaches that use regularization to work on high-dimensional data) -- still ultimately a kind of PAC learning, and still largely making very unstructured predictions based on very unstructured data
PAC learning—even when we allow ourselves inefficient (ie: super-poly-time) PAC learning, we’re still ultimately kept stuck by the reliance on prior knowledge to generate a hypothesis class with a known, finite VC Dimension. I’ve sometimes idly pondered trying to leverage algorithmic information theory to do something like what Hutter did, and prove a fully general counter-theorem to No Free Lunch saying that when the learner can have “more information” and “more algorithmic information” (more compute-power) than the environment, the learner can then win. (On the other hand, I tend to idly ponder a lot about AIT, since it seems to be a very underappreciated field of theoretical CS that remains underappreciated because of just how much mathematical background it requires!)
Stochastic Gradient Descent, and most especially neural networks: useful in properly general environments, but doesn’t tell the learner’s programmer much anything that makes a human kind of sense. Often overfits or finds non-global minima.
To those we are rapidly adding a fourth approach, that I think has the potential to really supplant many of the others:
Probabilistic programming: fully general, more capable of giving “sensible” outputs, capable of expressing arbitrary statistical models… but really slow, and modulo an Occam’s Razor assumption, subject to the same sort of losses in adversarial environments as any other Bayesian methods. But a lot better than what was there before.
I think that one issue is that the top ~200 mathematicians are of such high intellectual caliber that they’ve plucked all of the low hanging fruit and that as a result mathematicians outside of that group have a really hard time doing research that’s both interesting and original.
Your standards seem unusually high. I can cite several highly interesting and original work by mathematicians who would most probably not be in your, or any top ~200 list. For example,
Recursively enumerable sets of polynomials over a finite field are Diophantine by Jeroen Demeyer, Inventiones mathematicae, December 2007, Volume 170, Issue 3, pp 655-670
Maximal arcs in Desarguesian planes of odd order do not exist by S. Ball, A. Blokhuis and F. Mazzocca, Combinatorica, 17 (1997) 31--41.
The blocking number of an affine space by A. Brouwer and A. Schrijver, JCT (A), 24 (1978) 251-253.
I would like to know more about the perspective you claim to have gained which makes you think this particular way.
Yes, this is true. There are a number of reasons for this, but one is an encounter with Goro Shimura back in 2008 that left an impression on me – I thought about his words for many years.
More later, but just a brief remark – I think that one issue is that the top ~200 mathematicians are of such high intellectual caliber that they’ve plucked all of the low hanging fruit and that as a result mathematicians outside of that group have a really hard time doing research that’s both interesting and original. (The standard that I have in mind here is high, but I think that as one gains perspective one starts to see that superficially original research is often much less so than it looks.) I know many brilliant people who have only done so once over an entire career.
I am so confused as to why your standard seems to be so absurdly high to me.
Is it because my particular subfield is unusually full of low-hanging fruit? Or because so few of those ~200 top mathematicians work in it?
Is it because I don’t see how “superficially original” all of the work done in my field is? I lack perspective?
I am so confused as to why your standard seems to be so absurdly high to me.
The way in which I operationalize the originality / interest of research is “in 50 years, what will the best mathematicians think about it?” I think that this perspective is unusual amongst mathematicians as a group, but not among the greatest ones. I’d be interested in how it jibs with your own.
Anyway, I think that if one adopts this perspective and takes a careful look at current research using Bayesian reasoning, one is led to the conclusion that almost all of it will be considered to be irrelevant (confidence ~80%).
When I was in grad school, I observed people proving lots of theorems in low dimensional topology that were sort of interesting to me, but it’s also my best guess that most of them will be viewed in hindsight as similar to how advanced Euclidean geometry theorems are today – along the lines of “that’s sort of pretty, but not really worthy of serious attention.”
Is it because I don’t see how “superficially original” all of the work done in my field is? I lack perspective?
How old are you?
When I started grad school, I was blown away by how much the professors could do.
A few years out of grad school, I saw that a lot of the theorems were things that it was well known to experts that it was possible to prove by using certain techniques, and that proving them was in some sense a matter of the researchers dotting their i’s and crossing their t’s.
And in situations where something seemed strikingly original, the basic idea often turned out to be due to somebody other than the author of a paper (not to say that the author plagiarized – on the contrary, the author almost always acknowledged the source of the idea – but a lot of times people don’t read the fine print well enough to notice).
For example, the Wikipedia page on Paul Vojta reads
In formulating a number of striking conjectures, he pointed out the possible existence of parallels between the Nevanlinna theory of complex analysis, and diophantine analysis. This was a novel contribution to the circle of ideas around the Mordell conjecture and abc conjecture, suggesting something of large importance to the integer solutions (affine space) aspect of diophantine equations. It has been taken up in his own work, and that of others.
I had the chance to speak with Vojta and ask how he discovered these things, and he said that his advisor Barry Mazur
suggested that investigate possible parallels between Nevanlinna theory and diophantine analysis.
Similarly, even though Andrew Wiles’ work on Fermat’s Last Theorem does seem to be regarded by experts as highly original, the conceptual framework that he used had been developed by Barry Mazur, and I would guess (weakly – Idon’t have an inside view – just extrapolating based on things that I’ve heard) that people with deep knowledge of the field would say that Mazur’s contribution to the solution of Fermat’s last theorem was more substantial than that of Wiles.
The way in which I operationalize the originality / interest of research is “in 50 years, what will the best mathematicians think about it?”
Eegads. How do you even imagine what those people will be like?
I think that this perspective is unusual amongst mathematicians as a group, but not among the greatest ones. I’d be interested in how it jibs with your own.
Sure, I don’t think anyone I know really thinks of their work that way.
Is it because I don’t see how “superficially original” all of the work done in my field is? I lack perspective?
How old are you?
29, graduating in a few months.
A few years out of grad school, I saw that a lot of the theorems were things that it was well known to experts that it was possible to prove by using certain techniques, and that proving them was in some sense a matter of the researchers dotting their i’s and crossing their t’s.
Yeah, sure, that’s the vast majority of everything I’ve done so far, and some fraction of the work my subfield puts out.
The people two or three levels above me, though, they’re putting out genuinely new stuff on the order of once every three to five years. Maybe not “the best mathematicians 50 years from now think this is amazing” stuff, but I think the tools will still be in use in the generation after mine. Similar to the way most of my toolbox was invented in the 70′s-80′s.
I don’t understand your concept of originality. It has to be created in a vacuum to be original?
In the counterfactual where Vojta doesn’t exist, does Mazur go on to write similar papers? Is that the problem?
Eegads. How do you even imagine what those people will be like?
Well, if, e.g. you’re working on a special case of an unsolved problem using an ad hoc method with applicability that’s clearly limited to that case, and you think that the problem will probably be solved in full generality with a more illuminating solution within the next 50 years, then you have good reason to believe that work along these lines has no lasting significance.
Sure, I don’t think anyone I know really thinks of their work that way.
Not consciously, but there’s a difference between doing research that you think could contribute substantially to human knowledge and research that you know won’t. I think that a lot of mathematicians’ work falls into the latter category.
This is a long conversation, but I think that there’s a major issue of the publish or perish system (together with social pressures to be respectful to one’s colleagues) leading to doublethink, where on an explicit level, people think that their own research and the research of their colleagues is interesting, because they’re trying to make the best of the situation, but where there’s a large element of belief-in-belief, and that they don’t actually enjoy doing their work or hearing about their colleagues’ work in seminars. Even when people do enjoy their work, they often don’t know what they’re missing out on by not working on things that they find most interesting on an emotional level.
The people two or three levels above me, though, they’re putting out genuinely new stuff on the order of once every three to five years. Maybe not “the best mathematicians 50 years from now think this is amazing” stuff, but I think the tools will still be in use in the generation after mine. Similar to the way most of my toolbox was invented in the 70′s-80′s.
This sounds roughly similar to what I myself believe – the differences may be semantic. I think that work can be valuable even if people don’t find it amazing. I also think that there are people outside of the top 200 mathematicians who do really interesting work of lasting historical value – just that it doesn’t happen very often. (Weil said that you can tell that somebody is a really good mathematician if he or she has made two really good discoveries, and that Mordell is a counterexample.) It’s also possible that I’d consider the people who you have in mind to be in the top 200 mathematicians even if they aren’t considered to be so broadly.
I don’t understand your concept of originality. It has to be created in a vacuum to be original?
It’s hard to convey effect sizes in words. The standard that I have in mind is “producing knowledge that significantly changes experts’ Bayesian priors” (whether it be about what mathematical facts are true, or which methods are useful in a given context, or what the best perspective on a given topic is). By “significantly changes” I mean something like “uncovers something that some experts would find surprising.”
In the counterfactual where Vojta doesn’t exist, does Mazur go on to write similar papers? Is that the problem?
I don’t have enough subject matter knowledge to know how much Vojta added beyond what Mazur suggested (it could that upon learning more I would consider his marginal contributions to be really huge). I guess in bringing up those examples I didn’t so much mean “Vojta and Wiles didn’t do original work – it had already essentially been done by Mazur” as much as “the original contributions in math are more densely concentrated in a smaller number of people than one would guess from the outside,” which in turn bears on the question of how someone should assess his or her prospects for doing genuinely original work in a given field.
I guess in bringing up those examples I didn’t so much mean “Vojta and Wiles didn’t do original work – it had already essentially been done by Mazur” as much as “the original contributions in math are more densely concentrated in a smaller number of people than one would guess from the outside,” which in turn bears on the question of how someone should assess his or her prospects for doing genuinely original work in a given field.
I agree with your assessment of things here, but I do think it’s worth taking a moment to honor people who take correct speculation and turn it into a full proof. This is useful cognitive specialization of labor, and I don’t think it makes much sense to value originality over usefulness.
Well, if, e.g. you’re working on a special case of an unsolved problem using an ad hoc method with applicability that’s clearly limited to that case, and you think that the problem will probably be solved in full generality with a more illuminating solution within the next 50 years, then you have good reason to believe that work along these lines has no lasting significance.
It is really hard to tell when an ad hoc method will turn out many years later to be a special case of some more broad technique. It may also be that the special case will still need to be done if some later method uses it for bootstrapping.
the original contributions in math are more densely concentrated in a smaller number of people than one would guess from the outside,”
I’m not sure about this at all. Have you tried talking to people who aren’t already in academia about this? As far as I can tell, they think that there are a tiny number of very smart people who are mathematicians and are surprised to find out how many there are.
It is really hard to tell when an ad hoc method will turn out many years later to be a special case of some more broad technique. It may also be that the special case will still need to be done if some later method uses it for bootstrapping.
There are questions of quantitative effect sizes. Feel free to give some examples that you find compelling.
I’m not sure about this at all. Have you tried talking to people who aren’t already in academia about this? As far as I can tell, they think that there are a tiny number of very smart people who are mathematicians and are surprised to find out how many there are.
By “from the outside” I mean “from the outside of a field” (except to the extent that you’re able to extrapolate from your own field.)
Feel free to give some examples that you find compelling.
Fermat’s Last Theorem. The proof assumed that p >=11 and so the ad hoc cases from the 19th century were necessary to round it out. Moreover, the attempt to extend those ad hoc methods lead to the entire branch of algebraic number theory.
Primes in arithmetic progressions: much of what Tao and Greenberg did here extended earlier methods in a deep systematic way that were previously somewhat ad hoc. In fact, one can see a large fraction of modern work that touches on sieves as taking essentially ad hoc sieve techniques and generalizing them.
The proof assumed that p >=11 and so the ad hoc cases from the 19th century were necessary to round it out. Moreover, the attempt to extend those ad hoc methods lead to the entire branch of algebraic number theory.
I don’t recall Wiles’ proof assuming that p >= 11 – can you give a reference? I can’t find one quickly.
The n = 3 and 4 cases were proved by Euler and Fermat. It’s prima facie evident that Euler’s proof (which introduced a new number system with no historical analog) points to the existence of an entire field of math. I find this less so of Fermat’s proof as he stated it, but Fermat is also famous for the obscurity of his writings.
I don’t know the history around the n = 5 and n = 7 cases, and so don’t know whether they were important to the development of algebraic number theory, but exploring them is a natural extension of the exploration of new kinds of number systems that Euler had initiated.
They were subsumed by Kummer’s work, which I understand to have been motivated more by a desire to understand algebraic number fields and reciprocity laws than by Fermat’s last theorem in particular. For this, he developed the theory of ideal numbers, which is very general.
Primes in arithmetic progressions: much of what Tao and Greenberg did here extended earlier methods in a deep systematic way that were previously somewhat ad hoc. In fact, one can see a large fraction of modern work that touches on sieves as taking essentially ad hoc sieve techniques and generalizing them.
Ben Green, not Greenberg :-).
Sure, but the ultimate significance of the work remains to be seen. Of course, tastes vary, and there’s an element of subjectivity, but I think that we can agree that even if there’s a case for the proof being something that people will find interesting in 50 years, that the prior in favor of it is much weaker than the prior in favor of this being the case of, e.g. the Gross-Zagier formula.
I don’t recall Wiles’ proof assuming that p >= 11 – can you give a reference? I can’t find one quickly.
I think this is in the original paper that modularity implies FLT, but I’m on vacation and don’t have a copy available to check. Does this suffice as a reference?
Ben Green, not Greenberg
Yes, thank you.
They were subsumed by Kummer’s work, which I understand to have been motivated more by a desire to understand algebraic number fields and reciprocity laws than by Fermat’s last theorem in particular. For this, he developed the theory of ideal numbers, which is very general.
Sure, but Kummer was aware of the literature before him, and almost certainly used their results to guide him.
Sure, but the ultimate significance of the work remains to be seen. Of course, tastes vary, and there’s an element of subjectivity, but I think that we can agree that even if there’s a case for the proof being something that people will find interesting in 50 years, that the prior in favor of it is much weaker than the prior in favor of this being the case of, e.g. the Gross-Zagier formula.
Agreement may there depend very strongly on how you unpack “much weaker” but I’d be inclined to agree at least weaker without the much.
The way in which I operationalize the originality / interest of research is “in 50 years, what will the best mathematicians think about it?”
How good do you consider past mathematician to have been to judge further interest 50 years down the line.
Do you think that 50 years ago mathematicians understood the significance of all findings made at the time that turned out to be significant?
“in 50 years, what will the best mathematicians think about it?”
How do you make a priori judgments on who the best mathematicians are going to be? In your opinion, what qualities/achievements would put someone in the group of best mathematicians?
Anyway, I think that if one adopts this perspective and takes a careful look at current research using Bayesian reasoning, one is led to the conclusion that almost all of it will be considered to be irrelevant (confidence ~80%).
How different would your deductions be if you were living in a different time period? How much does that depend on the areas in mathematics that you are considering in that reasoning?
More later, but just a brief remark – I think that one issue is that the top ~200 mathematicians are of such high intellectual caliber that they’ve plucked all of the low hanging fruit and that as a result mathematicians outside of that group have a really hard time doing research that’s both interesting and original.
I’m not sure what gives you this impression. In my own field (number theory) I don’t get that feeling at all. There may not be much low hanging fruit, but there’s more than enough for people who aren’t that top 200 to do very useful work.
I’d certainly defer to you in relation to subject matter knowledge (my knowledge of number theory really only extends through 1965 or so), but this is not the sense that I’ve gotten from speaking with the best number theorists.
When I met Shimura, he was extremely dismissive of contemporary number theory research, to a degree that seemed absurd to me (e.g. he characterized papers in the Annals of Mathematics as “very mediocre.”) I would ordinarily be hesitant to write about a private conversation publicly, but he freely and eagerly expresses his views freely to everyone who he meets. Have you read The Map of My Life? He’s very harsh and cranky and perhaps even paranoid, but that doesn’t undercut his track record of being an extremely fertile mathematician. I reflected on his comments and learned more over the years (after meeting with him in 2008) his position came to seem progressively more sound (to my great surprise!).
A careful reading of Langlands’ Reflexions on receiving the Shaw Prize hints that he thinks that the methods that Taylor and collaborators have been using to prove theorems such as the Sato-Tate conjecture won’t have lasting value, though he’s very guarded in how he expresses himself. I remember coming across a more recent essay where he was more explicit and forceful, but I forget where it is (somewhere on his website, sorry, I realize that this isn’t so useful). It’s not clear to me that Taylor would disagree – he may explicitly be more committed to solving problems in the near term than by creating work of lasting value.
One can speculate these views are driven by arrogance, but they’re not even that exotic outside of the set of people who have unambiguously done great work. For example, the author of the Galois Representations blog, who you probably know of, wrote in response to Jordan Ellenberg:
That said, there is a secondary argument which portrays mathematics as a grand collective endeavour to which we can all contribute. I think that this is a little unrealistic. In my perspective, the actual number of people who are advancing mathematics in any genuine sense is very low. This is not to say that there aren’t quite a number of people doing interesting mathematics. But it’s not so clear the extent to which the discovery of conceptual breakthroughs is contingent on others first making incremental progress. This may sound like a depressing view of mathematics, but I don’t find it so. Merely to be an observer in the progress of number theory is enough for me — I know how to prove Fermat’s Last Theorem, how exciting is that?
apparently implicitly characterizing his own work as insignificant. And there aren’t very many number theorists as capable as him.
Replying separately so it isn’t missed. I wonder also how much of these issues is the two cultures problem that Gowers talks about. The top people conception seems to at least lean heavily into the theory-builder side.
I agree with you, and there are strong problem solver types who conceptualize mathematical value in a different way from the people who I’ve quoted and from myself.
Still, there are some situations where one has apples-to-apples comparisons.
There’s a large body of work giving unconditional proofs of theorems that would follow from the Riemann hypothesis and its generalizations. Many problem solvers would agree that a proof of the Riemann hypothesis and its generalizations would be more valuable than all of this work combined.
We don’t yet know how or when the Riemann hypothesis will be proved. But suppose that it turns out that Alain’s Connes’ approach using noncommutative geometry (which seems most promising right now, though I don’t know how promising) turns out to be possible to implement over the next 40 years or so. In this hypothetical. What attitude do you think that problem solvers would take to the prior unconditional proofs of consequences of RH?
I agree with you, and there are strong problem solver types who conceptualize mathematical value in a different way from the people who I’ve quoted and from myself.
Yes, but at the same time (against my earlier point) the best problem solvers are finding novel techniques that can be then applied to a variety of different problems- that’s essentially what Gowers seems to be focusing on.
There’s a large body of work giving unconditional proofs of theorems that would follow from the Riemann hypothesis and its generalizations. Many problem solvers would agree that a proof of the Riemann hypothesis and its generalizations would be more valuable than all of this work combined.
I’m not sure I’d agree with that, but I feel like I’m in the middle of the two camps so maybe I’m not relevant? All those other results tell us what to believe well before we actually have a proof of RH. So that’s at least got to count for something. It may be true that a proof of GRH would be that much more useful, but GRH is a much broader idea. Note also that part of the point of proving things under RH is to then try and prove the same statements with weaker or no assumptions, and that’s a successful process.
What attitude do you think that problem solvers would take to the prior unconditional proofs of consequences of RH?
I’m not sure. Can you expand on what you think would be their attitude?
I’d certainly defer to you in relation to subject matter knowledge (my knowledge of number theory really only extends through 1965 or so), but this is not the sense that I’ve gotten from speaking with the best number theorists.
I’ll volunteer another reason not to necessarily pay attention to my viewpoint: I’m pretty clearly one of those weaker mathematicians, so I have obvious motivations for seeing all of that side work as relevant.
I suspect that one can get similar viewpoints from people who more or less think the opposite but that they aren’t very vocal because it is closer to being a default viewpoint, but my evidence for this is very weak. It is also worth noting that when one does read papers by the top named people, they often cite papers from people who clearly aren’t in that top, using little constructions or generalizing bits or the like.
I’ll volunteer another reason not to necessarily pay attention to my viewpoint: I’m pretty clearly one of those weaker mathematicians, so I have obvious motivations for seeing all of that side work as relevant.
I’ll note that I think that there are people other than top researchers who have contributed enormously to the mathematical community through things other than research. For example, John Baez is listed amongst the mathematicians who influenced MathOverflow participants the most, in the same range as Fields medalists and historical greats, based on his expository contributions.
It is also worth noting that when one does read papers by the top named people, they often cite papers from people who clearly aren’t in that top, using little constructions or generalizing bits or the like.
Yes, this is true and a good point. It can serve as a starting point for estimating effect sizes.
I’m not qualified to say judge the accuracy of these claims, but I was speaking with a PhD in physics who said that he thought that only ~50 people in theoretical physics were doing anything important.
More later, but just a brief remark – I think that one issue is that the top ~200 mathematicians are of such high intellectual caliber that they’ve plucked all of the low hanging fruit and that as a result mathematicians outside of that group have a really hard time doing research that’s both interesting and original. (The standard that I have in mind here is high, but I think that as one gains perspective one starts to see that superficially original research is often much less so than it looks.) I know many brilliant people who have only done so once over an entire career.
Outside of pure math, the situation is very different – it seems to me that there’s a lot of room for “normal” mathematically talented people to do highly original work. Note for example that the Gale-Shapley theorem was considered significant enough so that Gale and Shapley were awarded a Nobel prize in economics for it, even though it’s something that a lot of mathematicians could have figured out in a few days (!!!). I think that my speed dating project is such an example, though I haven’t been presenting it in a way that’s made it clear why.
Of course, if you’re really committed to pure math in particular, my observation isn’t so helpful, but my later posts might be.
I disagree with this. I think it is a feature that all the low hanging fruit looks picked, until you pick another one. Also I am not entirely sure if there is a divide between pure math and stuff pure mathematicians would consider “applied” (e.g. causal inference, theoretical economics, ?complexity theory? etc.) other than a cultural divide.
Maybe our difference here is semantic, or we have different standards in mind for what constitutes “fruit.” Googling you, I see that you seem to be in theoretical CS? My impression from talking with in the field people is that there is in fact a lot more low hanging fruit there.
I strongly agree with this, which is one point that I’ll be making later in my sequence.
But the cultural divide is significant, and it seems that in practice the most mathematically talented do skew heavily toward going into “pure math” so that more low hanging fruit has been plucked in the areas that mathematicians in math departments work in. I say this based on knowledge of apples-to-apples comparisons coming from people who work on math within and outside of “pure math.” For example, Razborov’s achievements in TCS have been hugely significant, but he’s also worked in combinatorics and hasn’t had similar success there. This isn’t very much evidence – it could be that the combinatorics problems that he’s worked on are really hard, or that he’s only done it casually – but it’s still evidence, and there are other examples.
Let’s say I am at an intersection of foundations of statistics and philosophy (?).
The (?) proves you right about the philosophy part.
The (?) was meant to apply to the conjunction, not the latter term alone.
And I think that anyone who makes even the slightest substantial contribution to homotopy type theory is doing interesting, original work. I think the Low-Hanging Fruit Complaint is more often a result of not knowing where there’s a hot, productive research frontier than of the universe actually lacking interesting new mathematics to uncover.
I partially respond to this here.
There’s a lot of potential for semantic differences here, and risk of talking past each other. I’ll try to be explicit. I believe that:
There are very few people who have a nontrivial probability of discovering statements about the prime numbers that are both true, that people didn’t already believe to be true, and that people find fascinating.
The same is not far from being true for all areas of math that have been mainstream for 100+ years: algebraic topology, algebraic geometry, algebraic number theory, analytic number theory, partial differential equations, Lie Groups, functional analysis, etc.
There is a lot of rich math to be discovered outside of the areas that pure mathematicians have focused on historically, and that people might find equally fascinating. In particular, I believe this to be true within the broad domain of machine learning.
There are few historical examples of mathematicians discovering interesting new fields of math without being motivated by applications.
That’s largely because machine learning is in its infancy. It is still a field largely defined by three very limited approaches:
Structural Risk Minimization (support-vector machines and other approaches that use regularization to work on high-dimensional data) -- still ultimately a kind of PAC learning, and still largely making very unstructured predictions based on very unstructured data
PAC learning—even when we allow ourselves inefficient (ie: super-poly-time) PAC learning, we’re still ultimately kept stuck by the reliance on prior knowledge to generate a hypothesis class with a known, finite VC Dimension. I’ve sometimes idly pondered trying to leverage algorithmic information theory to do something like what Hutter did, and prove a fully general counter-theorem to No Free Lunch saying that when the learner can have “more information” and “more algorithmic information” (more compute-power) than the environment, the learner can then win. (On the other hand, I tend to idly ponder a lot about AIT, since it seems to be a very underappreciated field of theoretical CS that remains underappreciated because of just how much mathematical background it requires!)
Stochastic Gradient Descent, and most especially neural networks: useful in properly general environments, but doesn’t tell the learner’s programmer much anything that makes a human kind of sense. Often overfits or finds non-global minima.
To those we are rapidly adding a fourth approach, that I think has the potential to really supplant many of the others:
Probabilistic programming: fully general, more capable of giving “sensible” outputs, capable of expressing arbitrary statistical models… but really slow, and modulo an Occam’s Razor assumption, subject to the same sort of losses in adversarial environments as any other Bayesian methods. But a lot better than what was there before.
What do you mean by people find fascinating and how many people? It seems like a lot of work in your first bullet point is the last three words.
Upvoted for being specific.
Your standards seem unusually high. I can cite several highly interesting and original work by mathematicians who would most probably not be in your, or any top ~200 list. For example,
Recursively enumerable sets of polynomials over a finite field are Diophantine by Jeroen Demeyer, Inventiones mathematicae, December 2007, Volume 170, Issue 3, pp 655-670
Maximal arcs in Desarguesian planes of odd order do not exist by S. Ball, A. Blokhuis and F. Mazzocca, Combinatorica, 17 (1997) 31--41.
The blocking number of an affine space by A. Brouwer and A. Schrijver, JCT (A), 24 (1978) 251-253.
I would like to know more about the perspective you claim to have gained which makes you think this particular way.
Yes, this is true. There are a number of reasons for this, but one is an encounter with Goro Shimura back in 2008 that left an impression on me – I thought about his words for many years.
I’ll write more tomorrow.
I am so confused as to why your standard seems to be so absurdly high to me.
Is it because my particular subfield is unusually full of low-hanging fruit? Or because so few of those ~200 top mathematicians work in it?
Is it because I don’t see how “superficially original” all of the work done in my field is? I lack perspective?
Anyway, this is really weird.
The way in which I operationalize the originality / interest of research is “in 50 years, what will the best mathematicians think about it?” I think that this perspective is unusual amongst mathematicians as a group, but not among the greatest ones. I’d be interested in how it jibs with your own.
Anyway, I think that if one adopts this perspective and takes a careful look at current research using Bayesian reasoning, one is led to the conclusion that almost all of it will be considered to be irrelevant (confidence ~80%).
When I was in grad school, I observed people proving lots of theorems in low dimensional topology that were sort of interesting to me, but it’s also my best guess that most of them will be viewed in hindsight as similar to how advanced Euclidean geometry theorems are today – along the lines of “that’s sort of pretty, but not really worthy of serious attention.”
How old are you?
When I started grad school, I was blown away by how much the professors could do.
A few years out of grad school, I saw that a lot of the theorems were things that it was well known to experts that it was possible to prove by using certain techniques, and that proving them was in some sense a matter of the researchers dotting their i’s and crossing their t’s.
And in situations where something seemed strikingly original, the basic idea often turned out to be due to somebody other than the author of a paper (not to say that the author plagiarized – on the contrary, the author almost always acknowledged the source of the idea – but a lot of times people don’t read the fine print well enough to notice).
For example, the Wikipedia page on Paul Vojta reads
I had the chance to speak with Vojta and ask how he discovered these things, and he said that his advisor Barry Mazur suggested that investigate possible parallels between Nevanlinna theory and diophantine analysis.
Similarly, even though Andrew Wiles’ work on Fermat’s Last Theorem does seem to be regarded by experts as highly original, the conceptual framework that he used had been developed by Barry Mazur, and I would guess (weakly – Idon’t have an inside view – just extrapolating based on things that I’ve heard) that people with deep knowledge of the field would say that Mazur’s contribution to the solution of Fermat’s last theorem was more substantial than that of Wiles.
Eegads. How do you even imagine what those people will be like?
Sure, I don’t think anyone I know really thinks of their work that way.
29, graduating in a few months.
Yeah, sure, that’s the vast majority of everything I’ve done so far, and some fraction of the work my subfield puts out.
The people two or three levels above me, though, they’re putting out genuinely new stuff on the order of once every three to five years. Maybe not “the best mathematicians 50 years from now think this is amazing” stuff, but I think the tools will still be in use in the generation after mine. Similar to the way most of my toolbox was invented in the 70′s-80′s.
I don’t understand your concept of originality. It has to be created in a vacuum to be original?
In the counterfactual where Vojta doesn’t exist, does Mazur go on to write similar papers? Is that the problem?
Well, if, e.g. you’re working on a special case of an unsolved problem using an ad hoc method with applicability that’s clearly limited to that case, and you think that the problem will probably be solved in full generality with a more illuminating solution within the next 50 years, then you have good reason to believe that work along these lines has no lasting significance.
Not consciously, but there’s a difference between doing research that you think could contribute substantially to human knowledge and research that you know won’t. I think that a lot of mathematicians’ work falls into the latter category.
This is a long conversation, but I think that there’s a major issue of the publish or perish system (together with social pressures to be respectful to one’s colleagues) leading to doublethink, where on an explicit level, people think that their own research and the research of their colleagues is interesting, because they’re trying to make the best of the situation, but where there’s a large element of belief-in-belief, and that they don’t actually enjoy doing their work or hearing about their colleagues’ work in seminars. Even when people do enjoy their work, they often don’t know what they’re missing out on by not working on things that they find most interesting on an emotional level.
This sounds roughly similar to what I myself believe – the differences may be semantic. I think that work can be valuable even if people don’t find it amazing. I also think that there are people outside of the top 200 mathematicians who do really interesting work of lasting historical value – just that it doesn’t happen very often. (Weil said that you can tell that somebody is a really good mathematician if he or she has made two really good discoveries, and that Mordell is a counterexample.) It’s also possible that I’d consider the people who you have in mind to be in the top 200 mathematicians even if they aren’t considered to be so broadly.
It’s hard to convey effect sizes in words. The standard that I have in mind is “producing knowledge that significantly changes experts’ Bayesian priors” (whether it be about what mathematical facts are true, or which methods are useful in a given context, or what the best perspective on a given topic is). By “significantly changes” I mean something like “uncovers something that some experts would find surprising.”
I don’t have enough subject matter knowledge to know how much Vojta added beyond what Mazur suggested (it could that upon learning more I would consider his marginal contributions to be really huge). I guess in bringing up those examples I didn’t so much mean “Vojta and Wiles didn’t do original work – it had already essentially been done by Mazur” as much as “the original contributions in math are more densely concentrated in a smaller number of people than one would guess from the outside,” which in turn bears on the question of how someone should assess his or her prospects for doing genuinely original work in a given field.
I agree with your assessment of things here, but I do think it’s worth taking a moment to honor people who take correct speculation and turn it into a full proof. This is useful cognitive specialization of labor, and I don’t think it makes much sense to value originality over usefulness.
It is really hard to tell when an ad hoc method will turn out many years later to be a special case of some more broad technique. It may also be that the special case will still need to be done if some later method uses it for bootstrapping.
I’m not sure about this at all. Have you tried talking to people who aren’t already in academia about this? As far as I can tell, they think that there are a tiny number of very smart people who are mathematicians and are surprised to find out how many there are.
There are questions of quantitative effect sizes. Feel free to give some examples that you find compelling.
By “from the outside” I mean “from the outside of a field” (except to the extent that you’re able to extrapolate from your own field.)
Yes, and I’m not sure how to measure that.
Fermat’s Last Theorem. The proof assumed that p >=11 and so the ad hoc cases from the 19th century were necessary to round it out. Moreover, the attempt to extend those ad hoc methods lead to the entire branch of algebraic number theory.
Primes in arithmetic progressions: much of what Tao and Greenberg did here extended earlier methods in a deep systematic way that were previously somewhat ad hoc. In fact, one can see a large fraction of modern work that touches on sieves as taking essentially ad hoc sieve techniques and generalizing them.
I don’t recall Wiles’ proof assuming that p >= 11 – can you give a reference? I can’t find one quickly.
The n = 3 and 4 cases were proved by Euler and Fermat. It’s prima facie evident that Euler’s proof (which introduced a new number system with no historical analog) points to the existence of an entire field of math. I find this less so of Fermat’s proof as he stated it, but Fermat is also famous for the obscurity of his writings.
I don’t know the history around the n = 5 and n = 7 cases, and so don’t know whether they were important to the development of algebraic number theory, but exploring them is a natural extension of the exploration of new kinds of number systems that Euler had initiated.
They were subsumed by Kummer’s work, which I understand to have been motivated more by a desire to understand algebraic number fields and reciprocity laws than by Fermat’s last theorem in particular. For this, he developed the theory of ideal numbers, which is very general.
Ben Green, not Greenberg :-).
Sure, but the ultimate significance of the work remains to be seen. Of course, tastes vary, and there’s an element of subjectivity, but I think that we can agree that even if there’s a case for the proof being something that people will find interesting in 50 years, that the prior in favor of it is much weaker than the prior in favor of this being the case of, e.g. the Gross-Zagier formula.
I think this is in the original paper that modularity implies FLT, but I’m on vacation and don’t have a copy available to check. Does this suffice as a reference?
Yes, thank you.
Sure, but Kummer was aware of the literature before him, and almost certainly used their results to guide him.
Agreement may there depend very strongly on how you unpack “much weaker” but I’d be inclined to agree at least weaker without the much.
How good do you consider past mathematician to have been to judge further interest 50 years down the line. Do you think that 50 years ago mathematicians understood the significance of all findings made at the time that turned out to be significant?
How do you make a priori judgments on who the best mathematicians are going to be? In your opinion, what qualities/achievements would put someone in the group of best mathematicians?
How different would your deductions be if you were living in a different time period? How much does that depend on the areas in mathematics that you are considering in that reasoning?
I’m not sure what gives you this impression. In my own field (number theory) I don’t get that feeling at all. There may not be much low hanging fruit, but there’s more than enough for people who aren’t that top 200 to do very useful work.
I’d certainly defer to you in relation to subject matter knowledge (my knowledge of number theory really only extends through 1965 or so), but this is not the sense that I’ve gotten from speaking with the best number theorists.
When I met Shimura, he was extremely dismissive of contemporary number theory research, to a degree that seemed absurd to me (e.g. he characterized papers in the Annals of Mathematics as “very mediocre.”) I would ordinarily be hesitant to write about a private conversation publicly, but he freely and eagerly expresses his views freely to everyone who he meets. Have you read The Map of My Life? He’s very harsh and cranky and perhaps even paranoid, but that doesn’t undercut his track record of being an extremely fertile mathematician. I reflected on his comments and learned more over the years (after meeting with him in 2008) his position came to seem progressively more sound (to my great surprise!).
A careful reading of Langlands’ Reflexions on receiving the Shaw Prize hints that he thinks that the methods that Taylor and collaborators have been using to prove theorems such as the Sato-Tate conjecture won’t have lasting value, though he’s very guarded in how he expresses himself. I remember coming across a more recent essay where he was more explicit and forceful, but I forget where it is (somewhere on his website, sorry, I realize that this isn’t so useful). It’s not clear to me that Taylor would disagree – he may explicitly be more committed to solving problems in the near term than by creating work of lasting value.
One can speculate these views are driven by arrogance, but they’re not even that exotic outside of the set of people who have unambiguously done great work. For example, the author of the Galois Representations blog, who you probably know of, wrote in response to Jordan Ellenberg:
apparently implicitly characterizing his own work as insignificant. And there aren’t very many number theorists as capable as him.
Replying separately so it isn’t missed. I wonder also how much of these issues is the two cultures problem that Gowers talks about. The top people conception seems to at least lean heavily into the theory-builder side.
I agree with you, and there are strong problem solver types who conceptualize mathematical value in a different way from the people who I’ve quoted and from myself.
Still, there are some situations where one has apples-to-apples comparisons.
There’s a large body of work giving unconditional proofs of theorems that would follow from the Riemann hypothesis and its generalizations. Many problem solvers would agree that a proof of the Riemann hypothesis and its generalizations would be more valuable than all of this work combined.
We don’t yet know how or when the Riemann hypothesis will be proved. But suppose that it turns out that Alain’s Connes’ approach using noncommutative geometry (which seems most promising right now, though I don’t know how promising) turns out to be possible to implement over the next 40 years or so. In this hypothetical. What attitude do you think that problem solvers would take to the prior unconditional proofs of consequences of RH?
Yes, but at the same time (against my earlier point) the best problem solvers are finding novel techniques that can be then applied to a variety of different problems- that’s essentially what Gowers seems to be focusing on.
I’m not sure I’d agree with that, but I feel like I’m in the middle of the two camps so maybe I’m not relevant? All those other results tell us what to believe well before we actually have a proof of RH. So that’s at least got to count for something. It may be true that a proof of GRH would be that much more useful, but GRH is a much broader idea. Note also that part of the point of proving things under RH is to then try and prove the same statements with weaker or no assumptions, and that’s a successful process.
I’m not sure. Can you expand on what you think would be their attitude?
I’ll volunteer another reason not to necessarily pay attention to my viewpoint: I’m pretty clearly one of those weaker mathematicians, so I have obvious motivations for seeing all of that side work as relevant.
I suspect that one can get similar viewpoints from people who more or less think the opposite but that they aren’t very vocal because it is closer to being a default viewpoint, but my evidence for this is very weak. It is also worth noting that when one does read papers by the top named people, they often cite papers from people who clearly aren’t in that top, using little constructions or generalizing bits or the like.
I’ll note that I think that there are people other than top researchers who have contributed enormously to the mathematical community through things other than research. For example, John Baez is listed amongst the mathematicians who influenced MathOverflow participants the most, in the same range as Fields medalists and historical greats, based on his expository contributions.
Yes, this is true and a good point. It can serve as a starting point for estimating effect sizes.
I’m not qualified to say judge the accuracy of these claims, but I was speaking with a PhD in physics who said that he thought that only ~50 people in theoretical physics were doing anything important.