I think the Low-Hanging Fruit Complaint is more often a result of not knowing where there’s a hot, productive research frontier than of the universe actually lacking interesting new mathematics to uncover.
There’s a lot of potential for semantic differences here, and risk of talking past each other. I’ll try to be explicit. I believe that:
There are very few people who have a nontrivial probability of discovering statements about the prime numbers that are both true, that people didn’t already believe to be true, and that people find fascinating.
The same is not far from being true for all areas of math that have been mainstream for 100+ years: algebraic topology, algebraic geometry, algebraic number theory, analytic number theory, partial differential equations, Lie Groups, functional analysis, etc.
There is a lot of rich math to be discovered outside of the areas that pure mathematicians have focused on historically, and that people might find equally fascinating. In particular, I believe this to be true within the broad domain of machine learning.
There are few historical examples of mathematicians discovering interesting new fields of math without being motivated by applications.
There is a lot of rich math to be discovered outside of the areas that pure mathematicians have focused on historically, and that people might find equally fascinating. In particular, I believe this to be true within the broad domain of machine learning.
That’s largely because machine learning is in its infancy. It is still a field largely defined by three very limited approaches:
Structural Risk Minimization (support-vector machines and other approaches that use regularization to work on high-dimensional data) -- still ultimately a kind of PAC learning, and still largely making very unstructured predictions based on very unstructured data
PAC learning—even when we allow ourselves inefficient (ie: super-poly-time) PAC learning, we’re still ultimately kept stuck by the reliance on prior knowledge to generate a hypothesis class with a known, finite VC Dimension. I’ve sometimes idly pondered trying to leverage algorithmic information theory to do something like what Hutter did, and prove a fully general counter-theorem to No Free Lunch saying that when the learner can have “more information” and “more algorithmic information” (more compute-power) than the environment, the learner can then win. (On the other hand, I tend to idly ponder a lot about AIT, since it seems to be a very underappreciated field of theoretical CS that remains underappreciated because of just how much mathematical background it requires!)
Stochastic Gradient Descent, and most especially neural networks: useful in properly general environments, but doesn’t tell the learner’s programmer much anything that makes a human kind of sense. Often overfits or finds non-global minima.
To those we are rapidly adding a fourth approach, that I think has the potential to really supplant many of the others:
Probabilistic programming: fully general, more capable of giving “sensible” outputs, capable of expressing arbitrary statistical models… but really slow, and modulo an Occam’s Razor assumption, subject to the same sort of losses in adversarial environments as any other Bayesian methods. But a lot better than what was there before.
I partially respond to this here.
There’s a lot of potential for semantic differences here, and risk of talking past each other. I’ll try to be explicit. I believe that:
There are very few people who have a nontrivial probability of discovering statements about the prime numbers that are both true, that people didn’t already believe to be true, and that people find fascinating.
The same is not far from being true for all areas of math that have been mainstream for 100+ years: algebraic topology, algebraic geometry, algebraic number theory, analytic number theory, partial differential equations, Lie Groups, functional analysis, etc.
There is a lot of rich math to be discovered outside of the areas that pure mathematicians have focused on historically, and that people might find equally fascinating. In particular, I believe this to be true within the broad domain of machine learning.
There are few historical examples of mathematicians discovering interesting new fields of math without being motivated by applications.
That’s largely because machine learning is in its infancy. It is still a field largely defined by three very limited approaches:
Structural Risk Minimization (support-vector machines and other approaches that use regularization to work on high-dimensional data) -- still ultimately a kind of PAC learning, and still largely making very unstructured predictions based on very unstructured data
PAC learning—even when we allow ourselves inefficient (ie: super-poly-time) PAC learning, we’re still ultimately kept stuck by the reliance on prior knowledge to generate a hypothesis class with a known, finite VC Dimension. I’ve sometimes idly pondered trying to leverage algorithmic information theory to do something like what Hutter did, and prove a fully general counter-theorem to No Free Lunch saying that when the learner can have “more information” and “more algorithmic information” (more compute-power) than the environment, the learner can then win. (On the other hand, I tend to idly ponder a lot about AIT, since it seems to be a very underappreciated field of theoretical CS that remains underappreciated because of just how much mathematical background it requires!)
Stochastic Gradient Descent, and most especially neural networks: useful in properly general environments, but doesn’t tell the learner’s programmer much anything that makes a human kind of sense. Often overfits or finds non-global minima.
To those we are rapidly adding a fourth approach, that I think has the potential to really supplant many of the others:
Probabilistic programming: fully general, more capable of giving “sensible” outputs, capable of expressing arbitrary statistical models… but really slow, and modulo an Occam’s Razor assumption, subject to the same sort of losses in adversarial environments as any other Bayesian methods. But a lot better than what was there before.
What do you mean by people find fascinating and how many people? It seems like a lot of work in your first bullet point is the last three words.
Upvoted for being specific.