There is a lot of rich math to be discovered outside of the areas that pure mathematicians have focused on historically, and that people might find equally fascinating. In particular, I believe this to be true within the broad domain of machine learning.
That’s largely because machine learning is in its infancy. It is still a field largely defined by three very limited approaches:
Structural Risk Minimization (support-vector machines and other approaches that use regularization to work on high-dimensional data) -- still ultimately a kind of PAC learning, and still largely making very unstructured predictions based on very unstructured data
PAC learning—even when we allow ourselves inefficient (ie: super-poly-time) PAC learning, we’re still ultimately kept stuck by the reliance on prior knowledge to generate a hypothesis class with a known, finite VC Dimension. I’ve sometimes idly pondered trying to leverage algorithmic information theory to do something like what Hutter did, and prove a fully general counter-theorem to No Free Lunch saying that when the learner can have “more information” and “more algorithmic information” (more compute-power) than the environment, the learner can then win. (On the other hand, I tend to idly ponder a lot about AIT, since it seems to be a very underappreciated field of theoretical CS that remains underappreciated because of just how much mathematical background it requires!)
Stochastic Gradient Descent, and most especially neural networks: useful in properly general environments, but doesn’t tell the learner’s programmer much anything that makes a human kind of sense. Often overfits or finds non-global minima.
To those we are rapidly adding a fourth approach, that I think has the potential to really supplant many of the others:
Probabilistic programming: fully general, more capable of giving “sensible” outputs, capable of expressing arbitrary statistical models… but really slow, and modulo an Occam’s Razor assumption, subject to the same sort of losses in adversarial environments as any other Bayesian methods. But a lot better than what was there before.
That’s largely because machine learning is in its infancy. It is still a field largely defined by three very limited approaches:
Structural Risk Minimization (support-vector machines and other approaches that use regularization to work on high-dimensional data) -- still ultimately a kind of PAC learning, and still largely making very unstructured predictions based on very unstructured data
PAC learning—even when we allow ourselves inefficient (ie: super-poly-time) PAC learning, we’re still ultimately kept stuck by the reliance on prior knowledge to generate a hypothesis class with a known, finite VC Dimension. I’ve sometimes idly pondered trying to leverage algorithmic information theory to do something like what Hutter did, and prove a fully general counter-theorem to No Free Lunch saying that when the learner can have “more information” and “more algorithmic information” (more compute-power) than the environment, the learner can then win. (On the other hand, I tend to idly ponder a lot about AIT, since it seems to be a very underappreciated field of theoretical CS that remains underappreciated because of just how much mathematical background it requires!)
Stochastic Gradient Descent, and most especially neural networks: useful in properly general environments, but doesn’t tell the learner’s programmer much anything that makes a human kind of sense. Often overfits or finds non-global minima.
To those we are rapidly adding a fourth approach, that I think has the potential to really supplant many of the others:
Probabilistic programming: fully general, more capable of giving “sensible” outputs, capable of expressing arbitrary statistical models… but really slow, and modulo an Occam’s Razor assumption, subject to the same sort of losses in adversarial environments as any other Bayesian methods. But a lot better than what was there before.