It seems suspicious to me that this hype is coming from fields were it seems hard to verify (is the LLM actually coming up with original ideas or is it just fusing standard procedures? Are the ideas the bottleneck or is the experimental time the bottleneck? Are the ideas actually working or do they just sound impressive?). And of course this is Twitter.
Why not progress on hard (or even easy but open) math problems? Are LLMs afraid of proof verifiers? On the contrary, it seems like this is the area where we should be able to best apply RL, since there is a clear reward signal.
On the contrary, it seems like this is the area where we should be able to best apply RL, since there is a clear reward signal.
Is there? It’s one thing to verify whether a proof is correct; whether an expression (posed by a human!) is tautologous to a different expression (also posed by a human!). But what’s the ground-truth signal for “the framework of Bayesian probability/category theory is genuinely practically useful”?
This is the reason I’m bearish on the reasoning models even for math. The realistic benefits of them seem to be:
Much faster feedback loops on mathematical conjectures.
Solving long-standing mathematical challenges such as Riemann’s or P vs. NP.
Mathematicians might be able to find hints of whole new math paradigms in the proofs for the long-standing challenges the models generate.
Of those:
(1) still requires mathematicians to figure out which conjectures are useful. It compresses hours, days, weeks, or months (depending on how well it scales) of a very specific and niche type of work into minutes, which is cool, but not Singularity-tier.
(2) is very speculative. It’s basically “compresses decades of work into minutes”, while the current crop of reasoning models can barely solve problems that ought to be pretty “shallow” from their perspective. Maybe Altman is right, the paradigm is in its GPT-2 stage, and we’re all about to be blown away by what they’re capable of. Or maybe it doesn’t scale past the frontier of human mathematical knowledge very well at all, and the parallels with AlphaZero are overstated. We’ll see.
(3) is dependent on (2) working out.
(The reasoning-model hype is so confusing for me. Superficially there’s a ton of potential, but I don’t think there’s been any real indication they’re up to the real challenges still ahead.)
That’s a reasonable suspicion but as a counterpoint there might be more low-hanging fruit in biomedicine than math, precisely because it’s harder to test ideas in the former. Without the need for expensive experiments, math has already been driven much deeper than other fields, and therefore requires a deeper understanding to have any hope of making novel progress.
edit: Also, if I recall correctly, the average IQ of mathematicians is higher than biologists, which is consistent with it being harder to make progress in math.
On the other hand, frontier math (pun intended) is much worse financed than biomedicine because most of the PhD-level math has barely any practical applications worth spending many manhours of high-IQ mathematicians (which often makes them switch career, you know). So, I would argue, if productivity of math postdocs when armed with future LLMs raises by, let’s say, an order of magnitude, they will be able to attack more laborious problems.
Not that I expect it to make much difference to the general populace or even the scientific community at large though
It seems suspicious to me that this hype is coming from fields were it seems hard to verify (is the LLM actually coming up with original ideas or is it just fusing standard procedures? Are the ideas the bottleneck or is the experimental time the bottleneck? Are the ideas actually working or do they just sound impressive?). And of course this is Twitter.
Why not progress on hard (or even easy but open) math problems? Are LLMs afraid of proof verifiers? On the contrary, it seems like this is the area where we should be able to best apply RL, since there is a clear reward signal.
Is there? It’s one thing to verify whether a proof is correct; whether an expression (posed by a human!) is tautologous to a different expression (also posed by a human!). But what’s the ground-truth signal for “the framework of Bayesian probability/category theory is genuinely practically useful”?
This is the reason I’m bearish on the reasoning models even for math. The realistic benefits of them seem to be:
Much faster feedback loops on mathematical conjectures.
Solving long-standing mathematical challenges such as Riemann’s or P vs. NP.
Mathematicians might be able to find hints of whole new math paradigms in the proofs for the long-standing challenges the models generate.
Of those:
(1) still requires mathematicians to figure out which conjectures are useful. It compresses hours, days, weeks, or months (depending on how well it scales) of a very specific and niche type of work into minutes, which is cool, but not Singularity-tier.
(2) is very speculative. It’s basically “compresses decades of work into minutes”, while the current crop of reasoning models can barely solve problems that ought to be pretty “shallow” from their perspective. Maybe Altman is right, the paradigm is in its GPT-2 stage, and we’re all about to be blown away by what they’re capable of. Or maybe it doesn’t scale past the frontier of human mathematical knowledge very well at all, and the parallels with AlphaZero are overstated. We’ll see.
(3) is dependent on (2) working out.
(The reasoning-model hype is so confusing for me. Superficially there’s a ton of potential, but I don’t think there’s been any real indication they’re up to the real challenges still ahead.)
That’s a reasonable suspicion but as a counterpoint there might be more low-hanging fruit in biomedicine than math, precisely because it’s harder to test ideas in the former. Without the need for expensive experiments, math has already been driven much deeper than other fields, and therefore requires a deeper understanding to have any hope of making novel progress.
edit: Also, if I recall correctly, the average IQ of mathematicians is higher than biologists, which is consistent with it being harder to make progress in math.
On the other hand, frontier math (pun intended) is much worse financed than biomedicine because most of the PhD-level math has barely any practical applications worth spending many manhours of high-IQ mathematicians (which often makes them switch career, you know). So, I would argue, if productivity of math postdocs when armed with future LLMs raises by, let’s say, an order of magnitude, they will be able to attack more laborious problems.
Not that I expect it to make much difference to the general populace or even the scientific community at large though