Most of the heavy lifting in these proofs seem to be done by the Lean tactics. The comment “arguments to nlinarith are fully invented by our model” above a proof which is literally the single line
nlinarith [sq_nonneg (b - a), sq_nonneg (c - b), sq_nonneg (c - a)] makes me feel like they’re trying too hard to convince me this is impressive.
The other proof involving multiple steps is more impressive, but this still feels like a testament to the power of “traditional” search methods for proving algebraic inequalities, rather than an impressive AI milestone. People on twitter have claimed that some of the other problems are also one-liners using existing proof assistant strategies—I find this claim totally plausible.
I would be much more impressed with an AI-generated proof of a combinatorics or number theory IMO problem (eg problem 1 or 5 from 2021). Someone with more experience in proof assistants probably has a better intuition for which problems are hard to solve with “dumb” searching like nlinarith, but this is my guess.
I think it’s worth distinguishing how hard it is for a lean programmer to write the solution, how hard it is to solve the math problem in the first place, and how hard it is to write down an ML algorithm that spits out the right lean tactics.
Like, even if something can be written in a compact form, there might be only a dozen of combinations of ~10 tokens that give us a correct solution like nlinarith (b- a), …, where by token I count “nlinarith”, “sq_nonneg”, “b”, “-”, “a”, etc., and the actual search space for something of length 10 is probably ~(grammar size)^10 where the grammar is possibly of size 10-100. (Note: I don’t know how traditional solvers perform on statement of that size, it’s maybe not that hard.)
I agree that traditional methods work well for algebraic problems where proofs are short, and that AI doing search with nlinarith seems “dumb”, but the real question here is whether OAI has found a method to solve such problems at scale.
As you said, the one liner is not really convincing, but the multi step solution, introducing a new axiom in the middle, seems like a general construction to solve all algebraic problems, and even more. (Though they do mention how infinite action space and no no-self-play limits scaling in general.)
I do agree with the general impression that it’s not a huge breakthrough. To me, it’s mostly an update like “look, two years after gpt-f, it’s still hard but se can solve a theorem which requires multiple steps with transformers now!”.
I think we’re basically in agreement here (and I think your summary of the results is fair, I was mostly pushing back on a too-hyped tone coming from OpenAI, not from you)
Most of the heavy lifting in these proofs seem to be done by the Lean tactics. The comment “arguments to
nlinarith
are fully invented by our model” above a proof which is literally the single linenlinarith [sq_nonneg (b - a), sq_nonneg (c - b), sq_nonneg (c - a)]
makes me feel like they’re trying too hard to convince me this is impressive.The other proof involving multiple steps is more impressive, but this still feels like a testament to the power of “traditional” search methods for proving algebraic inequalities, rather than an impressive AI milestone. People on twitter have claimed that some of the other problems are also one-liners using existing proof assistant strategies—I find this claim totally plausible.
I would be much more impressed with an AI-generated proof of a combinatorics or number theory IMO problem (eg problem 1 or 5 from 2021). Someone with more experience in proof assistants probably has a better intuition for which problems are hard to solve with “dumb” searching like
nlinarith
, but this is my guess.I think it’s worth distinguishing how hard it is for a lean programmer to write the solution, how hard it is to solve the math problem in the first place, and how hard it is to write down an ML algorithm that spits out the right lean tactics.
Like, even if something can be written in a compact form, there might be only a dozen of combinations of ~10 tokens that give us a correct solution like nlinarith (b- a), …, where by token I count “nlinarith”, “sq_nonneg”, “b”, “-”, “a”, etc., and the actual search space for something of length 10 is probably ~(grammar size)^10 where the grammar is possibly of size 10-100. (Note: I don’t know how traditional solvers perform on statement of that size, it’s maybe not that hard.)
I agree that traditional methods work well for algebraic problems where proofs are short, and that AI doing search with nlinarith seems “dumb”, but the real question here is whether OAI has found a method to solve such problems at scale.
As you said, the one liner is not really convincing, but the multi step solution, introducing a new axiom in the middle, seems like a general construction to solve all algebraic problems, and even more. (Though they do mention how infinite action space and no no-self-play limits scaling in general.)
I do agree with the general impression that it’s not a huge breakthrough. To me, it’s mostly an update like “look, two years after gpt-f, it’s still hard but se can solve a theorem which requires multiple steps with transformers now!”.
I think we’re basically in agreement here (and I think your summary of the results is fair, I was mostly pushing back on a too-hyped tone coming from OpenAI, not from you)