AlphaProof works in the “obvious” way: an LLM generates candidate next steps which are checked using a formal proof-checking system, in this case Lean.
I don’t think this is true, actually. Look at the diagram of AlphaProof’s reinforcement learning training loop. The proof part (“solver network”) seems to be purely RL based; it even uses the same algorithm as AlphaZero. (The article actually describes AlphaProof as “a new reinforcement-learning based system”.)
The contribution of the LLM to AlphaProof seems to be only in providing the left part of the diagram (the “formalizer network”). Namely translating, and somehow expanding, one million human written proofs into 100 million formal Lean proofs. I’m not sure how the LLM increases the number of proofs by 100x, but I assume for each successfully formalized human-written proof it also generates 100 (simple?) synthetic variations. This interpretation is also corroborated by this statement:
We established a bridge between these two complementary spheres by fine-tuning a Gemini model to automatically translate natural language problem statements into formal statements, creating a large library of formal problems of varying difficulty.
So AlphaProof is actually a rather narrow RL system, not unlike the original AlphaGo. The latter was also bootstrapped on human data (expert Go games), similar to how AlphaProof uses (formalized) human proofs for initial training. (Unlike AlphaGo and AlphaProof, AlphaGo Zero and the original AlphaZero did not rely on any non-synthetic / human training data.)
That being said, unlike AlphaProof, AlphaGeometry 2, which solved the one geometry problem, is not RL based. It does indeed use an LLM (a Gemini-like model trained on large amounts of synthetic training data) when coming up with proofs. Though it works in tandem with a non-ML symbolic deduction engine. Such systems are largely based on brute force search, similar to how DeepBlue played chess. So AlphaGeometry is simultaneously more general (LLM) and less general (symbolic deduction engine) than AlphaProof (an RL system).
Namely translating, and somehow expanding, one million human written proofs into 100 million formal Lean proofs.
We obviously should wait for the paper and more details, but I am certain this is incorrect. Both your quote and diagram is clear that it is one million problems, not proofs.
Well, what kind of problem do you think will help with learning how to prove things, if not proofs? AlphaGo & co learned to play games by being trained with games. And AlphaProof uses the AlphaZero algorithm.
The plain interpretation is that only statements to be proved (or disproved) were sourced from human data, without any actual proof steps. In Go analogy, it is like being given Go board positions without next moves.
It makes a lot of sense this is needed and helpful, because winning a game of Go from the empty board is a different and easier problem than playing best moves from arbitrary Go positions. Igo Hatsuyoron mentioned in the original post is a good example; additional training was needed, because such positions never come up in actual games.
Imagine AlphaZero trained from randomly sampled Go positions, each intersection being black/white/empty with uniform probability. It would play much worse game of Go. Fortunately, how to sample “relevant” Go positions is an easy problem: you just play the game, initial N moves sampled at higher temperature for diversity.
In comparison, how to sample relevant math positions is unclear. Being good at finding proofs in arbitrary formal systems from arbitrary set of axioms is actually quite different from being good at math. Using human data sidesteps this problem.
I don’t understand what you mean here. How would merely seeing conjectures help with proving them? You arguably need to see many example proofs of example conjectures. Otherwise it would be like expecting a child, who has never seen a proof, learning to prove conjectures merely by showing it a lot of conjectures.
I don’t think this [sc. that AlphaProof uses an LLM to generate candidate next steps] is true, actually.
Hmm, maybe you’re right. I thought I’d seen something that said it did that, but perhaps I hallucinated it. (What they’ve written isn’t specific enough to make it clear that it doesn’t do that either, at least to me. They say “AlphaProof generates solution candidates”, but nothing about how it generates them. I get the impression that it’s something at least kinda LLM-like, but could be wrong.)
The AlphaZero algorithm doesn’t obviously not involve an LLM. It has a “policy network” to propose moves, and I don’t know what that looks like in the case of AlphaProof. If I had to guess blindly I would guess it’s an LLM, but maybe they’ve got something else instead.
I don’t think this is true, actually. Look at the diagram of AlphaProof’s reinforcement learning training loop. The proof part (“solver network”) seems to be purely RL based; it even uses the same algorithm as AlphaZero. (The article actually describes AlphaProof as “a new reinforcement-learning based system”.)
The contribution of the LLM to AlphaProof seems to be only in providing the left part of the diagram (the “formalizer network”). Namely translating, and somehow expanding, one million human written proofs into 100 million formal Lean proofs. I’m not sure how the LLM increases the number of proofs by 100x, but I assume for each successfully formalized human-written proof it also generates 100 (simple?) synthetic variations. This interpretation is also corroborated by this statement:
So AlphaProof is actually a rather narrow RL system, not unlike the original AlphaGo. The latter was also bootstrapped on human data (expert Go games), similar to how AlphaProof uses (formalized) human proofs for initial training. (Unlike AlphaGo and AlphaProof, AlphaGo Zero and the original AlphaZero did not rely on any non-synthetic / human training data.)
So, if LLMs like GPT-4 and Gemini are considered to be fairly general AI systems (some even consider them AGI), while systems like AlphaGo or AlphaZero are regarded as narrow AI—then AlphaProof counts as narrow AI. Which contradicts views like the one by Dwarkesh Patel who thought solving IMO math problems may be AGI complete, and confirms the opinion of people like Grant Sanderson (3Blue1Brown) who thought that training on and solving formal math proofs is formally very similar to AlphaGo solving board games.
That being said, unlike AlphaProof, AlphaGeometry 2, which solved the one geometry problem, is not RL based. It does indeed use an LLM (a Gemini-like model trained on large amounts of synthetic training data) when coming up with proofs. Though it works in tandem with a non-ML symbolic deduction engine. Such systems are largely based on brute force search, similar to how DeepBlue played chess. So AlphaGeometry is simultaneously more general (LLM) and less general (symbolic deduction engine) than AlphaProof (an RL system).
We obviously should wait for the paper and more details, but I am certain this is incorrect. Both your quote and diagram is clear that it is one million problems, not proofs.
Well, what kind of problem do you think will help with learning how to prove things, if not proofs? AlphaGo & co learned to play games by being trained with games. And AlphaProof uses the AlphaZero algorithm.
The plain interpretation is that only statements to be proved (or disproved) were sourced from human data, without any actual proof steps. In Go analogy, it is like being given Go board positions without next moves.
It makes a lot of sense this is needed and helpful, because winning a game of Go from the empty board is a different and easier problem than playing best moves from arbitrary Go positions. Igo Hatsuyoron mentioned in the original post is a good example; additional training was needed, because such positions never come up in actual games.
Imagine AlphaZero trained from randomly sampled Go positions, each intersection being black/white/empty with uniform probability. It would play much worse game of Go. Fortunately, how to sample “relevant” Go positions is an easy problem: you just play the game, initial N moves sampled at higher temperature for diversity.
In comparison, how to sample relevant math positions is unclear. Being good at finding proofs in arbitrary formal systems from arbitrary set of axioms is actually quite different from being good at math. Using human data sidesteps this problem.
I don’t understand what you mean here. How would merely seeing conjectures help with proving them? You arguably need to see many example proofs of example conjectures. Otherwise it would be like expecting a child, who has never seen a proof, learning to prove conjectures merely by showing it a lot of conjectures.
The feedback is from Lean, which can validate attempted formal proofs.
Hmm, maybe you’re right. I thought I’d seen something that said it did that, but perhaps I hallucinated it. (What they’ve written isn’t specific enough to make it clear that it doesn’t do that either, at least to me. They say “AlphaProof generates solution candidates”, but nothing about how it generates them. I get the impression that it’s something at least kinda LLM-like, but could be wrong.)
The diagram actually says it uses the AlphaZero algorithm. Which obviously doesn’t involve an LLM.
The AlphaZero algorithm doesn’t obviously not involve an LLM. It has a “policy network” to propose moves, and I don’t know what that looks like in the case of AlphaProof. If I had to guess blindly I would guess it’s an LLM, but maybe they’ve got something else instead.