do we have good reason to think they didn’t specifically train it on human lean proofs? it seems plausible to me that they did but idk
the curriculum of human problems teaches it human tricks
lean sorta “knows” a bunch of human tricks
We could argue about whether AlphaProof “is mostly human imitation or mostly RL”, but I feel like it’s pretty clear that it’s more analogous to AlphaGo than to AlphaZero.
I think AlphaProof is pretty far from being just RL from scratch:
they use a pretrained language model; I think the model is trained on human math in particular ( https://archive.is/Cwngq#selection-1257.0-1272.0:~:text=Dr. Hubert’s team,frequency was reduced. )
do we have good reason to think they didn’t specifically train it on human lean proofs? it seems plausible to me that they did but idk
the curriculum of human problems teaches it human tricks
lean sorta “knows” a bunch of human tricks
We could argue about whether AlphaProof “is mostly human imitation or mostly RL”, but I feel like it’s pretty clear that it’s more analogous to AlphaGo than to AlphaZero.
(a relevant thread: https://www.lesswrong.com/posts/sTDfraZab47KiRMmT/views-on-when-agi-comes-and-on-strategy-to-reduce?commentId=ZKuABGnKf7v35F5gp )
Okay, great, then we just have to wait a year for AlphaProofZero to get a perfect score on the IMO.