Kaarel comments on Jesse Hoogland’s Shortform

Kaarel 12 Feb 2025 2:26 UTC
6 points
4
I think AlphaProof is pretty far from being just RL from scratch:
- they use a pretrained language model; I think the model is trained on human math in particular ( https://archive.is/Cwngq#selection-1257.0-1272.0:~:text=Dr. Hubert’s team,frequency was reduced. )
- do we have good reason to think they didn’t specifically train it on human lean proofs? it seems plausible to me that they did but idk
- the curriculum of human problems teaches it human tricks
- lean sorta “knows” a bunch of human tricks
We could argue about whether AlphaProof “is mostly human imitation or mostly RL”, but I feel like it’s pretty clear that it’s more analogous to AlphaGo than to AlphaZero.

(a relevant thread: https://www.lesswrong.com/posts/sTDfraZab47KiRMmT/views-on-when-agi-comes-and-on-strategy-to-reduce?commentId=ZKuABGnKf7v35F5gp )
- Jesse Hoogland 12 Feb 2025 17:38 UTC
  2 points
  0
  Parent
  Okay, great, then we just have to wait a year for AlphaProofZero to get a perfect score on the IMO.