Sam: “The model (o1) is going to get so much better so fast […] Maybe this is the GPT-2 moment, we know how to get it to GPT-4”. So plan for the model to get rapidly smarter.
I notice I am skeptical, because of how I think about the term ‘smarter.’ I think we can make it, maybe the word is ‘cleverer’? Have it use its smarts better.
AlphaZero gets to be actually smarter, that’s a real possibility. If they only scaled reasoning RL slightly, but there are four orders of magnitude more where that came from, even o1-mini might turn into an alien tiger when it no longer needs to spend its parameters on remembering all the world’s trivia. It’s merely “cleverer” now, but a natural reading of GPT-2 to GPT-4 transition permits actual improvement in smartness that is distinct from how LLMs get smarter with scale.
AlphaZero gets to be actually smarter, that’s a real possibility. If they only scaled reasoning RL slightly, but there are four orders of magnitude more where that came from, even o1-mini might turn into an alien tiger when it no longer needs to spend its parameters on remembering all the world’s trivia. It’s merely “cleverer” now, but a natural reading of GPT-2 to GPT-4 transition permits actual improvement in smartness that is distinct from how LLMs get smarter with scale.