If everything improves on a log scale, then having three places to spend log scale compute is a rather big improvement.
RL training and inference are far from catching up to pretraining in scale, so the initial improvement from scaling their compute can soon prove massive compared to remaining potential for scaling pretraining. RL training might crucially depend on human labels and if so won’t scale much for now, while for inference compute OpenAI’s Noam Brown says
o1 thinks for seconds, but we aim for future versions to think for hours, days, even weeks
This doesn’t even account for additional compute that might go into wider inference time search, where both generative and process reward models work together to refine the reasoning trace (unlike producing ever longer traces, this works in parallel).
RL training and inference are far from catching up to pretraining in scale, so the initial improvement from scaling their compute can soon prove massive compared to remaining potential for scaling pretraining. RL training might crucially depend on human labels and if so won’t scale much for now, while for inference compute OpenAI’s Noam Brown says
This doesn’t even account for additional compute that might go into wider inference time search, where both generative and process reward models work together to refine the reasoning trace (unlike producing ever longer traces, this works in parallel).