I guess I’m a bit confused where o3 comes into this analysis. This discussion appears to be focused on base models to me? Is data really the bottleneck these days for o-series-type advancements? I thought that compute available to do RL in self-play / CoT / long-time-horizon-agentic-setups would be a bigger consideration.
Edit: I guess upon another reading this article seems like an argument against AI capabilities hitting a plateau in the coming years, whereas the o3 announcement makes me more curious about whether we’re going to hyper-accelerate capabilities in the coming months.
My thesis is that the o3 announcement is timelines-relevant in a strange way. The causation goes from o3 to impressiveness or utility of its successors trained on 1 GW training systems, then to decisions to build 5 GW training systems, and it’s those 5 GW training systems that have a proximate effect on timelines (in comparison to the world only having 1 GW training systems for a few years). The argument goes through even if o3 and its successors don’t particularly move timelines directly through their capabilities, they can remain a successful normal technology.
The funding constraint stopping $150bn training systems previously seemed more plausible, but with o3 it might be lifted. This is timelines-relevant precisely because there aren’t any other constraints that come into play before that point.
I guess I’m a bit confused where o3 comes into this analysis. This discussion appears to be focused on base models to me? Is data really the bottleneck these days for o-series-type advancements? I thought that compute available to do RL in self-play / CoT / long-time-horizon-agentic-setups would be a bigger consideration.
Edit: I guess upon another reading this article seems like an argument against AI capabilities hitting a plateau in the coming years, whereas the o3 announcement makes me more curious about whether we’re going to hyper-accelerate capabilities in the coming months.
My thesis is that the o3 announcement is timelines-relevant in a strange way. The causation goes from o3 to impressiveness or utility of its successors trained on 1 GW training systems, then to decisions to build 5 GW training systems, and it’s those 5 GW training systems that have a proximate effect on timelines (in comparison to the world only having 1 GW training systems for a few years). The argument goes through even if o3 and its successors don’t particularly move timelines directly through their capabilities, they can remain a successful normal technology.
The funding constraint stopping $150bn training systems previously seemed more plausible, but with o3 it might be lifted. This is timelines-relevant precisely because there aren’t any other constraints that come into play before that point.