I agree with the possibility of pre-training platoeing as some point, possibly even in next few years.
It would change timelines significantly. But there are other factors apart from scaling pre-training. For example, reasoning models like o3 crushing ARC-AGI (https://arcprize.org/blog/oai-o3-pub-breakthrough). Reasoning in latent space is too fresh yet, but it might be the next breakthrough of a similar magnitude.
Why not take GPT-4.5 for what it is, OpenAI has literally stated that it’s not a frontier model? Ok, so GPT-5 will not be 100x-ed GPT-4, but maybe GPT-6 will be, and it might be enough for AGI.
You should not look for progress in autonomy/agency in commercial offings like GPT-4.5. At this point OpenAI is focusing on what sells well (better personality and EQ). I think they care less about a path to AGI. Rapid advances towards agency/autonomy are better gauged from academic literature.
I agree that we should not fall for “vibe checks”.
But don’t bail on benchmarks, many people are working on benchmarks and evals, there is constant progress there, benchmarks are getting more objective and harder to game. Rather than looking at benchmarks that are pushed by OpenAI, it’s better to look for cutting-edge ones in academic literature. Evaluating a SOTA model with a benchmark that is few years old does not make sense at this point.
I agree with the possibility of pre-training platoeing as some point, possibly even in next few years.
It would change timelines significantly. But there are other factors apart from scaling pre-training. For example, reasoning models like o3 crushing ARC-AGI (https://arcprize.org/blog/oai-o3-pub-breakthrough). Reasoning in latent space is too fresh yet, but it might be the next breakthrough of a similar magnitude.
Why not take GPT-4.5 for what it is, OpenAI has literally stated that it’s not a frontier model? Ok, so GPT-5 will not be 100x-ed GPT-4, but maybe GPT-6 will be, and it might be enough for AGI.
You should not look for progress in autonomy/agency in commercial offings like GPT-4.5. At this point OpenAI is focusing on what sells well (better personality and EQ). I think they care less about a path to AGI. Rapid advances towards agency/autonomy are better gauged from academic literature.
I agree that we should not fall for “vibe checks”.
But don’t bail on benchmarks, many people are working on benchmarks and evals, there is constant progress there, benchmarks are getting more objective and harder to game. Rather than looking at benchmarks that are pushed by OpenAI, it’s better to look for cutting-edge ones in academic literature. Evaluating a SOTA model with a benchmark that is few years old does not make sense at this point.