I think that people don’t account for the fact that scaling means decreasing space for algorithmic experiments, you can’t train GPT-5 10000 times making small tweaks each time. Some algorithmic improvements can show effect only on large scales or if they are implemented in training from scratch, therefore, such improvements are hard to find.
I don’t think recent and further scaling really changes the ever-present tradeoff between large full runs and small experimental runs. That’s been a factor in training large neural networks since 2004 at least, the first time I was involved in attempts to deal with real-world datasets that benefit from scaling networks as far as the hardware allows.
Personally I believe that a novel algorithm/architecture which is substantially better than transformer-based LLMs is findable, and would show up even at small scale.
I think the effect you are discussing is more of an issue for incremental improvements on the existing paradigm.
I think that people don’t account for the fact that scaling means decreasing space for algorithmic experiments, you can’t train GPT-5 10000 times making small tweaks each time. Some algorithmic improvements can show effect only on large scales or if they are implemented in training from scratch, therefore, such improvements are hard to find.
I don’t think recent and further scaling really changes the ever-present tradeoff between large full runs and small experimental runs. That’s been a factor in training large neural networks since 2004 at least, the first time I was involved in attempts to deal with real-world datasets that benefit from scaling networks as far as the hardware allows.
Personally I believe that a novel algorithm/architecture which is substantially better than transformer-based LLMs is findable, and would show up even at small scale. I think the effect you are discussing is more of an issue for incremental improvements on the existing paradigm.
My point is that people can perceive difficulties with getting incremental improvement as strong evidence about LLMs being generally limited.