The scaler view is not that LLMs scale to superintelligence directly and without bound, but that they merely scale enough to start fixing their remaining crippling flaws and assisting with further scaling, which due to their digital and high speed nature massively accelerates timelines compared to only human labor. So the crux is a salient relevant threshold that’s relatively low, though it might still prove too high without further advances.
To do that and achieve something looking like take-off they would need to have to get to the level of advanced AI researcher, rather than just coding assistant. That is come up with novel architectures to test. Even if the LLM could write all the code for a top researcher 10* faster that’s not a 10* speedup in timelines, probably 50% at most if much of the time is thinking up theoretical concepts and waiting for training runs to test results.
An LLM might be able to take a few steps of advanced research (though not necessarily more than that) into many current topics (at once) if it was pre-trained on the right synthetic data. Continual improvement through search/self-play also seems to be getting closer.
Even without autonomous research, another round of scaling (that’s currently unaffordable) gets unlocked by economic value of becoming able to do routine long-horizon tasks. The question is always the least possible capability sufficient to keep the avalanche going.
The scaler view is not that LLMs scale to superintelligence directly and without bound, but that they merely scale enough to start fixing their remaining crippling flaws and assisting with further scaling, which due to their digital and high speed nature massively accelerates timelines compared to only human labor. So the crux is a salient relevant threshold that’s relatively low, though it might still prove too high without further advances.
To do that and achieve something looking like take-off they would need to have to get to the level of advanced AI researcher, rather than just coding assistant. That is come up with novel architectures to test. Even if the LLM could write all the code for a top researcher 10* faster that’s not a 10* speedup in timelines, probably 50% at most if much of the time is thinking up theoretical concepts and waiting for training runs to test results.
An LLM might be able to take a few steps of advanced research (though not necessarily more than that) into many current topics (at once) if it was pre-trained on the right synthetic data. Continual improvement through search/self-play also seems to be getting closer.
Even without autonomous research, another round of scaling (that’s currently unaffordable) gets unlocked by economic value of becoming able to do routine long-horizon tasks. The question is always the least possible capability sufficient to keep the avalanche going.