Aschenbrenner’s argument starts to be much more plausible
To be fair, Aschenbrenner explicitly mentions that what he terms “unhobbling” of LLMs will also be needed: he just expects progress in that to continue. The question then is whether the various weakness you’ve mentioned (and any other important ones) will be beaten by either scaling, or unhobbling, or a combination of the two.
Agreed. I added some thoughts on the relevance of Aschenbrenner’s unhobbling claims in a footnote:
“Chollet Aschenbrenner also discusses ‘unhobbling’, which he describes as ‘fixing obvious ways in which models are hobbled by default, unlocking latent capabilities and giving them tools, leading to step-changes in usefulness’. He breaks that down into categories here. Scaffolding and tooling I discuss here; RHLF seems unlikely to help with fundamental reasoning issues. Increased context length serves roughly as a kind of scaffolding for purposes of this discussion. ‘Posttraining improvements’ is too vague to really evaluate. But note that his core claim (the graph here) ‘shows only the scaleup in base models; “unhobblings” are not pictured’.”
To be fair, Aschenbrenner explicitly mentions that what he terms “unhobbling” of LLMs will also be needed: he just expects progress in that to continue. The question then is whether the various weakness you’ve mentioned (and any other important ones) will be beaten by either scaling, or unhobbling, or a combination of the two.
Agreed. I added some thoughts on the relevance of Aschenbrenner’s unhobbling claims in a footnote:
“
CholletAschenbrenner also discusses ‘unhobbling’, which he describes as ‘fixing obvious ways in which models are hobbled by default, unlocking latent capabilities and giving them tools, leading to step-changes in usefulness’. He breaks that down into categories here. Scaffolding and tooling I discuss here; RHLF seems unlikely to help with fundamental reasoning issues. Increased context length serves roughly as a kind of scaffolding for purposes of this discussion. ‘Posttraining improvements’ is too vague to really evaluate. But note that his core claim (the graph here) ‘shows only the scaleup in base models; “unhobblings” are not pictured’.”