‘That means, around three months, it is possible to achieve performance comparable to current state-of-the-art LLMs using a model with half the parameter size.’
If this trend continues, combined with (better/more extensible) inference scaling laws, it could make LM agents much more competitive on many AI R&D capabilities soon, at much longer horizon tasks.
‘That means, around three months, it is possible to achieve performance comparable to current state-of-the-art LLMs using a model with half the parameter size.’
If this trend continues, combined with (better/more extensible) inference scaling laws, it could make LM agents much more competitive on many AI R&D capabilities soon, at much longer horizon tasks.
E.g. - figure 11 from RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts:
Also related: Before smart AI, there will be many mediocre or specialized AIs.