Would better token prediction loss help towards AGI? I wonder if scaling is no longer relevant given existing level of performance of ChatGPT, that there is already some token prediction loss overhang. Missing things needed to make LLMs autonomously productive (mainly to plan/develop their own training, assemble datasets) are probably unrelated, prediction loss that’s even better won’t help with them.
Would better token prediction loss help towards AGI? I wonder if scaling is no longer relevant given existing level of performance of ChatGPT, that there is already some token prediction loss overhang. Missing things needed to make LLMs autonomously productive (mainly to plan/develop their own training, assemble datasets) are probably unrelated, prediction loss that’s even better won’t help with them.