The paper seems to be about scaling laws for a static dataset as well?
Similar to the initial study of scale in LLMs, we focus on the effect of scaling on a generative pre-training loss (rather than on downstream agent performance, or reward- or representation-centric objectives), in the infinite data regime, on a fixed offline dataset.
To learn to act you’d need to do reinforcement learning, which is massively less data-efficient than the current self-supervised training.
More generally: I think almost everyone thinks that you’d need to scale the right thing for further progress. The question is just what the right thing is if text is not the right thing. Because text encodes highly powerful abstractions (produced by humans and human culture over many centuries) in a very information dense way.
If you look at the Active Inference community there’s a lot of work going into PPL-based languages to do more efficient world modelling but that shit ain’t easy and as you say it is a lot more compute heavy.
I think there’ll be a scaling break due to this but when it is algorithmically figured out again we will be back and back with a vengeance as I think most safety challenges have a self vs environment model as a necessary condition to be properly engaged. (which currently isn’t engaged with LLMs wolrd modelling)
The paper seems to be about scaling laws for a static dataset as well?
To learn to act you’d need to do reinforcement learning, which is massively less data-efficient than the current self-supervised training.
More generally: I think almost everyone thinks that you’d need to scale the right thing for further progress. The question is just what the right thing is if text is not the right thing. Because text encodes highly powerful abstractions (produced by humans and human culture over many centuries) in a very information dense way.
If you look at the Active Inference community there’s a lot of work going into PPL-based languages to do more efficient world modelling but that shit ain’t easy and as you say it is a lot more compute heavy.
I think there’ll be a scaling break due to this but when it is algorithmically figured out again we will be back and back with a vengeance as I think most safety challenges have a self vs environment model as a necessary condition to be properly engaged. (which currently isn’t engaged with LLMs wolrd modelling)