My report estimates that the amount of training data required to train a model with N parameters scales as N^0.8, based significantly on results from Kaplan et al 2020. In 2022, the Chinchilla scaling result (Hoffmann et al 2022) showed that instead the amount of data should scale as N.
The BioAnchors estimate was also based on older scaling laws that placed a lower priority on data relative to compute. With the new Chinchilla scaling laws, more data would be required for compute-optimal training. Of course, training runs don’t need to be compute-optimal: You can get away with using more compute and less data if you’re constrained by data, even if it’s going to cost more. And text isn’t the only data a transformative model could use: audio, video, and RLHF on diverse tasks all seem like good candidates.
Does the limited available public text data affect your views of how likely GPT-N is to be transformative? Are there any considerations overlooked here, or questions that could use a more thorough analysis? Curious about anybody else’s opinions, and thanks for sharing the update, I think it’s quite persuasive.
I suspect Chinchilla’s implied data requirements aren’t going to be that much of a blocker for capability gain. It is an important result, but it’s primarily about the behavior of current backpropped transformer based LLMs.
The data inefficiency of many architectures was known before Chinchilla, but the industry worked around it because it wasn’t yet a bottleneck. After Chinchilla, it has become one of the largest architectural optimization targets. Given the increase in focus and the relative infancy of the research, I would guess the next two years will see the picking of some very juicy low hanging fruit. There are a lot of options floating nearby in conceptspace and there is a lot of room to grow; I’d be surprised if data limitations still feel as salient in 2025.
Well I thought about that but I wasn’t sure whether reinforcement learning from human feedback wouldn’t be just a strict subset of reward learning from human feedback. If reinforcement is indeed the strict definition then I concede but I dont think it makes sense.
The acronym is definitely used for reinforcement learning. [“RLHF” “reinforcement learning from human feedback”] gets 564 hits on google, [“RLHF” “reward learning from human feedback”] gets 0.
I think historically reinforcement has been used more in that particular constellation (see eg deep RL from HP paper) but as I noted I find reward learning more apt as it points to the hard thing being the reward learning, i.e. distilling human feedback into an objective, rather than the optimization of any given reward function (which technically need not involve reinforcement learning)
Are you concerned that pretrained language models might hit data constraints before TAI? Nostalgebraist estimates that there are roughly 3.2T tokens available publicly for language model pretraining. This estimate misses important potential data sources such as transcripts from audio and video and private text conversations and email. But the BioAnchors report estimates that the median transformative model will require a median of 22T data points, nearly an order of magnitude higher than this estimate.
The BioAnchors estimate was also based on older scaling laws that placed a lower priority on data relative to compute. With the new Chinchilla scaling laws, more data would be required for compute-optimal training. Of course, training runs don’t need to be compute-optimal: You can get away with using more compute and less data if you’re constrained by data, even if it’s going to cost more. And text isn’t the only data a transformative model could use: audio, video, and RLHF on diverse tasks all seem like good candidates.
Does the limited available public text data affect your views of how likely GPT-N is to be transformative? Are there any considerations overlooked here, or questions that could use a more thorough analysis? Curious about anybody else’s opinions, and thanks for sharing the update, I think it’s quite persuasive.
I suspect Chinchilla’s implied data requirements aren’t going to be that much of a blocker for capability gain. It is an important result, but it’s primarily about the behavior of current backpropped transformer based LLMs.
The data inefficiency of many architectures was known before Chinchilla, but the industry worked around it because it wasn’t yet a bottleneck. After Chinchilla, it has become one of the largest architectural optimization targets. Given the increase in focus and the relative infancy of the research, I would guess the next two years will see the picking of some very juicy low hanging fruit. There are a lot of options floating nearby in conceptspace and there is a lot of room to grow; I’d be surprised if data limitations still feel as salient in 2025.
What’s RLHF?
Reward Learning from Human Feedback
Reinforcement* learning from human feedback
Well I thought about that but I wasn’t sure whether reinforcement learning from human feedback wouldn’t be just a strict subset of reward learning from human feedback. If reinforcement is indeed the strict definition then I concede but I dont think it makes sense.
The acronym is definitely used for reinforcement learning. [“RLHF” “reinforcement learning from human feedback”] gets 564 hits on google, [“RLHF” “reward learning from human feedback”] gets 0.
Thanks for verifying! I retract my comment.
I think historically reinforcement has been used more in that particular constellation (see eg deep RL from HP paper) but as I noted I find reward learning more apt as it points to the hard thing being the reward learning, i.e. distilling human feedback into an objective, rather than the optimization of any given reward function (which technically need not involve reinforcement learning)