In the past couple of weeks lots of people have been saying the scaling labs have hit the data wall, because of rumors of slowdowns in capabilities improvements. But before that, I was hearing at least some people in those labs saying that they expected to wring another 0.5 − 1 order of magnitude of human-generated training data out of what they had access to, and that still seems very plausible to me
Epoch’s analysis from June supports this view, and suggests it may even be a bit too conservative:
(and that’s just for text—there are also other significant sources of data for multimodal models, eg video)
Epoch’s analysis from June supports this view, and suggests it may even be a bit too conservative:
(and that’s just for text—there are also other significant sources of data for multimodal models, eg video)