Would the prediction also apply to inference scaling (laws) - and maybe more broadly various forms of scaling post-training, or only to pretraining scaling?
Some of the underlying evidence, like e.g. Altman’s public statements, is relevant to other forms of scaling. Some of the underlying evidence, like e.g. the data wall, is not. That cashes out to differing levels of confidence in different versions of the prediction.
Would the prediction also apply to inference scaling (laws) - and maybe more broadly various forms of scaling post-training, or only to pretraining scaling?
Some of the underlying evidence, like e.g. Altman’s public statements, is relevant to other forms of scaling. Some of the underlying evidence, like e.g. the data wall, is not. That cashes out to differing levels of confidence in different versions of the prediction.