Our analysis indicates that AI labs should spend comparable resources on training and running inference, assuming they can flexibly balance compute between these tasks to maintain model performance.
Yes, though notably I must make an additional assumption that the returns to capability are such that when allocating compute you go closer to chinchilla optimal rather than trying to make inference cheaper.
As in, my argument has two steps:
Comparable resources (matching epoch’s analysis)
Returns to capabilites imply that you want to be near chinchilla optimal rather than overtrained.
I think this is likely to not be too far off in practice, though it might lose you an order of magnitude or so.
This Epoch analysis (Optimally Allocating Compute Between Inference and Training) suggests something similar (especially assuming Chinchilla scaling laws would keep holding):
Yes, though notably I must make an additional assumption that the returns to capability are such that when allocating compute you go closer to chinchilla optimal rather than trying to make inference cheaper.
As in, my argument has two steps:
Comparable resources (matching epoch’s analysis)
Returns to capabilites imply that you want to be near chinchilla optimal rather than overtrained.
I think this is likely to not be too far off in practice, though it might lose you an order of magnitude or so.