Yes, though notably I must make an additional assumption that the returns to capability are such that when allocating compute you go closer to chinchilla optimal rather than trying to make inference cheaper.
As in, my argument has two steps:
Comparable resources (matching epoch’s analysis)
Returns to capabilites imply that you want to be near chinchilla optimal rather than overtrained.
I think this is likely to not be too far off in practice, though it might lose you an order of magnitude or so.
Yes, though notably I must make an additional assumption that the returns to capability are such that when allocating compute you go closer to chinchilla optimal rather than trying to make inference cheaper.
As in, my argument has two steps:
Comparable resources (matching epoch’s analysis)
Returns to capabilites imply that you want to be near chinchilla optimal rather than overtrained.
I think this is likely to not be too far off in practice, though it might lose you an order of magnitude or so.