Finbarr Timbers makes a point, obvious in retrospect, but which many people, including people forecasting AI timeline, seem to miss: since training cost is amortized over inference, optimal training depends on expected amount of inference. Both scaling laws from OpenAI and DeepMind assume zero (or negligible) inference, which is obviously incorrect. Any forecasting using scaling laws similarly is suspect and should be revised.
Large language models aren’t trained enough
Link post
Finbarr Timbers makes a point, obvious in retrospect, but which many people, including people forecasting AI timeline, seem to miss: since training cost is amortized over inference, optimal training depends on expected amount of inference. Both scaling laws from OpenAI and DeepMind assume zero (or negligible) inference, which is obviously incorrect. Any forecasting using scaling laws similarly is suspect and should be revised.