jacob_cannell comments on Large language models aren’t trained enough

jacob_cannell 29 Mar 2023 22:23 UTC
7 points
5
This point is semi-correct now, but mostly incorrect for future systems. A larger model learns faster per data point which is increasingly important as we move towards AGI. If you want a system which has mostly memorized the internet then sure—overtraining a small model now makes sense. If you want a system that can rapidly continuously transfer learn from minimal amounts of new data to compete with smart humans, then you probably want something far larger than even the naive^[1] chinchilla optimum.
1. ↩︎
  Naive in the sense that it only considers total compute cost of training, without considering future downstream data efficiency.