Seth Herd comments on LLM Generality is a Timeline Crux

Seth Herd 24 Jun 2024 20:58 UTC
8 points
2
I don’t think recent and further scaling really changes the ever-present tradeoff between large full runs and small experimental runs. That’s been a factor in training large neural networks since 2004 at least, the first time I was involved in attempts to deal with real-world datasets that benefit from scaling networks as far as the hardware allows.