I don’t think recent and further scaling really changes the ever-present tradeoff between large full runs and small experimental runs. That’s been a factor in training large neural networks since 2004 at least, the first time I was involved in attempts to deal with real-world datasets that benefit from scaling networks as far as the hardware allows.
I don’t think recent and further scaling really changes the ever-present tradeoff between large full runs and small experimental runs. That’s been a factor in training large neural networks since 2004 at least, the first time I was involved in attempts to deal with real-world datasets that benefit from scaling networks as far as the hardware allows.