I agree that most of the recent large model gains have been due to the surplus of compute and data, and theory and technique will have to catch up eventually … what I’m not convinced on is why that would necessarily be slow.
I would argue there’s a theory and technique overhang with self-supervised learning being just one area of popular research. We haven’t needed to dip very deeply yet since training bigger transformers with more data “just works.”
There’s very weak evidence that we’re hitting the limits of deep learning itself or even just the transformer architecture. Ultimately, that is the real limiter … certainly data and compute are the conceptually easier problems to solve. Maybe in the short-term that’s enough.
I agree that most of the recent large model gains have been due to the surplus of compute and data, and theory and technique will have to catch up eventually … what I’m not convinced on is why that would necessarily be slow.
I would argue there’s a theory and technique overhang with self-supervised learning being just one area of popular research. We haven’t needed to dip very deeply yet since training bigger transformers with more data “just works.”
There’s very weak evidence that we’re hitting the limits of deep learning itself or even just the transformer architecture. Ultimately, that is the real limiter … certainly data and compute are the conceptually easier problems to solve. Maybe in the short-term that’s enough.