The observation that two SOTA language models trained close together in time were substantially different in measured performance provides evidence of a discontinuity, as defined in the usual sense of a large residual from prior extrapolation.
I can answer your question literally: I don’t think that would be infinitely fast progress. I am genuinely unsure what your point is though. :)
I think there’s a significant point here: that it only makes sense to compare with the expected trend rather than with one data point. In particular, note that if Gopher had been released one day before GPT-3, then GPT-3 wouldn’t have been SOTA, and the time-to-achieve-x-progress would look a lot longer.
I’m confused what you’re asking.
The observation that two SOTA language models trained close together in time were substantially different in measured performance provides evidence of a discontinuity, as defined in the usual sense of a large residual from prior extrapolation.
I can answer your question literally: I don’t think that would be infinitely fast progress. I am genuinely unsure what your point is though. :)
I think there’s a significant point here: that it only makes sense to compare with the expected trend rather than with one data point.
In particular, note that if Gopher had been released one day before GPT-3, then GPT-3 wouldn’t have been SOTA, and the time-to-achieve-x-progress would look a lot longer.
(FWIW, it still seems like a discontinuity to me)