The observation that two SOTA language models trained close together in time were substantially different in measured performance provides evidence of a discontinuity, as defined in the usual sense of a large residual from prior extrapolation.
I can answer your question literally: I don’t think that would be infinitely fast progress. I am genuinely unsure what your point is though. :)
I think there’s a significant point here: that it only makes sense to compare with the expected trend rather than with one data point. In particular, note that if Gopher had been released one day before GPT-3, then GPT-3 wouldn’t have been SOTA, and the time-to-achieve-x-progress would look a lot longer.
If two people trained language models at the same time and one was better than the other, would you call it infinitely fast progress?
I’m confused what you’re asking.
The observation that two SOTA language models trained close together in time were substantially different in measured performance provides evidence of a discontinuity, as defined in the usual sense of a large residual from prior extrapolation.
I can answer your question literally: I don’t think that would be infinitely fast progress. I am genuinely unsure what your point is though. :)
I think there’s a significant point here: that it only makes sense to compare with the expected trend rather than with one data point.
In particular, note that if Gopher had been released one day before GPT-3, then GPT-3 wouldn’t have been SOTA, and the time-to-achieve-x-progress would look a lot longer.
(FWIW, it still seems like a discontinuity to me)