I think there’s a significant point here: that it only makes sense to compare with the expected trend rather than with one data point. In particular, note that if Gopher had been released one day before GPT-3, then GPT-3 wouldn’t have been SOTA, and the time-to-achieve-x-progress would look a lot longer.
I think there’s a significant point here: that it only makes sense to compare with the expected trend rather than with one data point.
In particular, note that if Gopher had been released one day before GPT-3, then GPT-3 wouldn’t have been SOTA, and the time-to-achieve-x-progress would look a lot longer.
(FWIW, it still seems like a discontinuity to me)