Rereading this, I think a lot of the same things could be said about GPT-2 and GPT-3.
--Very simple, off-the shelf transformer architecture, plus tons of compute, beats all sorts of more specialized algorithms without even any fine-tuning. (few-shot learning)
--Then it gets substantially better, a year later, just by making it bigger. With performance improvement curves that aren’t showing signs of diminishing returns.
Is it a case of one project pulling far ahead of everyone else? That’s less clear. GPT-3 doesn’t beat SOTA on many tasks, though IIRC it does on some. However, I’d bet that in the metric “How good are you, without fine-tuning, on this ensemble of tasks” the GPT series blows everything else out of the water.
With now a year of hindsight, it seems that GPT-3 was about a year or two ahead of everyone else. It has only now been decisively surpassed, in my imperfect reckoning. (Maybe there were private, unpublished systems that eclipsed it earlier.)
Rereading this, I think a lot of the same things could be said about GPT-2 and GPT-3.
--Very simple, off-the shelf transformer architecture, plus tons of compute, beats all sorts of more specialized algorithms without even any fine-tuning. (few-shot learning)
--Then it gets substantially better, a year later, just by making it bigger. With performance improvement curves that aren’t showing signs of diminishing returns.
Is it a case of one project pulling far ahead of everyone else? That’s less clear. GPT-3 doesn’t beat SOTA on many tasks, though IIRC it does on some. However, I’d bet that in the metric “How good are you, without fine-tuning, on this ensemble of tasks” the GPT series blows everything else out of the water.
With now a year of hindsight, it seems that GPT-3 was about a year or two ahead of everyone else. It has only now been decisively surpassed, in my imperfect reckoning. (Maybe there were private, unpublished systems that eclipsed it earlier.)