Judging from what I’ve read here on LW, it’s maybe around 3/4ths as significant as GPT-3? I might be wrong here, though.
Disclaimer to the effect that I’m not very experienced here either and might be wrong too, but I’m not sure that’s the right comparison. It seems to me like GPT-2 (or GPT, but I don’t know anything about it) was a breakthrough in having one model that’s good at learning on new tasks with little data, and GPT-3 was a breakthrough in showing how far capabilities like that can extend with greater compute. This feels more like the former than the latter, but also sounds more significant than GPT-2 from a pure generalizing capability standpoint, so maybe slightly more significant than GPT-2?
Disclaimer to the effect that I’m not very experienced here either and might be wrong too, but I’m not sure that’s the right comparison. It seems to me like GPT-2 (or GPT, but I don’t know anything about it) was a breakthrough in having one model that’s good at learning on new tasks with little data, and GPT-3 was a breakthrough in showing how far capabilities like that can extend with greater compute. This feels more like the former than the latter, but also sounds more significant than GPT-2 from a pure generalizing capability standpoint, so maybe slightly more significant than GPT-2?