The date published vs date trained was on my mind because of Gopher. It seemed to me very relevant,that Deepmind trained a significantly larger model within basically half a year of the publication of GPT-3.
In addition to google brain also being quite coy about their 100+B model it made me update a lot in the direction of “the big players will replicate any new breakthrough very quickly but not necessarily talk about it.”
To be clear, I also think it probably doesn’t make sense to include this information in the list, because it is too rarely relevant.
It’s worth noting that aside from the ridiculous situation where Googlers aren’t allowed to name LaMDA (despite at least 5 published papers so far), Google has been very coy about MUM & Pathways (to the point where I’m still not sure if ‘Pathways’ is an actual model that exists, or merely an aspirational goal/name of a research programme). You also have the situation where models like LG’s new 300b Exaone is described in a research paper which makes no mention of Exaone (the Korean coverage briefly mentions the L-Verse arch, but none of the English coverage does), or where we still have little idea what the various Wudao models (the MoE recordholders...?) do. And how about that Megatron-NLG-500b, eh? Is it cool, or not? A blog post, and one paper about how efficiently it can censor tweets, is not much to evaluate it on.
And forget about real evaluation! I’m sure OA’s DALL-E successor, GLIDE, is capable of very cool things, which people would find if they could poke at it to establish things like CLIP’s* ability to do visual analogies or “the Unreal Engine prompt”; but we’ll never know because they aren’t going to release it, and if they do, it’ll be locked behind an API where you can’t do many of the useful things like backpropping through it.
We are much more ignorant about the capabilities of the best models today than we were a year ago.
Increasingly, we’re gonna need range/interval notation and survival/extremes analysis to model things, since exact dates, benchmarks, petaflops/days, and parameter counts will be unavailable. Better start updating your data model & graphs now.
* One gets the impression that if OA had realized just how powerful CLIP was for doing more than just zero-shot ImageNet classification or re-ranking DALL-E samples, they probably wouldn’t’ve released the largest models of it. Another AI capabilities lesson: even the creators of something don’t always know what it is capable of. “Attacks only get better.”
This is an excellent point and it’s indeed one of the fundamental limitations of a public tracking approach. Extrapolating trends in an information environment like this can quickly degenerate into pure fantasy. All one can really be sure of is that the public numbers are merely lower bounds — and plausibly, very weak ones.
Yeah, great point about Gopher, we noticed the same thing and included a note to that effect in Gopher’s entry in the tracker.
I agree there’s reason to believe this sort of delay could become a bigger factor in the future, and may already be a factor now. If we see this pattern develop further (and if folks start publishing “model cards” more consistently like DM did, which gave us the date of Gopher’s training) we probably will begin to include training date as separate from publication date. But for now, it’s a possible trend to keep an eye on.
Much better now!
The date published vs date trained was on my mind because of Gopher. It seemed to me very relevant,that Deepmind trained a significantly larger model within basically half a year of the publication of GPT-3.
In addition to google brain also being quite coy about their 100+B model it made me update a lot in the direction of “the big players will replicate any new breakthrough very quickly but not necessarily talk about it.”
To be clear, I also think it probably doesn’t make sense to include this information in the list, because it is too rarely relevant.
It’s worth noting that aside from the ridiculous situation where Googlers aren’t allowed to name LaMDA (despite at least 5 published papers so far), Google has been very coy about MUM & Pathways (to the point where I’m still not sure if ‘Pathways’ is an actual model that exists, or merely an aspirational goal/name of a research programme). You also have the situation where models like LG’s new 300b Exaone is described in a research paper which makes no mention of Exaone (the Korean coverage briefly mentions the L-Verse arch, but none of the English coverage does), or where we still have little idea what the various Wudao models (the MoE recordholders...?) do. And how about that Megatron-NLG-500b, eh? Is it cool, or not? A blog post, and one paper about how efficiently it can censor tweets, is not much to evaluate it on.
And forget about real evaluation! I’m sure OA’s DALL-E successor, GLIDE, is capable of very cool things, which people would find if they could poke at it to establish things like CLIP’s* ability to do visual analogies or “the Unreal Engine prompt”; but we’ll never know because they aren’t going to release it, and if they do, it’ll be locked behind an API where you can’t do many of the useful things like backpropping through it.
We are much more ignorant about the capabilities of the best models today than we were a year ago.
Increasingly, we’re gonna need range/interval notation and survival/extremes analysis to model things, since exact dates, benchmarks, petaflops/days, and parameter counts will be unavailable. Better start updating your data model & graphs now.
* One gets the impression that if OA had realized just how powerful CLIP was for doing more than just zero-shot ImageNet classification or re-ranking DALL-E samples, they probably wouldn’t’ve released the largest models of it. Another AI capabilities lesson: even the creators of something don’t always know what it is capable of. “Attacks only get better.”
This is an excellent point and it’s indeed one of the fundamental limitations of a public tracking approach. Extrapolating trends in an information environment like this can quickly degenerate into pure fantasy. All one can really be sure of is that the public numbers are merely lower bounds — and plausibly, very weak ones.
Yeah, great point about Gopher, we noticed the same thing and included a note to that effect in Gopher’s entry in the tracker.
I agree there’s reason to believe this sort of delay could become a bigger factor in the future, and may already be a factor now. If we see this pattern develop further (and if folks start publishing “model cards” more consistently like DM did, which gave us the date of Gopher’s training) we probably will begin to include training date as separate from publication date. But for now, it’s a possible trend to keep an eye on.
Thanks again!