Which is reasonable. It has been about <2.5 years since GPT-3 was trained (they mention the move to Azure disrupting training, IIRC, which lets you date it earlier than just ‘May 2020’). Under the 3.4 month “AI and Compute” trend, you’d expect 8.8 doublings or the top run now being 445x. I do not think anyone has a 445x run they are about to unveil any second now. Whereas on the slower >5.7-month doubling in that link, you would expect <36x, which is still 3x PaLM’s actual 10x, but at least the right order of magnitude.
There may also be other runs around PaLM scale, pushing peak closer to 30x. (eg Gopher was secret for a long time and a larger Chinchilla would be a logical thing to do and we wouldn’t know until next year, potentially; and no one’s actually computed the total FLOPS for ERNIE-Titan AFAIK, and it may still be running so who knows what it’s up to in total compute consumption. So, 10x from PaLM is the lower bound, and 5 years from now, we may look back and say “ah yes, XYZ nailed the compute-trend exactly, we just didn’t learn about it until recently when they happened to disclose exact numbers.” Somewhat like how some Starcraft predictions were falsified but retroactively turned out to be right because we just didn’t know about AlphaStar and no one had noticed Vinyal’s Blizzard talk implying they were positioned for AlphaStar.)
It’s roughly an order of magnitude more compute than GPT-3.
Which is reasonable. It has been about <2.5 years since GPT-3 was trained (they mention the move to Azure disrupting training, IIRC, which lets you date it earlier than just ‘May 2020’). Under the 3.4 month “AI and Compute” trend, you’d expect 8.8 doublings or the top run now being 445x. I do not think anyone has a 445x run they are about to unveil any second now. Whereas on the slower >5.7-month doubling in that link, you would expect <36x, which is still 3x PaLM’s actual 10x, but at least the right order of magnitude.
There may also be other runs around PaLM scale, pushing peak closer to 30x. (eg Gopher was secret for a long time and a larger Chinchilla would be a logical thing to do and we wouldn’t know until next year, potentially; and no one’s actually computed the total FLOPS for ERNIE-Titan AFAIK, and it may still be running so who knows what it’s up to in total compute consumption. So, 10x from PaLM is the lower bound, and 5 years from now, we may look back and say “ah yes, XYZ nailed the compute-trend exactly, we just didn’t learn about it until recently when they happened to disclose exact numbers.” Somewhat like how some Starcraft predictions were falsified but retroactively turned out to be right because we just didn’t know about AlphaStar and no one had noticed Vinyal’s Blizzard talk implying they were positioned for AlphaStar.)