Russia is not at all an AI superpower. China also seems to be quite far behind the west in terms of LLMs, so overall, six months would very likely not lead to any of them catching up.
China also seems to be quite far behind the west in terms of LLM
This doesn’t match my impression. For example, THUDM(Tsing Hua University Data Mining lab) is one of the most impressive group in the world in terms of actually doing large LLM training runs.
yeah you’re pretty much just wrong about china, as far as I can tell. It’s hard to be sure, but they’ve been hitting >1T parameters sparse regularly [edit: whoops, I was wrong about that scale level, they’re just reaching 1T of useful model scale now]; I find it hard to tell whether they’re getting good performance out of their scaling, my impression is they scaled slightly too early, but with the different dataset I’m not really sure.
So, it’s not clear that they got the target performance out of this model. However, they did manage to scale it, which is all it takes. They don’t need to buy more GPUs, they’ve got what they need, as long as they can find the algorithms. Which are mostly published.
Well they get to run however much compute they do have 6 more months with no competition. Probably several years since obviously this pause would get renewed again and again until someone honoring it defects. Note that enormous models are a function of total cluster memory and interconnect. Many current clusters have enough memory for theoretically enormous models, 10 trillion weights plus. Having too few GPUs so training takes a year+ is a problem unless your competition is all idle.
Worth keeping in mind that the current focus on generative AI is very much a western phenomenon, and the focus on China has, until recently at least, been on non-generative applications, so progress is going to look different there as it does here.
Russia is not at all an AI superpower. China also seems to be quite far behind the west in terms of LLMs, so overall, six months would very likely not lead to any of them catching up.
This doesn’t match my impression. For example, THUDM(Tsing Hua University Data Mining lab) is one of the most impressive group in the world in terms of actually doing large LLM training runs.
yeah you’re pretty much just wrong about china, as far as I can tell. It’s hard to be sure,
but they’ve been hitting >1T parameters sparse regularly[edit: whoops, I was wrong about that scale level, they’re just reaching 1T of useful model scale now]; I find it hard to tell whether they’re getting good performance out of their scaling, my impression is they scaled slightly too early, but with the different dataset I’m not really sure.1 Haven’t seen an impressive AI product come out of China (Please point me to some if you disagree)
2 They can’t import A100/ H100 anymore after the US chip restrictions
So, it’s not clear that they got the target performance out of this model. However, they did manage to scale it, which is all it takes. They don’t need to buy more GPUs, they’ve got what they need, as long as they can find the algorithms. Which are mostly published.
https://twitter.com/arankomatsuzaki/status/1637983258880122881 - https://arxiv.org/abs/2303.10845
Thanks! Haven’t found good comments on that paper (and lack the technical insights to evaluate it myself)
Are you implying that China has access to compute required for a) GPT-4 type models or b) AGI?
Well they get to run however much compute they do have 6 more months with no competition. Probably several years since obviously this pause would get renewed again and again until someone honoring it defects. Note that enormous models are a function of total cluster memory and interconnect. Many current clusters have enough memory for theoretically enormous models, 10 trillion weights plus. Having too few GPUs so training takes a year+ is a problem unless your competition is all idle.
a.
Worth keeping in mind that the current focus on generative AI is very much a western phenomenon, and the focus on China has, until recently at least, been on non-generative applications, so progress is going to look different there as it does here.