The relevant tooling already exists: https://github.com/THUDM/CodeGeeX is a 13b param model which would have been SOTA two years ago, trained on Chinese hardware, using a Chinese deep learning framework.
Anyone who thinks that compute restrictions will help with x-risk should explain how they plan to convince China that this isn’t just a geopolitical ploy by the US to cripple their advanced industry and related military capabilities. (despite, you know, that being the motivation for export controls to date)
Wow—if those stats are correct, the training of CodeGeeX used up to 1e24 nominal flops (2.56e14 flops/chip * 1536 chips * 2.6e6 seconds), which would put it a bit ahead of Chinchilla, although its seemingly lite on param count. But it is somewhat easier to tile a chip with fp16 units then it is to utilize them effectively, so the true useful flops may be lower.
Nonetheless, that’s quite surprising, impressive, and perhaps concerning.
Distributed training runs never manage to fully utilize nominal flops from hardware, and are easy to stuff up in other ways too, but I’d expect the chips themselves to be pretty well set out—it’s obvious early in the design stage if you’re going to be bottlenecked on something else.
I’m open to the idea that this is good from an x-risk perspective but definitely not 100% sold on it. I agree with you that China knows this is directly aiming to cripple their advanced industry and related military capabilities. We’re entering an era of open hostilities and that’s not news to anyone on either side. I don’t think that necessarily means this is bad from an x-risk perspective.
A few claims relevant to analyzing x-risk in the context of US-China policy:
AGI alignment is more likely if AGI is developed by US groups rather than Chinese groups, because some influential people in the US take alignment seriously while almost nobody in China does.
Slowing Chinese AI progress is good because it makes the US more likely to be the first to AGI, or gives more time for the necessary alignment work to happen and spread to China.
US policy can actually slow Chinese AI development with export bans on compute.
I currently believe these three claims. CodeGeeX is pretty good evidence against 3, showing that Chinese compute and tooling is catching up to that of the US. But building high performance compute is a complicated supply chain with a lot of choke points, and it seems plausible that the US can slow Chinese progress by a few years. See e.g. this CSET article: https://cset.georgetown.edu/wp-content/uploads/Preserving-the-Chokepoints.pdf
IMO the strongest argument against this policy from an x-risk perspective is that this reduces future influence over Chinese AI development by using a one-time slowdown right now. If the most critical time for slowing AI progress is in the future, this bullet will no longer be in the chamber. But I also haven’t spent much time thinking about this and would welcome better arguments.
I was wrong that nobody in China takes alignment seriously! Concordia Consulting led by Brian Tse and Tianxia seem to be leading the charge. See this post and specifically this comment. To the degree that poor US-China relations slow the spread of alignment work in China, current US policy seems harmful.
I think that convincing Chinese researchers and policy-makers of the importance of the alignment problem would be very valuable, but it also risks changing the focus of race dynamics to AGI, and is therefore very risky. The last thing you want to do is leave the CCP convinced that AGI is very important but safety isn’t, as happened to John Carmack! Also beware thinking you’re in a race.
I think (3) is only true over timescales of 2--5 years: the thing that really matters is performance per unit cost, and if you own the manufacturer you’re not paying Nvidia’s ~80% unit margins.
I think that convincing Chinese researchers and policy-makers of the importance of the alignment problem would be very valuable, but it also risks changing the focus of race dynamics to AGI, and is therefore very risky. The last thing you want to do is leave the CCP convinced that AGI is very important but safety isn’t, as happened to John Carmack! Also beware thinking you’re in a race.
Unfortunately for us LessWrongers, it’s probably rational for them to believe that a race is happening, because to put it bluntly, China having AGI would become the world power that exceeds even the US. Problem is, the US government doesn’t care about the alignment problem.
The relevant tooling already exists: https://github.com/THUDM/CodeGeeX is a 13b param model which would have been SOTA two years ago, trained on Chinese hardware, using a Chinese deep learning framework.
Anyone who thinks that compute restrictions will help with x-risk should explain how they plan to convince China that this isn’t just a geopolitical ploy by the US to cripple their advanced industry and related military capabilities. (despite, you know, that being the motivation for export controls to date)
It used Huawei Ascend 910 AI Processors, which was fabbed by TSMC, which will no longer be allowed to make such chips for China.
Yep, this is a bigger deal than I realized last week.
Wow—if those stats are correct, the training of CodeGeeX used up to 1e24 nominal flops (2.56e14 flops/chip * 1536 chips * 2.6e6 seconds), which would put it a bit ahead of Chinchilla, although its seemingly lite on param count. But it is somewhat easier to tile a chip with fp16 units then it is to utilize them effectively, so the true useful flops may be lower.
Nonetheless, that’s quite surprising, impressive, and perhaps concerning.
Distributed training runs never manage to fully utilize nominal flops from hardware, and are easy to stuff up in other ways too, but I’d expect the chips themselves to be pretty well set out—it’s obvious early in the design stage if you’re going to be bottlenecked on something else.
I’m open to the idea that this is good from an x-risk perspective but definitely not 100% sold on it. I agree with you that China knows this is directly aiming to cripple their advanced industry and related military capabilities. We’re entering an era of open hostilities and that’s not news to anyone on either side. I don’t think that necessarily means this is bad from an x-risk perspective.
A few claims relevant to analyzing x-risk in the context of US-China policy:
AGI alignment is more likely if AGI is developed by US groups rather than Chinese groups, because some influential people in the US take alignment seriously while almost nobody in China does.
Slowing Chinese AI progress is good because it makes the US more likely to be the first to AGI, or gives more time for the necessary alignment work to happen and spread to China.
US policy can actually slow Chinese AI development with export bans on compute.
I currently believe these three claims. CodeGeeX is pretty good evidence against 3, showing that Chinese compute and tooling is catching up to that of the US. But building high performance compute is a complicated supply chain with a lot of choke points, and it seems plausible that the US can slow Chinese progress by a few years. See e.g. this CSET article: https://cset.georgetown.edu/wp-content/uploads/Preserving-the-Chokepoints.pdf
IMO the strongest argument against this policy from an x-risk perspective is that this reduces future influence over Chinese AI development by using a one-time slowdown right now. If the most critical time for slowing AI progress is in the future, this bullet will no longer be in the chamber. But I also haven’t spent much time thinking about this and would welcome better arguments.
I was wrong that nobody in China takes alignment seriously! Concordia Consulting led by Brian Tse and Tianxia seem to be leading the charge. See this post and specifically this comment. To the degree that poor US-China relations slow the spread of alignment work in China, current US policy seems harmful.
I think that convincing Chinese researchers and policy-makers of the importance of the alignment problem would be very valuable, but it also risks changing the focus of race dynamics to AGI, and is therefore very risky. The last thing you want to do is leave the CCP convinced that AGI is very important but safety isn’t, as happened to John Carmack! Also beware thinking you’re in a race.
I think (3) is only true over timescales of 2--5 years: the thing that really matters is performance per unit cost, and if you own the manufacturer you’re not paying Nvidia’s ~80% unit margins.
Unfortunately for us LessWrongers, it’s probably rational for them to believe that a race is happening, because to put it bluntly, China having AGI would become the world power that exceeds even the US. Problem is, the US government doesn’t care about the alignment problem.