Musk claims xAI built a cluster of 100,000 Nvidia H100 GPUs—one of the most advanced broadly available chips—in a facility in Memphis, Tenn.
In a post on Monday, Musk said the 100,000-chip cluster, known as Colossus, is already up and running and is the “most powerful AI training system in the world.” Two people with direct knowledge of xAI’s chip order and power capacity at the site said they believed that fewer than half of those chips are currently in operation, largely because of constraints involving power or networking gear.
Whether Musk’s claims are embellished or not, they have caused a stir among other top AI developers, which fear falling behind. OpenAI CEO Sam Altman, for instance, has told some Microsoft executives he is concerned that xAI could soon have more access to computing power than OpenAI does, according to someone who heard his comments.
Keep in mind Musk never said it was “fully online” or “100,000 GPUs are running concurrently” or anything like that. He only said that the cluster was “online”, which could mean just about anything, and that it is “the most powerful AI training system”, which is unfalsifiable (who can know how powerful every AI training system is worldwide, including all of the secret proprietary ones by FANG etc?) and obvious pure puffery (“best pizza in the world!”). If you fell for it, well, then the tweet was for you.
I now think this is false. From The Information:
Keep in mind Musk never said it was “fully online” or “100,000 GPUs are running concurrently” or anything like that. He only said that the cluster was “online”, which could mean just about anything, and that it is “the most powerful AI training system”, which is unfalsifiable (who can know how powerful every AI training system is worldwide, including all of the secret proprietary ones by FANG etc?) and obvious pure puffery (“best pizza in the world!”). If you fell for it, well, then the tweet was for you.