So it’s very possible (albeit unlikely) that the number of total GPUs used for GPT-4 training could be higher than 15000!
Corrections/nitpicks:
GPT-3.5 finished training in early 2022, was released in November 2022, and demonstrated better quality answers than GPT-3. In December 2022, OpenAI released ChatGPT which is based on GPT-3.5 and fine-tuned for conversation.
So it’s very possible (albeit unlikely) that the number of total GPUs used for GPT-4 training could be higher than 15000!
OAers have noted that the cluster has, of course, been expanded heavily since the original 10k (albeit not what it is now).
Morgan Stanleyis saying that GPT-5 is being trained right now on 25,000 GPUs, up heavily from the original 10k, and implying that ‘most’ of the GPT-5 GPUs were used for GPT-4 which finished ‘some time ago’; the mean of 10 & 25 is 17.5, so >15k seems entirely possible, especially if those GPUs weren’t just installed.
Thanks for the comment! I updated the paragraph to:
The GPT-3.5 models finished training and were released in 2022, and demonstrated better quality answers than GPT-3. In late 2022, OpenAI released ChatGPT which is based on GPT-3.5 and fine-tuned for conversation.
The March blog post mentions text-davinci-003, but you only say text-davinci-002 was released in March. The latter seems more plausible, since it matches with the newsletter OpenAI sent out at the end of November: “New GPT-3 model: text-davinci-003”.
Starting today, you can access text-davinci-003 through our API and playground at the same price as our other Davinci base language models ($0.0200 / 1k tokens).
So I think the “March” blog post has probably been edited and isn’t decisive evidence that code-davinci-002 (the GPT 3.5 base model) actually came out in March.
Thanks for writing this!
I think the crux of your estimate of compute usage is the following line:
In May 2020 (!) Microsoft announced that they had built a supercomputer with 10,000 GPUs for OpenAI, which is often suggested to be the machine GPT-3 was trained on: https://news.microsoft.com/source/features/ai/openai-azure-supercomputer/
So it’s very possible (albeit unlikely) that the number of total GPUs used for GPT-4 training could be higher than 15000!
Corrections/nitpicks:
code-davinci-002
andtext-davinci-002
were first released in mid March 2022, soon after the InstructGPT paper, not November 2022. Source:https://openai.com/blog/gpt-3-edit-insert/ (See also this reddit thread talking about
text-davinci-002
.)Also, a nitpick: ChatGPT was released November 30th, 2022: https://openai.com/blog/chatgpt/
OAers have noted that the cluster has, of course, been expanded heavily since the original 10k (albeit not what it is now). Morgan Stanley is saying that GPT-5 is being trained right now on 25,000 GPUs, up heavily from the original 10k, and implying that ‘most’ of the GPT-5 GPUs were used for GPT-4 which finished ‘some time ago’; the mean of 10 & 25 is 17.5, so >15k seems entirely possible, especially if those GPUs weren’t just installed.
Thanks for the comment! I updated the paragraph to:
The March blog post mentions text-davinci-003, but you only say text-davinci-002 was released in March. The latter seems more plausible, since it matches with the newsletter OpenAI sent out at the end of November: “New GPT-3 model: text-davinci-003”.
So I think the “March” blog post has probably been edited and isn’t decisive evidence that code-davinci-002 (the GPT 3.5 base model) actually came out in March.