There’s gpt-5 though, or GPT-4.math.finetune. You saw the Minerva results. You know there will be significant gain with a fine-tune, likely enough to satisfy 2-3 of your conditions.
As I said it’s ridiculous to think someone either in the Google or OAI camp won’t have more than 1 billion USD in training hardware, in service for a single model (training many instances in parallel) by openAI.
Think about what that means. 1 A100 is 25k. The cluster meta uses is 2048 of them. So about 50 million.
Why would you not go for the most powerful model possible as soon as you can? Either the world’s largest tech giant is about to lose it all, or they are going to put the proportional effort in.
As I said it’s ridiculous to think someone either in the Google or OAI camp won’t have more than 1 billion USD in training hardware, in service for a single model (training many instances in parallel) by openAI.
I think you’re reading this condition incorrectly. The $1 billion would need to be spent for a single model. If OpenAI buys a $2 billion supercomputer but they train 10 models with it, that won’t necessarily qualify.
Then why did you add the term? I assume you meant that the entire supercomputer is working on instances of the same model at once. Obviously training is massively parallel.
Once the model is done obviously the supercomputer will be used for other things.
There’s gpt-5 though, or GPT-4.math.finetune. You saw the Minerva results. You know there will be significant gain with a fine-tune, likely enough to satisfy 2-3 of your conditions.
As I said it’s ridiculous to think someone either in the Google or OAI camp won’t have more than 1 billion USD in training hardware, in service for a single model (training many instances in parallel) by openAI.
Think about what that means. 1 A100 is 25k. The cluster meta uses is 2048 of them. So about 50 million.
Why would you not go for the most powerful model possible as soon as you can? Either the world’s largest tech giant is about to lose it all, or they are going to put the proportional effort in.
I think you’re reading this condition incorrectly. The $1 billion would need to be spent for a single model. If OpenAI buys a $2 billion supercomputer but they train 10 models with it, that won’t necessarily qualify.
Then why did you add the term? I assume you meant that the entire supercomputer is working on instances of the same model at once. Obviously training is massively parallel.
Once the model is done obviously the supercomputer will be used for other things.