Hmm? The 10 billion funding increase to OpenAI and the arms race with google pretty much guaranteed that the 10^30/ 1 billion USD machine for training would be satisfied. So we can mark that one as “almost certainly” satisfied by EOY 2023. Only way it isn’t is a shortage of GPU/TPUs.
GPT-4 likely satisfies MMLU. So with 2 “almost certain” conditions met, plus if by some fluke they aren’t met by 2026, there are still several other ways Matt can lose the bet.
I think you’re overconfident here. I’m quite skeptical that GPT-4 already got above 80% on every single task in the MMLU since there are 57 tasks and it got 86.4% on average. I’m also skeptical that OpenAI will very soon spend >$1 billion to train a single model, but I definitely don’t think that’s implausible. “Almost certain” for either of those seems wrong.
There’s gpt-5 though, or GPT-4.math.finetune. You saw the Minerva results. You know there will be significant gain with a fine-tune, likely enough to satisfy 2-3 of your conditions.
As I said it’s ridiculous to think someone either in the Google or OAI camp won’t have more than 1 billion USD in training hardware, in service for a single model (training many instances in parallel) by openAI.
Think about what that means. 1 A100 is 25k. The cluster meta uses is 2048 of them. So about 50 million.
Why would you not go for the most powerful model possible as soon as you can? Either the world’s largest tech giant is about to lose it all, or they are going to put the proportional effort in.
As I said it’s ridiculous to think someone either in the Google or OAI camp won’t have more than 1 billion USD in training hardware, in service for a single model (training many instances in parallel) by openAI.
I think you’re reading this condition incorrectly. The $1 billion would need to be spent for a single model. If OpenAI buys a $2 billion supercomputer but they train 10 models with it, that won’t necessarily qualify.
Then why did you add the term? I assume you meant that the entire supercomputer is working on instances of the same model at once. Obviously training is massively parallel.
Once the model is done obviously the supercomputer will be used for other things.
Hmm? The 10 billion funding increase to OpenAI and the arms race with google pretty much guaranteed that the 10^30/ 1 billion USD machine for training would be satisfied. So we can mark that one as “almost certainly” satisfied by EOY 2023. Only way it isn’t is a shortage of GPU/TPUs.
GPT-4 likely satisfies MMLU. So with 2 “almost certain” conditions met, plus if by some fluke they aren’t met by 2026, there are still several other ways Matt can lose the bet.
I think you’re overconfident here. I’m quite skeptical that GPT-4 already got above 80% on every single task in the MMLU since there are 57 tasks and it got 86.4% on average. I’m also skeptical that OpenAI will very soon spend >$1 billion to train a single model, but I definitely don’t think that’s implausible. “Almost certain” for either of those seems wrong.
There’s gpt-5 though, or GPT-4.math.finetune. You saw the Minerva results. You know there will be significant gain with a fine-tune, likely enough to satisfy 2-3 of your conditions.
As I said it’s ridiculous to think someone either in the Google or OAI camp won’t have more than 1 billion USD in training hardware, in service for a single model (training many instances in parallel) by openAI.
Think about what that means. 1 A100 is 25k. The cluster meta uses is 2048 of them. So about 50 million.
Why would you not go for the most powerful model possible as soon as you can? Either the world’s largest tech giant is about to lose it all, or they are going to put the proportional effort in.
I think you’re reading this condition incorrectly. The $1 billion would need to be spent for a single model. If OpenAI buys a $2 billion supercomputer but they train 10 models with it, that won’t necessarily qualify.
Then why did you add the term? I assume you meant that the entire supercomputer is working on instances of the same model at once. Obviously training is massively parallel.
Once the model is done obviously the supercomputer will be used for other things.