You assume that in 2023 “The multimodal transformers are now even bigger; the biggest are about half a trillion parameters”, while GPT-3 had 137 billions in 2020 (but not multimodal). This is like 4 times grows in 3 years, compared with an order of magnitude in 3 month growth before GPT-3. So you assume a significant slowdown in the parameter growth.
I heard a rumor that GPT-4 could be as large as 32 trillion parameters. If it turns to be true, will it affect your prediction?
Indeed, my median future involves a significant slowdown in dense-network parameter growth.
If there is a 32 trillion parameter dense model by 2023, I’ll be surprised and update towards shorter timelines, unless it turns out to be underwhelming compared to the performance predicted by the scaling trends.
Hard to say, it depends a lot on the rest of the details. If the performance is as good as the scaling trends would predict, it’ll be almost human-level at text prediction and multiple choice questions on diverse topics and so forth. After fine-tuning it would probably be a beast.
I suppose I’d update my 50% mark to, like, 2027 or so? IDK.
I got the idea. I would also update to a very short timeline (4-5 years) in the absence of slowdown in dense-network parameter growth l, and performance following the scaling trend.
And I was pretty scared when GPT-3 was released. As many here, I was expected further growth in that direction very soon which did not happen. So, I am less scared now.
This was all well before the Chinchilla scaling paper, but this has still turned out to be absolutely true by 2023. We have PaLM-E 540B just for starters.
You assume that in 2023 “The multimodal transformers are now even bigger; the biggest are about half a trillion parameters”, while GPT-3 had 137 billions in 2020 (but not multimodal). This is like 4 times grows in 3 years, compared with an order of magnitude in 3 month growth before GPT-3. So you assume a significant slowdown in the parameter growth.
I heard a rumor that GPT-4 could be as large as 32 trillion parameters. If it turns to be true, will it affect your prediction?
Indeed, my median future involves a significant slowdown in dense-network parameter growth.
If there is a 32 trillion parameter dense model by 2023, I’ll be surprised and update towards shorter timelines, unless it turns out to be underwhelming compared to the performance predicted by the scaling trends.
What will be your new median? (If you observe 32 trillion parameter model in 2023)
Hard to say, it depends a lot on the rest of the details. If the performance is as good as the scaling trends would predict, it’ll be almost human-level at text prediction and multiple choice questions on diverse topics and so forth. After fine-tuning it would probably be a beast.
I suppose I’d update my 50% mark to, like, 2027 or so? IDK.
I got the idea. I would also update to a very short timeline (4-5 years) in the absence of slowdown in dense-network parameter growth l, and performance following the scaling trend. And I was pretty scared when GPT-3 was released. As many here, I was expected further growth in that direction very soon which did not happen. So, I am less scared now.
This was all well before the Chinchilla scaling paper, but this has still turned out to be absolutely true by 2023. We have PaLM-E 540B just for starters.