[10/50/90% = 2021/2022/2025] 50% chance scenario: At the end of 2020, Google releases their own API with a 100x larger model. In 2021, there is a race for who could scale it to an even larger model, with Manhattan project level investments from nation states like China and the US. Physicists, computer scientists and mathematicians are working on bottlenecks privately, either at top companies or in secret nation-state labs. People have already built products on top of Google’s API which make researchers and engineers more productive. At the end of 2021, one country (or lab) reached a GPT-5 like model size but doesn’t publish anything. Instead, researchers use this new model to predict what a better architecture for a 100x bigger model should be, and their productivity is vastly improved. At the same time, researchers from private labs (think OpenAI /DeepMind) move to this leading country (or lab) to help build the next version (safely?). In 2022, they come up with something we would call both transformative AI and GPT-6 today.
Your 90% is 2025? This suggests that you assign <10% credence to the disjunction of the following:
--The “Scaling hypothesis” is false. (One reason it could be false: Bigger does equal better, but transformative AI requires 10 orders of magnitude more compute, not 3. Another reason it could be false: We need better training environments, or better architectures, or something, in order to get something actually useful. Another reason it could be false: The scaling laws paper predicted the scaling of GPTs would start breaking down right about now. Maybe it’s correct.)
--There are major engineering hurdles that need to be crossed before people can train models 3+ orders of magnitude bigger than GPT-3.
--There is some other bottleneck (chip fab production speed?) that slows things down a few more years.
--There is some sort of war or conflict that destroys (or distracts resources from) multibillion-dollar AI projects.
Seems to me that this disjunction should get at least 50% probability. Heck, I think my credence in the scaling hypothesis is only about 50%.
To be clear, 90% is the probability of getting transformative AI before 2025 conditional on (i) “GPT-6 or GPT-7 might do it” and (ii) we get a model 100x larger than GPT-3 in a matter of months. So essentially the question boils down to how long would it take to get to something like GPT-6 given that we’re already at GPT-4 at the end of the year.
-- re scaling hypothesis: I agree that there might be compute bottlenecks. I’m not sure how compute scales with model size, but assuming that we’re aiming for a GPT-6 size model, we’re actually aiming for roughly 6-9 orders of magnitude bigger models (~100-1000x scale up each), not 3, so the same would hold for compute? For architecture, I’m assuming for each scaling they find new architecture hacks etc. to make it work. It’s true that I’m not taking into account major architecture changes, which might take more time to find. I haven’t looked into the Scaling law paper, what’s the main argument for it breaking down about now?
-- engineering hurdles: again, it’s assuming (ii), i.e. we get GPT-4 in a matter of months. As for the next engineering hurdles, it’s important to note that this leaves between 5y and 5y and 4 months to solve them (for the next two iterations GPT-5 and GPT-6).
-- for chip fab production speed, it’s true that if we ask for 10^10 bigger models, the chip economy might take some time to adapt. However, my main intuition is that we could get better chips (like we got TPUs), or that people would have found more efficient ways to train LM (like people are showing how to train GPT-3 more efficiently now).
-- re political instability/war: it could change how things evolve for worse. There’s also strong economic incentives to develop transformative AI _for_ war.
[10/50/90% = 2021/2022/2025] 50% chance scenario: At the end of 2020, Google releases their own API with a 100x larger model. In 2021, there is a race for who could scale it to an even larger model, with Manhattan project level investments from nation states like China and the US. Physicists, computer scientists and mathematicians are working on bottlenecks privately, either at top companies or in secret nation-state labs. People have already built products on top of Google’s API which make researchers and engineers more productive. At the end of 2021, one country (or lab) reached a GPT-5 like model size but doesn’t publish anything. Instead, researchers use this new model to predict what a better architecture for a 100x bigger model should be, and their productivity is vastly improved. At the same time, researchers from private labs (think OpenAI /DeepMind) move to this leading country (or lab) to help build the next version (safely?). In 2022, they come up with something we would call both transformative AI and GPT-6 today.
Your 90% is 2025? This suggests that you assign <10% credence to the disjunction of the following:
--The “Scaling hypothesis” is false. (One reason it could be false: Bigger does equal better, but transformative AI requires 10 orders of magnitude more compute, not 3. Another reason it could be false: We need better training environments, or better architectures, or something, in order to get something actually useful. Another reason it could be false: The scaling laws paper predicted the scaling of GPTs would start breaking down right about now. Maybe it’s correct.)
--There are major engineering hurdles that need to be crossed before people can train models 3+ orders of magnitude bigger than GPT-3.
--There is some other bottleneck (chip fab production speed?) that slows things down a few more years.
--There is some sort of war or conflict that destroys (or distracts resources from) multibillion-dollar AI projects.
Seems to me that this disjunction should get at least 50% probability. Heck, I think my credence in the scaling hypothesis is only about 50%.
To be clear, 90% is the probability of getting transformative AI before 2025 conditional on (i) “GPT-6 or GPT-7 might do it” and (ii) we get a model 100x larger than GPT-3 in a matter of months. So essentially the question boils down to how long would it take to get to something like GPT-6 given that we’re already at GPT-4 at the end of the year.
-- re scaling hypothesis: I agree that there might be compute bottlenecks. I’m not sure how compute scales with model size, but assuming that we’re aiming for a GPT-6 size model, we’re actually aiming for roughly 6-9 orders of magnitude bigger models (~100-1000x scale up each), not 3, so the same would hold for compute? For architecture, I’m assuming for each scaling they find new architecture hacks etc. to make it work. It’s true that I’m not taking into account major architecture changes, which might take more time to find. I haven’t looked into the Scaling law paper, what’s the main argument for it breaking down about now?
-- engineering hurdles: again, it’s assuming (ii), i.e. we get GPT-4 in a matter of months. As for the next engineering hurdles, it’s important to note that this leaves between 5y and 5y and 4 months to solve them (for the next two iterations GPT-5 and GPT-6).
-- for chip fab production speed, it’s true that if we ask for 10^10 bigger models, the chip economy might take some time to adapt. However, my main intuition is that we could get better chips (like we got TPUs), or that people would have found more efficient ways to train LM (like people are showing how to train GPT-3 more efficiently now).
-- re political instability/war: it could change how things evolve for worse. There’s also strong economic incentives to develop transformative AI _for_ war.