For what it’s worth, my median scenario looks like:
Leading AI labs continue doing AI-assisted but primarily human-driven improvement to AI over the next 1-3 years. At some point during this time, a sufficiently competent general reasoning & coding model is created that shifts the balance of AI to human inputs. So the next generation of AI starts shifting in favor of increasing share of contributions from the previous AI. With this new help, the lab releases a new model in somewhat less than a year. This new model contributes even more. In just a few months, a yet newer model is designed and trained. A few months after that, another model. Then the lab says, ‘hold up, this thing is now getting scary powerful. How can we be sure we trust it?’ (maybe a few more or fewer model generations will be required).
Then we have a weird stalemate situation where the lab has this new powerful model which is clear superior to what it had a year ago, but is unsure about how trustworthy it is. It is safely contained and extensive tests are run, but the tests are far from definitive.
Meanwhile, the incautious open-source community continues to advance..… The countdown is ticking until the open source community catches up.
So there we are in 2026, being like, “What do we do now? We have definitely created a model powerful enough to be dangerous, but still don’t have a sure way to align it. We know what advances and compute it took for us to get this far, and we know that the open-source efforts will catch up in around 3-4 years. Can we solve the alignment problem before that happens?”
I don’t have a clear answer to what happens then. My guess is that a true complete solution to the alignment problem won’t be found in time, and that we’ll have to ‘make do’ with some sort of incomplete solution which is hopefully adequate to prevent disaster.
I also think there’s an outside possibility of a research group not associated with one of the main labs making and publishing an algorithmic breakthrough which brings recursive self-improvement into reach of the open-source community suddenly and without warning. If that happens, what does humanity do about that? If we have kicked off this process in an open-source anyone-can-do-it scenario, and we suspect we have only a few months before further advances occur to push the open-source models to dangerous levels of competence. I don’t know. If anything, I’m hoping that the big labs with their reasonable safety precautions (hopefully which get improved over the next year or two) do actually manage to come in first. Just because then it’ll be at least temporarily contained, and the world’s governments and large corporate actors will have the opportunity to officially verify for themselves that ‘yes, this is a real thing that exists and is dangerous now’. That seems like a scenario more likely to go well than the sudden open-source breakthrough.
Thanks for this comment. I don’t have much to add, other than: have you considered fleshing out and writing up this scenario in a style similar to “What 2026 looks like”?
For what it’s worth, my median scenario looks like:
Leading AI labs continue doing AI-assisted but primarily human-driven improvement to AI over the next 1-3 years. At some point during this time, a sufficiently competent general reasoning & coding model is created that shifts the balance of AI to human inputs. So the next generation of AI starts shifting in favor of increasing share of contributions from the previous AI. With this new help, the lab releases a new model in somewhat less than a year. This new model contributes even more. In just a few months, a yet newer model is designed and trained. A few months after that, another model. Then the lab says, ‘hold up, this thing is now getting scary powerful. How can we be sure we trust it?’ (maybe a few more or fewer model generations will be required).
Then we have a weird stalemate situation where the lab has this new powerful model which is clear superior to what it had a year ago, but is unsure about how trustworthy it is. It is safely contained and extensive tests are run, but the tests are far from definitive.
Meanwhile, the incautious open-source community continues to advance..… The countdown is ticking until the open source community catches up.
So there we are in 2026, being like, “What do we do now? We have definitely created a model powerful enough to be dangerous, but still don’t have a sure way to align it. We know what advances and compute it took for us to get this far, and we know that the open-source efforts will catch up in around 3-4 years. Can we solve the alignment problem before that happens?”
I don’t have a clear answer to what happens then. My guess is that a true complete solution to the alignment problem won’t be found in time, and that we’ll have to ‘make do’ with some sort of incomplete solution which is hopefully adequate to prevent disaster.
I also think there’s an outside possibility of a research group not associated with one of the main labs making and publishing an algorithmic breakthrough which brings recursive self-improvement into reach of the open-source community suddenly and without warning. If that happens, what does humanity do about that? If we have kicked off this process in an open-source anyone-can-do-it scenario, and we suspect we have only a few months before further advances occur to push the open-source models to dangerous levels of competence. I don’t know. If anything, I’m hoping that the big labs with their reasonable safety precautions (hopefully which get improved over the next year or two) do actually manage to come in first. Just because then it’ll be at least temporarily contained, and the world’s governments and large corporate actors will have the opportunity to officially verify for themselves that ‘yes, this is a real thing that exists and is dangerous now’. That seems like a scenario more likely to go well than the sudden open-source breakthrough.
Thanks for this comment. I don’t have much to add, other than: have you considered fleshing out and writing up this scenario in a style similar to “What 2026 looks like”?