I wish I shared your optimism! You’ve talked about some of your reasons for it elsewhere, but I’d be interested to hear even a quick sketch of roughly how you imagine the next decade to go in the context of the thought experiment, in the 70-80% of cases where you expect things to go well.
The next decade from 2026-2036 will probably be wild, conditional on your scenario starting to pass, and my guess is that robotics is solved 2-5 years after the new AI is introduced.
But to briefly talk about the 70-80% of worlds where we make it through, several common properties appear:
Data still matters a great deal for capabilities and alignment, and the sparse RL problem where you try to get an AI to do something based on very little data will essentially not contribute to capabilities for the next several decades, if ever (I’m defining it as the goals are done over say 1-10 year timescales, or maybe even just 1 year timescales with no reward-shaping/giving feedback for intermediate rewards at all.)
Unlearning becomes more effective, such that we can remove certain capabilities without damaging the rest of the system, and this technique is pretty illustrative:
As far as my sketch of how the world goes in the median future, conditional on them achieving something like a research AI in 2026, they first automate their own research, which will take 1-5 years, then solve robotics, which will take another 2-5 years, and by 2036, the economy starts seriously feeling the impact of an AI that can replace everyone’s jobs.
The big reason why this change is slower than a lot of median predictions is a combination of AI science being more disconnectable from the rest of the economy than most others, combined with the problems being solvable, but with a lot of edge case that will take time to iron out (similar to how self driving cars went from being very bad in the 2000s to actually working in 2021-2023.)
The big question is if distributed training works out.
Thanks for sketching that out, I appreciate it. Unlearning significantly improving the safety outlook is something I may not have fully priced in.
My guess is that the central place we differ is that I expect dropping in, say, 100k extra capabilities researchers gets us into greater-than-human intelligence fairly quickly—we’re already seeing LLMs scoring better than human in various areas, so clearly there’s no hard barrier at human level—and at that point control gets extremely difficult.
I do certainly agree that there’s a lot of low-hanging fruit in control that’s well worth grabbing.
I wish I shared your optimism! You’ve talked about some of your reasons for it elsewhere, but I’d be interested to hear even a quick sketch of roughly how you imagine the next decade to go in the context of the thought experiment, in the 70-80% of cases where you expect things to go well.
The next decade from 2026-2036 will probably be wild, conditional on your scenario starting to pass, and my guess is that robotics is solved 2-5 years after the new AI is introduced.
But to briefly talk about the 70-80% of worlds where we make it through, several common properties appear:
Data still matters a great deal for capabilities and alignment, and the sparse RL problem where you try to get an AI to do something based on very little data will essentially not contribute to capabilities for the next several decades, if ever (I’m defining it as the goals are done over say 1-10 year timescales, or maybe even just 1 year timescales with no reward-shaping/giving feedback for intermediate rewards at all.)
Unlearning becomes more effective, such that we can remove certain capabilities without damaging the rest of the system, and this technique is pretty illustrative:
https://x.com/scaling01/status/1865200522581418298
AI control becomes a bigger deal in labs, such that it enables more ability to prevent self-exfiltration.
I like Mark Xu’s arguments, and If I’m wrong about alignment being easy, AI control would be more important for safety.
https://www.lesswrong.com/posts/A79wykDjr4pcYy9K7/mark-xu-s-shortform#FGf6reY3CotGh4ewv
As far as my sketch of how the world goes in the median future, conditional on them achieving something like a research AI in 2026, they first automate their own research, which will take 1-5 years, then solve robotics, which will take another 2-5 years, and by 2036, the economy starts seriously feeling the impact of an AI that can replace everyone’s jobs.
The big reason why this change is slower than a lot of median predictions is a combination of AI science being more disconnectable from the rest of the economy than most others, combined with the problems being solvable, but with a lot of edge case that will take time to iron out (similar to how self driving cars went from being very bad in the 2000s to actually working in 2021-2023.)
The big question is if distributed training works out.
Thanks for sketching that out, I appreciate it. Unlearning significantly improving the safety outlook is something I may not have fully priced in.
My guess is that the central place we differ is that I expect dropping in, say, 100k extra capabilities researchers gets us into greater-than-human intelligence fairly quickly—we’re already seeing LLMs scoring better than human in various areas, so clearly there’s no hard barrier at human level—and at that point control gets extremely difficult.
I do certainly agree that there’s a lot of low-hanging fruit in control that’s well worth grabbing.