I think that if we saw the working AI alignment solution used in 2050 in a paper written in 2026, we wouldn’t be confident it would work. That’s because there are a lot of uncertainties about how hard the AI alignment problem is in the first place, how ML behaves when it’s scaled up, ect. I think most plans for AI safety need to go like “we make the theory now, then we keep working on it as ML scales up and adapt accordingly”.
Yes, if you have a solution in 2026 it isn’t likely to be relevant to something used in 2050. But 2026 is the planned solution date and 2050 is the median TAI date.
The numbers I used above a just to demonstrate the point thought. The broad idea is that coming up with a solution/theory to alignment takes longer than planned. Having a theory isn’t enough, you still have some time to make it count. Then TAI might come at the early end of your probability distribution.
It’s pretty optimistic to plan that TAI will come at your median estimate and that you won’t run into the planning fallacy.
What I’m trying to say is that it’s much harder to do AI alignment research while models are still small, so TAI timelines somewhat dictate the progress of AI alignment research. If I wanted my 5 year plan to have the best chance at success, I would have “test this on a dog-intelligence-level AI” in my plan, even if I thought that probably wouldn’t arrive by 2036, because that would make AI alignment research much easier.
I think that if we saw the working AI alignment solution used in 2050 in a paper written in 2026, we wouldn’t be confident it would work. That’s because there are a lot of uncertainties about how hard the AI alignment problem is in the first place, how ML behaves when it’s scaled up, ect. I think most plans for AI safety need to go like “we make the theory now, then we keep working on it as ML scales up and adapt accordingly”.
Yes, if you have a solution in 2026 it isn’t likely to be relevant to something used in 2050. But 2026 is the planned solution date and 2050 is the median TAI date.
The numbers I used above a just to demonstrate the point thought. The broad idea is that coming up with a solution/theory to alignment takes longer than planned. Having a theory isn’t enough, you still have some time to make it count. Then TAI might come at the early end of your probability distribution.
It’s pretty optimistic to plan that TAI will come at your median estimate and that you won’t run into the planning fallacy.
What I’m trying to say is that it’s much harder to do AI alignment research while models are still small, so TAI timelines somewhat dictate the progress of AI alignment research. If I wanted my 5 year plan to have the best chance at success, I would have “test this on a dog-intelligence-level AI” in my plan, even if I thought that probably wouldn’t arrive by 2036, because that would make AI alignment research much easier.
The plan and numbers I lay out above you actually finish friendly AI in 2036, which is the 10% point