We’d need to get more quantitative here about how much AI labor we can use for alignment before it’s too dangerous, and my answer is that we could get about 1-2 OOMs smarter than humans at inference where we could be confident in using them safely, and IMO close to an arbitrary number of copies of that AI, conditional on good control techniques being used.
To address 2 comments:
and I have high confidence that control measures will not be used consistently and correctly in practice.
Yeah, this seems pretty load-bearing for the plan, and a lot of the reason I don’t have probabilities of extinction below 0.1-1% is because I am actually worried about labs not doing control measures consistently.
I assign more moderate probabilities than you do, in that I think both the scenarios of labs not doing control properly and doing control properly are both somewhat plausible to me now, but yeah it would really be high-value for labs to prepare themselves to do control work properly.
To address this:
I just don’t think that extends to ASI
Maybe initially, but critically, I think the evidence we will get for pre-AGI levels will heavily constrain our expectations of what an ASI will do re it’s alignment, and that we will learn a lot more about both alignment and control techniques when we get human-level models, and I think we can trust a lot of the evidence to generalize at least 2 OOMs up.
So I think a lot of the uncertainty will start becoming removed as AI scales up.
I agree with this:
I think almost anything that counts as AGI is very nearly ASI by default (not because of RSI, just because of hardware scaling ability)
Even without recursive self-improvement, it’s pretty easy to scale by several OOMs, and while there are enough bottlenecks to prevent FOOM, they are not enough to slow it down by 1 decade except in tail scenarios.
We’d need to get more quantitative here about how much AI labor we can use for alignment before it’s too dangerous, and my answer is that we could get about 1-2 OOMs smarter than humans at inference where we could be confident in using them safely, and IMO close to an arbitrary number of copies of that AI, conditional on good control techniques being used.
To address 2 comments:
Yeah, this seems pretty load-bearing for the plan, and a lot of the reason I don’t have probabilities of extinction below 0.1-1% is because I am actually worried about labs not doing control measures consistently.
I assign more moderate probabilities than you do, in that I think both the scenarios of labs not doing control properly and doing control properly are both somewhat plausible to me now, but yeah it would really be high-value for labs to prepare themselves to do control work properly.
To address this:
Maybe initially, but critically, I think the evidence we will get for pre-AGI levels will heavily constrain our expectations of what an ASI will do re it’s alignment, and that we will learn a lot more about both alignment and control techniques when we get human-level models, and I think we can trust a lot of the evidence to generalize at least 2 OOMs up.
So I think a lot of the uncertainty will start becoming removed as AI scales up.
I agree with this:
Even without recursive self-improvement, it’s pretty easy to scale by several OOMs, and while there are enough bottlenecks to prevent FOOM, they are not enough to slow it down by 1 decade except in tail scenarios.