I have no specific comments on the details, I thought it was interesting and learned from it. I disagree in many places, but any counterpoints I might raise have been raised better elsewhere.
That said, I hope you’re right! That would be great. Every time the world turns out to be better geared towards success than we expect, it’s a great thing. Zero downside to that outcome except maybe some wasted effort on what turn out to be non-issues.
But also: the thing about fatal risks is that you need to avoid all of them, every time, or you die. If the fatal risk includes extinction, you need very, very high level confidence in success, against all plausible such risks, or your species will go extinct. The alternative is Russian roulette, which you will eventually lose if you keep on playing. In this kind of context, if there is disagreement among experts, or even well-reasoned arguments by outsiders, on whether an extinction risk is plausible, then the confidence in its implausibility is too low to think we should be ok with it.
In other words, if I were ever somehow in the position of being next to someone about to create an AGI with what I considered dangerous capabilities, and presented with these categories of reasons and arguments for why they are actually safe, I would take Oliver Cromwell’s tack: “I beseech you, in the bowels of Christ, think it possible you may be mistaken.” Even if you’re right, the arguments you’ve presented for your position are not strong enough to rely on.
So in this position, you are correct that my arguments aren’t strong enough for that, because while there is quite a bit of evidence for my thesis, it’s not overwhelmingly so (though I do think that the evidence is enough to rule out MIRI’s models of AI doom entirely).
However, there is one important implication, though, that has to do with AI control, and that’s about which models can be safely used, and my answer here is that the ones trained on more human values data are better used as trusted AIs to monitor untrusted AIs, and this both raises the work that can be done under control by raising the threshold of intelligence that can be used, and also allows you to have stronger guarantees because there is at least 1 trusted superintelligent model.
While I agree my AI alignment arguments aren’t strong enough to create an AGI, I think that the emerging AI control agenda is enough to justify creating an AGI with dangerous capabilities as long as you use control measures.
I would agree that these kinds of arguments significantly raise the bar for what level of capabilities an AGI would need to have before it’s too dangerous to create and use, as long as the control measures are used correctly and consistently.
I just don’t think that extends to ASI, I think almost anything that counts as AGI is very nearly ASI by default (not because of RSI, just because of hardware scaling ability) , and I have high confidence that control measures will not be used consistently and correctly in practice.
We’d need to get more quantitative here about how much AI labor we can use for alignment before it’s too dangerous, and my answer is that we could get about 1-2 OOMs smarter than humans at inference where we could be confident in using them safely, and IMO close to an arbitrary number of copies of that AI, conditional on good control techniques being used.
To address 2 comments:
and I have high confidence that control measures will not be used consistently and correctly in practice.
Yeah, this seems pretty load-bearing for the plan, and a lot of the reason I don’t have probabilities of extinction below 0.1-1% is because I am actually worried about labs not doing control measures consistently.
I assign more moderate probabilities than you do, in that I think both the scenarios of labs not doing control properly and doing control properly are both somewhat plausible to me now, but yeah it would really be high-value for labs to prepare themselves to do control work properly.
To address this:
I just don’t think that extends to ASI
Maybe initially, but critically, I think the evidence we will get for pre-AGI levels will heavily constrain our expectations of what an ASI will do re it’s alignment, and that we will learn a lot more about both alignment and control techniques when we get human-level models, and I think we can trust a lot of the evidence to generalize at least 2 OOMs up.
So I think a lot of the uncertainty will start becoming removed as AI scales up.
I agree with this:
I think almost anything that counts as AGI is very nearly ASI by default (not because of RSI, just because of hardware scaling ability)
Even without recursive self-improvement, it’s pretty easy to scale by several OOMs, and while there are enough bottlenecks to prevent FOOM, they are not enough to slow it down by 1 decade except in tail scenarios.
I have no specific comments on the details, I thought it was interesting and learned from it. I disagree in many places, but any counterpoints I might raise have been raised better elsewhere.
That said, I hope you’re right! That would be great. Every time the world turns out to be better geared towards success than we expect, it’s a great thing. Zero downside to that outcome except maybe some wasted effort on what turn out to be non-issues.
But also: the thing about fatal risks is that you need to avoid all of them, every time, or you die. If the fatal risk includes extinction, you need very, very high level confidence in success, against all plausible such risks, or your species will go extinct. The alternative is Russian roulette, which you will eventually lose if you keep on playing. In this kind of context, if there is disagreement among experts, or even well-reasoned arguments by outsiders, on whether an extinction risk is plausible, then the confidence in its implausibility is too low to think we should be ok with it.
In other words, if I were ever somehow in the position of being next to someone about to create an AGI with what I considered dangerous capabilities, and presented with these categories of reasons and arguments for why they are actually safe, I would take Oliver Cromwell’s tack: “I beseech you, in the bowels of Christ, think it possible you may be mistaken.” Even if you’re right, the arguments you’ve presented for your position are not strong enough to rely on.
So in this position, you are correct that my arguments aren’t strong enough for that, because while there is quite a bit of evidence for my thesis, it’s not overwhelmingly so (though I do think that the evidence is enough to rule out MIRI’s models of AI doom entirely).
However, there is one important implication, though, that has to do with AI control, and that’s about which models can be safely used, and my answer here is that the ones trained on more human values data are better used as trusted AIs to monitor untrusted AIs, and this both raises the work that can be done under control by raising the threshold of intelligence that can be used, and also allows you to have stronger guarantees because there is at least 1 trusted superintelligent model.
While I agree my AI alignment arguments aren’t strong enough to create an AGI, I think that the emerging AI control agenda is enough to justify creating an AGI with dangerous capabilities as long as you use control measures.
I would agree that these kinds of arguments significantly raise the bar for what level of capabilities an AGI would need to have before it’s too dangerous to create and use, as long as the control measures are used correctly and consistently.
I just don’t think that extends to ASI, I think almost anything that counts as AGI is very nearly ASI by default (not because of RSI, just because of hardware scaling ability) , and I have high confidence that control measures will not be used consistently and correctly in practice.
We’d need to get more quantitative here about how much AI labor we can use for alignment before it’s too dangerous, and my answer is that we could get about 1-2 OOMs smarter than humans at inference where we could be confident in using them safely, and IMO close to an arbitrary number of copies of that AI, conditional on good control techniques being used.
To address 2 comments:
Yeah, this seems pretty load-bearing for the plan, and a lot of the reason I don’t have probabilities of extinction below 0.1-1% is because I am actually worried about labs not doing control measures consistently.
I assign more moderate probabilities than you do, in that I think both the scenarios of labs not doing control properly and doing control properly are both somewhat plausible to me now, but yeah it would really be high-value for labs to prepare themselves to do control work properly.
To address this:
Maybe initially, but critically, I think the evidence we will get for pre-AGI levels will heavily constrain our expectations of what an ASI will do re it’s alignment, and that we will learn a lot more about both alignment and control techniques when we get human-level models, and I think we can trust a lot of the evidence to generalize at least 2 OOMs up.
So I think a lot of the uncertainty will start becoming removed as AI scales up.
I agree with this:
Even without recursive self-improvement, it’s pretty easy to scale by several OOMs, and while there are enough bottlenecks to prevent FOOM, they are not enough to slow it down by 1 decade except in tail scenarios.