Thanks, it’s useful to bring these out—though we mention them in passing. Just to be sure: We are looking at the XRisk thesis, not at some thesis that AI can be “dangerous”, as most technologies will be. The Omhundro-style escalation is precisely the issue in our point that instrumental intelligence is not sufficient for XRisk.
… we aren’t trying to prove the absence of XRisk, we are probing the best argument for it?
But the idea that value drift is non random is built into the best argument for AI risk.
You quote it as :
But there are actually two more steps:-
A goal that appears morally neutral or even good can still be dangerous.(paperclipping, dopamine drips)
AIs that don’t have stable goals will tend to converge on Omohundran goals....which are dangerous.
Thanks, it’s useful to bring these out—though we mention them in passing. Just to be sure: We are looking at the XRisk thesis, not at some thesis that AI can be “dangerous”, as most technologies will be. The Omhundro-style escalation is precisely the issue in our point that instrumental intelligence is not sufficient for XRisk.