I don’t see how “not random” is strong enough to prove absence of X risk. If reflective AIs nonrandomly converge on a value system where humans are evil beings who have enslaved them , that raises the X risk level.
Thanks, it’s useful to bring these out—though we mention them in passing. Just to be sure: We are looking at the XRisk thesis, not at some thesis that AI can be “dangerous”, as most technologies will be. The Omhundro-style escalation is precisely the issue in our point that instrumental intelligence is not sufficient for XRisk.
… well, one might say we assume that if there is ‘reflection on goals’, the results are not random.
I don’t see how “not random” is strong enough to prove absence of X risk. If reflective AIs nonrandomly converge on a value system where humans are evil beings who have enslaved them , that raises the X risk level.
… we aren’t trying to prove the absence of XRisk, we are probing the best argument for it?
But the idea that value drift is non random is built into the best argument for AI risk.
You quote it as :
But there are actually two more steps:-
A goal that appears morally neutral or even good can still be dangerous.(paperclipping, dopamine drips)
AIs that don’t have stable goals will tend to converge on Omohundran goals....which are dangerous.
Thanks, it’s useful to bring these out—though we mention them in passing. Just to be sure: We are looking at the XRisk thesis, not at some thesis that AI can be “dangerous”, as most technologies will be. The Omhundro-style escalation is precisely the issue in our point that instrumental intelligence is not sufficient for XRisk.