Stuart_Armstrong comments on Corrigibility thoughts II: the robot operator

Stuart_Armstrong 30 Jun 2017 13:03 UTC
0 points
AF

An agent which is doing what the operator wants, where the operator is “whatever currently has physical control of the AI,” won’t try to replace the operator—because that’s not what the operator wants.

I disagree (though we may be interpreting that sentence differently). Once the AI has the possibility of subverting the controller, then it is, in effect, in physical control of itself. So it itself becomes the “formal operator”, and, depending on how it’s motivated, is perfectly willing to replace the “human operator”, whose wishes are now irrelevant (because it’s no longer the formal operator).

And this never involves any goal modification at all—it’s the same goal, except that the change in control has changed the definition of the operator.