And yes, the way for a paperclipper (or similar “malevolent” optimizer) to subvert this is to come up with a strategy that both achieves its goals and convinces all humans to sign off on it.
And, yes, if the optimizer is sufficiently simple that a human can review its internal workings and reliably calculate its future motives and actions, then that kind of subversion strategy probably won’t succeed, and it’s probably safe.
Yeah, this comes up from time to time.
And yes, the way for a paperclipper (or similar “malevolent” optimizer) to subvert this is to come up with a strategy that both achieves its goals and convinces all humans to sign off on it.
And, yes, if the optimizer is sufficiently simple that a human can review its internal workings and reliably calculate its future motives and actions, then that kind of subversion strategy probably won’t succeed, and it’s probably safe.
So is a rock, come to that.