This may be stating the obvious, but isn’t this exactly the reason why there shouldn’t be a subroutine that detects “The AI wants to cheat its masters” (or any similar security subroutines)?
The AI has to look out for humanity’s interests (CEV) but the manner in which it does so we can safely leave up to the AI. Take for analogy Eliezer’s chess computer example. We can’t play chess as well as the chess computer (or we could beat Grand Masters of chess ourselves) but we can predict the outcome of the chess game when we play against the computer: the chess computer finds a winning position against us.
With a friendly AI you can’t predict what it will do, or even why it will do it, but if we get FAI right then we can predict that the actions will steer humanity in the right direction.
(Also building an AI by giving it explicit axioms or values we desire is a really bad idea. Much like the genie in the lamp it is bound to turn out that we don’t get what we think we asked for. See http://singinst.org/upload/CEV.html if you haven’t read it already)
Welcome to Less wrong!
This may be stating the obvious, but isn’t this exactly the reason why there shouldn’t be a subroutine that detects “The AI wants to cheat its masters” (or any similar security subroutines)?
The AI has to look out for humanity’s interests (CEV) but the manner in which it does so we can safely leave up to the AI. Take for analogy Eliezer’s chess computer example. We can’t play chess as well as the chess computer (or we could beat Grand Masters of chess ourselves) but we can predict the outcome of the chess game when we play against the computer: the chess computer finds a winning position against us.
With a friendly AI you can’t predict what it will do, or even why it will do it, but if we get FAI right then we can predict that the actions will steer humanity in the right direction.
(Also building an AI by giving it explicit axioms or values we desire is a really bad idea. Much like the genie in the lamp it is bound to turn out that we don’t get what we think we asked for. See http://singinst.org/upload/CEV.html if you haven’t read it already)