This is basically true. I’ve mentioned before that any AI that can engage in human conversation must have an abstract idea corresponding to “good”, and this abstract idea will in principle allow it to perform any action whatsoever, just as happens with human beings; for example, it could have learned that some other particular computer has been presenting it with true statements 99.999999999999999999999% of the time, and then this other computer presents it with the statement, “It is good to push this button...” (which button destroys humanity.) The AI will conclude it is good to push the button, and will then push it.
So the real consequence is that giving an AI either a belief or a preference that it cannot override is impossible. Nor can you refute this by arguing that you can verify from the programming that it cannot take certain actions. We already know that we cannot predict our own programming, since this would result in a contradiction. So why is it necessary that we should be able to predict the result of any other intelligent program, especially a superintelligent one? And in fact, the above argument shows that this cannot happen; we will never be able to predict the actions of an intelligent being.
This is basically true. I’ve mentioned before that any AI that can engage in human conversation must have an abstract idea corresponding to “good”, and this abstract idea will in principle allow it to perform any action whatsoever, just as happens with human beings; for example, it could have learned that some other particular computer has been presenting it with true statements 99.999999999999999999999% of the time, and then this other computer presents it with the statement, “It is good to push this button...” (which button destroys humanity.) The AI will conclude it is good to push the button, and will then push it.
So the real consequence is that giving an AI either a belief or a preference that it cannot override is impossible. Nor can you refute this by arguing that you can verify from the programming that it cannot take certain actions. We already know that we cannot predict our own programming, since this would result in a contradiction. So why is it necessary that we should be able to predict the result of any other intelligent program, especially a superintelligent one? And in fact, the above argument shows that this cannot happen; we will never be able to predict the actions of an intelligent being.