You probably mean that friendly AI is supposed to give an AI preferences that it can’t override, rather than beliefs that it can’t override.
For the purposes of discussion here, yes, preference is a belief. They are both expressed as symbolic propositions. Since the preference and the belief are both meant to be used by the same inference engine in the same computations, they are both in the same representation. There is no difference in difficulty between giving an AI a preference that it cannot override, and a belief that it cannot override. And that was my point.
For the purposes of discussion here, yes, preference is a belief.
That’s a strange view to take. They’re extremely different things, with different properties. What is true is that they are highly entangled -- preferences must be grounded in beliefs to be effective, and changing beliefs can change actions just as much as changing preferences. But the ways in which this happen seem, in general, far less predictable.
This is basically true. I’ve mentioned before that any AI that can engage in human conversation must have an abstract idea corresponding to “good”, and this abstract idea will in principle allow it to perform any action whatsoever, just as happens with human beings; for example, it could have learned that some other particular computer has been presenting it with true statements 99.999999999999999999999% of the time, and then this other computer presents it with the statement, “It is good to push this button...” (which button destroys humanity.) The AI will conclude it is good to push the button, and will then push it.
So the real consequence is that giving an AI either a belief or a preference that it cannot override is impossible. Nor can you refute this by arguing that you can verify from the programming that it cannot take certain actions. We already know that we cannot predict our own programming, since this would result in a contradiction. So why is it necessary that we should be able to predict the result of any other intelligent program, especially a superintelligent one? And in fact, the above argument shows that this cannot happen; we will never be able to predict the actions of an intelligent being.
You probably mean that friendly AI is supposed to give an AI preferences that it can’t override, rather than beliefs that it can’t override.
For the purposes of discussion here, yes, preference is a belief. They are both expressed as symbolic propositions. Since the preference and the belief are both meant to be used by the same inference engine in the same computations, they are both in the same representation. There is no difference in difficulty between giving an AI a preference that it cannot override, and a belief that it cannot override. And that was my point.
That’s a strange view to take. They’re extremely different things, with different properties. What is true is that they are highly entangled -- preferences must be grounded in beliefs to be effective, and changing beliefs can change actions just as much as changing preferences. But the ways in which this happen seem, in general, far less predictable.
This is basically true. I’ve mentioned before that any AI that can engage in human conversation must have an abstract idea corresponding to “good”, and this abstract idea will in principle allow it to perform any action whatsoever, just as happens with human beings; for example, it could have learned that some other particular computer has been presenting it with true statements 99.999999999999999999999% of the time, and then this other computer presents it with the statement, “It is good to push this button...” (which button destroys humanity.) The AI will conclude it is good to push the button, and will then push it.
So the real consequence is that giving an AI either a belief or a preference that it cannot override is impossible. Nor can you refute this by arguing that you can verify from the programming that it cannot take certain actions. We already know that we cannot predict our own programming, since this would result in a contradiction. So why is it necessary that we should be able to predict the result of any other intelligent program, especially a superintelligent one? And in fact, the above argument shows that this cannot happen; we will never be able to predict the actions of an intelligent being.