You can’t talk sensibly about what values are right, or what we ‘should’ build into intelligent agents.
I agree that in our usual use of the word, it doesn’t make sense to talk about what (terminal) values are right.
But you agree that (within a certain level of abstraction and implied context) you can talk as if you should take certain actions? Like “you should try this dessert” is a sensible English sentence. So what about actions that impact intelligent agents?
Like, suppose there was a pill you could take that would make you want to kill your family. Should you take it? No, probably not. But now we’ve just expressed a preference about the values of an intelligent agent (yourself).
Modifying yourself to want bad things is wrong in the same sense that the bad things are wrong in the first place: they are wrong with respect to your current values, which are a thing we model you as having within a certain level of abstraction.
I agree that in our usual use of the word, it doesn’t make sense to talk about what (terminal) values are right.
But you agree that (within a certain level of abstraction and implied context) you can talk as if you should take certain actions? Like “you should try this dessert” is a sensible English sentence. So what about actions that impact intelligent agents?
Like, suppose there was a pill you could take that would make you want to kill your family. Should you take it? No, probably not. But now we’ve just expressed a preference about the values of an intelligent agent (yourself).
Modifying yourself to want bad things is wrong in the same sense that the bad things are wrong in the first place: they are wrong with respect to your current values, which are a thing we model you as having within a certain level of abstraction.