Would you agree with this way of stating it: There are more ways for someone to be wrong about their values under realism than under anti-realism. Under realism someone could be wrong even if they correctly state their preferences about how they want their values to evolve, because those preferences could themselves be wrong. So assuming an anti-realist position makes the problem sound easier because it implies there are fewer ways for the user to be wrong for the AI / AI designer to worry about.
Could you give an example of a statement you think could be wrong on the realist perspective, for which there couldn’t be a precisely analogous error on the non-realistic perspective?
There is some uninteresting semantic sense in which there are “more ways to be wrong” (since there is a whole extra category of statements that have truth values...) but not a sense that is relevant to the difficulty of building an AI.
I might be using the word “values” in a different way than. I think I can say something like “I’d like to deliberate in way X” and be wrong. I guess under non-realism I’m “incorrectly stating my preferences” and under realism I could be “correctly stating my preferences but be wrong,” but I don’t see how to translate that difference into any situation where I build an AI that is adequate on one perspective but inadequate on the other.
Suppose the user says “I want to try to figure out my true/normative values by doing X. Please help me do that.” If moral anti-realism is true, then the AI can only check if the user really wants to do X (e.g., by looking into the user’s brain and checking if X is encoded as a preference somewhere). But if moral realism is true, the AI could also use its own understanding of metaethics and metaphilosophy to predict if doing X would reliably lead to the user’s true/normative values, and warn the user or refuse to help or take some other action if the answer is no. Or if one can’t be certain about metaethics yet, and it looks like X might prematurely lock the user into the wrong values, the AI could warn the user about that.
I definitely don’t mean such a narrow sense of “want my values to evolve.” Seems worth using some language to clarify that.
In general the three options seem to be:
You care about what is “good” in the realist sense.
You care about what the user “actually wants” in some idealized sense.
You care about what the user “currently wants” in some narrow sense.
It seems to me that the first two are pretty similar. (And if you are uncertain about whether realism is true, and you’d be in the first case if you accepted realism, it seems like you’d probably be in the second case if you rejected realism. Of course that would depend on the nature of your uncertainty about realism, your views could depend on an arbitrary way on whether realism is true or false depending on what versions of realism/non-realism are competing, but I’m assuming something like the most common realist and non-realist views around here.)
To defend my original usage both in this thread and in the OP, which I’m not that attached to, I do think it would be typical to say that someone made a mistake if they were trying to help me get what I wanted, but failed to notice or communicate some crucial consideration that would totally change my views about what I wanted—the usual English usage of these terms involves at least mild idealization.
Would you agree with this way of stating it: There are more ways for someone to be wrong about their values under realism than under anti-realism. Under realism someone could be wrong even if they correctly state their preferences about how they want their values to evolve, because those preferences could themselves be wrong. So assuming an anti-realist position makes the problem sound easier because it implies there are fewer ways for the user to be wrong for the AI / AI designer to worry about.
Could you give an example of a statement you think could be wrong on the realist perspective, for which there couldn’t be a precisely analogous error on the non-realistic perspective?
There is some uninteresting semantic sense in which there are “more ways to be wrong” (since there is a whole extra category of statements that have truth values...) but not a sense that is relevant to the difficulty of building an AI.
I might be using the word “values” in a different way than. I think I can say something like “I’d like to deliberate in way X” and be wrong. I guess under non-realism I’m “incorrectly stating my preferences” and under realism I could be “correctly stating my preferences but be wrong,” but I don’t see how to translate that difference into any situation where I build an AI that is adequate on one perspective but inadequate on the other.
Suppose the user says “I want to try to figure out my true/normative values by doing X. Please help me do that.” If moral anti-realism is true, then the AI can only check if the user really wants to do X (e.g., by looking into the user’s brain and checking if X is encoded as a preference somewhere). But if moral realism is true, the AI could also use its own understanding of metaethics and metaphilosophy to predict if doing X would reliably lead to the user’s true/normative values, and warn the user or refuse to help or take some other action if the answer is no. Or if one can’t be certain about metaethics yet, and it looks like X might prematurely lock the user into the wrong values, the AI could warn the user about that.
I definitely don’t mean such a narrow sense of “want my values to evolve.” Seems worth using some language to clarify that.
In general the three options seem to be:
You care about what is “good” in the realist sense.
You care about what the user “actually wants” in some idealized sense.
You care about what the user “currently wants” in some narrow sense.
It seems to me that the first two are pretty similar. (And if you are uncertain about whether realism is true, and you’d be in the first case if you accepted realism, it seems like you’d probably be in the second case if you rejected realism. Of course that would depend on the nature of your uncertainty about realism, your views could depend on an arbitrary way on whether realism is true or false depending on what versions of realism/non-realism are competing, but I’m assuming something like the most common realist and non-realist views around here.)
To defend my original usage both in this thread and in the OP, which I’m not that attached to, I do think it would be typical to say that someone made a mistake if they were trying to help me get what I wanted, but failed to notice or communicate some crucial consideration that would totally change my views about what I wanted—the usual English usage of these terms involves at least mild idealization.