What scares me is the possibility that moral anti-realism is false, but we build an AI under the assumption that it’s true
One way of dealing with this, in part, is to figure out what would convince you that moral realism was true, and put that in as a strong conditional meta-preference.
I can see two possible ways to convince me that moral realism is true:
I spend hundreds or more years in a safe environment with a bunch of other philosophically minded people and we try to come up with arguments for and against moral realism, counterarguments, counter-counterarguments and so on, and we eventually exhaust the space of such arguments and reach a consensus that moral realism is true.
We solve metaphilosophy, program/teach an AI to “do philosophy”, somehow reach high confidence that we did that correctly, and the AI solves metaethics and gives us a convincing argument that moral realism is true.
Do these seem like things that could be “put in as a strong conditional meta-preference” in your framework?
Do these seem like things that could be “put in as a strong conditional meta-preference” in your framework?
Yes, very easily.
The main issue is whether these should count as an overwhelming meta-preference—one that over-weights all other considerations. And, currently as I have things set up, the answer is no. I have no doubt that you feel strongly about potentially true moral realism. But I’m certain that this strong feeling is not absurdly strong compared to other preferences at other moments in your life. So if we synthesised your current preferences, and 1. or 2. ended up being true, then the moral realism would end up playing a large-but-not-dominating role in your moral preferences.
I wouldn’t want to change that, because what I’m aiming for is an accurate synthesis of your current preferences, and your current preference for moral-realism-if-it’s-true is not, in practice, dominating your preferences. If you wanted to ensure the potential dominance of moral realism, you’d have to put that directly into the synthesis process, as a global meta-preference (section 2.8 of the research agenda).
But the whole discussion feels a bit peculiar, to me. One property of moral realism that is often assumed, is that it is, in some sense, ultimately convincing—that all systems of morality (or all systems derived from humans) will converge to it. Yet when I said a “large-but-not-dominating role in your moral preferences”, I’m positing that moral realism is true, but that we have a system of morality - UH - that does not converge to it. I’m not really grasping how this could be possible (you could argue that the moral realism UR is some sort of acausal trade convergent function, but that gives an instrumental reason to follow UR, not an actual reason to have UR; and I know that a moral system need not be a utility function ^_^).
So yes, I’m a bit confused by true-but-not-convincing moral realisms.
One way of dealing with this, in part, is to figure out what would convince you that moral realism was true, and put that in as a strong conditional meta-preference.
I can see two possible ways to convince me that moral realism is true:
I spend hundreds or more years in a safe environment with a bunch of other philosophically minded people and we try to come up with arguments for and against moral realism, counterarguments, counter-counterarguments and so on, and we eventually exhaust the space of such arguments and reach a consensus that moral realism is true.
We solve metaphilosophy, program/teach an AI to “do philosophy”, somehow reach high confidence that we did that correctly, and the AI solves metaethics and gives us a convincing argument that moral realism is true.
Do these seem like things that could be “put in as a strong conditional meta-preference” in your framework?
Yes, very easily.
The main issue is whether these should count as an overwhelming meta-preference—one that over-weights all other considerations. And, currently as I have things set up, the answer is no. I have no doubt that you feel strongly about potentially true moral realism. But I’m certain that this strong feeling is not absurdly strong compared to other preferences at other moments in your life. So if we synthesised your current preferences, and 1. or 2. ended up being true, then the moral realism would end up playing a large-but-not-dominating role in your moral preferences.
I wouldn’t want to change that, because what I’m aiming for is an accurate synthesis of your current preferences, and your current preference for moral-realism-if-it’s-true is not, in practice, dominating your preferences. If you wanted to ensure the potential dominance of moral realism, you’d have to put that directly into the synthesis process, as a global meta-preference (section 2.8 of the research agenda).
But the whole discussion feels a bit peculiar, to me. One property of moral realism that is often assumed, is that it is, in some sense, ultimately convincing—that all systems of morality (or all systems derived from humans) will converge to it. Yet when I said a “large-but-not-dominating role in your moral preferences”, I’m positing that moral realism is true, but that we have a system of morality - UH - that does not converge to it. I’m not really grasping how this could be possible (you could argue that the moral realism UR is some sort of acausal trade convergent function, but that gives an instrumental reason to follow UR, not an actual reason to have UR; and I know that a moral system need not be a utility function ^_^).
So yes, I’m a bit confused by true-but-not-convincing moral realisms.