This means you are trying to Procustes the human squishiness into legibility, with consistent values. You should, instead, be trying to make pragmatic AIs that would frame the world for the humans, in the ways that the humans would approve*, taking into account their objectively stupid incoherence. Because that would be Friendly and parsed as such by the humans.
*=this doesn’t mean that such human preferences as those that violate meta-universalizability from behind the veil of ignorance should not be factored out of the calculation of what is ethically relevant; but it means that the states of the world that violate those preferences should still be hidden from the humans who have those preferences. This obviously results in humans being allowed to more accurately see the states of the world, the more their preferences are tolerant of other people’s preferences; there is absolutely nothing that could possibly ever go wrong from this, considering that the AIs, being Friendly, would simply prevent them from sociopathically exploiting that information asymmetry since that would violate the ethical principle.
>pragmatic AIs that would frame the world for the humans, in the ways that the humans would approve
The choice of how to do that is equivalent with choosing among the human values. That’s not to say that there are not better or worse ways of doing things, but as soon as human behaviour become legible to an AI, we have to be very specific about any squishiness we want to preserve, and encode those in AI values.
This means you are trying to Procustes the human squishiness into legibility, with consistent values. You should, instead, be trying to make pragmatic AIs that would frame the world for the humans, in the ways that the humans would approve*, taking into account their objectively stupid incoherence. Because that would be Friendly and parsed as such by the humans.
*=this doesn’t mean that such human preferences as those that violate meta-universalizability from behind the veil of ignorance should not be factored out of the calculation of what is ethically relevant; but it means that the states of the world that violate those preferences should still be hidden from the humans who have those preferences. This obviously results in humans being allowed to more accurately see the states of the world, the more their preferences are tolerant of other people’s preferences; there is absolutely nothing that could possibly ever go wrong from this, considering that the AIs, being Friendly, would simply prevent them from sociopathically exploiting that information asymmetry since that would violate the ethical principle.
>pragmatic AIs that would frame the world for the humans, in the ways that the humans would approve
The choice of how to do that is equivalent with choosing among the human values. That’s not to say that there are not better or worse ways of doing things, but as soon as human behaviour become legible to an AI, we have to be very specific about any squishiness we want to preserve, and encode those in AI values.