I wonder if, under the current plan, CEV would take into account people’s volition about how CEV should work — i.e. if the extrapolated human race would want CEV to exclude the preferences of sociopaths/psychopaths/other moral mutants, would it do so, or does it only take into account people’s first-order volition about the properties of the FAI it will build?
In the original document, “extrapolated as we wish that extrapolated, interpreted as we wish that interpreted” sounds like it covers this, in combination with the collective nature of the extrapolation.
I wonder if, under the current plan, CEV would take into account people’s volition about how CEV should work — i.e. if the extrapolated human race would want CEV to exclude the preferences of sociopaths/psychopaths/other moral mutants, would it do so, or does it only take into account people’s first-order volition about the properties of the FAI it will build?
Ultimately, preference is about properties of the world, and AI with all its properties is part of the world.
In the original document, “extrapolated as we wish that extrapolated, interpreted as we wish that interpreted” sounds like it covers this, in combination with the collective nature of the extrapolation.