Here is a revised way of asking the question I had in mind: If our preferences determine which extraction method is the correct one (the one that results in our actual preferences), and if we cannot know or use our preferences with precision until they are extracted, then how can we find the correct extraction method?
Asking it this way, I’m no longer sure it is a real problem. I can imagine that knowing what kind of object preference is would clarify what properties a correct extraction method needs to have.
Going meta and using the (potentially) available data such as humans in form of uploads, is a step made in attempt to minimize the amount of data (given explicitly by the programmers) to the process that reconstructs human preference. Sure, it’s a bet (there are no universal preference-extraction methods that interpret every agent in a way it’d prefer to do itself, so we have to make a good enough guess), but there seems to be no other way to have a chance at preserving current preference. Also, there may turn out to be a good means of verification that the solution given by a particular preference-extraction procedure is the right one.
Here is a revised way of asking the question I had in mind: If our preferences determine which extraction method is the correct one (the one that results in our actual preferences), and if we cannot know or use our preferences with precision until they are extracted, then how can we find the correct extraction method?
Asking it this way, I’m no longer sure it is a real problem. I can imagine that knowing what kind of object preference is would clarify what properties a correct extraction method needs to have.
Going meta and using the (potentially) available data such as humans in form of uploads, is a step made in attempt to minimize the amount of data (given explicitly by the programmers) to the process that reconstructs human preference. Sure, it’s a bet (there are no universal preference-extraction methods that interpret every agent in a way it’d prefer to do itself, so we have to make a good enough guess), but there seems to be no other way to have a chance at preserving current preference. Also, there may turn out to be a good means of verification that the solution given by a particular preference-extraction procedure is the right one.