It seems to me that we should be very liberal in this regard: biases which remain in the AIs model of SO+UO are likely to be minor biases (as major biases will have been stated by humans as things to avoid). These are biases so small that we’re probably not aware of them. Compared with the possibility of losing something human-crucial we didn’t think of explicitly stating, I’d say the case is strong to err on the size of increased complexity/more biases and preferences allowed. Essentially, we’re unlikely to have missed some biases we’d really care about eliminating, but very likely to have missed some preference we’d really miss if it were gone.
You frame the issue as though the cost of being liberal is that we’ll have more biases preventing us from achieving our preferences, but I think this understates the difficulty. Precisely because it’s difficult to distinguish biases from preferences, accidentally preserving unnecessary biases is equivalent to being liberal and unnecessarily adding entirely new values to human beings. We’re not merely faced with biases that would function as instrumental difficulties to achieving our goals, but with direct end-point changes to those goals.
You frame the issue as though the cost of being liberal is that we’ll have more biases preventing us from achieving our preferences, but I think this understates the difficulty. Precisely because it’s difficult to distinguish biases from preferences, accidentally preserving unnecessary biases is equivalent to being liberal and unnecessarily adding entirely new values to human beings. We’re not merely faced with biases that would function as instrumental difficulties to achieving our goals, but with direct end-point changes to those goals.
I agree there’s much more investigation to be made in that area.