“extrapolating your true consequentialist preferences” is at least partially about adding stuff that wasn’t previously there rather than discovering something that was hidden.
Yes yes yes, this is a point I make often. Finding true preferences is not just a learning process, and cannot be reduced to a learning process.
As for why it needs to be done… well, for all the designs like Inverse Reinforcement Learning that involve AIs learning human preferences, it has to be done adequately if those are to work at all.
Yes yes yes, this is a point I make often. Finding true preferences is not just a learning process, and cannot be reduced to a learning process.
As for why it needs to be done… well, for all the designs like Inverse Reinforcement Learning that involve AIs learning human preferences, it has to be done adequately if those are to work at all.