Why should your True Preferences have to be selfish?
What I mean is: For most given people I meet, it seems very plausible to me that, say, self-preservation is a big part of their extrapolated values. And it seems much less plausible that their extrapolated value is monotonic increasing in consciousness or number of conscious beings existing.
Any given outcome might have hints that it’s part of extrapolated value/not a fake utility function. Examples of hints are: It persists as a feeling of preference over a long time and many changes of circumstance; there are evolutionary reasons why it might be so strong an instrumental value that it becomes terminal; etc.
Self-preservation has a lot of hints in its support. Monotonicity in consciousness seems less obvious (maybe strictly less obvious, in that every hint supporting monotonicity might also support self-preservation, with some further hint supporting self-preservation but not monotonicity).
What I mean is: For most given people I meet, it seems very plausible to me that, say, self-preservation is a big part of their extrapolated values. And it seems much less plausible that their extrapolated value is monotonic increasing in consciousness or number of conscious beings existing.
Any given outcome might have hints that it’s part of extrapolated value/not a fake utility function. Examples of hints are: It persists as a feeling of preference over a long time and many changes of circumstance; there are evolutionary reasons why it might be so strong an instrumental value that it becomes terminal; etc.
Self-preservation has a lot of hints in its support. Monotonicity in consciousness seems less obvious (maybe strictly less obvious, in that every hint supporting monotonicity might also support self-preservation, with some further hint supporting self-preservation but not monotonicity).