I completely agree. The argument may be wrong but the point it raises, that sloppily assuming things about which possible causal continuations of self I care about, is important.
My initial reaction: we can still use our current utility function, but make sure the CEV analysis or whatever doesn’t say “what would you want if you were more intelligentetc?” but instead “what would you want if you were changed in a way you currently want to be changed”?
This includes “what would you want if we found fixed points of iterated changes based on previous preferences”, so that if I currently want to value paperclips more but don’t care whether I value factories differently, but if upon modifying me to value paperclips more it turns out I would want to value factories more, then changing my preferences to value factories more is acceptable.
The part where I’m getting confused right now (rather, the part where I notice I’m getting confused :)) is that calculating fixed points almost certainly depends on the order of alteration, so that there are lots of different future-mes that I prefer to current-me that are at local maximums.
Also I have no idea how much we need to apply our current preferences to the fixed-point-mes. Not at all? 100%? Somehow something in-between? Or to the intermediate-state-mes.
I don’t think the order issue is a big problem—there is not One Glowing Solution, we just need to find something nice and tolerable.
Also I have no idea how much we need to apply our current preferences to the fixed-point-mes. Not at all? 100%? Somehow something in-between? Or to the intermediate-state-mes.
I completely agree. The argument may be wrong but the point it raises, that sloppily assuming things about which possible causal continuations of self I care about, is important.
My initial reaction: we can still use our current utility function, but make sure the CEV analysis or whatever doesn’t say “what would you want if you were more intelligentetc?” but instead “what would you want if you were changed in a way you currently want to be changed”?
This includes “what would you want if we found fixed points of iterated changes based on previous preferences”, so that if I currently want to value paperclips more but don’t care whether I value factories differently, but if upon modifying me to value paperclips more it turns out I would want to value factories more, then changing my preferences to value factories more is acceptable.
The part where I’m getting confused right now (rather, the part where I notice I’m getting confused :)) is that calculating fixed points almost certainly depends on the order of alteration, so that there are lots of different future-mes that I prefer to current-me that are at local maximums.
Also I have no idea how much we need to apply our current preferences to the fixed-point-mes. Not at all? 100%? Somehow something in-between? Or to the intermediate-state-mes.
I don’t think the order issue is a big problem—there is not One Glowing Solution, we just need to find something nice and tolerable.
That is the question.