Manfred comments on Selfish preferences and self-modification

Manfred 14 Jan 2015 20:02 UTC
0 points
If one cares about their copies because their past self self-modified to a stable point, then what matters are the preferences of this causal ancestor. If I don’t want my preferences to be satisfied if I am given a pill that makes me evil, then I will self-modify so that if one of my future copies takes the evil pill, my other future copies will not help them.

In other words, there is absolutely not one true definition here.

However, at a minimum, agents will self-modify so that copies of them with the same values and world-model, but who locate themselves at different places within that model, will sacrifice for each other.
- ike 14 Jan 2015 20:21 UTC
  0 points
  Parent
  You are just giving yourself a large incentive to lie to your alter ego if you suspect that you are diverging. That doesn’t sound good.
  
  On the original post: I don’t think that it’s practical to commit to something like that right now as a human. I have the same problem with TDT. I can agree that self modifying is best, but still not do as I would wish to have precommitted. But as we’re talking about cloning here anyway, we can assume that self-modification is possible, in which the question arises whether this modification has positive expected utility. I think it does, but you seem to be trying to say that you wouldn’t need to modify, as each side would stay selfish but still do what they would have preferred in the past. Why would you continue doing something that you committed to if it no longer has positive utility?
  
  Would you pay the traveler in Parfit’s hitchhiker as a selfish agent? If not, why cooperate with your alter ego after you find out that you are B? (Yes, I’m comparing this to Parfit’s hitchhiker with your commitment to press the button if B analogous to a commitment to give money later. It’s a little different as it’s symmetrical, but the question of whether you should pay up seems isomorphic. Assuming the traveler isn’t reading your mind, in which case TDT enters the picture.)