Why would an AI want to self-modify away from selfishness? Because future copies of itself can’t cooperate fully if it remained selfish? That may not be the case if we solve the problem of cooperation between agents with conflicting preferences. Alternatively, AI may not want to self-modify for “acausal” reasons (for example it’s worried about itself not existing if it decided to prevent future selfish versions of itself from existing), or for ethical reasons (it values being selfish, or values the existence of selfish agents in the world).
How is it coherent for an agent at time T1 to ‘want’ copy A at T2 to care only about A and copy B at T2 to care only about B? There’s no non-meta way to express this—you would have to care more strongly about agents having a certain exact decision function than about all object-level entities at stake. When it comes to object-level things, whatever the agent at T1 coherently cares about, it will want A and B to care about.
Why would an AI want to self-modify away from selfishness? Because future copies of itself can’t cooperate fully if it remained selfish? That may not be the case if we solve the problem of cooperation between agents with conflicting preferences. Alternatively, AI may not want to self-modify for “acausal” reasons (for example it’s worried about itself not existing if it decided to prevent future selfish versions of itself from existing), or for ethical reasons (it values being selfish, or values the existence of selfish agents in the world).
How is it coherent for an agent at time T1 to ‘want’ copy A at T2 to care only about A and copy B at T2 to care only about B? There’s no non-meta way to express this—you would have to care more strongly about agents having a certain exact decision function than about all object-level entities at stake. When it comes to object-level things, whatever the agent at T1 coherently cares about, it will want A and B to care about.