Why would an AI want to self-modify away from selfishness? Because future copies of itself can’t cooperate fully if it remained selfish? That may not be the case if we solve the problem of cooperation between agents with conflicting preferences. Alternatively, AI may not want to self-modify for “acausal” reasons (for example it’s worried about itself not existing if it decided to prevent future selfish versions of itself from existing), or for ethical reasons (it values being selfish, or values the existence of selfish agents in the world).
How is it coherent for an agent at time T1 to ‘want’ copy A at T2 to care only about A and copy B at T2 to care only about B? There’s no non-meta way to express this—you would have to care more strongly about agents having a certain exact decision function than about all object-level entities at stake. When it comes to object-level things, whatever the agent at T1 coherently cares about, it will want A and B to care about.
It strikes me that a persistently selfish agent may be somewhat altruistic towards its future selves. The agent might want its future versions to be free to follow their own selfish preferences, rather than binding them to its current selfish preferences.
Another alternative is that the agent is not only selfish but lazy… it could self-modify to bind its future selves, but that takes effort, and it can’t be bothered.
Either way, it’s going to take a weird sort of utility function to reproduce human selfishness in an AI.
Now that I think of it, caring about making more copies of yourself might be more fundamental than caring about object-level things in the world… I wonder what kind of math could be used to model this.
If selfishness is reflectively inconsistent, and an AI can self-modify, then I don’t see how an AI can stay selfish. Do you have any ideas?
Why would an AI want to self-modify away from selfishness? Because future copies of itself can’t cooperate fully if it remained selfish? That may not be the case if we solve the problem of cooperation between agents with conflicting preferences. Alternatively, AI may not want to self-modify for “acausal” reasons (for example it’s worried about itself not existing if it decided to prevent future selfish versions of itself from existing), or for ethical reasons (it values being selfish, or values the existence of selfish agents in the world).
How is it coherent for an agent at time T1 to ‘want’ copy A at T2 to care only about A and copy B at T2 to care only about B? There’s no non-meta way to express this—you would have to care more strongly about agents having a certain exact decision function than about all object-level entities at stake. When it comes to object-level things, whatever the agent at T1 coherently cares about, it will want A and B to care about.
It strikes me that a persistently selfish agent may be somewhat altruistic towards its future selves. The agent might want its future versions to be free to follow their own selfish preferences, rather than binding them to its current selfish preferences.
Another alternative is that the agent is not only selfish but lazy… it could self-modify to bind its future selves, but that takes effort, and it can’t be bothered.
Either way, it’s going to take a weird sort of utility function to reproduce human selfishness in an AI.
Now that I think of it, caring about making more copies of yourself might be more fundamental than caring about object-level things in the world… I wonder what kind of math could be used to model this.