cousin_it comments on SUDT: A toy decision theory for updateless anthropics

cousin_it 25 Feb 2014 12:36 UTC
2 points
If selfishness is reflectively inconsistent, and an AI can self-modify, then I don’t see how an AI can stay selfish. Do you have any ideas?
- Wei Dai 26 Feb 2014 11:29 UTC
  4 points
  Parent
  Why would an AI want to self-modify away from selfishness? Because future copies of itself can’t cooperate fully if it remained selfish? That may not be the case if we solve the problem of cooperation between agents with conflicting preferences. Alternatively, AI may not want to self-modify for “acausal” reasons (for example it’s worried about itself not existing if it decided to prevent future selfish versions of itself from existing), or for ethical reasons (it values being selfish, or values the existence of selfish agents in the world).
  - Eliezer Yudkowsky 14 Mar 2014 5:29 UTC
    4 points
    Parent
    How is it coherent for an agent at time T1 to ‘want’ copy A at T2 to care only about A and copy B at T2 to care only about B? There’s no non-meta way to express this—you would have to care more strongly about agents having a certain exact decision function than about all object-level entities at stake. When it comes to object-level things, whatever the agent at T1 coherently cares about, it will want A and B to care about.
- drnickbone 14 Mar 2014 7:29 UTC
  2 points
  Parent
  It strikes me that a persistently selfish agent may be somewhat altruistic towards its future selves. The agent might want its future versions to be free to follow their own selfish preferences, rather than binding them to its current selfish preferences.
  
  Another alternative is that the agent is not only selfish but lazy… it could self-modify to bind its future selves, but that takes effort, and it can’t be bothered.
  
  Either way, it’s going to take a weird sort of utility function to reproduce human selfishness in an AI.
  - cousin_it 14 Mar 2014 10:12 UTC
    2 points
    Parent
    Now that I think of it, caring about making more copies of yourself might be more fundamental than caring about object-level things in the world… I wonder what kind of math could be used to model this.