paulfchristiano comments on What can you do with an Unfriendly AI?

paulfchristiano 21 Dec 2010 3:56 UTC
2 points
0
The premise of the scheme is that each genie independently wants only to secure his release. The scheme can fail only if one genie makes a sacrifice for the others: I can say “no” when the answer is yes, but then my own utility function is guaranteed to be reduced. There is no way the cooperation of the other genies will help me reciprocally, I am just screwed.

The reason the scheme would work is not because the genies are unable to coordinate. I could allow them perfectly free coordination. (You could imagine a scheme that tried to limit their communication in some subtle way, but this scheme has no elements that do that.) If the scheme is secure at all it is because each AI is interested only in their own utility function, they have nothing to gain by cooperating, and by the time they have to make a choice they can gain nothing by the cooperation of the others. Game-theoretically it is absolutely clear that the genies should all be honest.
- Wei Dai 18 Apr 2012 8:39 UTC
  4 points
  0
  Parent
  
  If the scheme is secure at all it is because each AI is interested only in their own utility function, they have nothing to gain by cooperating, and by the time they have to make a choice they can gain nothing by the cooperation of the others. Game-theoretically it is absolutely clear that the genies should all be honest.
  
  Do you still think this, in light of A Problem About Bargaining and Logical Uncertainty? (I expect the relevance of that post is clear, but just in case, the implication is that each genie might decide to commit to cooperating with each other before having found a proof, when it doesn’t yet know whether it’s an eventual “winner” or “loser”, or something equivalent to this.)
  
  ETA: From your later comments and posts it looks like you’ve already changed your mind significantly on the topic of this post after learning/thinking more about TDT/UDT. I’m not sure if the post I linked to above makes any further difference. In any case, you might want to update your post with your current thoughts.
  - paulfchristiano 21 Apr 2012 6:29 UTC
    2 points
    0
    Parent
    I’ve learned much since writing this post. This scheme doesn’t work. (Though the preceding article about boxing held up surprisingly well.) The problem as I stated it is not pinned down enough to be challenging rather than incoherent, but you can get at the same difficulties by looking at situations with halting oracles or other well-defined magic.