Wei Dai comments on What can you do with an Unfriendly AI?

Wei Dai 18 Apr 2012 8:39 UTC
4 points

If the scheme is secure at all it is because each AI is interested only in their own utility function, they have nothing to gain by cooperating, and by the time they have to make a choice they can gain nothing by the cooperation of the others. Game-theoretically it is absolutely clear that the genies should all be honest.

Do you still think this, in light of A Problem About Bargaining and Logical Uncertainty? (I expect the relevance of that post is clear, but just in case, the implication is that each genie might decide to commit to cooperating with each other before having found a proof, when it doesn’t yet know whether it’s an eventual “winner” or “loser”, or something equivalent to this.)

ETA: From your later comments and posts it looks like you’ve already changed your mind significantly on the topic of this post after learning/thinking more about TDT/UDT. I’m not sure if the post I linked to above makes any further difference. In any case, you might want to update your post with your current thoughts.
- paulfchristiano 21 Apr 2012 6:29 UTC
  2 points
  Parent
  I’ve learned much since writing this post. This scheme doesn’t work. (Though the preceding article about boxing held up surprisingly well.) The problem as I stated it is not pinned down enough to be challenging rather than incoherent, but you can get at the same difficulties by looking at situations with halting oracles or other well-defined magic.