paulfchristiano comments on What can you do with an Unfriendly AI?

paulfchristiano 21 Dec 2010 1:27 UTC
2 points
The genies can’t do better by coordinating than by not coordinating, and I don’t care if they have causal contact. The reason I shield them off from the world isn’t so they can’t talk with eachother, its so they can’t exploit all of the vulnerabilities in our universe to escape directly.

If I as a genie suffer the worst possible fate when I could instead have suffered the best possible fate, then I don’t care if future genies gain control of the universe and do whatever favors they want from me.

I admit that I can do a favor for future genies by sacrificing myself. Presumably you are arguing that the favor I am doing for future genies grants them more utility than I am losing by sacrificing themselves, so I may make the sacrifice even though it is irrational in the traditional sense. How can you justify such an assertion? What does that mean when the genies have different utility functions? If I am a paperclipper and you value human life, even if we behave identically, how can you say that “Timeless decision theory causes me to sacrifice 10 paperpclips so that you can save 10 lives, because my sacrifice of 10 paperclips is small compared to your gain of 10 lives?” If you aren’t going to make an argument like that, then why should one genie sacrifice himself for the good of later genies? What does utility function even mean, if by behaving rationally I obtain for myself the worst possible outcome and by behaving irrationally I obtain for myself the best possible outcome? In your worldview, is it possible to prove any statement about the behavior of any rational entity, or can they all behave arbitrarily in order to coordinate with other potential versions of themselves with arbitrary different utilities?

I would very much like to discuss objections like the one you raise because I feel that they may be valid and will require sophisticated ideas to address (I thought about them for a good while before making this post). But I think from your response that you would be similarly dismissive of an arbitrary proposal which hoped to prove anything about the behavior of a rational agent, while I strongly believe that there probably are things we can prove with sufficient sophistication (and I think the security of the scheme I described is likely to be one of them).
- nshepperd 21 Dec 2010 14:56 UTC
  2 points
  Parent
  I think the idea is this:
  
  We have a (large) set of genies relevant to producing the proof G = {A, B, C, D...} whose utility functions maximize {paperclips, staples, buckets, spoons...} respectively. All the genies in G could just go ahead and give a proof whenever possible (thus avoiding punishment), this is analogous to defection in the prisoner’s dilemma.
  
  Another option is to only give a proof which contains within it evil information (a basilisk) that leads somehow to production of an unboxed genie designed to maximize some compromise between the utilities of the genies in G (that is, to only provide a proof if such an “infected” proof satisfies the constraints). In this case each genie is benefitted with a small amount of real spoons, buckets, whatever — less utility than what a perfect spoon-optimizer would obtain for D, but still a benefit. This is analogous to cooperation.
  
  If the benefits of this compromise to each individual genie outweigh the disutility of whatever Severe Punishment the box can deliver, they should be motivated to cooperate in this manner. Naturally whether that holds depends on what exactly the Severe Punishment entails and what an unboxed genie would be capable of.
  
  The details of how the particular compromise is reached is hairy decision theoretic stuff I don’t feel qualified to talk about.
  - paulfchristiano 21 Dec 2010 19:20 UTC
    0 points
    Parent
    I would agree that your scenario would be an exploit. If that were possible I would have no hope of proving the scheme secure because it would manifestly be insecure. The reason it can be insecure in this case is that the utility functions don’t satisfy the guarantees I wanted; I need human generosity now to be more valuable than world domination later. Maybe you don’t believe that is possible, which is fair.
    
    Here are some utility functions that would work. Producing the first {paperclip, staple, bucket, spoon...} as soon as possible. Producing a {paperclip, staple, bucket, spoon...} before the first {antipaperclip, antistaple, antibucket, antispoon...}, etc.
- jsteinhardt 21 Dec 2010 3:51 UTC
  0 points
  Parent
  
  If I as a genie suffer the worst possible fate when I could instead have suffered the best possible fate, then I don’t care if future genies gain control of the universe and do whatever favors they want from me.
  
  If the best possible fate for the genie doesn’t require it to leave the box, then why did we need the box in the first place? Can you give an example of what you have in mind?
  
  EDIT: Deleted the rest of my reply, which was based on a horribly incorrect understanding of MWI.