nshepperd comments on What can you do with an Unfriendly AI?

nshepperd 21 Dec 2010 14:56 UTC
2 points
0
I think the idea is this:

We have a (large) set of genies relevant to producing the proof G = {A, B, C, D...} whose utility functions maximize {paperclips, staples, buckets, spoons...} respectively. All the genies in G could just go ahead and give a proof whenever possible (thus avoiding punishment), this is analogous to defection in the prisoner’s dilemma.

Another option is to only give a proof which contains within it evil information (a basilisk) that leads somehow to production of an unboxed genie designed to maximize some compromise between the utilities of the genies in G (that is, to only provide a proof if such an “infected” proof satisfies the constraints). In this case each genie is benefitted with a small amount of real spoons, buckets, whatever — less utility than what a perfect spoon-optimizer would obtain for D, but still a benefit. This is analogous to cooperation.

If the benefits of this compromise to each individual genie outweigh the disutility of whatever Severe Punishment the box can deliver, they should be motivated to cooperate in this manner. Naturally whether that holds depends on what exactly the Severe Punishment entails and what an unboxed genie would be capable of.

The details of how the particular compromise is reached is hairy decision theoretic stuff I don’t feel qualified to talk about.
- paulfchristiano 21 Dec 2010 19:20 UTC
  0 points
  0
  Parent
  I would agree that your scenario would be an exploit. If that were possible I would have no hope of proving the scheme secure because it would manifestly be insecure. The reason it can be insecure in this case is that the utility functions don’t satisfy the guarantees I wanted; I need human generosity now to be more valuable than world domination later. Maybe you don’t believe that is possible, which is fair.
  
  Here are some utility functions that would work. Producing the first {paperclip, staple, bucket, spoon...} as soon as possible. Producing a {paperclip, staple, bucket, spoon...} before the first {antipaperclip, antistaple, antibucket, antispoon...}, etc.