luminosity comments on Cryptographic Boxes for Unfriendly AI

luminosity 18 Dec 2010 9:02 UTC
8 points
0
I found the discussion of homomorphic encryption interesting, but

One solution to the problem of friendliness is to develop a self-improving, unfriendly AI, put it in a box, and ask it to make a friendly AI for us. This gets around the incredible difficulty of friendliness, but it creates a new, apparently equally impossible problem.

If you can reliably build an AI, but you cannot reliably build friendliness in it, why should I trust that you can build a program which in turn can reliably verify friendliness? It seems to me that if you are unable to build friendliness, it is due to not sufficiently understanding friendliness. If you don’t sufficiently understand it, I do not want you building a program which is the check on the release of an AI.
- Tyrrell_McAllister 18 Dec 2010 18:25 UTC
  9 points
  0
  Parent
  
  It seems to me that if you are unable to build friendliness, it is due to not sufficiently understanding friendliness. If you don’t sufficiently understand it, I do not want you building a program which is the check on the release of an AI.
  
  It seems very plausible to me that certifying friendly source code (while still hugely difficult) is much easier than finding friendly source code. For example, maybe we will develop a complicated system S of equations such that, provably, any solution to S encodes the source of an FAI. While finding a solution to S might be intractably difficult for us, verifying a solution x that has been handed to us would be easy—just plug x into S and see if x solves the equations.
  
  ETA: The difficult part, obviously, is developing the system S in the first place. But that would be strictly easier than additionally finding a solution to S on our own.
  What links here?
  - Tyrrell_McAllister's comment on Cryptographic Boxes for Unfriendly AI by paulfchristiano (18 Dec 2010 18:45 UTC; 5 points)
  - Oscar_Cunningham 18 Dec 2010 21:29 UTC
    2 points
    0
    Parent
    It looks like you’re just arguing about P=NP, no?
    - Tyrrell_McAllister 18 Dec 2010 22:04 UTC
      4 points
      0
      Parent
      
      It looks like you’re just arguing about P=NP, no?
      
      Even if P!=NP, there still remains the question of whether friendliness is one of those things that we will be able to certify before we can construct an FAI from scratch, but after we can build a uFAI. Eliezer doesn’t think so.
- paulfchristiano 18 Dec 2010 9:18 UTC
  0 points
  0
  Parent
  If I have a system which is manifestly maximizing a utility function I can understand, then I have some hope of proving claims about its behavior. I believe that designing such a system is possible, but right now it seems way out of our depth.
  
  Every example of intelligence we have ever seen behaves in an incredibly complicated way, which isn’t manifestly maximizing any utility function at all. The way things are going, it looks like the first AI humans come up with (if they come up with one in the next 50 years at least) is probably going to be similarly difficult to analyze formally.
  
  I think someone who worked on friendliness would agree, but I don’t know. I have only started thinking about these issues recently, so I may be way off base.