paulfchristiano comments on Cryptographic Boxes for Unfriendly AI

paulfchristiano 18 Dec 2010 9:37 UTC
10 points
I am sorry that my writing is unclear, but I think you should be a little more charitable.

In particular, note that I wrote explicitly “the output of the program will be homomorphically encrypted.” This is exactly the difference between program obfuscation and homomorphic encryption. A homomorphic encryption scheme allows you to do exactly what I just did with it. In this case, you as the user still have access to the private key which can be used to decrypt the homomorphically encrypted outputs, once the AI has been safely dismantled.

The source code verifier for friendliness is basically the weakest possible thing a friendliness researcher could hope for. It doesn’t verify all source code: it verifies source code of an arbitrarily special form. If you can ever prove that any AI is friendly, then you have to implement something at least this strong. I believe that the possibility of putting an AI in a box has been discussed at reasonable length here as a solution to precisely the problem I am considering. I think that trying to reduce the demands on friendliness to a minimum, and motivating the cryptographic tools we will need for that reduction, is worthwhile. After all, I am a cryptographer not an AI researcher, so I do what I can.

I am sure that the uFAI is non-quantum because I am the one running it on my computer, and it is a classical computer. If I were using the scheme of Gentry, I could not possibly run a quantum AI in this way. The measurements I perform during the homomorphic operations destroy any quantum computation the AI tries to do, even if the computer isn’t classical. I believe that, on top of all this, Gentry’s scheme is conjecturally secure against quantum adversaries.

In terms of the security guarantees, I don’t trust them all that much, but its a lot better than nothing. There is some significant probability that the schemes are completely secure, in which case we are really safe. Otherwise, we lost nothing except a small amount of time. We can also hope that by the time this is an issue we will have unconditionally secure schemes (although this seems sort of hopelessly optimistic, I will admit).

In the last section I am suggesting precisely destroying all copies of the private key and then expecting to decrypt the answer myself. This requires producing a new object which allows you to decode the output of a certifiably friendly AI, but not to decode any of the homomorphic encryptions in which the uFAI is running. You are right that this is probably impossible classically, but I have reasonably good evidence that it is possible quantum mechanically. I think this probably isn’t reasonable to discuss until the much more basic confusion is cleared up.
- Pfft 18 Dec 2010 17:34 UTC
  2 points
  Parent
  
  The source code verifier for friendliness is basically the weakest possible thing a friendliness researcher could hope for.
  
  This is a good observation—I think the article would be stronger if you added it to the intro paragraph. As it is, I was a bit amused when reading it when you say “this creates a problem”, and then mention the problem of the box, while the problem of the verifier seems immensely harder...
- datadataeverywhere 18 Dec 2010 10:06 UTC
  0 points
  Parent
  
  I am sorry that my writing is unclear, but I think you should be a little more charitable.
  
  Please revise. What I quoted in the grandparent is not merely misleading, it is wrong. I feel like I was being at least somewhat charitable with adding “I don’t think” to my comment, rather than just pointing out the inaccuracies.
  
  A homomorphic encryption scheme allows you to do exactly what I just did with it.
  
  I still don’t think it does. You are still referring to “a homomorphically encrypted AI”, as if the AI itself is encrypted. Gentry’s results indicate that you can encrypt the inputs to the AI, which might be enough to make your scheme work. However, they do not extend to encrypting the program itself. To be able to do that, you have to demonstrate a flaw in Barak’s proof.
  
  The source code verifier for friendliness is basically the weakest possible thing a friendliness researcher could hope for.
  
  Make it probabilistic, and I’ll stop whining. The idea of a formal proof that a program is friendly seems bizarre to me, but I can accept that we need some way to verify to a high degree of certainty that a program is friendly, so I have to agree.
  
  The measurements I perform during the homomorphic operations destroy any quantum computation the AI tries to do, even if the computer isn’t classical.
  
  This is interesting, and I wasn’t aware of this. It seems natural for this to be the case, but why must it be the case?
  
  I am suggesting precisely destroying all copies of the private key and then expecting to decrypt the answer myself. [...] You are right that this is probably impossible classically, but I have reasonably good evidence that it is possible quantum mechanically.
  
  What?! I’m willing to drop all the other lines of discussion to focus on this, if you can demonstrate this.
  - paulfchristiano 18 Dec 2010 10:25 UTC
    9 points
    Parent
    You are confused about the difference between homomorphic encryption and obfuscation. The difference is NOT the difference between encrypting the program and encrypting the inputs. Those two are the same (this is perhaps the central observation of theoretical computer science), because you can just view a program as an input to the interpreter for whatever language it is written in. Homomorphic encryption does allow me to run a program without learning anything about the program.
    
    The difference is that a homomorphic encryption of a program only allows me to learn homomorphic encryptions of that program’s outputs, while an obfuscated version of the program allows me to actually learn the program’s outputs.
    
    To see that homomorphic encryption breaks the quantumness of computation, try and do a quantum computation homomorphically. The first thing you notice is that you can’t even implement a Hadamard transform or a phase transform. This can be fixed with work. The second thing you notice is that it is impossible to uncompute garbage at the end of the computation, which seems to be a fundamental difficulty.
    
    Regarding my last claim, note that program obfuscation, in the sense of Barak’s impossibility result you cited, would be completely adequate (just obfuscate the program that does what you want to do, with the secret key built in). Barak’s impossibility proof does not rule out obfuscation quantum mechanically, or in all cases. I don’t really care whether or not you believe I can do it in this case, but I thought I would add it to the discussion for completeness.
    - datadataeverywhere 18 Dec 2010 11:19 UTC
      0 points
      Parent
      
      Homomorphic encryption does allow me to run a program without learning anything about the program.
      
      This is definitely true in some cases; even if we were talking about obfuscation, Barak only proves that a universal obfuscater is impossible. As you noted, if a program is homomorphically encrypted, we learn the program’s length. We also learn how long it takes to execute certain operations. We can observe any side effects of the running program, which don’t really change. If we are allowed to give it inputs that haven’t been also homomorphically encrypted (I know you don’t plan to), we might learn much more about the program.
      
      I’m not sure that any of those properties make what you propose a Bad Idea, or even render it unworkable, since our adversary is the AI, not the compiler/translator. On the other hand, I’m still not sure what we’re gaining from the scheme. What are we scared of? I assume we are worried that our AI might write a virus for the verifier, but the AI can attempt that whether or not it has the key to decrypt its own output (though, I’ll grant that it might be much more difficult). Once it does that, it can output an AI that it understands, even if the compiler/translator provably can’t, and send the signal that the verifier was supposed to.
      
      So this is really my objection; a homomorphically encrypted program still understands itself, and its inputs and outputs (it couldn’t work any other way). The guarantee is only that the interpreter running the program cannot decipher any of those without the key, and that doesn’t seem to help in this case.
      - paulfchristiano 18 Dec 2010 18:35 UTC
        4 points
        Parent
        A homomorphically encrypted program doesn’t really understand the environment it is running in, in that it is computationally incapable of having any effect on the world until you bring it the secret key. The AI can write a virus for the verifier in the sense of a program which causes the verifier to incorrectly accept (if our test for friendliness was not actually correct) but it can’t write some code which, when the verifier attempts to verify it, does horrible damage to the universe. It can’t do anything except try and trick the verifier. This seems to me like the best guarantee you could possibly hope for.
        gwern 23 Dec 2010 23:51 UTC
        5 points
        Parent
        
        If you run a homomorphically encrypted AI, sufficiently isolated from the private key, you get the very strong guarantee that the AI’s execution has no computationally detectable effect on the rest of the world.
        
        A homomorphically encrypted program doesn’t really understand the environment it is running in, in that it is computationally incapable of having any effect on the world until you bring it the secret key.
        
        I don’t understand something here. I read many of the articles on the homomorphic encryption research and skimmed the article itself, and the impression I had was that homomorphic encryption only meant that the encrypted data could be manipulated by an unencrypted program (a native program, machine code) to produce another batch of encrypted data.
        
        This is interesting, yes. But I don’t see how it protects us from a sandboxed AI. You give the AI its homomorphically encrypted data, and it goes, hmm, very interesting, I’ll put it over here and in the meantime let’s examine the jnz instruction* - oh look, it probably gives off some radiation from the flipping gate which may let me tamper with a WiFi broadcast and let me out onto another x86-based computer from which I can access the broader Internet… etc.
        
        Any encrypted output may be 100% safe, but what about the side-effects? In functional programming, we always remember that ‘purely functional’ only means that our functions don’t have any explicit side-effects but said functions can still easily have side-effects like using 10GB of RAM and causing the OOM killer to be invoked. And cryptography is well-aware of side-channel attacks these days.
        
        * example totally made up
        paulfchristiano 24 Dec 2010 1:03 UTC
        4 points
        Parent
        Sorry that this was unclear. The intended use is that the code of the AI itself is the encrypted input, and the plaintext program manipulating it is an interpreter for the language in which the AI is written. For example, the AI could be specified as a LISP program and a homomorphically encrypted copy would be given as input to a LISP interpreter.
        gwern 24 Dec 2010 1:29 UTC
        0 points
        Parent
        So (in Haskell notation) our total set up would be something like eval (evilAI ++ problemDescription ++ verifier) where eval :: EncryptedBitstream -> EncryptedBitstream, and the homomorphic argument/input to our evaluator is effectively one gigantic inlined/combined function with no further input?
        paulfchristiano 24 Dec 2010 2:19 UTC
        0 points
        Parent
        Correct. The result of that call then gets decrypted.
        
        In Haskell, HomomorphicEncryption would probably be a monad.
        
        Edit: actually, it makes sense syntactically for it to be a monad but Haskell doesn’t really have the machinery to make it work so scratch that.
        gwern 24 Dec 2010 3:15 UTC
        0 points
        Parent
        No no, Haskell probably does. Look at the ST monad—you go ‘into’ it and then you can perform all sorts of destructive updates but purity is preserved by having to ‘escape’ it before its results are available anywhere else.
        paulfchristiano 24 Dec 2010 4:00 UTC
        0 points
        Parent
        The problem is that the only thing Haskell can do with functions is use them as black boxes, to the best of my knowledge. To apply a function to homomorphically encrypted data, you can’t use it as a black box—you need to use an explicit description of the function.
        Expand this thread
        gwern 24 Dec 2010 4:30 UTC
        0 points
        Parent
        Well, my point was that you can do something akin to Perl’s taint - force operations to be done only within a particular type context.
        
        So you could do something similar to the ST monad but instead of accepting functions which generate any type output, it operates on, say, a parse tree/ADT representing a Lisp function which is evaluated with the rest of the homomorphic data.
        
        But it’s not really important; any such strategy would probably be done in a new language (for efficiency, if nothing else) and the into/escape invariant enforced by manual code inspection or something.
        jimrandomh 24 Dec 2010 0:21 UTC
        2 points
        Parent
        In order to hack the substrate, the AI would probably need to crack the homomorphic encryption key used to run it. Noticing that jnz gives off radiation won’t do any good if both the timing of when that instruction runs and the results of running it are on the other side of a computationally hard problem.