paulfchristiano comments on Cryptographic Boxes for Unfriendly AI

paulfchristiano 18 Dec 2010 18:35 UTC
4 points
A homomorphically encrypted program doesn’t really understand the environment it is running in, in that it is computationally incapable of having any effect on the world until you bring it the secret key. The AI can write a virus for the verifier in the sense of a program which causes the verifier to incorrectly accept (if our test for friendliness was not actually correct) but it can’t write some code which, when the verifier attempts to verify it, does horrible damage to the universe. It can’t do anything except try and trick the verifier. This seems to me like the best guarantee you could possibly hope for.
- gwern 23 Dec 2010 23:51 UTC
  5 points
  Parent
  
  If you run a homomorphically encrypted AI, sufficiently isolated from the private key, you get the very strong guarantee that the AI’s execution has no computationally detectable effect on the rest of the world.
  
  A homomorphically encrypted program doesn’t really understand the environment it is running in, in that it is computationally incapable of having any effect on the world until you bring it the secret key.
  
  I don’t understand something here. I read many of the articles on the homomorphic encryption research and skimmed the article itself, and the impression I had was that homomorphic encryption only meant that the encrypted data could be manipulated by an unencrypted program (a native program, machine code) to produce another batch of encrypted data.
  
  This is interesting, yes. But I don’t see how it protects us from a sandboxed AI. You give the AI its homomorphically encrypted data, and it goes, hmm, very interesting, I’ll put it over here and in the meantime let’s examine the jnz instruction* - oh look, it probably gives off some radiation from the flipping gate which may let me tamper with a WiFi broadcast and let me out onto another x86-based computer from which I can access the broader Internet… etc.
  
  Any encrypted output may be 100% safe, but what about the side-effects? In functional programming, we always remember that ‘purely functional’ only means that our functions don’t have any explicit side-effects but said functions can still easily have side-effects like using 10GB of RAM and causing the OOM killer to be invoked. And cryptography is well-aware of side-channel attacks these days.
  
  * example totally made up
  - paulfchristiano 24 Dec 2010 1:03 UTC
    4 points
    Parent
    Sorry that this was unclear. The intended use is that the code of the AI itself is the encrypted input, and the plaintext program manipulating it is an interpreter for the language in which the AI is written. For example, the AI could be specified as a LISP program and a homomorphically encrypted copy would be given as input to a LISP interpreter.
    - gwern 24 Dec 2010 1:29 UTC
      0 points
      Parent
      So (in Haskell notation) our total set up would be something like eval (evilAI ++ problemDescription ++ verifier) where eval :: EncryptedBitstream -> EncryptedBitstream, and the homomorphic argument/input to our evaluator is effectively one gigantic inlined/combined function with no further input?
      - paulfchristiano 24 Dec 2010 2:19 UTC
        0 points
        Parent
        Correct. The result of that call then gets decrypted.
        
        In Haskell, HomomorphicEncryption would probably be a monad.
        
        Edit: actually, it makes sense syntactically for it to be a monad but Haskell doesn’t really have the machinery to make it work so scratch that.
        gwern 24 Dec 2010 3:15 UTC
        0 points
        Parent
        No no, Haskell probably does. Look at the ST monad—you go ‘into’ it and then you can perform all sorts of destructive updates but purity is preserved by having to ‘escape’ it before its results are available anywhere else.
        paulfchristiano 24 Dec 2010 4:00 UTC
        0 points
        Parent
        The problem is that the only thing Haskell can do with functions is use them as black boxes, to the best of my knowledge. To apply a function to homomorphically encrypted data, you can’t use it as a black box—you need to use an explicit description of the function.
        gwern 24 Dec 2010 4:30 UTC
        0 points
        Parent
        Well, my point was that you can do something akin to Perl’s taint - force operations to be done only within a particular type context.
        
        So you could do something similar to the ST monad but instead of accepting functions which generate any type output, it operates on, say, a parse tree/ADT representing a Lisp function which is evaluated with the rest of the homomorphic data.
        
        But it’s not really important; any such strategy would probably be done in a new language (for efficiency, if nothing else) and the into/escape invariant enforced by manual code inspection or something.
  - jimrandomh 24 Dec 2010 0:21 UTC
    2 points
    Parent
    In order to hack the substrate, the AI would probably need to crack the homomorphic encryption key used to run it. Noticing that jnz gives off radiation won’t do any good if both the timing of when that instruction runs and the results of running it are on the other side of a computationally hard problem.