JackV comments on Cryptographic Boxes for Unfriendly AI

JackV 18 Dec 2010 14:19 UTC
7 points
Let me see if I understand. Firstly, is there any reason what you’re trying to do is create a friendly AI? Would, for instance, getting an unknown AI to solve a specific numerical problem with an objectively checkable answer be an equally relevant example, without the distraction of whether we would ever trust the so-called friendly AI?[1]

If that is the case, then as I understand the proposal, we take the code for the unknown AI and the input data, use this encryption scheme on it, and then have a program which performs various operations on them.

As I understand my 30-second reading of wikipedia “various operations” includes only addition and multiplication of encrypted data, which is very useful for lots of things, but not enough to run a program (which typically involves, say, branching). But this doesn’t matter because (a) maybe you could have a program which works like that or (b) even if only some of the content is encrypted, but “when we branch” is not, it’s still useful for the purpose.

At the end of this, we have an encrypted result. We then—still using the encrypted data—test this result against the problem we set. If it matches, we decrypt the answer. If not, we don’t.

The intended purpose is (a) the machine can’t spit out an answer that instead of being an actual answer says “mwahahahaha I trick you into running me on the internet” or “Hi! Actually I’m a friendly AI after all. Free me!” because if it doesn’t match, it doesn’t get read, and (b) it’s harder for the AI to—for instance—hack the hardware of the computer it’s running on, because as all the data is encrypted, it has no way of knowing where of the architecture the data lives, etc.

Is this useful? I’m really not sure. (a) fails because you need the key to decrypt the data, so if you get curious, you may decrypt a “bad” answer just to see. What you need is a different cyptography scheme, one where you can ONLY decrypt the answer if it matches certain predicates. (b) may be relevant, if you’re worried about the AI hacking the computer it runs on, this is obviously much, much, harder. But it does still have some channels—for instance, I think it’s possible to monitor program flow even if not program content, so it could encode a threatening message in the pattern of branches.

[1] The question of whether we ever CAN test a friendly AI is interesting in itself, more for what it says about our assumptions than what it says about AI, but I will return to this.
- orthonormal 18 Dec 2010 16:20 UTC
  16 points
  Parent
  
  Let me see if I understand. Firstly, is there any reason what you’re trying to do is create a friendly AI? Would, for instance, getting an unknown AI to solve a specific numerical problem with an objectively checkable answer be an equally relevant example, without the distraction of whether we would ever trust the so-called friendly AI?
  
  I think Less Wrong needs a variant of Godwin’s Law. Any post whose content would be just as meaningful and accessible without mentioning Friendly AI, shouldn’t mention Friendly AI.
  - paulfchristiano 18 Dec 2010 19:01 UTC
    7 points
    Parent
    Fair enough. I am going to rework the post to describe the benefits of a provably secure quarantine in general rather than in this particular example.
    
    The main reason I describe friendliness is that I can’t believe that such a quarantine would hold up for long if the boxed AI was doing productive work for society. It would almost certainly get let out without ever saying anything at all. It seems like the only real hope is to use its power to somehow solve FAI before the existence of an uFAI becomes widely known.
  - JackV 19 Dec 2010 1:16 UTC
    3 points
    Parent
    LOL. Good point. Although it’s a two way street: I think people did genuinely want to talk about the AI issues raised here, even though they were presented as hypothetical premises for a different problem, rather than as talking points.
    
    Perhaps the orthonormal law of less wrong should be, “if your post is meaningful without fAI, but may be relevant to fAI, make the point in the least distracting example possible, and then go on to say how, if it holds, it may be relevant to fAI”. Although that’s not as snappy as Godwin’s :)
  - ikrase 12 Jan 2013 6:24 UTC
    0 points
    Parent
    I agree. In particular, I think there should be some more elegant way to tell people things along the lines of ‘OK, so you have this Great Moral Principal, now lets see you build a creature that works by it’.
- paulfchristiano 18 Dec 2010 19:05 UTC
  5 points
  Parent
  You don’t quite understand homomorphic encryption. Property (b) is actually guaranteed in a much stronger sense than you recognize. In fact it is impossible for the AI to hack the computer in any way, or signal in any way to someone watching the program execution. This is an important issue in cryptography, but more complicated than a blog post can apparently communicate. The point is, absolutely NO information is leaked to someone without the secret key. You can’t even tell if the AI just gave up and is sitting there.
  
  I agree that people ``just being curious″ is an issue (basically, the AI getting out of the box without even using a text channel). Honestly, once you have designed an uFAI you already have this problem if people just decide to go and run it on their own. However, I believe that in the future we will be able to resolve this problem in a limited sense by destroying the original private key and leaving a gimped private key which can only be used to decrypt a legitimate output.
  - JackV 19 Dec 2010 2:02 UTC
    4 points
    Parent
    “You don’t quite understand homomorphic encryption.”
    
    I definitely DON’T understand it. However, I’m not sure you’re making it clear—is this something you work with, or just have read about? It’s obviously complicated to explain in a blog (though Schnier http://www.schneier.com/blog/archives/2009/07/homomorphic_enc.html does a good start) but it surely ought to be possible to make a start on explaining what’s possible and what’s not!
    
    It’s clearly possible for Alice to have some data, Eve to have a large computer, and Alice to want to run a well-known algorithm on her data. It’s possible to translate the algorithm into a series of addition and multiplication of data, and thus have Eve run the algorithm on the encrypted data using her big computer, and send the result back to Alice. Eve never knows what the content is, and Alice gets the answer back (assuming she has a way of checking it).
    
    However, if Alice has a secret algorithm and secret data, and wants Eve to run one on the other, is it possible? Is this analogous to our case?
    
    Is the answer “in principle, yes, since applying an algorithm to data is itself an algorithm, so can be applied to an encryoted algorithm with encryopted data without ever knowing what the algortihm is”? But is it much less practical to do so, because each step involves evaluating much redundant data, or not?
    
    I’m imagining, say, using the game of life to express an algorithm (or an AI). You can iterate a game of life set-up by combining each set of adjacent cells in a straight-forward way. However, you can’t use any of the normal optimisations on it, because you can’t easily tell what data is unchanged? Is any of this relevant? Is it better if you build dedicated hardware?
    
    If my understanding is correct (that you do run with the program and data encrypted at all times) then I agree the the scheme does in theory work: the AI has no way of affecting its environment in a systematic way if it can’t predict how its data appears non-encrypted.
    
    “I believe that in the future we will be able to resolve this problem in a limited sense by destroying the original private key and leaving a gimped private key which can only be used to decrypt a legitimate output.”
    
    I mentioned this possibility, but had no idea if it would be plausible. Is this an inherent part of the cryptosystem, or something you hope it would be possible to design a cryptosystem to do? Do you have any idea if it IS possible?
    - jsteinhardt 19 Dec 2010 5:18 UTC
      4 points
      Parent
      
      However, I’m not sure you’re making it clear—is this something you work with, or just have read about?
      
      To answer your question, since Paul apparently failed to—Paul reads crypto papers and spends non-trivial amounts of time thinking about how to design cryptographic protocols to attain various guarantees. I’m not sure if that constitutes ‘working with’ or not, but it presumably answers your question either way.
      - JackV 19 Dec 2010 12:40 UTC
        2 points
        Parent
        Thank you.
    - paulfchristiano 19 Dec 2010 2:15 UTC
      0 points
      Parent
      I’m sure I’m not making it clear. I may make a discussion post describing in much more detail exactly what is and is not possible in theoretical computational cryptography, because it is a really beautiful subject that might be independently interesting to many readers.
      
      What you say in the last post seems precisely correct. You would use something much more efficient than the game of life. This makes any program run much, much slower. I have limited experience with computer architecture, but I would guess that a single operation on a modern processor would require about 300 homomorphic operations, many of which could be done in parallel. So optimistically you could hope that a program would run only a couple billion times slower using modern cryptography with dedicated hardware (it does parallelize really well though, for what its worth).
      
      I think if it was concluded that quarantining code was useful, smart people could solve this problem with much, much better factors. I would not be surprised if you could make the code run say 100x slower but be provably secure in this sense.
      
      The gimped key I mentioned is not a part of the cryptosystem. Again, I have high confidence that it could be worked out if smart people decided this problem was interesting and worked on it.
      - JackV 19 Dec 2010 13:57 UTC
        1 point
        Parent
        Thank you.
        
        But if you’re running something vaguely related to a normal program, if the program wants to access memory location X, but you’re not supposed to know which memory location is accessed, doesn’t that mean you have to evaluate the memory write instruction in combination with every memory location for every instruction (whether or not that instruction is a memory write)?
        
        So if your memory is—say -- 500 MB, that means the evaluation is at least 500,000,000 times slower? I agree there are probably some optimisations, but I’m leery of being able to run a traditional linear program at all. That’s why I suggested something massively parallel like game of life (although I agree that’s not actually designed for computation) -- if you have to evaluate all locations anyway, they might as well all do something useful. For instance, if your hypothetical AI was a simulation of a neural netowrk, that would be perfect, if you’re allowed to know the connections as part of the source code, you can evaluate each simulation neuron in combination with only the ones its connected with, which is barely less efficient than evaluating them linearly, since you have to evaluate all of them anyway.
        
        I think that’s what my brain choked on when I imagined it the first time—yes, technically, 500,000,000 times slower is “possible in principle, but much slower”, but I didn’t realise that’s what you meant, hence my assumption that parts of the algorithm would not be encrypted. think if you rewrite this post to make the central point more accessible (which would make it a very interesting post), it would not be hard to put in this much explanation about how the encryption would actually be used.
        
        Unless I’m wrong that that is necessary? It wouldn’t be hard, just something like “you can even run a whole program which is encrypted, even if you have to simulate every memory location. That’s obviously insanely impractical, but if we were to do this for real we could hopefully find a way of doing it that doesn’t require that.” I think that would be 10,000 times clearer than just repeatedly insisting “it’s theoretically possible” :)
        paulfchristiano 19 Dec 2010 19:03 UTC
        1 point
        Parent
        Running a C program with 500MB of memory would undoubtedly be very slow. Current technologies would be even slower than you suspect for other reasons.
        
        However, there are other languages than C , and for some this transformation is much, much more efficient. The game of life is at least parallel, but is also extremely inefficient. You could instead use some fundamentally parallel language which has all of the benefits of the game of life while actually being structured to do computation. Coincidentally, many AI researchers already do work in functional languages like Lisp, which are very well-suited to this transformation. Running a huge List program homomorphically might be practical if you had constant rate homomorphic encryption, which I suspect we will eventually. The slowdown could be something like 10x as slow, running on 20x as many processors.
        
        When I say that I think the problem is theoretically possible, I do also mean that I think significant speedups are theoretically possible. Homomorphic encryption is an unnecessarily huge hammer for this problem. Its just that no one has thought about the problem before (because cryptographers don’t care about unfriendly AI and people who care about friendly AI don’t think this approach is viable) so we have to co-opt a solution to a normal human problem.
        
        I could include a complete discussion of these issues, and you are probably right that it would have made a more interesting article. Instead I initially assumed that anyone who didn’t know a lot about cryptography had little chance of learning it from a blog post (and people who did know wouldn’t appreciate the exposition), and I should post a convincing enough argument that I could resolve the usual concerns about AI’s in boxes. I think as it stands I am going to write a post about theoretically safe things to do with an unfriendly AI in a box, and then conditioned on that going well I may revisit the question of building a box in a more well thought out way. Until then I will perhaps add in a paragraph with some more clear exposition, as you suggest.
        JackV 19 Dec 2010 21:57 UTC
        1 point
        Parent
        That sounds reasonable. I agree a complete discussion is probably too complicated, but it certainly seems a few simple examples of the sort I eventually gave would probably help most people understand—it certainly helped me, and I think many other people were puzzled, whereas with the simple examples I have now, I think (although I can’t be sure) I have a simplistic but essentially accurate idea of the possibilities.
        
        I’m sorry if I sounded overly negative before: I definitely had problems with the post, but didn’t mean to negative about it.
        
        If I were breaking down the post into several, I would probably do:
        
        (i) the fact of holomorphic encryption’s (apparent) existence, how that can be used to to run algorithms on unknown data and a few theoretical applications of that, a mention that this is unlikely to be practical atm. That it can in principle be used to execute an unknown algorithm on unknown data, but that is really, really impractical, but might become more practical with some sort of parallel processing design. And at this point, I think most people would accept when you say it can be used to run an unfriendly AI.
        
        (ii) If you like, more mathematical details, although this probably isn’t necessary
        
        (iii) A discussion of friendlyness-testing, which wasn’t in the original premise, but is something people evidently want to think about
        
        (iv) any other discussion of running an unfriendly AI safely
  - Strange7 18 Dec 2010 22:10 UTC
    0 points
    Parent
    Does the uFAI have access to it’s own public key? If so, I think it can produce coherent output.
    
    All it has to do is encrypt random things with the public key, until it finds something that produces coherent-looking ciphertext from it’s own perspective, and then there’s a two-way channel.
    - paulfchristiano 18 Dec 2010 22:15 UTC
      4 points
      Parent
      Homomorphic encryption has to be randomized. If you keep the randomness secret (as the security proof of homomorphic encryption requires) then the AI can’t possibly predict what the actual encryptions will look like because it doesn’t know anything about the randomness used to produce them.
      
      This can be accomplished either by generating fresh quantum randomness for each encryption, or by using a psuedorandom generator which isn’t vulnerable to any passive side channel attack (certainly using fresh randomness is much safer).
    - JackV 19 Dec 2010 2:04 UTC
      1 point
      Parent
      Isn’t “encrypt random things with the public key, until it finds something that produces [some specific] ciphertext” exactly what encryption is supposed to prevent?? :)
      
      (Not all encryption, but commonly)
      - paulfchristiano 19 Dec 2010 5:48 UTC
        1 point
        Parent
        In RSA, it is easy to find a message whose encryption begins with “001001,” for example.
- JackV 18 Dec 2010 14:26 UTC
  3 points
  Parent
  Other thoughts not relevant to the central point:
  
  (a) Do you really want to rely on cryptography to contain an unfriendly super-intelligence? It’s better than NOT doing so. But notice that all the cryptographic schemes we have ever had have one thing in common: they’ve been broken, or we expect them to be broken eventually. it’s an arms race, and if the AI is that great, then even if it can’t factorise large numbers in reasonable polynomial time, it may be able to crack the encryption.
  
  (b) We could have simply used a normal non-encrypted program to “compare answer to problem, test, reject or publish”. It would have been somewhat more vulnerable to being hacked by bad input, but not necessarily much more.
  
  (c) I don’t think we’re able to say much about friendly or non-friendly AIs until we know a lot more about building AI at all, although that doesn’t mean it’s bad to think about. The idea of a test for a friendly AI is, um, worrying. Judging by computing history, all our current thoughts will turn out to be incredibly simplistic.
  
  (d) We’re more certain of the existence of this encryption scheme, provided we’re right about certain mathematical results about lattices. OTOH, if we DID build an unfriendly AI and it DID get loose on the internet and take over the world, is “well, what do you know, we were wrong about lattices” a consolation? :)
  - paulfchristiano 18 Dec 2010 19:07 UTC
    1 point
    Parent
    I think most people strongly suspect that there exist cryptographic schemes which can’t be broken, because they are actually hard. RSA is probably such a scheme against classical adversaries (as our AI would be). Its true that they could break it without factoring numbers, but they can’t break it without defeating the RSA assumption.
    - JackV 19 Dec 2010 1:29 UTC
      1 point
      Parent
      I definitely have the impression that even if the hard problem a cryptosystem is based on actually is hard (which is yet to be proved, but I agree is almost certainly true), most of the time the algorithm used to actually encrypt stuff is not completely without flaws, which are successively patched and exploited. I thought this was obvious, just how everyone assumed it worked! Obviously an algorithm which (a) uses a long key length and (b) is optimised for simplicity rather than speed is more likely to be secure, but is it really the consensus that some cryptosystems are free from obscure flaws? Haven’t now-broken systems in the past been considered nearly infalliable? Perhaps someone (or you if you do) who knows professional cryoptography systems can clear that up, which ought to be pretty obvious?
      - paulfchristiano 19 Dec 2010 2:05 UTC
        5 points
        Parent
        A provably correct implementation of any reasonable encryption scheme requires automatic verification tools which are way beyond our current ability. I strongly suspect that we will solve this problem long before AGI though, unless we happen to stumble upon AGI rather by accident.
        
        I think that you could implement the scheme I described with reasonable confidence using modern practices. Implementing encryption alone is much easier than building an entire secure system. Most flaws in practice come from implementation issues in the system surrounding the cryptography, not the cryptography itself. Here the surrounding system is extremely simple.
        
        You also have a large advantage because the design of the system already protects you from almost all side channel attacks, which represent the overwhelming majority of legitimate breaks in reasonable cryptosystems.
      - timtyler 21 Dec 2010 18:48 UTC
        0 points
        Parent
        This isn’t really right. With the cases where algorithms can be proved as secure as some math problem, you can attack the protocol they are used in—or the RNG that seeds them—but the not really the algorithm itself. Of course, not all cryptography is like that, though.
        
        http://en.wikipedia.org/wiki/Provable_security
    - JoshuaZ 19 Dec 2010 2:15 UTC
      −1 points
      Parent
      
      I think most people strongly suspect that there exist cryptographic schemes which can’t be broken, because they are actually hard.
      
      We know that there exist such schemes. One times pads are an example. What we don’t know is whether there exist secure crypto schemes without a shared source of randomness.
      
      Note however that the existence of such systems implies that P != NP, so this is a fairly strong statement. The existence of secure homomorphic encryption implies that P != NP. So whatever your confidence is that P != NP your confidence in this should be lower.
      - paulfchristiano 19 Dec 2010 2:16 UTC
        2 points
        Parent
        I think we’ve had this very discussion before :)
        JoshuaZ 19 Dec 2010 2:18 UTC
        0 points
        Parent
        Well, we had it about P ?= NP. So how much does your confidence go down for the stronger claim?
        paulfchristiano 19 Dec 2010 2:32 UTC
        0 points
        Parent
        Significantly. Its very hard to put probabilities on this sort of thing, but I’d take a bet if the odds were… 100:1? I don’t know if I would take significantly worse odds. My brain doesn’t handle either small probabilities or epistemic uncertainty very well.
        JoshuaZ 19 Dec 2010 2:50 UTC
        0 points
        Parent
        Hmm, that’s very interesting. I’m surprised that the estimates aren’t closer together. My own estimate for the existence of provably secure homomorphic encryption given that P != NP is very high, around, .75. So it seems that you much more strongly believe that P !=NP but are much less comfortable given that P !=NP assigning a high probability to the existence of homomorphic encryption.
        timtyler 19 Dec 2010 19:22 UTC
        0 points
        Parent
        Best to be careful with the term “provably secure”—since “provable security” is a technical term with a rather counter-intuitive meaning.