I mean, it can’t hack out of the box. While the training process is encrypted, the AI is screened off from the rest of reality, and it can’t do anything without breaking the encryption (its computations can’t be externally distinguished from any other kind of homomorphically encrypted computation, so under homomorphic encryption, running it is as safe as summing up random numbers).
The problem is that it can still seize full control over the insides of the box, and we’re presumably going to interact with them at some point. And then, the AI will 1) show us whatever we need to see to open the box more, if we’re limiting the bandwidth at the level of hardware, then 2) break out once we eventually give it a high-bandwidth channel.
No, inside the box as in inside the encrypted computation. The AI’s architecture, the training setup, whatever regularizers/interpreters we have as part of that, but nothing beyond this.
Well, the idea is that we basically give the code of the AI architecture and training loop as input to e. g. a Lisp interpreter, in an encrypted form. The interpreter would then execute that program without understanding it — i. e., directly derive the encrypted versions of the next steps of that program. The computations being encrypted means that we can’t distinguish them from any other kind of encrypted computations, so whatever clever strategy the AI is trying to try and break out, it won’t look any different to us than an AI that isn’t trying to break out. As such, hacking out is impossible without breaking encryption.
Another angle: If you think about it from the point of view inside the homomorphically encrypted computation, it’s as if the rest of the world was encrypted. So to somehow cleverly manipulate parts of the outside (e. g., hack the system running it), the AI would need to decrypt them without the secret key, i. e. break encryption. It can still take over everything inside the encrypted computation, but everything beyond it (starting at the Lisp interpreter) is inaccessible.
I mean, it can’t hack out of the box. While the training process is encrypted, the AI is screened off from the rest of reality, and it can’t do anything without breaking the encryption (its computations can’t be externally distinguished from any other kind of homomorphically encrypted computation, so under homomorphic encryption, running it is as safe as summing up random numbers).
The problem is that it can still seize full control over the insides of the box, and we’re presumably going to interact with them at some point. And then, the AI will 1) show us whatever we need to see to open the box more, if we’re limiting the bandwidth at the level of hardware, then 2) break out once we eventually give it a high-bandwidth channel.
No, inside the box as in inside the encrypted computation. The AI’s architecture, the training setup, whatever regularizers/interpreters we have as part of that, but nothing beyond this.
Well, the idea is that we basically give the code of the AI architecture and training loop as input to e. g. a Lisp interpreter, in an encrypted form. The interpreter would then execute that program without understanding it — i. e., directly derive the encrypted versions of the next steps of that program. The computations being encrypted means that we can’t distinguish them from any other kind of encrypted computations, so whatever clever strategy the AI is trying to try and break out, it won’t look any different to us than an AI that isn’t trying to break out. As such, hacking out is impossible without breaking encryption.
Another angle: If you think about it from the point of view inside the homomorphically encrypted computation, it’s as if the rest of the world was encrypted. So to somehow cleverly manipulate parts of the outside (e. g., hack the system running it), the AI would need to decrypt them without the secret key, i. e. break encryption. It can still take over everything inside the encrypted computation, but everything beyond it (starting at the Lisp interpreter) is inaccessible.
Cool! I’ve had a similar idea. You should consider submitting it to superlinear.