It’s clear that in such a system, the ‘weak point’ would be the people in control of the private key.
If the AI is out of the box, I don’t think humans are the weak point.
Humans physically do something when they reward the AI. To get a reward, the AI has only to figure out what the humans would physically do and mimic that itself. If the human reward the AI by pressing a big red button, then the AI can just kill the human and press the big red button itself. It wouldn’t matter if the big red button uses 512 bit elliptic curve cryptography—the AI just has to find a paperweight and put it on the button.
If humans can perform RSA encryption#Encryption) silently in their heads, then you might be on to something. A human could memorize a private key and produce a cryptographically signed reward for the AI when the human deemed the AI worthy. The AI would not know the private key, would not be able to produce signed rewards, and would not be able to mimic humans. This setup works because it is cryptographically difficult to mimic a human doing RSA. But if the human did not perform the cryptography silently in their head, it would not be cryptographically difficult to mimic their rewarding behavior.
But I doubt that humans could perform RSA or elliptic curve cryptography mentally. Unless humans can compute trapdoor functions silently in their head, I don’t see how public key cryptography could buy you anything over a reward-button.
For the same reason that Linus Torvalds is not able to hack every single Linux system in existence, it is reasonable to assume that the probability of ‘backdoor’-type attacks could be reduced or eliminated.
If you’re not talking about a back door, then I’m not sure what you’re trying to say here. Are you implying that the AI will find the ‘reward producing machine’ and somehow use it to produce rewards? It doesn’t work that way because the machine would still need the key to produce a cryptographically-valid reward, and the key would not be stored on the machine. That’s the whole point of using cryptography. For instance, you could do something like http://en.wikipedia.org/wiki/Secret_sharing where the key is divided among several participants and never actually assembled in one place.
Of course any cryptography system has vulnerabilities. The point is not to build a 100% secure system. The point is to make it so that attempting to hack the system has less expected utility than simply doing what the humans say. And if that fails, then the AI will attempt to hack the system using its weakest point: the people controlling the key. Hence my question.
If you’re not talking about a back door, then I’m not sure what you’re trying to say here.
Yeah, we’re talking past each other. I think I understand what you’re saying, and I’ll try to rephrase what I’m saying.
The AI is out. It is free to manipulate the world at its will. Sensors are everywhere. The AI can hear every word you say, feel every keystroke you make, and see everything you see. The only secrets left are the ones in your head.
How do humans reward the AI? You say “cryptographically”, but cryptography requires difficult arithmetic. How do you perform difficult arithmetic on a secret that can’t leave your head?
Too many assumptions are being made here. What is the basis for believing the AI will have sensors everywhere, especially while it’s still under human control? And if it has the ability to put clandestine sensors in even the most secure locations, why couldn’t it plant clandestince brain implants in the people controlling the key?
If the AI is out of the box, I don’t think humans are the weak point.
Humans physically do something when they reward the AI. To get a reward, the AI has only to figure out what the humans would physically do and mimic that itself. If the human reward the AI by pressing a big red button, then the AI can just kill the human and press the big red button itself. It wouldn’t matter if the big red button uses 512 bit elliptic curve cryptography—the AI just has to find a paperweight and put it on the button.
If humans can perform RSA encryption#Encryption) silently in their heads, then you might be on to something. A human could memorize a private key and produce a cryptographically signed reward for the AI when the human deemed the AI worthy. The AI would not know the private key, would not be able to produce signed rewards, and would not be able to mimic humans. This setup works because it is cryptographically difficult to mimic a human doing RSA. But if the human did not perform the cryptography silently in their head, it would not be cryptographically difficult to mimic their rewarding behavior.
But I doubt that humans could perform RSA or elliptic curve cryptography mentally. Unless humans can compute trapdoor functions silently in their head, I don’t see how public key cryptography could buy you anything over a reward-button.
For the same reason that Linus Torvalds is not able to hack every single Linux system in existence, it is reasonable to assume that the probability of ‘backdoor’-type attacks could be reduced or eliminated.
If you’re not talking about a back door, then I’m not sure what you’re trying to say here. Are you implying that the AI will find the ‘reward producing machine’ and somehow use it to produce rewards? It doesn’t work that way because the machine would still need the key to produce a cryptographically-valid reward, and the key would not be stored on the machine. That’s the whole point of using cryptography. For instance, you could do something like http://en.wikipedia.org/wiki/Secret_sharing where the key is divided among several participants and never actually assembled in one place.
Of course any cryptography system has vulnerabilities. The point is not to build a 100% secure system. The point is to make it so that attempting to hack the system has less expected utility than simply doing what the humans say. And if that fails, then the AI will attempt to hack the system using its weakest point: the people controlling the key. Hence my question.
Yeah, we’re talking past each other. I think I understand what you’re saying, and I’ll try to rephrase what I’m saying.
The AI is out. It is free to manipulate the world at its will. Sensors are everywhere. The AI can hear every word you say, feel every keystroke you make, and see everything you see. The only secrets left are the ones in your head.
How do humans reward the AI? You say “cryptographically”, but cryptography requires difficult arithmetic. How do you perform difficult arithmetic on a secret that can’t leave your head?
Too many assumptions are being made here. What is the basis for believing the AI will have sensors everywhere, especially while it’s still under human control? And if it has the ability to put clandestine sensors in even the most secure locations, why couldn’t it plant clandestince brain implants in the people controlling the key?