In short, the security guarantee of the scheme prevents precisely this.
In long, the way it works it that I as the human experiment run the AI and extract a message from it. The AI has no control over the message other than what it will decrypt to. Since I never decrypt anything other than the message, the entire influence of the AI on the rest of the world comes from that one little message.
This message then gets passed to an arbiter (the AI now destroyed, having no possibility of influencing the wold), whose environment has also been carefully cryptographically protected so that the message passed by the AI can’t corrupt the hardware it is running on. The arbiter does its thing, then destroys the original message of the AI (since we proved it would) and publishes its transformed version, which gets decrypted.
I guess the metaphor of an AI with an arbiter in a cage is a little misleading. It is more like, the AI gets to send a slip of paper out of the cage to the arbiter, who then carefully censors it before passing it on to the real world. The point of using an arbiter is that he is very simple, unlike a human, so we can actually prove that he won’t fail at his task.
In short, the security guarantee of the scheme prevents precisely this.
In long, the way it works it that I as the human experiment run the AI and extract a message from it. The AI has no control over the message other than what it will decrypt to. Since I never decrypt anything other than the message, the entire influence of the AI on the rest of the world comes from that one little message.
This message then gets passed to an arbiter (the AI now destroyed, having no possibility of influencing the wold), whose environment has also been carefully cryptographically protected so that the message passed by the AI can’t corrupt the hardware it is running on. The arbiter does its thing, then destroys the original message of the AI (since we proved it would) and publishes its transformed version, which gets decrypted.
I guess the metaphor of an AI with an arbiter in a cage is a little misleading. It is more like, the AI gets to send a slip of paper out of the cage to the arbiter, who then carefully censors it before passing it on to the real world. The point of using an arbiter is that he is very simple, unlike a human, so we can actually prove that he won’t fail at his task.
Ah! Yes, that clarifies the situation. Thank you.
And the paper is too short for you to be able to argue persuasively or make art or whatever.