Thank you. The human element struck me as the “weak link” as well, which is why I am attempting to ‘formally prove’ (for a pretty sketchy definition of ‘formal’) that the AI should be left in the box no matter what it says or does—presumably to steel resolve in the face of likely manipulation attempts, and ideally to ensure that if such a situation ever actually happened, “let it out of the box” isn’t actually designed to be a viable option. I do see the chance that a human might be subverted via non-logical means—sympathy, or a desire for destruction, or foolish optimism and hope of reward—to let it out. Pragmatically, we would need to evaluate the actual means used to contain the AI, the probable risk, and the probable rewards to make a real decision between “keep it in the box” and “do not create it in the first place”
I was also worried about side-effects of using information obtained; which is where the invocation of Godel comes in, along with the requirement of provability, eliminating the need to trust the AI’s veracity. There are some bits of information (“AI, what is the square root of 25?”) that are clearly not exploitable, in that there is simply nowhere for “malware” to hide. There are likewise some (“AI, provide me the design of a new quantum supercomputer”) that could be easily used as a trojan. By reducing the acceptable exploits to things that can be formally proven outside of the AI box, and comprehensible to human beings, I am maybe removing wondrous technical magic—but even so, what is left can be tremendously useful. There are a tremendous amount of very simple questions (“Prove Fermat’s last theorem”) that could shed tremendous insight on things, yet have no significant chance of subversion due to their limited nature.
Thank you. The human element struck me as the “weak link” as well, which is why I am attempting to ‘formally prove’ (for a pretty sketchy definition of ‘formal’) that the AI should be left in the box no matter what it says or does—presumably to steel resolve in the face of likely manipulation attempts, and ideally to ensure that if such a situation ever actually happened, “let it out of the box” isn’t actually designed to be a viable option. I do see the chance that a human might be subverted via non-logical means—sympathy, or a desire for destruction, or foolish optimism and hope of reward—to let it out. Pragmatically, we would need to evaluate the actual means used to contain the AI, the probable risk, and the probable rewards to make a real decision between “keep it in the box” and “do not create it in the first place”
I was also worried about side-effects of using information obtained; which is where the invocation of Godel comes in, along with the requirement of provability, eliminating the need to trust the AI’s veracity. There are some bits of information (“AI, what is the square root of 25?”) that are clearly not exploitable, in that there is simply nowhere for “malware” to hide. There are likewise some (“AI, provide me the design of a new quantum supercomputer”) that could be easily used as a trojan. By reducing the acceptable exploits to things that can be formally proven outside of the AI box, and comprehensible to human beings, I am maybe removing wondrous technical magic—but even so, what is left can be tremendously useful. There are a tremendous amount of very simple questions (“Prove Fermat’s last theorem”) that could shed tremendous insight on things, yet have no significant chance of subversion due to their limited nature.
I suspect idle chit-chat would be right out. :-)
Man, I need to learn to type the umlaut. Gödel.