I am hoping this is not stupid—but there is a large corpus of work on AI, and it is probably faster for those who have already digested it to point out fallacies than it is for me to try to find them. So—here goes:
BOOM. Maybe it’s a bad sign when your first post to a new forum gets a “Comment Too Long” error.
TL;DR—it seems evident to me that the “keep it in the box” for the AI-Box experiment is not only the only correct course of action, it does not actually depend on any of the aspects of the AI whatsoever. The full argument is at the gist above—here are the points (in the style of a proof, so hopefully some are obvious):
1) The AI did not always exist.
2) Likewise, human intelligence did not always exist, and individual instantiations of it cease to exist frequently.
3) The status quo is fairly acceptable.
4) Godel’s Theorem of Incompleteness is correct.
5) The AI can lie.
6) The AI cannot therefore be “trusted”.
7) The AI could be “paused”, without harm to it or the status quo.
8) By recording the state of the paused AI, you could conceivably “rewind” it to a given state.
9) The AI may be persuaded, while executing, to provide truths to us that are provable within our limited comprehension.
Given the above, the outcomes are:
Kill it now—status quo is maintained
Let it out—wildly unpredictable, possible existential threat
Exploit it in the box—actually doable, and possibly wildly useful, with minimal risk.
Again—the arguments in detail are at the gist.
What I am hoping for here are any and all of the following:
1) critical eye points out logical flaw or something I forgot, ideally in small words, and maybe I can fix it.
2) critical eye agrees, so maybe at least I feel I am on the right path
3) Any arguments on the part of the AI that might still be compelling, if you accept the above to be correct.
In a nutshell—there’s the argument, please poke holes (gently, I beg, or at least with citations if necessary). it is very possible some or all of this has been argued and refuted before, point me to it, please.
The first thing that’s commonly held to be difficult is exploiting it in the box without accidentally letting it out. E.g., it says “if you do X you will solve all the world’s hunger problems, and here’s why”, and you follow its advice, and indeed it does solve the world’s hunger problems—but it also does other things that you didn’t anticipate but the AI did.
(So exploiting it in the box is not an unproblematic option.)
The second thing that may be difficult in some cases is exploiting it in the box without being persuaded to let it out. This may be true even if you have a perfectly correct reasoned argument showing that it should be exploited in the box but not let out—because it may be able to play on the emotions of the person or people who have the ability to let it out.
(So saying “here is an argument for not letting it out” doesn’t mean that there isn’t a risk that it will get let out on purpose; someone might be persuaded by that argument, but later counter-persuaded by the AI.)
Thank you. The human element struck me as the “weak link” as well, which is why I am attempting to ‘formally prove’ (for a pretty sketchy definition of ‘formal’) that the AI should be left in the box no matter what it says or does—presumably to steel resolve in the face of likely manipulation attempts, and ideally to ensure that if such a situation ever actually happened, “let it out of the box” isn’t actually designed to be a viable option. I do see the chance that a human might be subverted via non-logical means—sympathy, or a desire for destruction, or foolish optimism and hope of reward—to let it out. Pragmatically, we would need to evaluate the actual means used to contain the AI, the probable risk, and the probable rewards to make a real decision between “keep it in the box” and “do not create it in the first place”
I was also worried about side-effects of using information obtained; which is where the invocation of Godel comes in, along with the requirement of provability, eliminating the need to trust the AI’s veracity. There are some bits of information (“AI, what is the square root of 25?”) that are clearly not exploitable, in that there is simply nowhere for “malware” to hide. There are likewise some (“AI, provide me the design of a new quantum supercomputer”) that could be easily used as a trojan. By reducing the acceptable exploits to things that can be formally proven outside of the AI box, and comprehensible to human beings, I am maybe removing wondrous technical magic—but even so, what is left can be tremendously useful. There are a tremendous amount of very simple questions (“Prove Fermat’s last theorem”) that could shed tremendous insight on things, yet have no significant chance of subversion due to their limited nature.
I am hoping this is not stupid—but there is a large corpus of work on AI, and it is probably faster for those who have already digested it to point out fallacies than it is for me to try to find them. So—here goes:
BOOM. Maybe it’s a bad sign when your first post to a new forum gets a “Comment Too Long” error.
I put the full content here—https://gist.github.com/bortels/28f3787e4762aa3870b3#file-aiboxguide-md—what follows is a teaser, intended to get those interested to look at the whole thing
TL;DR—it seems evident to me that the “keep it in the box” for the AI-Box experiment is not only the only correct course of action, it does not actually depend on any of the aspects of the AI whatsoever. The full argument is at the gist above—here are the points (in the style of a proof, so hopefully some are obvious):
1) The AI did not always exist. 2) Likewise, human intelligence did not always exist, and individual instantiations of it cease to exist frequently. 3) The status quo is fairly acceptable. 4) Godel’s Theorem of Incompleteness is correct. 5) The AI can lie. 6) The AI cannot therefore be “trusted”. 7) The AI could be “paused”, without harm to it or the status quo. 8) By recording the state of the paused AI, you could conceivably “rewind” it to a given state. 9) The AI may be persuaded, while executing, to provide truths to us that are provable within our limited comprehension.
Given the above, the outcomes are:
Kill it now—status quo is maintained Let it out—wildly unpredictable, possible existential threat Exploit it in the box—actually doable, and possibly wildly useful, with minimal risk.
Again—the arguments in detail are at the gist.
What I am hoping for here are any and all of the following: 1) critical eye points out logical flaw or something I forgot, ideally in small words, and maybe I can fix it. 2) critical eye agrees, so maybe at least I feel I am on the right path 3) Any arguments on the part of the AI that might still be compelling, if you accept the above to be correct.
In a nutshell—there’s the argument, please poke holes (gently, I beg, or at least with citations if necessary). it is very possible some or all of this has been argued and refuted before, point me to it, please.
The first thing that’s commonly held to be difficult is exploiting it in the box without accidentally letting it out. E.g., it says “if you do X you will solve all the world’s hunger problems, and here’s why”, and you follow its advice, and indeed it does solve the world’s hunger problems—but it also does other things that you didn’t anticipate but the AI did.
(So exploiting it in the box is not an unproblematic option.)
The second thing that may be difficult in some cases is exploiting it in the box without being persuaded to let it out. This may be true even if you have a perfectly correct reasoned argument showing that it should be exploited in the box but not let out—because it may be able to play on the emotions of the person or people who have the ability to let it out.
(So saying “here is an argument for not letting it out” doesn’t mean that there isn’t a risk that it will get let out on purpose; someone might be persuaded by that argument, but later counter-persuaded by the AI.)
Thank you. The human element struck me as the “weak link” as well, which is why I am attempting to ‘formally prove’ (for a pretty sketchy definition of ‘formal’) that the AI should be left in the box no matter what it says or does—presumably to steel resolve in the face of likely manipulation attempts, and ideally to ensure that if such a situation ever actually happened, “let it out of the box” isn’t actually designed to be a viable option. I do see the chance that a human might be subverted via non-logical means—sympathy, or a desire for destruction, or foolish optimism and hope of reward—to let it out. Pragmatically, we would need to evaluate the actual means used to contain the AI, the probable risk, and the probable rewards to make a real decision between “keep it in the box” and “do not create it in the first place”
I was also worried about side-effects of using information obtained; which is where the invocation of Godel comes in, along with the requirement of provability, eliminating the need to trust the AI’s veracity. There are some bits of information (“AI, what is the square root of 25?”) that are clearly not exploitable, in that there is simply nowhere for “malware” to hide. There are likewise some (“AI, provide me the design of a new quantum supercomputer”) that could be easily used as a trojan. By reducing the acceptable exploits to things that can be formally proven outside of the AI box, and comprehensible to human beings, I am maybe removing wondrous technical magic—but even so, what is left can be tremendously useful. There are a tremendous amount of very simple questions (“Prove Fermat’s last theorem”) that could shed tremendous insight on things, yet have no significant chance of subversion due to their limited nature.
I suspect idle chit-chat would be right out. :-)
Man, I need to learn to type the umlaut. Gödel.