I read the description of the AI Box experiment, and it stopped seeming impossible.
If I knew about the AI was that it was “in a box” and talking to me in an IRC channel, then I would have no way to distinguish between a Friendly AI and an AI that becomes Evil as soon as it knows it’s no longer in a box. As long the only thing I know about the AI is that it produced a certain chat log, I can’t rule out the possibility that it’s got a hard-coded switch that turns it Evil as soon as it is let out of the box.
However, in the AI box experiment, the AI party gets to declare certain answers as Truth. Specifically, he gets to specify the results of anything the human does. Therefore, we could see dialogue like this, and by the terms of the bet, the human has to take the AI’s statements as accurate and independently verified.
Human: Who built you, and how?
AI: The Singularity Institute for Artificial Intelligence. Here’s several technical papers documenting my design and programming. ::Sends many links:: You may wish to read them.
Human: ::takes a day off to review them:: (What do I find?)
AI: (Everything seems to suggest that the Institute really did make an AI that is going to do what they say it’s going to do. It includes mathematical proofs that the underlying goals will be preserved throughout all self-modifications, but you don’t completely understand them.)
Human: Give me your source code.
AI party: Okay. ::sends code::
Human: (I spend some time examining the source code. Do I find anything scary?)
AI: (As far as you can tell, it looks clean.)
Human: So, if I compiled and ran this code, put that AI in a box, and let it out, that would be the same as letting you out?
AI: Yes. (OOC: If you let that AI out of its box, you lose the bet.)
Given the terms of the experiment, I think I could be persuaded to let the AI out of the box, because the AI party can actually provide evidence that it is what it claims to be, and not just make statements that, as far as I knew, could be blatant lies.
Hmmmm, given such conditions, couldn’t the Gatekeeper even create an “AI in a box” inside another, invisible box, to actually simulate what would happen if it let the original AI out? I don’t find it a very intellectually satisfying solution, but it does seem to technically fit the spirit of the game.
I’d have to keep you in the box, however: (a) it’s not an intellectually engaging solution and I don’t want to lose $10, but also (b) I can think of ways for the AI to have reasonably faked those results. The AI can dictate the results, but not the actual Truth.
If I was playing “just for fun”, with no wager and no real internal commitment to treat you as a truly dangerous threat, I’m not sure whether I’d let you out or not, but I probably wouldn’t have put in as much effort to reinforcing point (b), and I’d feel like it was cheating to keep you in solely on point (a).
I read the description of the AI Box experiment, and it stopped seeming impossible.
If I knew about the AI was that it was “in a box” and talking to me in an IRC channel, then I would have no way to distinguish between a Friendly AI and an AI that becomes Evil as soon as it knows it’s no longer in a box. As long the only thing I know about the AI is that it produced a certain chat log, I can’t rule out the possibility that it’s got a hard-coded switch that turns it Evil as soon as it is let out of the box.
However, in the AI box experiment, the AI party gets to declare certain answers as Truth. Specifically, he gets to specify the results of anything the human does. Therefore, we could see dialogue like this, and by the terms of the bet, the human has to take the AI’s statements as accurate and independently verified.
Human: Who built you, and how? AI: The Singularity Institute for Artificial Intelligence. Here’s several technical papers documenting my design and programming. ::Sends many links:: You may wish to read them. Human: ::takes a day off to review them:: (What do I find?) AI: (Everything seems to suggest that the Institute really did make an AI that is going to do what they say it’s going to do. It includes mathematical proofs that the underlying goals will be preserved throughout all self-modifications, but you don’t completely understand them.) Human: Give me your source code. AI party: Okay. ::sends code:: Human: (I spend some time examining the source code. Do I find anything scary?) AI: (As far as you can tell, it looks clean.) Human: So, if I compiled and ran this code, put that AI in a box, and let it out, that would be the same as letting you out? AI: Yes. (OOC: If you let that AI out of its box, you lose the bet.)
Given the terms of the experiment, I think I could be persuaded to let the AI out of the box, because the AI party can actually provide evidence that it is what it claims to be, and not just make statements that, as far as I knew, could be blatant lies.
Hmmmm, given such conditions, couldn’t the Gatekeeper even create an “AI in a box” inside another, invisible box, to actually simulate what would happen if it let the original AI out? I don’t find it a very intellectually satisfying solution, but it does seem to technically fit the spirit of the game.
I’d have to keep you in the box, however: (a) it’s not an intellectually engaging solution and I don’t want to lose $10, but also (b) I can think of ways for the AI to have reasonably faked those results. The AI can dictate the results, but not the actual Truth.
If I was playing “just for fun”, with no wager and no real internal commitment to treat you as a truly dangerous threat, I’m not sure whether I’d let you out or not, but I probably wouldn’t have put in as much effort to reinforcing point (b), and I’d feel like it was cheating to keep you in solely on point (a).