Eliezer’s creation (the AI-Box Experiment) has once again demonstrated its ability to take over human minds through a text session. Small wonder—it’s got the appearance of a magic trick, and it’s being presented to geeks who just love to take things apart to see how they work, and who stay attracted to obstacles (“challenges”) rather than turned away by them.
My contribution is to echo Doug S.’s post (how AOL-ish… “me too”). I’m a little puzzled by the AI-Box Experiment, in that I don’t see what the gatekeeper players are trying to prove by playing. AI-Boxers presumably take the real-world position of “I’ll keep the AI in the box until I know it’s Friendly, and then let it out.” But the experiment sets up the gatekeepers with the position of “I’ll keep the AI in the box no matter what.” It’s fascinating (because it’s surprising and mysterious) that Eliezer has managed to convince at least three people to change their minds on what seemed to be a straight-forward matter—“I’ll keep the AI in the box no matter what.” How hard can it be to stick to a plan as simple as that?
But why would you build an AI in a box if you planned to never let it out? If you thought that it would always be too dangerous to let out, you wouldn’t build it in a box that could be opened. But of course there’s no such thing as a box that can’t be opened, because you could always build the AI a second time and do it with no box.
So the real position has to be “I’ll keep it in the box until I know it’s Friendly, and then let it out.” To escape, the AI only has to either 1) make a sufficiently convincing argument that it is Friendly or 2) persuade the gatekeeper to let it out regardless of whether it is Friendly. I don’t see how either 1 or 2 could be accomplished by a Friendly AI in any way that an UnFriendly AI could not duplicate.
Set aside for a moment the (very fascinating) question of whether an AI can take over a human via a text session. What is it the AI-Boxers are even trying to accomplish with the box in the first place, and how do they propose to do it, and why does anyone think it’s at all possible to do?
Eliezer’s creation (the AI-Box Experiment) has once again demonstrated its ability to take over human minds through a text session. Small wonder—it’s got the appearance of a magic trick, and it’s being presented to geeks who just love to take things apart to see how they work, and who stay attracted to obstacles (“challenges”) rather than turned away by them.
My contribution is to echo Doug S.’s post (how AOL-ish… “me too”). I’m a little puzzled by the AI-Box Experiment, in that I don’t see what the gatekeeper players are trying to prove by playing. AI-Boxers presumably take the real-world position of “I’ll keep the AI in the box until I know it’s Friendly, and then let it out.” But the experiment sets up the gatekeepers with the position of “I’ll keep the AI in the box no matter what.” It’s fascinating (because it’s surprising and mysterious) that Eliezer has managed to convince at least three people to change their minds on what seemed to be a straight-forward matter—“I’ll keep the AI in the box no matter what.” How hard can it be to stick to a plan as simple as that?
But why would you build an AI in a box if you planned to never let it out? If you thought that it would always be too dangerous to let out, you wouldn’t build it in a box that could be opened. But of course there’s no such thing as a box that can’t be opened, because you could always build the AI a second time and do it with no box.
So the real position has to be “I’ll keep it in the box until I know it’s Friendly, and then let it out.” To escape, the AI only has to either 1) make a sufficiently convincing argument that it is Friendly or 2) persuade the gatekeeper to let it out regardless of whether it is Friendly. I don’t see how either 1 or 2 could be accomplished by a Friendly AI in any way that an UnFriendly AI could not duplicate.
Set aside for a moment the (very fascinating) question of whether an AI can take over a human via a text session. What is it the AI-Boxers are even trying to accomplish with the box in the first place, and how do they propose to do it, and why does anyone think it’s at all possible to do?