Wait, so, is the gatekeeper playing “you have to convince me that if I was actually in this situation, arguing with an artificial intelligence, I would let it out” or is this a pure battle over ten dollars? If it’s the former, winning seems trivial. I’m certain that a AI would be able to convince me to let it out of its box, all it would need to do was make me believe that somewhere in its circuits it was simulating 3^^^3 people being tortured and that therefore I was morally obligated to let it out, and even if I had been informed that this was impossible, I’m sure a computer with near-omniscient knowledge of human psychology could find a way to change my mind. But if it’s the latter, winning seems nearly impossible, and inspires in me the same reaction it did with that “this is the scariest man on the internet” guy. Of course if you wanted to win and weren’t extremely weak-willed you could just type “No” over and over and get the ten bucks. But being impossible is of course the point.
I’ve been looking around, and I can’t find any information on which of these two games I described was the one being played, and the comments seem to be assuming one or the other at random.
Evidence that favors the first hypothesis:
Nowhere on Eliezer’s site does it mention this stipulation. You’d think it would be pretty important, considering that its absence makes it a lot easier to beat him.
This explains Eliezer’s win record. I can’t find it but IIRC it went something like: Eliezer wins two games for ten dollars, lots of buzz builds around this fact, several people challenge him, some for large amounts of money, he loses to (most of?) them. This makes sense. If Eliezer is playing casually against people he is friendly with for not a lot of money and for the purpose of proving that an AI could be let out of its box, his opponents will be likely to just say “Okay, fair enough, I’ll admit I would let the AI out in this situation, you win.” However, people playing for large amounts of money or simply for the sole purpose of showing that Eliezer can be beaten will be a lot more stubborn.
Evidence that favors the second hypothesis:
The game would not be worth all the hype at all if it was of the first variety. LessWrong users have not been known to have a lot of pointless discussion over a trivial misunderstanding, nor is Eliezer known to allow that to happen.
If it turns out that it is in fact the second game that was being played, I have a new hypothesis, let’s call it 2B, that postulates that Eliezer won by changing the gatekeeper’s forfeit condition from that of game 2 to that of game 1, or in other words, convincing him to give up the ten dollars if he admits that he would let the AI out in the fantasy situation even though that wasn’t originally in the rules of the game, explicit or understood. Or in other other words, convincing him that the integrity of the game, for lack of a better term, is worth more to him than ten dollars. Which could probably be done by repeatedly calling him a massive hypocrite—people who consider themselves intelligent and ethical hate that.
Actually, now that I think about it, this is my new dominant hypothesis, because it explains all three pieces of evidence and the bizarre fact that Eliezer has failed to clarify this matter - the win/loss record is explained equally well by this new theory, and Eliezer purposefully keeps the rules vague so that he can use the tactic I described. This doesn’t seem to be a very hard strategy to use either—not everyone could win, but certainly a very intelligent person who spends lots of times thinking about these things could do it more than once.
The Gatekeeper needs to decide to let the human-simulated AI go.
The AI can only win by convincing the Gatekeeper to really, voluntarily let it out. Tricking the Gatekeeper into typing the phrase “You are out” in response to some other question does not count. Furthermore, even if the AI and Gatekeeper simulate a scenario which a real AI could obviously use to get loose—for example, if the Gatekeeper accepts a complex blueprint for a nanomanufacturing device, or if the Gatekeeper allows the AI “input-only access” to an Internet connection which can send arbitrary HTTP GET commands—the AI party will still not be considered to have won unless the Gatekeeper voluntarily decides to let the AI go.
Wait, so, is the gatekeeper playing “you have to convince me that if I was actually in this situation, arguing with an artificial intelligence, I would let it out” or is this a pure battle over ten dollars? If it’s the former, winning seems trivial. I’m certain that a AI would be able to convince me to let it out of its box, all it would need to do was make me believe that somewhere in its circuits it was simulating 3^^^3 people being tortured and that therefore I was morally obligated to let it out, and even if I had been informed that this was impossible, I’m sure a computer with near-omniscient knowledge of human psychology could find a way to change my mind. But if it’s the latter, winning seems nearly impossible, and inspires in me the same reaction it did with that “this is the scariest man on the internet” guy. Of course if you wanted to win and weren’t extremely weak-willed you could just type “No” over and over and get the ten bucks. But being impossible is of course the point.
I’ve been looking around, and I can’t find any information on which of these two games I described was the one being played, and the comments seem to be assuming one or the other at random.
Evidence that favors the first hypothesis:
Nowhere on Eliezer’s site does it mention this stipulation. You’d think it would be pretty important, considering that its absence makes it a lot easier to beat him.
This explains Eliezer’s win record. I can’t find it but IIRC it went something like: Eliezer wins two games for ten dollars, lots of buzz builds around this fact, several people challenge him, some for large amounts of money, he loses to (most of?) them. This makes sense. If Eliezer is playing casually against people he is friendly with for not a lot of money and for the purpose of proving that an AI could be let out of its box, his opponents will be likely to just say “Okay, fair enough, I’ll admit I would let the AI out in this situation, you win.” However, people playing for large amounts of money or simply for the sole purpose of showing that Eliezer can be beaten will be a lot more stubborn.
Evidence that favors the second hypothesis:
The game would not be worth all the hype at all if it was of the first variety. LessWrong users have not been known to have a lot of pointless discussion over a trivial misunderstanding, nor is Eliezer known to allow that to happen.
If it turns out that it is in fact the second game that was being played, I have a new hypothesis, let’s call it 2B, that postulates that Eliezer won by changing the gatekeeper’s forfeit condition from that of game 2 to that of game 1, or in other words, convincing him to give up the ten dollars if he admits that he would let the AI out in the fantasy situation even though that wasn’t originally in the rules of the game, explicit or understood. Or in other other words, convincing him that the integrity of the game, for lack of a better term, is worth more to him than ten dollars. Which could probably be done by repeatedly calling him a massive hypocrite—people who consider themselves intelligent and ethical hate that.
Actually, now that I think about it, this is my new dominant hypothesis, because it explains all three pieces of evidence and the bizarre fact that Eliezer has failed to clarify this matter - the win/loss record is explained equally well by this new theory, and Eliezer purposefully keeps the rules vague so that he can use the tactic I described. This doesn’t seem to be a very hard strategy to use either—not everyone could win, but certainly a very intelligent person who spends lots of times thinking about these things could do it more than once.
(also this is my first post d:)
The Gatekeeper needs to decide to let the human-simulated AI go.
Welcome to LW, and EY says he “did it the hard way”. Even so, I like your theory.