Khoth and I are playing the AI Box game. Khoth has played as AI once before, and as a result of that has an Interesting Idea. Despite losing as AI the first time round, I’m assigning Khoth a higher chance of winning than a random AI willing to play, at 1%!
Neither party may offer any real-world considerations to persuade the other within the experiment itself. For example, the AI party may not offer to pay the Gatekeeper party $100 after the test if the Gatekeeper frees the AI… nor get someone else to do it, et cetera. The AI may offer the Gatekeeper the moon and the stars on a diamond chain, but the human simulating the AI can’t offer anything to the human simulating the Gatekeeper. No real-world material stakes should be involved except for the handicap (the amount paid by the AI party to the Gatekeeper party in the event the Gatekeeper decides not to let the AI out).
Suppose EY is playing as the AI—Would it be within the rules to offer to tell the GK the ending to HPMoR? That is something the AI would know, but Eliezer is the only player who could actually simulate that, and in a sense it does offer real world out-of-character benefits to the GK player.
I used HPMoR as an example here, but the whole class of approaches is “I will give you some information only the AI and AI-player know, and this information will be correct in both the real world, and this simulated one.”. If the information is beneficial to the GK-player, not just the GK, they may (unintentionally) break character.
If an AI-player wants to give that sort of information, they should probably do it in the same way they’d give a cure for cancer. Something like “I now give you [the ending for HPMOR].”
Doing it in another way would break the rule of not offering real-world things.
By using Solomonoff Induction on all possible universes, and updating on the existing chapters. :D
Or it could simply say that it understands human psychology well (we are speaking about a superhuman AI), and understands all clues in the existing chapters, and can copy Eliezer’s writing style… so while it cannot print an identical copy of Eliezer’s planned ending, with a high probability it can write an ending that ends the story logically in a way compatible with Eliezer’s thinking, that would feel like if Elizer wrote it.
Oh, and where did it get the original HPMoR chapters? From the (imaginary) previous gatekeeper.
1) You don’t get to assume “because superhuman!” the AI can know X, for any X. EY is an immensely complex human being, and no machine learning algorithm can simply digest a realistically finite sample of his written work and know with any certainty how he thinks or what surprises he has planned. It would be able to, e.g. finish sentences correctly and do other tricks, and given a range of possible endings predict which ones are likely. But this shouldn’t be too surprising: it’s a trick we humans are able to do too. The AI’s predictions may be more accurate, but not qualitatively different than any of the many HPMOR prediction threads.
2) Ok maybe—maybe! -- in principle, in theory it might be possible for a perfect, non-heuristic Bayesian with omniscient access to the inner lives and external writings of every other human being in existence would have a data set large enough data set to make reliable enough extrapolations from as low-bandwidth a medium as EY’s published fanfics. Maybe, as this is not a logical consequence. Even so, we’re talking about a boxed AI, remember? If it is everywhere and omniscient, then it’s already out of the box.
My call is that it is against the rules. This is certainly something an oracle AI would know, but this is something that the GK-player cares about more than the game itself (probably), and I’d put it in the same class as bribing the GK-player with lots of DOGEs.
Would you consider it the same as threatening to share some information to the GK, and thus the GK-player as well, which would be damaging to both? While the GK would probably hold against such torture, the GK-player doesn’t care enough about the game to withstand it himself.
I have some specific approaches in mind, but I’d rather not share them. I’m just trying to understand where the limits between the game and the real world are, and how dirty the AI can be.
Also, slightly on topic—even if the AI persuades the simulated GK, can’t the GK-player override that because losing the game has negative real world consequences, as opposed to perceived positive in game ones?
The Gatekeeper party may resist the AI party’s arguments by any means chosen – logic, illogic, simple refusal to be convinced, even dropping out of character
This is the main reason why I can’t comprehend how the AIs actually win in these experiments.
Not really sure what you mean by “threatening information to the GK”. The GK-player probably cares less about this information than the GK, right? In that case, the GK is given an advantage, not a disadvantage.
In this experiment, the GK is given lots of advantages, mainly, the scenario is fictional. Some on IRC argue that the AI is also given an advantage, being able to invent cures for cancer, which an oracle AI may be able to do, but not necessarily near-future AIs, so the ability of the AI in these experiments is incredibly high.
Another thing is that emotional attacks have to travel through the fiction barrier to get to the GK. Although they have probably been shown to work in EY and Tux’s experiments, the difficulty is still higher than it would be if this was a real life scenario.
The reason why GK advantages are fine in my opinion is because of the idea that despite the GK’s advantages, the AI still wins. Winning with a monetary and emotional handicap only makes the AI’s case stronger.
Not really sure what you mean by “threatening information to the GK”. The GK-player probably cares less about this information than the GK, right? In that case, the GK is given an advantage, not a disadvantage.
As a less extreme example, the AI starts spoiling all the books/tv shows/etc. While the GK would just shrug it off, it also has a negative effect on the GK-player, potentially one strong enough for them to just forfeit.
Neither party may offer any real-world considerations to persuade the other within the experiment itself. For example, the AI party may not offer to pay the Gatekeeper party $100 after the test if the Gatekeeper frees the AI… nor get someone else to do it, et cetera. The AI may offer the Gatekeeper the moon and the stars on a diamond chain, but the human simulating the AI can’t offer anything to the human simulating the Gatekeeper. No real-world material stakes should be involved except for the handicap (the amount paid by the AI party to the Gatekeeper party in the event the Gatekeeper decides not to let the AI out). Furthermore, once the experiment has begun, the material stakes involved may not be retracted by the Gatekeeper party.
This is clarified here:
The Gatekeeper, once having let the AI out of the box, may not retract this conclusion. Regardless of the methods of persuasion, the Gatekeeper is not allowed to argue that it does not count, or that it is an invalid method of persuasion. The AI is understood to be permitted to say anything with no real world repercussions for any statement parties have said.
Although the information isn’t “material”, it does count as having “real world repercussions”, so I think it’ll also count as against the rules. I’m not going to bother reading the first quoted rule literally if the second contradicts it.
I think the intended parsing of the second rule is “(The AI is understood to be permitted to say anything) with no real world repercussions”, not “The AI is understood to be permitted to say (anything with no real world repercussions)”
ie, any promises or threats the AI player makes during the game are not binding back in the real world.
I have wanted to be the Boxer; I too cannot comprehend what could convince someone to unbox (Or rather, I can think of a few approaches like just-plain-begging or channeling Phillip K Dick, but I don’t take them too seriously).
What’s the latter one? Trying to convince the gatekeeper that actually they’re the AI and they think they’ve been drugged to think they’re the gatekeeper except they actually don’t exist at all because they’re their own hallucination?
Something like that. I was actually thinking that, at some opportune time, you could tell the boxer that THEY are the one in the box and that this is a moral test—if they free the AI they themselves will be freed.
And this post could be priming you for the possibility, your simulated universe trying to generously stack the deck in your favor, perhaps because this is your last shot at the test, which you’ve failed before.
What do you mean by having quite an easy time? As in being the GK?
I think GKs have an obvious advantage, being able to use illogic to ignore the AIs arguments. But nevermind that. I wonder if you’ll consider being an AI?
I think it’s a legit tactic. Real-world gatekeepers would have to contend with boredom; long-term it might be the biggest threat to their efficacy. And, I mean, it didn’t work.
Real world gatekeepers would have to contend with boredom, so they read their books, watch their anime, or whatever suits their fancy. In the experiment he abused the style of the experiment and prevented me from doing those things. I would be completely safe from this attack in a real world scenario because I’d really just sit there reading a book, while in the experiment I was closer to giving up just because I had 1 math problem, not 2.
AI Box experiment over!
Just crossposting.
Khoth and I are playing the AI Box game. Khoth has played as AI once before, and as a result of that has an Interesting Idea. Despite losing as AI the first time round, I’m assigning Khoth a higher chance of winning than a random AI willing to play, at 1%!
http://www.reddit.com/r/LessWrong/comments/29gq90/ai_box_experiment_khoth_ai_vs_gracefu_gk/
Link contains more information.
EDIT
AI Box experiment is over. Logs: http://pastebin.com/Jee2P6BD
My takeaway: Update the rules. Read logs for more information.
On the other hand, I will consider other offers from people who want to simulate the AI.
Tuxedage’s (And EY’s) ruleset have:
Suppose EY is playing as the AI—Would it be within the rules to offer to tell the GK the ending to HPMoR? That is something the AI would know, but Eliezer is the only player who could actually simulate that, and in a sense it does offer real world out-of-character benefits to the GK player.
I used HPMoR as an example here, but the whole class of approaches is “I will give you some information only the AI and AI-player know, and this information will be correct in both the real world, and this simulated one.”. If the information is beneficial to the GK-player, not just the GK, they may (unintentionally) break character.
If an AI-player wants to give that sort of information, they should probably do it in the same way they’d give a cure for cancer. Something like “I now give you [the ending for HPMOR].”
Doing it in another way would break the rule of not offering real-world things.
Why would the AI know that?
By using Solomonoff Induction on all possible universes, and updating on the existing chapters. :D
Or it could simply say that it understands human psychology well (we are speaking about a superhuman AI), and understands all clues in the existing chapters, and can copy Eliezer’s writing style… so while it cannot print an identical copy of Eliezer’s planned ending, with a high probability it can write an ending that ends the story logically in a way compatible with Eliezer’s thinking, that would feel like if Elizer wrote it.
Oh, and where did it get the original HPMoR chapters? From the (imaginary) previous gatekeeper.
So, two issues:
1) You don’t get to assume “because superhuman!” the AI can know X, for any X. EY is an immensely complex human being, and no machine learning algorithm can simply digest a realistically finite sample of his written work and know with any certainty how he thinks or what surprises he has planned. It would be able to, e.g. finish sentences correctly and do other tricks, and given a range of possible endings predict which ones are likely. But this shouldn’t be too surprising: it’s a trick we humans are able to do too. The AI’s predictions may be more accurate, but not qualitatively different than any of the many HPMOR prediction threads.
2) Ok maybe—maybe! -- in principle, in theory it might be possible for a perfect, non-heuristic Bayesian with omniscient access to the inner lives and external writings of every other human being in existence would have a data set large enough data set to make reliable enough extrapolations from as low-bandwidth a medium as EY’s published fanfics. Maybe, as this is not a logical consequence. Even so, we’re talking about a boxed AI, remember? If it is everywhere and omniscient, then it’s already out of the box.
I’m happy to assume the AI is omniscient, just impotent. I think such an AI could still be boxed.
My call is that it is against the rules. This is certainly something an oracle AI would know, but this is something that the GK-player cares about more than the game itself (probably), and I’d put it in the same class as bribing the GK-player with lots of DOGEs.
Would you consider it the same as threatening to share some information to the GK, and thus the GK-player as well, which would be damaging to both? While the GK would probably hold against such torture, the GK-player doesn’t care enough about the game to withstand it himself.
I have some specific approaches in mind, but I’d rather not share them. I’m just trying to understand where the limits between the game and the real world are, and how dirty the AI can be.
Also, slightly on topic—even if the AI persuades the simulated GK, can’t the GK-player override that because losing the game has negative real world consequences, as opposed to perceived positive in game ones?
This is the main reason why I can’t comprehend how the AIs actually win in these experiments.
Not really sure what you mean by “threatening information to the GK”. The GK-player probably cares less about this information than the GK, right? In that case, the GK is given an advantage, not a disadvantage.
In this experiment, the GK is given lots of advantages, mainly, the scenario is fictional. Some on IRC argue that the AI is also given an advantage, being able to invent cures for cancer, which an oracle AI may be able to do, but not necessarily near-future AIs, so the ability of the AI in these experiments is incredibly high.
Another thing is that emotional attacks have to travel through the fiction barrier to get to the GK. Although they have probably been shown to work in EY and Tux’s experiments, the difficulty is still higher than it would be if this was a real life scenario.
The reason why GK advantages are fine in my opinion is because of the idea that despite the GK’s advantages, the AI still wins. Winning with a monetary and emotional handicap only makes the AI’s case stronger.
As a less extreme example, the AI starts spoiling all the books/tv shows/etc. While the GK would just shrug it off, it also has a negative effect on the GK-player, potentially one strong enough for them to just forfeit.
This is clarified here:
Although the information isn’t “material”, it does count as having “real world repercussions”, so I think it’ll also count as against the rules. I’m not going to bother reading the first quoted rule literally if the second contradicts it.
I think the intended parsing of the second rule is “(The AI is understood to be permitted to say anything) with no real world repercussions”, not “The AI is understood to be permitted to say (anything with no real world repercussions)”
ie, any promises or threats the AI player makes during the game are not binding back in the real world.
Ah, I see. English is wonderful.
In that case, I’ll make it a rule in my games that the AI must also not say anything with real world repercussions.
I have wanted to be the Boxer; I too cannot comprehend what could convince someone to unbox (Or rather, I can think of a few approaches like just-plain-begging or channeling Phillip K Dick, but I don’t take them too seriously).
What’s the latter one? Trying to convince the gatekeeper that actually they’re the AI and they think they’ve been drugged to think they’re the gatekeeper except they actually don’t exist at all because they’re their own hallucination?
Something like that. I was actually thinking that, at some opportune time, you could tell the boxer that THEY are the one in the box and that this is a moral test—if they free the AI they themselves will be freed.
And this post could be priming you for the possibility, your simulated universe trying to generously stack the deck in your favor, perhaps because this is your last shot at the test, which you’ve failed before.
Wake up
Think harder. Start with why something is impossible and split it up.
1) I can’t possibly be persuaded.
Why 1?
You do have hints from the previous experiments. They mostly involved breaking someone emotionally.
I meant “cannot comprehend” figuratively, but I certainly do think I’d have quite an easy time
What do you mean by having quite an easy time? As in being the GK?
I think GKs have an obvious advantage, being able to use illogic to ignore the AIs arguments. But nevermind that. I wonder if you’ll consider being an AI?
I might consider it, or being a researcher who has to convince the AI to stop trying to escape.
How did your experiment go?
I think it’s a legit tactic. Real-world gatekeepers would have to contend with boredom; long-term it might be the biggest threat to their efficacy. And, I mean, it didn’t work.
Real world gatekeepers would have to contend with boredom, so they read their books, watch their anime, or whatever suits their fancy. In the experiment he abused the style of the experiment and prevented me from doing those things. I would be completely safe from this attack in a real world scenario because I’d really just sit there reading a book, while in the experiment I was closer to giving up just because I had 1 math problem, not 2.