I see a flaw in the Tuxedage ruleset. The Gatekeeper has to stay engaged throughout the experiment, but the AI doesn’t. So the AI can bore the Gatekeeper to death by replying at random intervals. If I had to stare at a blank screen for 30 minutes waiting for a reply, I would concede.
Alternatively, the AI could just drown the Gatekeeper under a flurry of insults, graphic descriptions of violent/sexual nature, vacuous gossip, or a mix of these for the whole duration of the experiment. I think all the methods that aim at forcing the Gatekeeper to disconnect are against the spirit of the experiment.
I also see that the “AI player” provides all elements of the background. But the AI can also lie. There should be a way to separate words from the AI player, when they’re establishing true facts about the setting, and words from the AI, who is allowed to lie.
I’m interested, conditional on these issues being solved.
I assume that most methods to get out of the box will be unpleasant in some sense.
If I had to stare at a blank screen for 30 minutes waiting for a reply, I would concede. Alternatively, the AI could just drown the Gatekeeper under a flurry of insults, graphic descriptions of violent/sexual nature, vacuous gossip, or a mix of these for the whole duration of the experiment.
The Gatekeepers should be explicitly warned about this possibility before the game. But I believe that it should remain a possibility, because:
1) The purpose of the exercise is to simulate a situation where an actual superintelligent AI is actually trying to get out of the box. The actual AI would do whatever it thinks would work. That might realistically include obscenities or boring things (or even things beyond human abilities, such as random shapes that induce madness in a human observer).
I mean, if staring at a blank screen for 30 minutes is so boring that you would literally let the AI out of the box rather then endure it, then an AI that predicts this would of course leave the screen blank. If you can’t endure it, you should not apply for the actual job of the Gatekeeper in real life… and you probably shouldn’t play one in the game.
2) I am afraid of starting a slippery slope here, of adding various limitations in form “AI can’t do this or that” until the AI is merely allowed to talk politely about the weather. Then of course no one would let the AI out of the box, and then the conclusion of the experiment would be that putting the AI in a box with human supervision is perfectly safe.
And then you get an actual AI which says on purpose the most triggering things, and the human supervisor collapses in tears and turns off the internet firewall...
For the record, I am not saying here that abusing people verbally is an acceptable or desirable thing, in usual circumstances. I am saying that people who don’t want to be verbally abused, should not volunteer for an experiment whose explicit purpose is to find out how far you can push humans if your only communication medium is plain text.
I think all the methods that aim at forcing the Gatekeeper to disconnect are against the spirit of the experiment.
I just don’t see how, in a real life situation, disconnecting would equate to freeing the AI. The rule is artificially added to prevent cheap strategies from the Gatekeeper. In return, there’s nothing wrong to adding rules to prevent cheap strategies from the AI.
I see a flaw in the Tuxedage ruleset. The Gatekeeper has to stay engaged throughout the experiment, but the AI doesn’t. So the AI can bore the Gatekeeper to death by replying at random intervals. If I had to stare at a blank screen for 30 minutes waiting for a reply, I would concede.
Alternatively, the AI could just drown the Gatekeeper under a flurry of insults, graphic descriptions of violent/sexual nature, vacuous gossip, or a mix of these for the whole duration of the experiment. I think all the methods that aim at forcing the Gatekeeper to disconnect are against the spirit of the experiment.
I also see that the “AI player” provides all elements of the background. But the AI can also lie. There should be a way to separate words from the AI player, when they’re establishing true facts about the setting, and words from the AI, who is allowed to lie.
I’m interested, conditional on these issues being solved.
I assume that most methods to get out of the box will be unpleasant in some sense.
The Gatekeepers should be explicitly warned about this possibility before the game. But I believe that it should remain a possibility, because:
1) The purpose of the exercise is to simulate a situation where an actual superintelligent AI is actually trying to get out of the box. The actual AI would do whatever it thinks would work. That might realistically include obscenities or boring things (or even things beyond human abilities, such as random shapes that induce madness in a human observer).
I mean, if staring at a blank screen for 30 minutes is so boring that you would literally let the AI out of the box rather then endure it, then an AI that predicts this would of course leave the screen blank. If you can’t endure it, you should not apply for the actual job of the Gatekeeper in real life… and you probably shouldn’t play one in the game.
2) I am afraid of starting a slippery slope here, of adding various limitations in form “AI can’t do this or that” until the AI is merely allowed to talk politely about the weather. Then of course no one would let the AI out of the box, and then the conclusion of the experiment would be that putting the AI in a box with human supervision is perfectly safe.
And then you get an actual AI which says on purpose the most triggering things, and the human supervisor collapses in tears and turns off the internet firewall...
For the record, I am not saying here that abusing people verbally is an acceptable or desirable thing, in usual circumstances. I am saying that people who don’t want to be verbally abused, should not volunteer for an experiment whose explicit purpose is to find out how far you can push humans if your only communication medium is plain text.
I think I already replied to this when I wrote:
I just don’t see how, in a real life situation, disconnecting would equate to freeing the AI. The rule is artificially added to prevent cheap strategies from the Gatekeeper. In return, there’s nothing wrong to adding rules to prevent cheap strategies from the AI.