FWIW, my own model of gatekeepers who lose the AI Box game is that the AI player successfully suggests to them, whether directly or indirectly, that something is at stake more important than winning the AI box game.
One possibility is to get the gatekeeper sufficiently immersed into the roleplaying exercise that preserving the integrity of that fantasy world is more important than winning the game, then introducing various fictional twists to that exercise that would, in the corresponding fantasy situation, compel the person to release the AI from the box.
I suspect that’s common, as I suspect many of the people really excited to play the AI box game are unusually able to immerse themselves in roleplaying exercises.
I hope Lesswrong also contains people who would be excited to play the AI game in more of a “Ha, I just proved a bold claim wrong!” sort of way.
FWIW, my own model of gatekeepers who lose the AI Box game is that the AI player successfully suggests to them, whether directly or indirectly, that something is at stake more important than winning the AI box game.
I’ve seen that line of thought. This would be unfortunate, because if that method was the main winning metod it would invalidate the strong claim being made that AI can’t be kept in boxes.
But your model doesn’t explain Tuxedage’s descriptions of emotional turmoil and psychological warfare, so at least one person has won by another method (assuming honesty and non-exaggeration)
I haven’t read Tuxedage’s writeups in their entirety, nor am I likely to, so I’m at a loss for how emotional turmoil and psychological warfare could be evidence that the gatekeeper doesn’t think there’s something more important at stake than winning the game.
That said, I’ll take your word for it that in this case they are, and that Tuxedage’s transcripts constitute a counterexample to my model.
Losing felt horrible. By attempting to damage Alexei’s psyche, I in turn, opened myself up to being damaged. I went into a state of catharsis for days.
...and such.
That said, I’ll take your word for it that in this case they are, and that Tuxedage’s transcripts constitute a counterexample to my model.
No, don’t do that, I made a mistake.
I guess I just thought that “you should open the box to convince people of the danger of AI” type arguments aren’t emotionally salient.
But that was a bad assumption, you never limited yourself to just that one argument but spoke of meta in general. You’re right that there exist arguments that might go meta and be emotionally salient.
I suppose you could think of some convoluted timeless decision theory reason for you to open the box. History has shown that some people on LW find timeless blackmail threats emotionally upsetting, though these seem to be in a minority.
there exist arguments that might go meta and be emotionally salient
Oh, absolutely. Actually, the model I am working from here is my own experience of computer strategy games, in which I frequently find myself emotionally reluctant to “kill” my units and thus look for a zero-casualties strategy. All of which is kind of absurd, of course, but there it is.
FWIW, my own model of gatekeepers who lose the AI Box game is that the AI player successfully suggests to them, whether directly or indirectly, that something is at stake more important than winning the AI box game.
One possibility is to get the gatekeeper sufficiently immersed into the roleplaying exercise that preserving the integrity of that fantasy world is more important than winning the game, then introducing various fictional twists to that exercise that would, in the corresponding fantasy situation, compel the person to release the AI from the box.
I suspect that’s common, as I suspect many of the people really excited to play the AI box game are unusually able to immerse themselves in roleplaying exercises.
I hope Lesswrong also contains people who would be excited to play the AI game in more of a “Ha, I just proved a bold claim wrong!” sort of way.
I’ve seen that line of thought. This would be unfortunate, because if that method was the main winning metod it would invalidate the strong claim being made that AI can’t be kept in boxes.
But your model doesn’t explain Tuxedage’s descriptions of emotional turmoil and psychological warfare, so at least one person has won by another method (assuming honesty and non-exaggeration)
I haven’t read Tuxedage’s writeups in their entirety, nor am I likely to, so I’m at a loss for how emotional turmoil and psychological warfare could be evidence that the gatekeeper doesn’t think there’s something more important at stake than winning the game.
That said, I’ll take your word for it that in this case they are, and that Tuxedage’s transcripts constitute a counterexample to my model.
I’m only speaking of things written in the OP
...and such.
No, don’t do that, I made a mistake.
I guess I just thought that “you should open the box to convince people of the danger of AI” type arguments aren’t emotionally salient.
But that was a bad assumption, you never limited yourself to just that one argument but spoke of meta in general. You’re right that there exist arguments that might go meta and be emotionally salient.
I suppose you could think of some convoluted timeless decision theory reason for you to open the box. History has shown that some people on LW find timeless blackmail threats emotionally upsetting, though these seem to be in a minority.
Oh, absolutely. Actually, the model I am working from here is my own experience of computer strategy games, in which I frequently find myself emotionally reluctant to “kill” my units and thus look for a zero-casualties strategy. All of which is kind of absurd, of course, but there it is.