Although I’m worried about how the impossibility of boxing represents an existential risk, I find it hard to alert others to this.
The custom of not sharing powerful attack strategies is an obstacle. It forces me—and the people I want to discuss this with—to imagine how someone (and hypothetically something) much smarter than ourselves would argue, and we’re not good at imagining that.
I wish I had a story in which an AI gets a highly competent gatekeeper to unbox it. If the AI strategies you guys have come up with could actually work outside the frame this game is played in, it should be quite a compelling story. Maybe a movie script even. That’d create interest in FAI among the short attention span population.
Mr Yudkowsky, wouldn’t that be your kind of project?
The problem with that is that both EY and I suspect that if the logs were actually released, or any significant details given about the exact methods of persuasion used, people could easily point towards those arguments and say: “That definitely wouldn’t have worked on me!”—since it’s really easy to feel that way when you’re not the subject being manipulated.
From EY’s rules:
If Gatekeeper lets the AI out, naysayers can’t say “Oh, I wouldn’t have been convinced by that.” As long as they don’t know what happened to the Gatekeeper, they can’t argue themselves into believing it wouldn’t happen to them.
I don’t care about “me”, I care about hypothetical gatekeeper “X”.
Even if my ego prevents me from accepting that I might be persuaded by “Y”, I can easily admit that “X” could be persuaded by “Y”. In this case, exhibiting a particular “Y” that seems like it could persuade “X” is an excellent argument against creating the situation that allows “X” to be persuaded by “Y”. The more and varied the “Y” we can produce, the less smart putting humans in this situation looks. And isn’t that what we’re trying to argue here? That AI-boxing isn’t safe because people will be convinced by “Y”?
We do this all the time in arguing for why certain political powers shouldn’t be given. “The corrupting influence of power” is a widely accepted argument against having benign dictators, even if we think we’re personally exempt. How could you say “Dictators would do bad things because of Y, but I can’t even tell you Y because you’d claim that you wouldn’t fall for it” and expect to persuade anyone?
And if you posit that doing Z is sufficiently bad, then you don’t need recourse to any exotic arguments to show that we shouldn’t give people the option of doing Z. Eventually someone will do Z for money, or from fear, or because God told them to do Z, or maybe there’s just really stupid. I’m a little peeved I can’t geek out of the cool arguments people are coming up with because of this obscurantism.
There are other arguments I can think of for not sharing strong strategies, but they are either cynical or circular. Cynical explanations are obvious. On circular arguments: Isn’t an argument for letting the AI out of the box an argument for building the AI in the first place? Isn’t that the whole shtick here?
Provided people keep playing this game, this will eventually happen anyway. And if in that eventual released log of an AI victory, the gatekeeper is persuaded by less compelling strategies than yours, it would be even easier to believe “it couldn’t happen to me”.
Secondly, since we’re assuming Oracle AI is possible and boxing seems to be most people’s default strategy for when that happens, there will be future gatekeepers facing actual AIs. Shouldn’t you try to immunize them against at least some of the strategies AIs could conceivably discover independently?
The number of people actually playing this game is quite small, and the number of winning AIs is even smaller (to the point where Tuxedage can charge $750 a round and isn’t immediately flooded with competitors). And secrecy is considered part of the game’s standard rules. So it is not obvious that AI win logs will eventually be released anyway.
The number of people actually playing this game is quite small, and the number of winning AIs is even smaller (to the point where Tuxedage can charge $750 a round and isn’t immediately flooded with competitors).
A round seems to need the 2 hours on the chat but also many hours in background research. If we say 8 hours background research and script writing that would equal $75/hour. I think that most people with advanced persuasion skills can make a higher hourly rate.
Shouldn’t you try to immunize them against at least some of the strategies AIs could conceivably discover independently?
I don’t think reading a few logs would immunize someone. If you wanted to immunize someone I would suggest a few years of therapy with a good psychologist to work through any trauma’s that exist in that person’s life and the existential questions.
I would add many hours in meditation to have learn to have control over your own mind.
You could train someone to precommit and build emotional endurance. If someone can take highly addictive drugs and has a enough control over his own mind to refuse them when put a few hours alone in a room with them I would trust them more to stay emotionally stable in front of an AI.
You could also require gatekeepers to have played the AI role in the experiment a few times.
You might also look into techniques that the military teaches soldiers to resist torture.
But even with all these safety measures it’s still dangerous.
I suspect Eliezer is avoiding this project for the same reason the word “singularity” was adopted in the sense we use it at all. Vinge coined it to point to the impossibility of writing characters dramatically smarter than himself.
“Here I had tried a straightforward extrapolation of technology, and found myself precipitated over an abyss. It’s a problem we face every time we consider the creation of intelligences greater than our own. When this happens, human history will have reached a kind of singularity—a place where extrapolation breaks down and new models must be applied—and the world will pass beyond our understanding.”
Perhaps a large number of brilliant humans working together on a very short story / film for a long time could simulate superintelligence just enough to convince the average human that More Is Possible. But there would be a lot of risk of making people zero in on irrelevant details, and continue to underestimate just how powerful SI could be.
There’s also a worry that the vividness of ‘AI in a box’ as premise would continue to make the public think oracle AI is the obvious and natural approach and we just have to keep working on doing it better. They’d remember the premise more than the moral. So, caution is warranted.
Also, hindsight bias. Most tricks won’t work on everyone, but even if we find a universal trick that will work for the film, afterward people who see it will think it’s obvious and that they could easily think their way around it. Making some of the AI’s maneuvering mysterious would help combat this problem a bit, but would also weaken the story.
This is a good argument against the AI using a single trick. But Tuxedage describes picking 7-8 strategies from 30-40. The story could be about the last in a series of gatekeepers, after all the previous ones have been persuaded, each with a different, briefly mentioned strategy.
A lot of tricks could help solve the problem, yeah. On the other hand, the more effective tricks we include in the film, the more dangerous the film becomes in a new respect: We’re basically training our audience to be better at manipulating and coercing each other into doing things. We’d have to be very careful not to let the AI become romanticized in the way a whole lot of recent movie villains have been.
Moreover, if the AI is persuasive enough to convince an in-movie character to temporarily release it, then it will probably also be persuasive enough to permanently convince at least some of the audience members that a superintelligence deserves to have complete power over humanity, and to kill us if it wants. No matter how horrific we make the end of the movie look, at least some people will mostly remember how badass and/or kind and/or compelling the AI was during a portion of the movie, rather than the nightmarish end result. So, again, I like the idea, but a lot of caution is warranted if we decide to invest much into it.
I’m not asking whether we should outlaw AI-box stories; I’m asking whether we should commit lots of resources to creating a truly excellent one. I’m on the fence about that, not opposed. But I wanted to point out the risks at the outset.
Pretty much, and I loved that story. But it glosses over the persuasion bit, which is the interesting part. And it’d be hard to turn into a YouTube video.
The custom of not sharing powerful attack strategies is an obstacle. It forces me—and the people I want to discuss this with—to imagine how someone (and hypothetically something) much smarter than ourselves would argue, and we’re not good at imagining that.
If you don’t know what you are doing and retell something that actually designed to put people into emotional turmoil you can do damage to the people with whom you are arguing.
Secondly there are attack strategies that you won’t understand when you read a transcript.
Richard Bandler installed in someone I know on a first name basis an inability to pee in one of his lectures because the person refused to close their eyes when Bandler asked them directly to do so. After he asked Bandler to remove it, he could pee again.
There where plenty of people in the audience includign the person being attacked who knew quite a bit about language but who didn’t saw how the attack happened.
If you are the kind of person who can’t come up with interesting strategies on their own, I don’t think that you would be convinced by reading a transcript of covert hypnosis.
I don’t have a recording of the event to break it down to a level where I can explain that in a step by step fashion. Even if I would think it would take some background in hypnosis or NLP to follow a detailed explanation. Human minds often don’t do what we would intuitively assume they would do and unlearning to trust all those learned ideas about what’s supposed to happen isn’t easy.
If you think that attacks generally happen in a way that you can easily understand by reading an explanation, then you ignore most of the powerful attacks.
What pragmatist said. Even if you can’t break it down step by step, can you explain what the mechanism was or how the attack was delivered? Was it communicated with words? If it was hidden how did your friend understand it?
The basic framework is using nested loops and metaphors.
If a AGI for example wanted to get someone to get them out of the cage it could tell a highly story about some animal named Fred and part of the story is that it’s very important that a human released that animal from the cage.
If the AGI then later speaks about Fred it brings up the positively feeling concept of releasing things from cages. That increases the chances of listener then releasing the AGI.
Alone this won’t be enough, but over time it’s possible to build up a lot of emotionally charged metaphors and then chain them together in an instance to work together. In practice getting it to work isn’t easy.
Can you give me an example of a NLP “program” that influences someone, or link me to a source that discusses this more specifically? I’m interested but, as I said, skeptical, and looking for more specifics.
In this case, I doubt that there writing that get’s to the heart of the issue that accessible to people without an NLP or hypnosis background. I’m also from Germany so a lot of the sources from which I actually learned are German.
If you generally want to get an introduction into hypnosis I recommend “Monsters and Magical Sticks: There is No Such Thing as Hypnosis” by Steven Heller.
I share Blueberry’s skepticism, and it’s not based on what’s intuitive. It’s based on the lack of scientific evidence for the claims made by NLPers, and the fact that most serious psychologists consider NLP discredited.
It’s based on the lack of scientific evidence for the claims made by NLPers, and the fact that most serious psychologists consider NLP discredited.
I think that a lot of what serious psychologists these days call mimikry is basically what Bandler and Grindler described as rapport building through pacing and leading. Bandler wrote 30 years before Chartrand et al wrote “The chameleon effect: The perception–behavior link and social interaction.”
Being 30 years ahead of the time for a pretty simple effect isn’t bad.
There no evidence that the original NLP Fast Phobia cure is much better than existing CBT techniques but there is evidence that it has an effect. I also wouldn’t use the NLP Fast Phobia cure these days in the original version but in an improved version.
Certain claims made about eye accessing cues don’t seem to be true in the form they were made in the past. You can sometimes still find them in online articles written by people who read but and reiterate wisdom but they aren’t really taught that way anymore by good NLP trainers. Memorizing the eye accessing charts instead of calibrating yourself to the person in front of yourself isn’t what NLP is about these days.
A lot of what happens in NLP is also not in a form that can be easily tested in scientific experiments. Getting something to work is much easier than having scientific proof that it works. CFARs training is also largely unproven.
Although I’m worried about how the impossibility of boxing represents an existential risk, I find it hard to alert others to this.
The custom of not sharing powerful attack strategies is an obstacle. It forces me—and the people I want to discuss this with—to imagine how someone (and hypothetically something) much smarter than ourselves would argue, and we’re not good at imagining that.
I wish I had a story in which an AI gets a highly competent gatekeeper to unbox it. If the AI strategies you guys have come up with could actually work outside the frame this game is played in, it should be quite a compelling story. Maybe a movie script even. That’d create interest in FAI among the short attention span population.
Mr Yudkowsky, wouldn’t that be your kind of project?
The problem with that is that both EY and I suspect that if the logs were actually released, or any significant details given about the exact methods of persuasion used, people could easily point towards those arguments and say: “That definitely wouldn’t have worked on me!”—since it’s really easy to feel that way when you’re not the subject being manipulated.
From EY’s rules:
I don’t understand.
I don’t care about “me”, I care about hypothetical gatekeeper “X”.
Even if my ego prevents me from accepting that I might be persuaded by “Y”, I can easily admit that “X” could be persuaded by “Y”. In this case, exhibiting a particular “Y” that seems like it could persuade “X” is an excellent argument against creating the situation that allows “X” to be persuaded by “Y”. The more and varied the “Y” we can produce, the less smart putting humans in this situation looks. And isn’t that what we’re trying to argue here? That AI-boxing isn’t safe because people will be convinced by “Y”?
We do this all the time in arguing for why certain political powers shouldn’t be given. “The corrupting influence of power” is a widely accepted argument against having benign dictators, even if we think we’re personally exempt. How could you say “Dictators would do bad things because of Y, but I can’t even tell you Y because you’d claim that you wouldn’t fall for it” and expect to persuade anyone?
And if you posit that doing Z is sufficiently bad, then you don’t need recourse to any exotic arguments to show that we shouldn’t give people the option of doing Z. Eventually someone will do Z for money, or from fear, or because God told them to do Z, or maybe there’s just really stupid. I’m a little peeved I can’t geek out of the cool arguments people are coming up with because of this obscurantism.
There are other arguments I can think of for not sharing strong strategies, but they are either cynical or circular. Cynical explanations are obvious. On circular arguments: Isn’t an argument for letting the AI out of the box an argument for building the AI in the first place? Isn’t that the whole shtick here?
Provided people keep playing this game, this will eventually happen anyway. And if in that eventual released log of an AI victory, the gatekeeper is persuaded by less compelling strategies than yours, it would be even easier to believe “it couldn’t happen to me”.
Secondly, since we’re assuming Oracle AI is possible and boxing seems to be most people’s default strategy for when that happens, there will be future gatekeepers facing actual AIs. Shouldn’t you try to immunize them against at least some of the strategies AIs could conceivably discover independently?
The number of people actually playing this game is quite small, and the number of winning AIs is even smaller (to the point where Tuxedage can charge $750 a round and isn’t immediately flooded with competitors). And secrecy is considered part of the game’s standard rules. So it is not obvious that AI win logs will eventually be released anyway.
A round seems to need the 2 hours on the chat but also many hours in background research. If we say 8 hours background research and script writing that would equal $75/hour. I think that most people with advanced persuasion skills can make a higher hourly rate.
I don’t think reading a few logs would immunize someone. If you wanted to immunize someone I would suggest a few years of therapy with a good psychologist to work through any trauma’s that exist in that person’s life and the existential questions.
I would add many hours in meditation to have learn to have control over your own mind.
You could train someone to precommit and build emotional endurance. If someone can take highly addictive drugs and has a enough control over his own mind to refuse them when put a few hours alone in a room with them I would trust them more to stay emotionally stable in front of an AI.
You could also require gatekeepers to have played the AI role in the experiment a few times.
You might also look into techniques that the military teaches soldiers to resist torture.
But even with all these safety measures it’s still dangerous.
I suspect Eliezer is avoiding this project for the same reason the word “singularity” was adopted in the sense we use it at all. Vinge coined it to point to the impossibility of writing characters dramatically smarter than himself.
Perhaps a large number of brilliant humans working together on a very short story / film for a long time could simulate superintelligence just enough to convince the average human that More Is Possible. But there would be a lot of risk of making people zero in on irrelevant details, and continue to underestimate just how powerful SI could be.
There’s also a worry that the vividness of ‘AI in a box’ as premise would continue to make the public think oracle AI is the obvious and natural approach and we just have to keep working on doing it better. They’d remember the premise more than the moral. So, caution is warranted.
Also, hindsight bias. Most tricks won’t work on everyone, but even if we find a universal trick that will work for the film, afterward people who see it will think it’s obvious and that they could easily think their way around it. Making some of the AI’s maneuvering mysterious would help combat this problem a bit, but would also weaken the story.
This is a good argument against the AI using a single trick. But Tuxedage describes picking 7-8 strategies from 30-40. The story could be about the last in a series of gatekeepers, after all the previous ones have been persuaded, each with a different, briefly mentioned strategy.
A lot of tricks could help solve the problem, yeah. On the other hand, the more effective tricks we include in the film, the more dangerous the film becomes in a new respect: We’re basically training our audience to be better at manipulating and coercing each other into doing things. We’d have to be very careful not to let the AI become romanticized in the way a whole lot of recent movie villains have been.
Moreover, if the AI is persuasive enough to convince an in-movie character to temporarily release it, then it will probably also be persuasive enough to permanently convince at least some of the audience members that a superintelligence deserves to have complete power over humanity, and to kill us if it wants. No matter how horrific we make the end of the movie look, at least some people will mostly remember how badass and/or kind and/or compelling the AI was during a portion of the movie, rather than the nightmarish end result. So, again, I like the idea, but a lot of caution is warranted if we decide to invest much into it.
You can’t stop anybody from writing that story.
I’m not asking whether we should outlaw AI-box stories; I’m asking whether we should commit lots of resources to creating a truly excellent one. I’m on the fence about that, not opposed. But I wanted to point out the risks at the outset.
Isn’t that pretty much what http://lesswrong.com/lw/qk/that_alien_message/ is about?
Pretty much, and I loved that story. But it glosses over the persuasion bit, which is the interesting part. And it’d be hard to turn into a YouTube video.
If you don’t know what you are doing and retell something that actually designed to put people into emotional turmoil you can do damage to the people with whom you are arguing.
Secondly there are attack strategies that you won’t understand when you read a transcript.
Richard Bandler installed in someone I know on a first name basis an inability to pee in one of his lectures because the person refused to close their eyes when Bandler asked them directly to do so. After he asked Bandler to remove it, he could pee again.
There where plenty of people in the audience includign the person being attacked who knew quite a bit about language but who didn’t saw how the attack happened.
If you are the kind of person who can’t come up with interesting strategies on their own, I don’t think that you would be convinced by reading a transcript of covert hypnosis.
How did the attack happen? I’m skeptical.
I don’t have a recording of the event to break it down to a level where I can explain that in a step by step fashion. Even if I would think it would take some background in hypnosis or NLP to follow a detailed explanation. Human minds often don’t do what we would intuitively assume they would do and unlearning to trust all those learned ideas about what’s supposed to happen isn’t easy.
If you think that attacks generally happen in a way that you can easily understand by reading an explanation, then you ignore most of the powerful attacks.
What pragmatist said. Even if you can’t break it down step by step, can you explain what the mechanism was or how the attack was delivered? Was it communicated with words? If it was hidden how did your friend understand it?
The basic framework is using nested loops and metaphors.
If a AGI for example wanted to get someone to get them out of the cage it could tell a highly story about some animal named Fred and part of the story is that it’s very important that a human released that animal from the cage.
If the AGI then later speaks about Fred it brings up the positively feeling concept of releasing things from cages. That increases the chances of listener then releasing the AGI.
Alone this won’t be enough, but over time it’s possible to build up a lot of emotionally charged metaphors and then chain them together in an instance to work together. In practice getting it to work isn’t easy.
Can you give me an example of a NLP “program” that influences someone, or link me to a source that discusses this more specifically? I’m interested but, as I said, skeptical, and looking for more specifics.
In this case, I doubt that there writing that get’s to the heart of the issue that accessible to people without an NLP or hypnosis background. I’m also from Germany so a lot of the sources from which I actually learned are German.
As far as programming and complexity there a nice chart of what taught in a 3 day workshop with nested loops: http://nlpportal.org/nlpedia/images/a/a8/Salesloop.pdf
If you generally want to get an introduction into hypnosis I recommend “Monsters and Magical Sticks: There is No Such Thing as Hypnosis” by Steven Heller.
Understanding the fact that one can’t pee is pretty straightforward.
I share Blueberry’s skepticism, and it’s not based on what’s intuitive. It’s based on the lack of scientific evidence for the claims made by NLPers, and the fact that most serious psychologists consider NLP discredited.
I think that a lot of what serious psychologists these days call mimikry is basically what Bandler and Grindler described as rapport building through pacing and leading. Bandler wrote 30 years before Chartrand et al wrote “The chameleon effect: The perception–behavior link and social interaction.”
Being 30 years ahead of the time for a pretty simple effect isn’t bad.
There no evidence that the original NLP Fast Phobia cure is much better than existing CBT techniques but there is evidence that it has an effect. I also wouldn’t use the NLP Fast Phobia cure these days in the original version but in an improved version.
Certain claims made about eye accessing cues don’t seem to be true in the form they were made in the past. You can sometimes still find them in online articles written by people who read but and reiterate wisdom but they aren’t really taught that way anymore by good NLP trainers. Memorizing the eye accessing charts instead of calibrating yourself to the person in front of yourself isn’t what NLP is about these days.
A lot of what happens in NLP is also not in a form that can be easily tested in scientific experiments. Getting something to work is much easier than having scientific proof that it works. CFARs training is also largely unproven.