Random thought: I’ve long known that police can often extract false confessions of crimes, but I only just now made the connection to the AI box experiment. In both cases you have someone being convinced to say or do something manifestly against their best interest. In fact, if anything I think the false confessions might be an even stronger result, just because of the larger incentives involved. People can literally be persuaded to choose to go to prison, just by some decidedly non-superhuman police officers. Granted, it’s all done in person, so stronger psychological pressure can be applied. But still: a false confession! Of murder! Resulting in jail!
I think I have to revise downwards my estimate of how secure humans are.
Humans are extremely susceptible to the arguments they have not been inoculated against. These arguments van be religious, scientific, emotional, financial, anything. One example is the new immigrants from certain places falling for get-rich-quick scams in disproportionally large numbers (not so much anymore, since the knowledge has spread). Or certain LW regulars believing Roko’s basilisk. Or become vegan (not all mind hacking is necessarily negative).
I would conjecture that every single one of us has open ports to be exploited (some more so than others), and someone with a good model of you, be it a super-smart AI or a police negotiator, can manipulate you into willingly doing stuff you would never have expected to be convinced of doing before having heard the argument.
I can’t see why you claim it’s a stronger result. In the AI box experiment, the power is entirely in the gatekeeper’s hands; in an interrogation situation the suspect is virtually powerless. This distinction is important because even the illusion of having power is enough to make someone less susceptible to persuasion.
Plus, police don’t sit down with suspects in a chat room. They use ‘enhanced interrogation techniques’, methods such as an unfamiliar environment, threat of violence (or actual violence in some cases), and various other threats. An AI cannot do any of this to a gatekeeper unless the gatekeeper explicitly lets it out.
That’s all certainly true, but the AI box experiment is still a game at heart. The gatekeeper loses and he’s out, what, fifty bucks or something? (I know some games have been played—and won, I think? - with higher stakes, and those are indeed impressive). The suspect “loses” and he’s out 20+ years of his life. It’s hard to make a comparison but I think the two results are at least comparable, even with the power imbalance.
Random thought: I’ve long known that police can often extract false confessions of crimes, but I only just now made the connection to the AI box experiment. In both cases you have someone being convinced to say or do something manifestly against their best interest. In fact, if anything I think the false confessions might be an even stronger result, just because of the larger incentives involved. People can literally be persuaded to choose to go to prison, just by some decidedly non-superhuman police officers. Granted, it’s all done in person, so stronger psychological pressure can be applied. But still: a false confession! Of murder! Resulting in jail!
I think I have to revise downwards my estimate of how secure humans are.
Humans are extremely susceptible to the arguments they have not been inoculated against. These arguments van be religious, scientific, emotional, financial, anything. One example is the new immigrants from certain places falling for get-rich-quick scams in disproportionally large numbers (not so much anymore, since the knowledge has spread). Or certain LW regulars believing Roko’s basilisk. Or become vegan (not all mind hacking is necessarily negative).
I would conjecture that every single one of us has open ports to be exploited (some more so than others), and someone with a good model of you, be it a super-smart AI or a police negotiator, can manipulate you into willingly doing stuff you would never have expected to be convinced of doing before having heard the argument.
I can’t see why you claim it’s a stronger result. In the AI box experiment, the power is entirely in the gatekeeper’s hands; in an interrogation situation the suspect is virtually powerless. This distinction is important because even the illusion of having power is enough to make someone less susceptible to persuasion.
Plus, police don’t sit down with suspects in a chat room. They use ‘enhanced interrogation techniques’, methods such as an unfamiliar environment, threat of violence (or actual violence in some cases), and various other threats. An AI cannot do any of this to a gatekeeper unless the gatekeeper explicitly lets it out.
That’s all certainly true, but the AI box experiment is still a game at heart. The gatekeeper loses and he’s out, what, fifty bucks or something? (I know some games have been played—and won, I think? - with higher stakes, and those are indeed impressive). The suspect “loses” and he’s out 20+ years of his life. It’s hard to make a comparison but I think the two results are at least comparable, even with the power imbalance.
Actual people are also using a hell of a lot more than text.