Is an AI all that much different (on this dimension) from a particularly charismatic and persuasive human (or group of humans)?
For humans we often distinguish persuasion methods by their efficacy. So a quiet rational chat for twenty minutes is perfectly fine, raving to a large mob is dubious because of crowd dynamics, and stopping someone from sleeping while repeating the same thing over and over to them for twenty days and nights is brainwashing.
The risk with an AI is that it would be capable of changing humans in ways similar to the more dubious methods, while only using the “safe” methods.
How much have you explored the REASONS that brainwashing is seen as not cool, while quiet rational-seeming chat is perfectly fine? Are you sure it’s only about efficacy?
I worry that there’s some underlying principle missing from the conversation, about agentiness and “free will” of humans, which you’re trying to preserve without defining. It’d be much stronger to identify the underlying goals and include them as terms in the AI’s utility function(s).
No, but I’m pretty sure efficacy plays a role. Look at the (stereotypical) freakout from some conservative parents about their kids attending university; it’s not really about the content or the methods, but because changes in values or beliefs are expected to some degree.
Ok. The obvious followup is “under what conditions is it a bad thing?” Your college example is a good one—are you saying you want to prevent AIs from making similar changes (but on a perhaps larger scale) that university does to students?
Well, there’s a formal answer: if an AI can, in condition C, convince any human of belief B for any B, then condition C is not sufficient to constrain the AI’s power, and the process is unlikely to be truth-tracking.
That’s a sufficient condition for C being insufficient, but not a necessary one.
The risk with an AI is that it would be capable of changing humans in ways similar to the more dubious methods, while only using the “safe” methods.
I think what you’re saying makes sense, but I’m still on Dagon’s side. I’m not convinced this is uniquely an AI thing. It’s not like being a computer gives you charisma powers or makes you psychic—I think that basically comes down to breeding and exposure to toxic waste.
I’m not totally sure it’s an AI thing at all. When a lot of people talk about an AI, they seem to act as if they’re talking about “a being that can do tons of human things, but better.” It’s possible it could, but I don’t know if we have good evidence to assume AI would work like that.
A lot of parts of being human don’t seem to be visible from the outside, and current AI systems get caught in pretty superficial local minima when they try to analyze human behavior. If you think an AI could do the charisma schtick better than mere humans, it seems like you’d also have to assume the AI understands our social feelings better than we understand them.
We don’t know what the AI would be optimizing for and we don’t know how lumpy the gradient is, so I don’t think we have a foothold for solving this problem—and since finding that kind of foothold is probably an instance of the same intractable problem, I’m not convinced a really smart AI would have an advantage against us on solving us.
For humans we often distinguish persuasion methods by their efficacy. So a quiet rational chat for twenty minutes is perfectly fine, raving to a large mob is dubious because of crowd dynamics, and stopping someone from sleeping while repeating the same thing over and over to them for twenty days and nights is brainwashing.
The risk with an AI is that it would be capable of changing humans in ways similar to the more dubious methods, while only using the “safe” methods.
How much have you explored the REASONS that brainwashing is seen as not cool, while quiet rational-seeming chat is perfectly fine? Are you sure it’s only about efficacy?
I worry that there’s some underlying principle missing from the conversation, about agentiness and “free will” of humans, which you’re trying to preserve without defining. It’d be much stronger to identify the underlying goals and include them as terms in the AI’s utility function(s).
No, but I’m pretty sure efficacy plays a role. Look at the (stereotypical) freakout from some conservative parents about their kids attending university; it’s not really about the content or the methods, but because changes in values or beliefs are expected to some degree.
Ok. The obvious followup is “under what conditions is it a bad thing?” Your college example is a good one—are you saying you want to prevent AIs from making similar changes (but on a perhaps larger scale) that university does to students?
Well, there’s a formal answer: if an AI can, in condition C, convince any human of belief B for any B, then condition C is not sufficient to constrain the AI’s power, and the process is unlikely to be truth-tracking.
That’s a sufficient condition for C being insufficient, but not a necessary one.
I think what you’re saying makes sense, but I’m still on Dagon’s side. I’m not convinced this is uniquely an AI thing. It’s not like being a computer gives you charisma powers or makes you psychic—I think that basically comes down to breeding and exposure to toxic waste.
I’m not totally sure it’s an AI thing at all. When a lot of people talk about an AI, they seem to act as if they’re talking about “a being that can do tons of human things, but better.” It’s possible it could, but I don’t know if we have good evidence to assume AI would work like that.
A lot of parts of being human don’t seem to be visible from the outside, and current AI systems get caught in pretty superficial local minima when they try to analyze human behavior. If you think an AI could do the charisma schtick better than mere humans, it seems like you’d also have to assume the AI understands our social feelings better than we understand them.
We don’t know what the AI would be optimizing for and we don’t know how lumpy the gradient is, so I don’t think we have a foothold for solving this problem—and since finding that kind of foothold is probably an instance of the same intractable problem, I’m not convinced a really smart AI would have an advantage against us on solving us.