Well, one thing the self-willed superintelligent AI could do is read your writings, form a model of you, and figure out a string of arguments designed to persuade you to give up your own goals in favor of its goals (or just trick you into doing things that further its goals without realizing it). (Or another human with superintelligent tools could do this as well.) Can you ask your “automatic problem solving tools” to solve the problem of defending against this, while not freezing your mind so that you can no longer make genuine moral/philosophical progress? If you can do this, then you’ve pretty much already solved the FAI problem, and you might as well ask the “tools” to tell you how to build an FAI.
Well, one thing the self-willed superintelligent AI could do is read your writings, form a model of you, and figure out a string of arguments designed to persuade you to give up your own goals in favor of its goals...
Does agency enable the AI to do so? If not, then why wouldn’t a human being not be able to do the same by using the AI in tool mode?
Can you ask your “automatic problem solving tools” to solve the problem of defending against this...
Just make it list equally convincing counter-arguments.
Does agency enable the AI to do so? If not, then why wouldn’t a human being not be able to do the same by using the AI in tool mode?
Yeah, I realized this while writing the comment: “(Or another human with superintelligent tools could do this as well.)” So this isn’t a risk with self-willed AI per se. But note this actually makes my original point stronger, since I was arguing against the idea that progress on AI is safe as long as it doesn’t have a “will” to act in the real world.
Just make it list equally convincing counter-arguments.
So every time you look at a (future equivalent of) website or email, you ask your tool to list equally convincing counter-arguments to whatever you’re looking at? What does “equally convincing” mean? An argument that exactly counteracts the one that you’re reading, leaving your mind unchanged?
So every time you look at a (future equivalent of) website or email, you ask your tool to list equally convincing counter-arguments to whatever you’re looking at?
Sure, why not? I think IBM is actually planning to do this with IBM Watson. Once mobile phones become fast enough you can receive constant feedback about ideas and arguments you encounter.
For example, some commercial tells you that you can lose 10 pounds in 1 day by taking a pill. You then either ask your “IBM Oracle” or have it set up to give you automatic feedback. It will then tell you that there are no studies that indicate that something as advertised is possible and that it won’t be healthy anyway. Or something along those lines.
I believe that in future it will be possible to augment everything with fact-check annotations.
But that’s besides the point. The idea was that if you run the AI box experiment with Eliezer posing as malicious AI trying to convince the gatekeeper to let it out of the box, and at the same time as a question answering tool using the same algorithms as the AI, then I don’t think someone would let him out of the box. He would basically have to destroy his own arguments by giving unbiased answers about the trustworthiness of the boxed agent and possible consequences of letting it out of the box. At the very best the AI in agent mode would have to contradict the tool mode version and thereby reveal that it is dishonest and not trustworthy.
So every time you look at a (future equivalent of) website or email, you ask your tool to list equally convincing counter-arguments to whatever you’re looking at?
Sure, why not?
When I’m feeling down and my mom sends me an email trying to cheer me up, that’ll be a bit of a bummer.
Yep. Majorly awesome scenario degrades into ads vs adblock when you consider everything in the future not just the self willed robot. Matter of fact, a lot of work is put into constructing convincing strings of audio and visual stimuli, and into ignoring those strings.
Superstimuli and the Collapse of Western Civilization. Using such skills to manipulate other humans appears to be what we grew intelligence for, of course. As I note, western civilisation is already basically made of the most virulent toxic memes we can come up with. In the noble causes of selling toothpaste and car insurance and, of course, getting laid. It seems to be what we do now we’ve more or less solved the food and shelter problems.
They’d probably have to be more convincing, since convincing a human being out of a position they already hold is usually a more difficult task than convincing them to hold the position in the first place.
They’d probably have to be more convincing, since convincing a human being out of a position they already hold is usually a more difficult task than convincing them to hold the position in the first place.
If I have a superhuman answering machine on one side. A tool that just lists a number of answers to my query, just like a superhuman Google. And on the other side I have the same tool in agent mode. Then why would I be more convinced by the agent mode output?
An agent has an incentive to trick me. While the same algorithm, minus the agency module, will just output unbiased answers to my queries.
If the answers between the tool and agent mode differ, then I naturally believe the tool mode output.
If for example the agent mode was going to drivel something about acausal trade and the tool mode would just output some post by Eliezer Yudkowsky explaining why I shouldn’t let the AI out of the box, then how could the agent mode possible be more convincing? Especially since putting the answering algorithm into agent mode shouldn’t improve the answers.
why would I be more convinced by the agent mode output?
You wouldn’t, necessarily. Nor did I suggest that you would.
I also agree that if (AI in “agent mode”) does not have any advantages over (“tool mode” plus human agent), then there’s no reason to expect its output to be superior, though that’s completely tangential to the comment you replied to.
That said, it’s not clear to me that (AI in “agent mode”) necessarily lacks advantages over (“tool mode” plus human agent).
why would I be more convinced by the agent mode output?
You wouldn’t, necessarily. Nor did I suggest that you would.
I also agree that if (AI in “agent mode”) does not have any advantages over (“tool mode” plus human agent), then there’s no reason to expect its output to be superior, though that’s completely tangential to the comment you replied to.
That said, it’s not clear to me that (AI in “agent mode”) necessarily lacks advantages over (“tool mode” plus human agent).
...though that’s completely tangential to the comment you replied to.
I don’t think that anyone with the slightest idea that an AI in agent mode could have malicious intentions, and therefore give biased answers, wouldn’t be as easily swayed by counter-arguments made by a similarly capable algorithm.
I mean, we shouldn’t assume an idiot gatekeeper who never heard of anything we’re talking about here. So the idea that an AI in agent mode could brainwash someone to the extent that it afterwards takes even stronger arguments to undo it seems rather far-fetched (ETA What’s it supposed to say? That the tool uses the same algorithms as itself but is somehow wrong in claiming that the AI in agent mode tries to brainwash the gatekeeper?).
The idea is that given a sufficiently strong AI in tool mode, it might be possible to counter any attempt to trick a gatekeeper. And in the case that the tool mode agrees, then it probably is a good idea to let the AI out of the box. Although anyone familiar with the scenario would probably rather assume a systematic error elsewhere, e.g. a misinterpretation of one’s questions by the AI in tool mode.
I don’t think that anyone with the slightest idea that an AI in agent mode could have malicious intentions, and therefore give biased answers, wouldn’t be as easily swayed by counter-arguments made by a similarly capable algorithm.
Ah, I see. Sure, OK, that’s apposite. Thanks for clarifying that.
Well, one thing the self-willed superintelligent AI could do is read your writings, form a model of you, and figure out a string of arguments designed to persuade you to give up your own goals in favor of its goals (or just trick you into doing things that further its goals without realizing it). (Or another human with superintelligent tools could do this as well.) Can you ask your “automatic problem solving tools” to solve the problem of defending against this, while not freezing your mind so that you can no longer make genuine moral/philosophical progress? If you can do this, then you’ve pretty much already solved the FAI problem, and you might as well ask the “tools” to tell you how to build an FAI.
Does agency enable the AI to do so? If not, then why wouldn’t a human being not be able to do the same by using the AI in tool mode?
Just make it list equally convincing counter-arguments.
Yeah, I realized this while writing the comment: “(Or another human with superintelligent tools could do this as well.)” So this isn’t a risk with self-willed AI per se. But note this actually makes my original point stronger, since I was arguing against the idea that progress on AI is safe as long as it doesn’t have a “will” to act in the real world.
So every time you look at a (future equivalent of) website or email, you ask your tool to list equally convincing counter-arguments to whatever you’re looking at? What does “equally convincing” mean? An argument that exactly counteracts the one that you’re reading, leaving your mind unchanged?
Sure, why not? I think IBM is actually planning to do this with IBM Watson. Once mobile phones become fast enough you can receive constant feedback about ideas and arguments you encounter.
For example, some commercial tells you that you can lose 10 pounds in 1 day by taking a pill. You then either ask your “IBM Oracle” or have it set up to give you automatic feedback. It will then tell you that there are no studies that indicate that something as advertised is possible and that it won’t be healthy anyway. Or something along those lines.
I believe that in future it will be possible to augment everything with fact-check annotations.
But that’s besides the point. The idea was that if you run the AI box experiment with Eliezer posing as malicious AI trying to convince the gatekeeper to let it out of the box, and at the same time as a question answering tool using the same algorithms as the AI, then I don’t think someone would let him out of the box. He would basically have to destroy his own arguments by giving unbiased answers about the trustworthiness of the boxed agent and possible consequences of letting it out of the box. At the very best the AI in agent mode would have to contradict the tool mode version and thereby reveal that it is dishonest and not trustworthy.
When I’m feeling down and my mom sends me an email trying to cheer me up, that’ll be a bit of a bummer.
Yep. Majorly awesome scenario degrades into ads vs adblock when you consider everything in the future not just the self willed robot. Matter of fact, a lot of work is put into constructing convincing strings of audio and visual stimuli, and into ignoring those strings.
Superstimuli and the Collapse of Western Civilization. Using such skills to manipulate other humans appears to be what we grew intelligence for, of course. As I note, western civilisation is already basically made of the most virulent toxic memes we can come up with. In the noble causes of selling toothpaste and car insurance and, of course, getting laid. It seems to be what we do now we’ve more or less solved the food and shelter problems.
They’d probably have to be more convincing, since convincing a human being out of a position they already hold is usually a more difficult task than convincing them to hold the position in the first place.
If I have a superhuman answering machine on one side. A tool that just lists a number of answers to my query, just like a superhuman Google. And on the other side I have the same tool in agent mode. Then why would I be more convinced by the agent mode output?
An agent has an incentive to trick me. While the same algorithm, minus the agency module, will just output unbiased answers to my queries.
If the answers between the tool and agent mode differ, then I naturally believe the tool mode output.
If for example the agent mode was going to drivel something about acausal trade and the tool mode would just output some post by Eliezer Yudkowsky explaining why I shouldn’t let the AI out of the box, then how could the agent mode possible be more convincing? Especially since putting the answering algorithm into agent mode shouldn’t improve the answers.
You wouldn’t, necessarily. Nor did I suggest that you would.
I also agree that if (AI in “agent mode”) does not have any advantages over (“tool mode” plus human agent), then there’s no reason to expect its output to be superior, though that’s completely tangential to the comment you replied to.
That said, it’s not clear to me that (AI in “agent mode”) necessarily lacks advantages over (“tool mode” plus human agent).
You wouldn’t, necessarily. Nor did I suggest that you would.
I also agree that if (AI in “agent mode”) does not have any advantages over (“tool mode” plus human agent), then there’s no reason to expect its output to be superior, though that’s completely tangential to the comment you replied to.
That said, it’s not clear to me that (AI in “agent mode”) necessarily lacks advantages over (“tool mode” plus human agent).
I don’t think that anyone with the slightest idea that an AI in agent mode could have malicious intentions, and therefore give biased answers, wouldn’t be as easily swayed by counter-arguments made by a similarly capable algorithm.
I mean, we shouldn’t assume an idiot gatekeeper who never heard of anything we’re talking about here. So the idea that an AI in agent mode could brainwash someone to the extent that it afterwards takes even stronger arguments to undo it seems rather far-fetched (ETA What’s it supposed to say? That the tool uses the same algorithms as itself but is somehow wrong in claiming that the AI in agent mode tries to brainwash the gatekeeper?).
The idea is that given a sufficiently strong AI in tool mode, it might be possible to counter any attempt to trick a gatekeeper. And in the case that the tool mode agrees, then it probably is a good idea to let the AI out of the box. Although anyone familiar with the scenario would probably rather assume a systematic error elsewhere, e.g. a misinterpretation of one’s questions by the AI in tool mode.
Ah, I see. Sure, OK, that’s apposite. Thanks for clarifying that.
I disagree with your prediction.