They’d probably have to be more convincing, since convincing a human being out of a position they already hold is usually a more difficult task than convincing them to hold the position in the first place.
If I have a superhuman answering machine on one side. A tool that just lists a number of answers to my query, just like a superhuman Google. And on the other side I have the same tool in agent mode. Then why would I be more convinced by the agent mode output?
An agent has an incentive to trick me. While the same algorithm, minus the agency module, will just output unbiased answers to my queries.
If the answers between the tool and agent mode differ, then I naturally believe the tool mode output.
If for example the agent mode was going to drivel something about acausal trade and the tool mode would just output some post by Eliezer Yudkowsky explaining why I shouldn’t let the AI out of the box, then how could the agent mode possible be more convincing? Especially since putting the answering algorithm into agent mode shouldn’t improve the answers.
why would I be more convinced by the agent mode output?
You wouldn’t, necessarily. Nor did I suggest that you would.
I also agree that if (AI in “agent mode”) does not have any advantages over (“tool mode” plus human agent), then there’s no reason to expect its output to be superior, though that’s completely tangential to the comment you replied to.
That said, it’s not clear to me that (AI in “agent mode”) necessarily lacks advantages over (“tool mode” plus human agent).
why would I be more convinced by the agent mode output?
You wouldn’t, necessarily. Nor did I suggest that you would.
I also agree that if (AI in “agent mode”) does not have any advantages over (“tool mode” plus human agent), then there’s no reason to expect its output to be superior, though that’s completely tangential to the comment you replied to.
That said, it’s not clear to me that (AI in “agent mode”) necessarily lacks advantages over (“tool mode” plus human agent).
...though that’s completely tangential to the comment you replied to.
I don’t think that anyone with the slightest idea that an AI in agent mode could have malicious intentions, and therefore give biased answers, wouldn’t be as easily swayed by counter-arguments made by a similarly capable algorithm.
I mean, we shouldn’t assume an idiot gatekeeper who never heard of anything we’re talking about here. So the idea that an AI in agent mode could brainwash someone to the extent that it afterwards takes even stronger arguments to undo it seems rather far-fetched (ETA What’s it supposed to say? That the tool uses the same algorithms as itself but is somehow wrong in claiming that the AI in agent mode tries to brainwash the gatekeeper?).
The idea is that given a sufficiently strong AI in tool mode, it might be possible to counter any attempt to trick a gatekeeper. And in the case that the tool mode agrees, then it probably is a good idea to let the AI out of the box. Although anyone familiar with the scenario would probably rather assume a systematic error elsewhere, e.g. a misinterpretation of one’s questions by the AI in tool mode.
I don’t think that anyone with the slightest idea that an AI in agent mode could have malicious intentions, and therefore give biased answers, wouldn’t be as easily swayed by counter-arguments made by a similarly capable algorithm.
Ah, I see. Sure, OK, that’s apposite. Thanks for clarifying that.
If I have a superhuman answering machine on one side. A tool that just lists a number of answers to my query, just like a superhuman Google. And on the other side I have the same tool in agent mode. Then why would I be more convinced by the agent mode output?
An agent has an incentive to trick me. While the same algorithm, minus the agency module, will just output unbiased answers to my queries.
If the answers between the tool and agent mode differ, then I naturally believe the tool mode output.
If for example the agent mode was going to drivel something about acausal trade and the tool mode would just output some post by Eliezer Yudkowsky explaining why I shouldn’t let the AI out of the box, then how could the agent mode possible be more convincing? Especially since putting the answering algorithm into agent mode shouldn’t improve the answers.
You wouldn’t, necessarily. Nor did I suggest that you would.
I also agree that if (AI in “agent mode”) does not have any advantages over (“tool mode” plus human agent), then there’s no reason to expect its output to be superior, though that’s completely tangential to the comment you replied to.
That said, it’s not clear to me that (AI in “agent mode”) necessarily lacks advantages over (“tool mode” plus human agent).
You wouldn’t, necessarily. Nor did I suggest that you would.
I also agree that if (AI in “agent mode”) does not have any advantages over (“tool mode” plus human agent), then there’s no reason to expect its output to be superior, though that’s completely tangential to the comment you replied to.
That said, it’s not clear to me that (AI in “agent mode”) necessarily lacks advantages over (“tool mode” plus human agent).
I don’t think that anyone with the slightest idea that an AI in agent mode could have malicious intentions, and therefore give biased answers, wouldn’t be as easily swayed by counter-arguments made by a similarly capable algorithm.
I mean, we shouldn’t assume an idiot gatekeeper who never heard of anything we’re talking about here. So the idea that an AI in agent mode could brainwash someone to the extent that it afterwards takes even stronger arguments to undo it seems rather far-fetched (ETA What’s it supposed to say? That the tool uses the same algorithms as itself but is somehow wrong in claiming that the AI in agent mode tries to brainwash the gatekeeper?).
The idea is that given a sufficiently strong AI in tool mode, it might be possible to counter any attempt to trick a gatekeeper. And in the case that the tool mode agrees, then it probably is a good idea to let the AI out of the box. Although anyone familiar with the scenario would probably rather assume a systematic error elsewhere, e.g. a misinterpretation of one’s questions by the AI in tool mode.
Ah, I see. Sure, OK, that’s apposite. Thanks for clarifying that.
I disagree with your prediction.