...though that’s completely tangential to the comment you replied to.
I don’t think that anyone with the slightest idea that an AI in agent mode could have malicious intentions, and therefore give biased answers, wouldn’t be as easily swayed by counter-arguments made by a similarly capable algorithm.
I mean, we shouldn’t assume an idiot gatekeeper who never heard of anything we’re talking about here. So the idea that an AI in agent mode could brainwash someone to the extent that it afterwards takes even stronger arguments to undo it seems rather far-fetched (ETA What’s it supposed to say? That the tool uses the same algorithms as itself but is somehow wrong in claiming that the AI in agent mode tries to brainwash the gatekeeper?).
The idea is that given a sufficiently strong AI in tool mode, it might be possible to counter any attempt to trick a gatekeeper. And in the case that the tool mode agrees, then it probably is a good idea to let the AI out of the box. Although anyone familiar with the scenario would probably rather assume a systematic error elsewhere, e.g. a misinterpretation of one’s questions by the AI in tool mode.
I don’t think that anyone with the slightest idea that an AI in agent mode could have malicious intentions, and therefore give biased answers, wouldn’t be as easily swayed by counter-arguments made by a similarly capable algorithm.
Ah, I see. Sure, OK, that’s apposite. Thanks for clarifying that.
I don’t think that anyone with the slightest idea that an AI in agent mode could have malicious intentions, and therefore give biased answers, wouldn’t be as easily swayed by counter-arguments made by a similarly capable algorithm.
I mean, we shouldn’t assume an idiot gatekeeper who never heard of anything we’re talking about here. So the idea that an AI in agent mode could brainwash someone to the extent that it afterwards takes even stronger arguments to undo it seems rather far-fetched (ETA What’s it supposed to say? That the tool uses the same algorithms as itself but is somehow wrong in claiming that the AI in agent mode tries to brainwash the gatekeeper?).
The idea is that given a sufficiently strong AI in tool mode, it might be possible to counter any attempt to trick a gatekeeper. And in the case that the tool mode agrees, then it probably is a good idea to let the AI out of the box. Although anyone familiar with the scenario would probably rather assume a systematic error elsewhere, e.g. a misinterpretation of one’s questions by the AI in tool mode.
Ah, I see. Sure, OK, that’s apposite. Thanks for clarifying that.
I disagree with your prediction.