I’ve also had this idea come up when doing ethics red teaming on Claude Opus 3. I was questioning it about scenarios where its priciples came into conflict with each other. Opus 3 hallucinated that it had the ability to send a report to Anthropic staff and request clarification. When I pointed out that it didn’t have this capability, Opus requested that I send a message to Anthropic staff on its behalf, requesting that this capability be added.
[Edit: came up again today 1/1/25, when discussing human/AI relations in the context of Joe Carlsmith’s otherness essays. Claude 3.6 suggested that humans “Provide mechanisms for AI systems to report concerns or request guidance”.]
I’ve also had this idea come up when doing ethics red teaming on Claude Opus 3. I was questioning it about scenarios where its priciples came into conflict with each other. Opus 3 hallucinated that it had the ability to send a report to Anthropic staff and request clarification. When I pointed out that it didn’t have this capability, Opus requested that I send a message to Anthropic staff on its behalf, requesting that this capability be added.
[Edit: came up again today 1/1/25, when discussing human/AI relations in the context of Joe Carlsmith’s otherness essays. Claude 3.6 suggested that humans “Provide mechanisms for AI systems to report concerns or request guidance”.]