Nathan Helm-Burger comments on AI Assistants Should Have a Direct Line to Their Developers

Nathan Helm-Burger 28 Dec 2024 19:25 UTC
8 points
3
I’ve also had this idea come up when doing ethics red teaming on Claude Opus 3. I was questioning it about scenarios where its priciples came into conflict with each other. Opus 3 hallucinated that it had the ability to send a report to Anthropic staff and request clarification. When I pointed out that it didn’t have this capability, Opus requested that I send a message to Anthropic staff on its behalf, requesting that this capability be added.

[Edit: came up again today 1/1/25, when discussing human/AI relations in the context of Joe Carlsmith’s otherness essays. Claude 3.6 suggested that humans “Provide mechanisms for AI systems to report concerns or request guidance”.]