Zac Hatfield-Dodds comments on Anthropic: Reflections on our Responsible Scaling Policy

Zac Hatfield-Dodds 21 May 2024 16:35 UTC
6 points
2

What about whistleblowing or anonymous reporting to governments? If an Anthropic employee was so concerned about RSP implementation (or more broadly about models that had the potential to cause major national or global security threats), where would they go in the status quo?

That really seems more like a question for governments than for Anthropic! For example, the SEC or IRS whistleblower programs operate regardless of what companies puport to “allow”, and I think it’d be cool if the AISI had something similar.

If I was currently concerned about RSP implementation per se (I’m not), it’s not clear why the government would get involved in a matter of voluntary commitments by a private organization. If there was some concern touching on the White House committments, Bletchley declaration, Seoul declaration, etc., then I’d look up the appropriate monitoring body; if in doubt the Commerce whistleblower office or AISI seem like reasonable starting points.
- Akash 21 May 2024 16:46 UTC
  2 points
  0
  Parent
  That really seems more like a question for governments than for Anthropic
  +1. I do want governments to take this question seriously. It seems plausible to me that Anthropic (and other labs) could play an important role in helping governments improve its ability to detect/process information about AI risks, though.
  it’s not clear why the government would get involved in a matter of voluntary commitments by a private organization
  Makes sense. I’m less interested in a reporting system that’s like “tell the government that someone is breaking an RSP” and more interested in a reporting system that’s like “tell the government if you are worried about an AI-related national security risk, regardless of whether or not this risk is based on a company breaking its voluntary commitments.”
  My guess is that existing whistleblowing programs are the best bet right now, but it’s unclear to me whether they are staffed by people who understand AI risks well enough to know how to interpret/process/escalate such information (assuming the information ought to be escalated).