Akash comments on evhub’s Shortform

Akash Dec 30, 2024, 9:01 PM
23 points
5
I’m glad you’re doing this, and I support many of the ideas already suggested. Some additional ideas:
- Interview program. Work with USAISI or UKAISI (or DHS/NSA) to pilot an interview program in which officials can ask questions about AI capabilities, safety and security threats, and national security concerns. (If it’s not feasible to do this with a government entity yet, start a pilot with a non-government group– perhaps METR, Apollo, Palisade, or the new AI Futures Project.)
- Clear communication about RSP capability thresholds. I think the RSP could do a better job at outlining the kinds of capabilities that Anthropic is worried about and what sorts of thresholds would trigger a reaction. I think the OpenAI preparedness framework tables are a good example of this kind of clear/concise communication. It’s easy for a naive reader to quickly get a sense of “oh, this is the kind of capability that OpenAI is worried about.” (Clarification: I’m not suggesting that Anthropic should abandon the ASL approach or that OpenAI has necessarily identified the right capability thresholds. I’m saying that the tables are a good example of the kind of clarity I’m looking for– someone could skim this and easily get a sense of what thresholds OpenAI is tracking, and I think OpenAI’s PF currently achieves this much more than the Anthropic RSP.)
- Emergency protocols. Publishing an emergency protocol that specifies how Anthropic would react if it needed to quickly shut down a dangerous AI system. (See some specific prompts in the “AI developer emergency response protocol” section here). Some information can be redacted from a public version (I think it’s important to have a public version, though, partly to help government stakeholders understand how to handle emergency scenarios, partly to raise the standard for other labs, and partly to acquire feedback from external groups.)
- RSP surveys. Evaluate the extent to which Anthropic employees understand the RSP, their attitudes toward the RSP, and how the RSP affects their work. More on this here.
- More communication about Anthropic’s views about AI risks and AI policy. Some specific examples of hypothetical posts I’d love to see:
  - “How Anthropic thinks about misalignment risks”
  - “What the world should do if the alignment problem ends up being hard”
  - “How we plan to achieve state-proof security before AGI”
  - Encouraging more employees to share their views on various topics, EG Sam Bowman’s post.
- AI dialogues/debates. It would be interesting to see Anthropic employees have discussions/debates from other folks thinking about advanced AI. Hypothetical examples:
  - “What are the best things the US government should be doing to prepare for advanced AI” with Jack Clark and Daniel Kokotajlo.
  - “Should we have a CERN for AI?” with [someone from Anthropic] and Miles Brundage.
  - “How difficult should we expect alignment to be” with [someone from Anthropic] and [someone who expects alignment to be harder; perhaps Jeffrey Ladish or Malo Bourgon].
More ambitiously, I feel like I don’t really understand Anthropic’s plan for how to manage race dynamics in worlds where alignment ends up being “hard enough to require a lot more than RSPs and voluntary commitments.”
From a policy standpoint, several of the most interesting open questions seem to be along the lines of “under what circumstances should the USG get considerably more involved in overseeing certain kinds of AI development” and “conditional on the USG wanting to get way more involved, what are the best things for it to do?” It’s plausible that Anthropic is limited in how much work it could do on these kinds of questions (particularly in a public way). Nonetheless, it could be interesting to see Anthropic engage more with questions like the ones Miles raises here.