Akash comments on The Checklist: What Succeeding at AI Safety Will Involve

Akash 3 Sep 2024 22:51 UTC
30 points
6
I liked this post (and think it’s a lot better than official comms from Anthropic.) Some things I appreciate about this post:
Presenting a useful heuristic for RSPs
Relatedly, we should aim to pass what I call the LeCun Test: Imagine another frontier AI developer adopts a copy of our RSP as binding policy and entrusts someone who thinks that AGI safety concerns are mostly bullshit to implement it. If the RSP is well-written, we should still be reassured that the developer will behave safely—or, at least, if they fail, we should be confident that they’ll fail in a very visible and accountable way.
Acknowledging the potential for a pause
For our RSP commitments to function in a worst-case scenario where making TAI systems safe is extremely difficult, we’ll need to be able to pause the development and deployment of new frontier models until we have developed adequate safeguards, with no guarantee that this will be possible on any particular timeline. This could lead us to cancel or dramatically revise major deployments. Doing so will inevitably be costly and could risk our viability in the worst cases, but big-picture strategic preparation could make the difference between a fatal blow to our finances and morale and a recoverable one. More fine-grained tactical preparation will be necessary for us to pull this off as quickly as may be necessary without hitting technical or logistical hiccups.
Sam wants Anthropic to cede decision-making to governments at some point
[At ASL-5] Governments and other important organizations will likely be heavily invested in AI outcomes, largely foreclosing the need for us to make major decisions on our own. By this point, in most possible worlds, the most important decisions that the organization is going to make have already been made. I’m not including any checklist items below, because we hope not to have any.
Miscellenaous things I like
- Generally just providing a detailed overview of “the big picture”– how Sam actually sees Anthropic’s work potentially contributing to good outcomes. And not sugarcoating what’s going on– being very explicit about the fact that these systems are going to become catastrophically dangerous, and EG “If we haven’t succeeded decisively on the big core safety challenges by this point, there’s so much happening so fast and with such high stakes that we are unlikely to be able to recover from major errors now.”
- Striking a tone that feels pretty serious/straightforward/sober. (In contrast, many Anthropic comms have a vibe of “I am a corporation trying to sell you on the fact that I am a Good Guy.”)
Some limitations
- “Nothing here is a firm commitment on behalf of Anthropic.”
- Not much about policy or government involvement, besides a little bit about scary demos. (To be fair, Sam is a technical person.Though I think the “I’m just a technical person, I’m going to leave policy to the policy people” attitude is probably bad, especially for technical people who are thinking/writing about macrostratgy.)
- Not much about race dynamics, how to make sure other labs do this, whether Anthropic would actually do things that are costly or if race dynamics would just push them to cut corners. (Pretty similar to the previous concern but a more specific set of worries.)
- Still not very clear what kinds of evidence would be useful for establishing safety or establishing risk. Similarly, not very clear what kinds of evidence would trigger Sam to think that Anthropic should pause or should EG invest ~all of its capital into getting governments to pause. (To be fair, no one really has great/definitive answers on this. But on the other hand, I think it’s useful for people to start spelling out best-guesses RE what this would involve & just acknowledge that our ideas will hopefully get better over time.)
All in all, I think this is an impressive post and I applaud Sam for writing it.