This is great. Some quotes I want to come back to:
a thing that I like that both the Anthropic RSP and the ARC Evals RSP post point to is basically a series of well-operationalized conditional commitments. One way an RSP could be is to basically be a contract between AI labs and the public that concretely specifies “when X happens, then we commit to do Y”, where X is some capability threshold and Y is some pause commitment, with maybe some end condition.
instead of an RSP I would much prefer a bunch of frank interviews with Dario and Daniella where someone is like “so you think AGI has a decent chance of killing everyone, then why are you building it?”. And in-general to create higher-bandwidth channels that people can use to understand what people at leading AI labs believe about the risks from AI, and when the benefits are worth it.
It seems pretty important to me to have some sort of written down and maintained policy on “when would we stop increasing the power of our models” and “what safety interventions will we have in place for different power levels”.
I do feel like there is a substantial tension here between two different types of artifacts here:
A document that is supposed to accurately summarize what decisions the organization is expecting to make in different circumstances
A document that is supposed to bind the organization to make certain decisions in certain circumstances
Like, the current vibe that I am getting is that RSPs are a “no-take-backsies” kind of thing. You don’t get to publish an RSP saying “yeah, we aren’t planning to scale” and then later on to be like “oops, I changed my mind, we are actually going to go full throttle”.
And my guess is this is the primary reason why I expect organizations to not really commit to anything real in their RSPs and for them to not really capture what leadership of an organization thinks the tradeoffs are. Like, that’s why the Anthropic RSP has a big IOU where the actually most crucial decisions are supposed to be.
Like, here is an alternative to “RSP”s. Call them “Conditional Pause Commitments” (CPC if you are into acronyms).
Basically, we just ask AGI companies to tell us under what conditions they will stop scaling or stop otherwise trying to develop AGI. And then also some conditions under which they would resume. [Including implemented countermeasures.] Then we can critique those.
This seems like a much clearer abstraction that’s less philosophically opinionated about whether the thing is trying to be an accurate map of an organization’s future decisions, or to what degree it’s supposed to seriously commit an organization, or whether the whole thing is “responsible”.
people working at AI labs should think through and write down the conditions under which they would loudly quit (this can be done privately, but maybe should be shared between like minded employees for common knowledge). Then, people can hopefully avoid getting frog-boiled.
This is great. Some quotes I want to come back to: