Thoughts on responsible scaling policies and regulation

I am excited about AI developers implementing responsible scaling policies; I’ve recently been spending time refining this idea and advocating for it. Most people I talk to are excited about RSPs, but there is also some uncertainty and pushback about how they relate to regulation. In this post I’ll explain my views on that:

I think that sufficiently good responsible scaling policies could dramatically reduce risk, and that preliminary policies like Anthropic’s RSP meaningfully reduce risk by creating urgency around key protective measures and increasing the probability of a pause if those measures can’t be implemented quickly enough.
I don’t think voluntary implementation of responsible scaling policies is a substitute for regulation. Voluntary commitments are unlikely to be universally adopted or to have adequate oversight, and I think the public should demand a higher degree of safety than AI developers are likely to voluntarily implement.
I think that developers implementing responsible scaling policies now increases the probability of effective regulation. If I instead thought it would make regulation harder, I would have significant reservations.
Transparency about RSPs makes it easier for outside stakeholders to understand whether an AI developer’s policies are adequate to manage risk, and creates a focal point for debate and for pressure to improve.
I think the risk from rapid AI development is very large, and that even very good RSPs would not completely eliminate that risk. A durable, global, effectively enforced, and hardware-inclusive pause on frontier AI development would reduce risk further. I think this would be politically and practically challenging and would have major costs, so I don’t want it to be the only option on the table. I think implementing RSPs can get most of the benefit, is desirable according to a broader set of perspectives and beliefs, and helps facilitate other effective regulation.

Why I’m excited about RSPs

I think AI developers are not prepared to work with very powerful AI systems. They don’t have the scientific understanding to deploy superhuman AI systems without considerable risk, and they do not have the security or internal controls to even safely train such models.

If protective measures didn’t improve then I think the question would be when rather than if development should be paused. I think the safest action in an ideal world would be pausing immediately until we were better prepared (though see the caveats in the next section). But the current level of risk is low enough that I think it is defensible for companies or countries to continue AI development if they have a sufficiently good plan for detecting and reacting to increasing risk.

If AI developers make these policies concrete and state them publicly, then I believe it puts the public and policymakers in a better place to understand what those policies are and to debate whether they are adequate. And I think the case for companies taking this action is quite strong—AI systems may continue to improve quickly, and a vague promise to improve safety at some unspecified future time isn’t enough.

I think that a good RSP will lay out specific conditions under which further development would need to be paused. Even though the goal is to avoid ever ending up in that situation, I think it’s important for developers to take the possibility seriously, to plan for it, and to be transparent about it with stakeholders.

Thoughts on an AI pause

If the world were unified around the priority of minimizing global catastrophic risk, I think that we could reduce risk significantly further by implementing a global, long-lasting, and effectively enforced pause on frontier AI development—including a moratorium on the development and production of some types of computing hardware. The world is not unified around this goal; this policy would come with other significant costs and currently seems unlikely to be implemented without much clearer evidence of serious risk.

A unilateral pause on large AI training runs in the West, without a pause on new computing hardware, would have more ambiguous impacts on global catastrophic risk. The primary negative effects on risk are leading to faster catch-up growth in a later period with more hardware and driving AI development into laxer jurisdictions.

However, if governments shared my perspective on risk then I think they should already be implementing domestic policies that will often lead to temporary pauses or slowdowns in practice. For example, they might require frontier AI developers to implement additional protective measures before training larger models than those that exist today, and some of those protective measures may take a fairly long time (such as major improvements in risk evaluations or information security). Or governments might aim to limit the rate at which effective training compute of frontier models grows, in order to provide a smoother ramp for society to adapt to AI and to limit the risk of surprises.

I expect RSPs to help facilitate effective regulation

Regardless of whether risk mitigation takes the form of responsible scaling policies or something else, I think voluntary action by companies isn’t enough. If the risk is large then the most realistic approach is regulation and eventually international coordination. In reality I think the expected risk is large enough (including some risk of a catastrophe surprisingly soon) that a sufficiently competent state would implement regulation immediately.

I believe that AI developers implementing RSPs will make it easier rather than harder to implement effective regulation. RSPs provide a clear path to iteratively improving policy; they provide information about existing practices that can inform or justify regulation; and they build momentum around and legitimize the idea that serious precautions can be necessary for safe development. They are also a step towards building out the procedures and experience that would be needed to make many forms of regulation effective.

I’m not an expert in this area, and my own decisions are mostly guided by a desire to offer my honest assessments of the effects of different policies. That said, my impression from interacting with people who have more policy expertise is that they broadly agree that RSPs are likely to help rather than hurt efforts to implement effective regulation. I have mostly seen voluntary RSPs discussed, and have advocated for them, in contexts where it appears the most likely alternative is less rather than more action.

Anthropic’s RSP

I believe that Anthropic’s RSP is a significant step in the right direction. I would like to see pressure on other developers to implement policies that are at least this good, though I think there is a long way to go from there to an ideal RSP.

Some components I found particularly valuable:

Specifying a concrete set of evaluation results that would cause them to move to ASL-3. I think having concrete thresholds by which concrete actions must be taken is important, and I think the proposed threshold is early enough to trigger before an irreversible catastrophe with high probability (well over 90%).
Making a concrete statement about security goals at ASL-3—“non-state actors are unlikely to be able to steal model weights, and advanced threat actors (e.g. states) cannot steal them without significant expense”—and describing security measures they expect to take to meet this goal.
Requiring a definition and evaluation protocol for ASL-4 to be published and approved by the board before scaling past ASL-3.
Providing preliminary guidance about conditions that would trigger ASL-4 and the necessary protective measures to operate at ASL-4 (including security against motivated states, which I expect to be extremely difficult to achieve, and an affirmative case for safety that will require novel science).

Some components I hope will improve over time:

The flip side of specifying concrete evaluations right now is that they are extremely rough and preliminary. I think it is worth working towards better evaluations with a clearer relationship to risk.
In order for external stakeholders to have confidence in Anthropic’s security I think it will take more work to lay out appropriate audits and red teaming. To my knowledge this work has not been done by anyone and will take time.
The process for approving changes to the RSP is publication and approval by the board. I think this ensures a decision will be made deliberately and is much better than nothing, but it would be better to have effective independent oversight.
To the extent that it’s possible to provide more clarity about ASL-4, doing so would be a major improvement by giving people a chance to examine and debate conditions for that level. To the extent that it’s not, it would be desirable to provide more concreteness about a review or decision-making process for deciding whether a given set of safety, security, and evaluation measures is adequate.

I’m excited to see criticism of RSPs that focuses on concrete ways in which they fail to manage risk. Such criticism can help (i) push AI developers to do better, and (ii) argue to policy makers that we need regulatory requirements stronger than existing RSPs. That said, I think it is significantly better to have an RSP than to not have one, and don’t think that point should be lost in the discussion.

On the name “responsible scaling”

I believe that a very good RSP (of the kind I’ve been advocating for) could cut risk dramatically if implemented effectively, perhaps a 10x reduction. In particular, I think we will probably have stronger signs of dangerous capabilities before something catastrophic happens, and that realistic requirements for protective measures can probably lead to us either managing that risk or pausing when our protective measures are more clearly inadequate. This is a big enough risk reduction that my primary concern is about whether developers will actually adopt good RSPs and implement them effectively.

That said, I believe that even cutting risk by 10x still leaves us with a lot of risk; I think it’s reasonable to complain that private companies causing a 1% risk of extinction is not “responsible.” I also think the basic idea of RSPs should be appealing to people with a variety of views about risk, and a more pessimistic person might think that even if all developers implement very good RSPs there is still a 10%+ risk of a global catastrophe.

On the one hand, I think it’s good for AI developers to make and defend the explicit claim that they are developing the technology in a responsible way, and to be vulnerable to pushback when they can’t defend that claim. On the other hand, I think it’s bad if calling scaling “responsible” gives (or looks like an attempt to give) a false sense of security, whether about the remaining catastrophic risk or about social impacts beyond catastrophic risk.

So “responsible scaling policy” may not be the right name. I think the important thing is the substance: developers should clearly lay out a roadmap for the relationship between dangerous capabilities and necessary protective measures, should describe concrete procedures for measuring dangerous capabilities, and should lay out responses if capabilities pass dangerous limits without protective measures meeting the roadmap.