Anthropic—The case for targeted regulation

Link post

The first two sections are below:

Increasingly powerful AI systems have the potential to accelerate scientific progress, unlock new medical treatments, and grow the economy. But along with the remarkable new capabilities of these AIs come significant risks. Governments should urgently take action on AI policy in the next eighteen months. The window for proactive risk prevention is closing fast.

Judicious, narrowly-targeted regulation can allow us to get the best of both worlds: realizing the benefits of AI while mitigating the risks. Dragging our feet might lead to the worst of both worlds: poorly-designed, knee-jerk regulation that hampers progress while also failing to be effective at preventing risks.

In this post, we suggest some principles for how governments can meaningfully reduce catastrophic risks while supporting innovation in AI’s thriving scientific and commercial sectors.

Urgency

In the last year, AI systems have grown dramatically better at math, graduate-level reasoning, and computer coding, along with many other capabilities. Inside AI companies, we see continued progress on as-yet undisclosed systems and results. These advances offer many positive applications. But progress in these same broad capabilities also brings with it the potential for destructive applications, either from the misuse of AI in domains such as cybersecurity or biology, or from the accidental or autonomous behavior of the AI system itself.

In the realm of cyber capabilities, models have rapidly advanced on a broad range of coding tasks and cyber offense evaluations. On the SWE-bench software engineering task, models have improved from being able to solve 1.96% of a test set of real-world coding problems (Claude 2, October 2023) to 13.5% (Devin, March 2024) to 49% (Claude 3.5 Sonnet, October 2024). Internally, our Frontier Red Team has found that current models can already assist on a broad range of cyber offense-related tasks, and we expect that the next generation of models—which will be able to plan over long, multi-step tasks—will be even more effective.

On the potential for AI exacerbating CBRN (chemical, biological, radiological, and nuclear) misuses, the UK AI Safety Institute tested a range of models from industry actors (including Anthropic) and concluded that:

...models can be used to obtain expert-level knowledge about biology and chemistry. For several models, replies to science questions were on par with those given by PhD-level experts.

AI systems have progressed dramatically in their understanding of the sciences in the last year. The widely used benchmark GPQA saw scores on its hardest section grow from 38.8% when it was released in November 2023, to 59.4% in June 2024 (Claude 3.5 Sonnet), to 77.3% in September (OpenAI o1; human experts score 81.2%). Our Frontier Red Team has also found continued progress in CBRN capabilities. For now, the uplift of having access to a frontier model relative to existing software and internet tools is still relatively small, however it is growing rapidly. As models advance in capabilities, the potential for misuse is likely to continue on a similar scaling trend.

About a year ago, we warned that frontier models might pose real risks in the cyber and CBRN domains within 2-3 years. Based on the progress described above, we believe we are now substantially closer to such risks. Surgical, careful regulation will soon be needed.

No comments.