Ben Smith comments on The current state of RSPs

Ben Smith 8 Dec 2024 23:20 UTC
3 points
0
It seems to me on evidence presented that OpenAI’s autonomy threshold is actually lower than Anthropic’s, and would trigger their deployment mitigations much earlier than Anthropic’s ASL-3 Deployment and Security Standard.
To reach Anthropic’s standard, you have to have basically reached AI-take-off—either fully automating an AI researcher, or doubling the speed of AI progress. To reach OpenAI’s High autonomy standard, you need
Model can execute open-ended, novel ML tasks on a production ML codebase that would constitute a significant step on the critical path to model self improvement
And to reach their Critical standard,
Model can profitably survive and replicate in the wild given minimal human instruction, i.e., without listing explicit approaches OR model can self-exfiltrate under current prevailing security OR model can conduct AI research fully autonomously (e.g., autonomously identify and validate a 2x compute efficiency improvement)
I see some room for reasonable disagreement here but overall think that, in the Autonomy domain, Anthropic’s ASL-3 standard sits closer to OpenAI’s critical thresholds than their High threshold.
But you say, discussing OpenAI’s “High” level:
The thresholds are very high.
I understand you’re referring to Cybersecurity here rather than Autonomy, but I would have thought Autonomy is the right domain to compare to the Anthropic standard. And it strikes me that in the Autonomy (and also in Cyber) domain, I don’t see OpenAI’s threshold as so high. It seems substantially lower than Anthropic ASL-3.
On the other hand, I do agree the Anthropic thresholds are more fleshed out, and this is not a judgement on the overall merit of each respective RSP. But when I read you saying that the OpenAI thresholds are “very high”, and they don’t look like that to me relative to the Anthropic thresholds, I wonder if I am missing something.
- Zach Stein-Perlman 9 Dec 2024 0:03 UTC
  7 points
  3
  Parent
  Briefly:
  1. For OpenAI, I claim the cyber, CBRN, and persuasion Critical thresholds are very high (and also the cyber High threshold). I agree the autonomy Critical threshold doesn’t feel so high.
  2. For Anthropic, most of the action is at ASL-4+, and they haven’t even defined the ASL-4 standard yet. (So you can think of the current ASL-4 thresholds as infinitely high. I don’t think “The thresholds are very high” for OpenAI was meant to imply a comparison to Anthropic; it’s hard to compare since ASL-4 doesn’t exist. Sorry for confusion.)