Not well; almost all of pp. 2–9 or maybe 2–13 is relevant. But here are some bits:
we commit to pause the scaling2 and/or delay the deployment of new models whenever our scaling ability outstrips our ability to comply with the safety procedures for the corresponding ASL.
. . .
Before advancing to a given ASL, the next level must be defined to create a clear boundary with a “safety buffer”.
. . .
We define an ASL-3 model as one that can either immediately, or with additional post-training techniques corresponding to less than 1% of the total training cost, do at least one of the following two things. (By post-training techniques we mean the best capabilities elicitation techniques we are aware of at the time, including but not limited to fine-tuning, scaffolding, tool use, and prompt engineering.) 1. Capabilities that significantly increase risk of misuse catastrophe. . . . 2. Autonomous replication in the lab.
. . .
[See “ASL-3 Containment Measures” and “ASL-3 Deployment Measures”]
Cool thanks. I’ve seen that you’ve edited your post. If you look at ASL-3 Containment Measures, I’d recommend considering editing away the “Yay” aswell. This post is a pretty significant goalpost moving.
While my initial understanding was that the autonomous replication would be a ceiling, this doc now made it a floor.
So in other words, this paper is proposing to keep navigating beyond levels that are considered potentially catastrophic, with less-than-military-grade cybersecurity, which makes it very likely that at least one state, and plausibly multiple states, will have access to those things.
It also means that the chances of leaking a system which is irreversibly catastrophic are probably not below 0.1%, maybe not even below 1%.
My interpretation of the excitement around the proposal is a feeling that “yay, it’s better than where we were before”. But I think it neglects heavily a few things. 1. It’s way worse than risk management 101, which is easy to push for. 2. the US population is pro-slowdown (so you can basically be way more ambitious than “responsibly scaling”) 3. an increasing share of policymakers are worried 4. self-regulation has a track record of heavily affecting hard law (either by preventing it, or by creating a template that the state can enforce. That’s the ToC that I understood from people excited by self-regulation). For instance I expect this proposal to actively harm the efforts to push for ambitious slowdowns that would let us put the probability of doom below two-digit numbers.
Can you quote the parts you’re referring to?
Not well; almost all of pp. 2–9 or maybe 2–13 is relevant. But here are some bits:
Cool thanks.
I’ve seen that you’ve edited your post. If you look at ASL-3 Containment Measures, I’d recommend considering editing away the “Yay” aswell.
This post is a pretty significant goalpost moving.
While my initial understanding was that the autonomous replication would be a ceiling, this doc now made it a floor.
So in other words, this paper is proposing to keep navigating beyond levels that are considered potentially catastrophic, with less-than-military-grade cybersecurity, which makes it very likely that at least one state, and plausibly multiple states, will have access to those things.
It also means that the chances of leaking a system which is irreversibly catastrophic are probably not below 0.1%, maybe not even below 1%.
My interpretation of the excitement around the proposal is a feeling that “yay, it’s better than where we were before”.
But I think it neglects heavily a few things.
1. It’s way worse than risk management 101, which is easy to push for.
2. the US population is pro-slowdown (so you can basically be way more ambitious than “responsibly scaling”)
3. an increasing share of policymakers are worried
4. self-regulation has a track record of heavily affecting hard law (either by preventing it, or by creating a template that the state can enforce. That’s the ToC that I understood from people excited by self-regulation). For instance I expect this proposal to actively harm the efforts to push for ambitious slowdowns that would let us put the probability of doom below two-digit numbers.
For those reasons, I wish this doc didn’t exist.