I think Akash’s statement that the Anthropic RSP basically doesn’t specify any real conditions that would cause them to stop scaling seems right to me.
They have some deployment measures, which are not related to the question of when they would stop scaling, and then they have some security-related measures, but those don’t have anything to do with the behavior of the models and are the kind of thing that Anthropic can choose to do any time independent of how the facts play out.
I think Akash is right that the Anthropic RSP does concretely not answer the two questions you quote him for:
The RSP does not specify the conditions under which Anthropic would stop scaling models (it only says that in order to continue scaling it will implement some safety measures, but that’s not an empirical condition, since Anthropic is confident it can implement the listed security measures)
The RSP does not specify under what conditions Anthropic would scale to ASL-4 or beyond, though they have promised they will give those conditions.
I agree the RSP says a bunch of other things, and that there are interpretations of what Akash is saying that are inaccurate, but I do think on this (IMO most important question) the RSP seems quiet.
I do think the deployment measures are real, though I don’t currently think much of the risk comes from deploying models, so they don’t seem that relevant to me (and think the core question is what prevents organizations from scaling models up in the first place).
those don’t have anything to do with the behavior of the models and are the kind of thing that Anthropic can choose to do any time independent of how the facts play out.
I mean, they are certainly still conditions on which Anthropic would stop scaling. The sentence
the Anthropic RSP basically doesn’t specify any real conditions that would cause them to stop scaling
is clearly false. If you instead said
the Anthropic RSP doesn’t yet detail the non-security-related conditions that would cause them to stop training new models
then I would agree with you. I think it’s important to be clear here, though: the security conditions could trigger a pause all on their own, and there is a commitment to develop conditions that will halt scaling after ASL-3 by the time ASL-3 is reached.
the security conditions could trigger a pause all on their own
I don’t understand how this is possible. The RSP appendix has the list of security conditions, and they are just a checklist of things that Anthropic is planning to do and can just implement whenever they want. It’s not cheap for them to implement it, but I don’t see any real circumstance where they fail to implement the security conditions in a way that would force them to pause.
Like, I agree that some of these commitments are costly, but I don’t see how there is any world where Anthropic would like to continue scaling but finds itself incapable of doing so, which is what I would consider a “pause” to mean. Like, they can just implement their checklist of security requirements and then go ahead.
Maybe this is quibbling over semantics, but it does really feels quite qualitatively different to me. When OpenAI said that they would spend some substantial fraction of their compute on “Alignment Research” while they train their next model, I think it would be misleading to say “OpenAI has committed to conditionally pausing model scaling”.
I mean, I agree that humanity theoretically knows how to implement these sorts of security commitments, so the current conditions should always be possible for Anthropic to unblock with enough time and effort, but the commitment to the sequencing that they have to happen before Anthropic has a model that is ASL-3 means that there are situations where Anthropic commits to pause scaling until the security commitments are met. I agree with you that this is a relatively weak commitment in terms of a scaling pause, though to be fair I don’t actually think simply having (but not deploying) a just-barely-ASL-3 model poses much of a risk, so I think it does make sense from a risk-based perspective why most of the commitments are around deployment and security. That being said, even if a just-barely-ASL-3 model doesn’t pose an existential risk, so long as ASL-3 is defined only with a lower bound rather than also an upper bound, it’s obviously the case that eventually it will contain models that pose a potential existential risk, so I agree that a lot is tied up in the upcoming definition of ASL-4. Regardless, it is still the case that Anthropic has already committed to a scaling pause under certain circumstances.
Regardless, it is still the case that Anthropic has already committed to a scaling pause under certain circumstances.
I disagree that this is an accurate summary, or like, it’s only barely denotatively true but not connotatively.
I do think it’s probably best to let this discussion rest, not because it’s not important, but because I do think actually resolving this kind of semantic dispute in public comments like this is really hard, and I think it’s unlikely either of us will change their mind here, and we’ve both made our points. I appreciate you responding to my comments.
I think that there’s a reasonable chance that the current security commitments will lead Anthropic to pause scaling (though I don’t know whether Anthropic would announce publicly if they paused internally). Maybe a Manifold market on this would be a good idea.
Looks good—the only thing I would change is that I think this should probably resolve in the negative only once Anthropic has reached ASL-4, since only then will it be clear whether at any point there was a security-related pause during ASL-3.
I think Akash’s statement that the Anthropic RSP basically doesn’t specify any real conditions that would cause them to stop scaling seems right to me.
They have some deployment measures, which are not related to the question of when they would stop scaling, and then they have some security-related measures, but those don’t have anything to do with the behavior of the models and are the kind of thing that Anthropic can choose to do any time independent of how the facts play out.
I think Akash is right that the Anthropic RSP does concretely not answer the two questions you quote him for:
The RSP does not specify the conditions under which Anthropic would stop scaling models (it only says that in order to continue scaling it will implement some safety measures, but that’s not an empirical condition, since Anthropic is confident it can implement the listed security measures)
The RSP does not specify under what conditions Anthropic would scale to ASL-4 or beyond, though they have promised they will give those conditions.
I agree the RSP says a bunch of other things, and that there are interpretations of what Akash is saying that are inaccurate, but I do think on this (IMO most important question) the RSP seems quiet.
I do think the deployment measures are real, though I don’t currently think much of the risk comes from deploying models, so they don’t seem that relevant to me (and think the core question is what prevents organizations from scaling models up in the first place).
I mean, they are certainly still conditions on which Anthropic would stop scaling. The sentence
is clearly false. If you instead said
then I would agree with you. I think it’s important to be clear here, though: the security conditions could trigger a pause all on their own, and there is a commitment to develop conditions that will halt scaling after ASL-3 by the time ASL-3 is reached.
I don’t understand how this is possible. The RSP appendix has the list of security conditions, and they are just a checklist of things that Anthropic is planning to do and can just implement whenever they want. It’s not cheap for them to implement it, but I don’t see any real circumstance where they fail to implement the security conditions in a way that would force them to pause.
Like, I agree that some of these commitments are costly, but I don’t see how there is any world where Anthropic would like to continue scaling but finds itself incapable of doing so, which is what I would consider a “pause” to mean. Like, they can just implement their checklist of security requirements and then go ahead.
Maybe this is quibbling over semantics, but it does really feels quite qualitatively different to me. When OpenAI said that they would spend some substantial fraction of their compute on “Alignment Research” while they train their next model, I think it would be misleading to say “OpenAI has committed to conditionally pausing model scaling”.
I mean, I agree that humanity theoretically knows how to implement these sorts of security commitments, so the current conditions should always be possible for Anthropic to unblock with enough time and effort, but the commitment to the sequencing that they have to happen before Anthropic has a model that is ASL-3 means that there are situations where Anthropic commits to pause scaling until the security commitments are met. I agree with you that this is a relatively weak commitment in terms of a scaling pause, though to be fair I don’t actually think simply having (but not deploying) a just-barely-ASL-3 model poses much of a risk, so I think it does make sense from a risk-based perspective why most of the commitments are around deployment and security. That being said, even if a just-barely-ASL-3 model doesn’t pose an existential risk, so long as ASL-3 is defined only with a lower bound rather than also an upper bound, it’s obviously the case that eventually it will contain models that pose a potential existential risk, so I agree that a lot is tied up in the upcoming definition of ASL-4. Regardless, it is still the case that Anthropic has already committed to a scaling pause under certain circumstances.
I disagree that this is an accurate summary, or like, it’s only barely denotatively true but not connotatively.
I do think it’s probably best to let this discussion rest, not because it’s not important, but because I do think actually resolving this kind of semantic dispute in public comments like this is really hard, and I think it’s unlikely either of us will change their mind here, and we’ve both made our points. I appreciate you responding to my comments.
I think that there’s a reasonable chance that the current security commitments will lead Anthropic to pause scaling (though I don’t know whether Anthropic would announce publicly if they paused internally). Maybe a Manifold market on this would be a good idea.
That seems cool! I made a market here:
Feel free to suggest edits about the operationalization or other things before people start trading.
Looks good—the only thing I would change is that I think this should probably resolve in the negative only once Anthropic has reached ASL-4, since only then will it be clear whether at any point there was a security-related pause during ASL-3.
That seems reasonable. Edited the description (I can’t change when trading on the market closes, but I think that should be fine).