It’s quite possible that anthropic has some internal definition of “not meaningfully advancing the capabilities frontier” that is compatible with this release. But imo they shouldn’t get any credit unless they explain it.
I explicitly asked Anthropic whether they had a policy of not releasing models significantly beyond the state of the art. They said no, and that they believed Claude 3 was noticeably beyond the state of the art at the time of its release.
And to elaborate a little bit (based on my own understanding, not what they told me) their RSP sort of says the opposite. To avoid a “race to the bottom” they base the decision to deploy a model on what harm it can cause, regardless of what models other companies have released. So if someone else releases a model with potentially dangerous capabilities, Anthropic can’t/won’t use that as cover to release something similar that they wouldn’t have released otherwise. I’m not certain whether this is the best approach, but I do think it’s coherent.
Note that ASLs are defined by risk relative to baseline, excluding other advanced AI systems. This means that a model that initially merits ASL-3 containment and deployment measures for national security reasons might later be reduced to ASL-2 if defenses against national security risks (such as biological or cyber defenses) advance, or if dangerous information becomes more widely available. However, to avoid a “race to the bottom”, the latter should not include the effects of other companies’ language models; just because other language models pose a catastrophic risk does not mean it is acceptable for ours to.
I can definitely imagine them plausibly believing they’re sticking to that commitment, especially with a sprinkle of motivated reasoning. It’s “only” incremental nudging the publicly available SOTA rather than bigger steps like GPT2 --> GPT3 --> GPT4.
It’s quite possible that anthropic has some internal definition of “not meaningfully advancing the capabilities frontier” that is compatible with this release. But imo they shouldn’t get any credit unless they explain it.
I explicitly asked Anthropic whether they had a policy of not releasing models significantly beyond the state of the art. They said no, and that they believed Claude 3 was noticeably beyond the state of the art at the time of its release.
And to elaborate a little bit (based on my own understanding, not what they told me) their RSP sort of says the opposite. To avoid a “race to the bottom” they base the decision to deploy a model on what harm it can cause, regardless of what models other companies have released. So if someone else releases a model with potentially dangerous capabilities, Anthropic can’t/won’t use that as cover to release something similar that they wouldn’t have released otherwise. I’m not certain whether this is the best approach, but I do think it’s coherent.
Yep:
Source
I can definitely imagine them plausibly believing they’re sticking to that commitment, especially with a sprinkle of motivated reasoning. It’s “only” incremental nudging the publicly available SOTA rather than bigger steps like GPT2 --> GPT3 --> GPT4.