which case they’ve misled people by suggesting that they would not do this.
Neither of your examples seem super misleading to me. I feel like there was some atmosphere of “Anthropic intends to stay behind the frontier” when the actual statements were closer to “stay on the frontier”.
Also worth noting that Claude 3 does not substantially advance the LLM capabilities frontier! Aside from GPQA, it doesn’t do that much better on benchmarks than GPT-4 (and in fact does worse than gpt-4-1106-preview). Releasing models that are comparable to models OpenAI released a year ago seems compatible with “staying behind the frontier”, given OpenAI has continued its scale up and will no doubt soon release even more capable models.
That being said, I agree that Anthropic did benefit in the EA community by having this impression. So compared to the impression many EAs got from Anthropic, this is indeed a different stance.
In any case, whether or not Claude 3 already surpasses the frontier, soon will, or doesn’t, I request that Anthropic explicitly clarify whether their intention is to push the frontier.
The main (only?) limit on scaling is their ability to implement containment/safety measures for ever more dangerous models. E.g.:
That is, they won’t go faster than they can scale up safety procedures, but they’re otherwise fine pushing the frontier.
It’s worth noting that their ASL-3 commitments seem pretty likely to trigger in the next few years, and probably will be substantially difficult to meet:
If one of the effects of instituting a responsible scaling policy was that Anthropic moved from the stance of not meaningfully pushing the frontier to “it’s okay to push the frontier so long as we deem it safe,” this seems like a pretty important shift that was not well communicated. I, for one, did not interpret Anthropic’s RSP as a statement that they were now okay with advancing state of the art, nor did many others; I think that’s because the RSP did not make it clear that they were updating this position. Like, with hindsight I can see how the language in the RSP is consistent with pushing the frontier. But I think the language is also consistent with not pushing it. E.g., when I was operating under the assumption that Anthropic had committed to this, I interpreted the RSP as saying “we’re aiming to scale responsibly to the extent that we scale at all, which will remain at or behind the frontier.”
Attempting to be forthright about this would, imo, look like a clear explanation of Anthropic’s previous stance relative to the new one they were adopting, and their reasons for doing so. To the extent that they didn’t feel the need to do this, I worry that it’s because their previous stance was more of a vibe, and therefore non-binding. But if they were using that vibe to gain resources (funding, talent, etc.), then this seems quite bad to me. It shouldn’t both be the case that they benefit from ambiguity but then aren’t held to any of the consequences of breaking it. And indeed, this makes me pretty wary of other trust/deferral based support that people currently give to Anthropic. I think that if the RSP in fact indicates a departure from their previous stance of not meaningfully pushing the frontier, then this is a negative update about Anthropic holding to the spirit of their commitments.
Neither of your examples seem super misleading to me. I feel like there was some atmosphere of “Anthropic intends to stay behind the frontier” when the actual statements were closer to “stay on the frontier”.
Also worth noting that Claude 3 does not substantially advance the LLM capabilities frontier! Aside from GPQA, it doesn’t do that much better on benchmarks than GPT-4 (and in fact does worse than
gpt-4-1106-preview
). Releasing models that are comparable to models OpenAI released a year ago seems compatible with “staying behind the frontier”, given OpenAI has continued its scale up and will no doubt soon release even more capable models.That being said, I agree that Anthropic did benefit in the EA community by having this impression. So compared to the impression many EAs got from Anthropic, this is indeed a different stance.
As Evan says, I think they clarified their intentions in their RSP: https://www.anthropic.com/news/anthropics-responsible-scaling-policy
The main (only?) limit on scaling is their ability to implement containment/safety measures for ever more dangerous models. E.g.:
That is, they won’t go faster than they can scale up safety procedures, but they’re otherwise fine pushing the frontier.
It’s worth noting that their ASL-3 commitments seem pretty likely to trigger in the next few years, and probably will be substantially difficult to meet:
If one of the effects of instituting a responsible scaling policy was that Anthropic moved from the stance of not meaningfully pushing the frontier to “it’s okay to push the frontier so long as we deem it safe,” this seems like a pretty important shift that was not well communicated. I, for one, did not interpret Anthropic’s RSP as a statement that they were now okay with advancing state of the art, nor did many others; I think that’s because the RSP did not make it clear that they were updating this position. Like, with hindsight I can see how the language in the RSP is consistent with pushing the frontier. But I think the language is also consistent with not pushing it. E.g., when I was operating under the assumption that Anthropic had committed to this, I interpreted the RSP as saying “we’re aiming to scale responsibly to the extent that we scale at all, which will remain at or behind the frontier.”
Attempting to be forthright about this would, imo, look like a clear explanation of Anthropic’s previous stance relative to the new one they were adopting, and their reasons for doing so. To the extent that they didn’t feel the need to do this, I worry that it’s because their previous stance was more of a vibe, and therefore non-binding. But if they were using that vibe to gain resources (funding, talent, etc.), then this seems quite bad to me. It shouldn’t both be the case that they benefit from ambiguity but then aren’t held to any of the consequences of breaking it. And indeed, this makes me pretty wary of other trust/deferral based support that people currently give to Anthropic. I think that if the RSP in fact indicates a departure from their previous stance of not meaningfully pushing the frontier, then this is a negative update about Anthropic holding to the spirit of their commitments.