I interpreted you, in your previous comment, as claiming that Anthropic’s RSP is explicit in its compatibility with meaningfully pushing the frontier. Dustin is under the impression that Anthropic verbally committed otherwise. Whether or not Claude 3 pushed the frontier seems somewhat orthogonal to this question—did Anthropic commit and/or heavily imply that they weren’t going to push the frontier, and if so, does the RSP quietly contradict that commitment? My current read is that the answer to both questions is yes. If this is the case, I think that Anthropic has been pretty misleading about a crucial part of their safety plan, and this seems quite bad to me.
I think that you’re correct that Anthropic at least heavily implied that they weren’t going to “meaningfully advance” the frontier (even if they have not made any explicit commitments about this). I’d be interested in hearing when Dustin had this conversation w/ Dario—was it pre or post RSP release?
And as far as I know, the only commitments they’ve made explicitly are in their RSP, which commits to limiting their ability to scale to the rate at which they can advance and deploy safety measures. It’s unclear if the “sufficient safety measures” limitation is the only restriction on scaling, but I would be surprised if anyone senior Anthropic was willing to make a concrete unilateral commitment to stay behind the curve.
My current story based on public info is, up until mid 2022, there was indeed an intention to stay at the frontier but not push it forward significantly. This changed sometime in late 2022-early 2023, maybe after ChatGPT released and the AGI race became somewhat “hot”.
I feel some kinda missing mood in these comments. It seems like you’re saying “Anthropic didn’t make explicit commitments here”, and that you’re not weighting as particularly important whether they gave people different impressions, or benefited from that.
(AFAICT you haven’t explicitly stated “that’s not a big deal”, but, it’s the vibe I get from your comments. Is that something you’re intentionally implying, or do you think of yourself as mostly just trying to be clear on the factual claims, or something like that?)
I interpreted you, in your previous comment, as claiming that Anthropic’s RSP is explicit in its compatibility with meaningfully pushing the frontier. Dustin is under the impression that Anthropic verbally committed otherwise. Whether or not Claude 3 pushed the frontier seems somewhat orthogonal to this question—did Anthropic commit and/or heavily imply that they weren’t going to push the frontier, and if so, does the RSP quietly contradict that commitment? My current read is that the answer to both questions is yes. If this is the case, I think that Anthropic has been pretty misleading about a crucial part of their safety plan, and this seems quite bad to me.
I think that you’re correct that Anthropic at least heavily implied that they weren’t going to “meaningfully advance” the frontier (even if they have not made any explicit commitments about this). I’d be interested in hearing when Dustin had this conversation w/ Dario—was it pre or post RSP release?
And as far as I know, the only commitments they’ve made explicitly are in their RSP, which commits to limiting their ability to scale to the rate at which they can advance and deploy safety measures. It’s unclear if the “sufficient safety measures” limitation is the only restriction on scaling, but I would be surprised if anyone senior Anthropic was willing to make a concrete unilateral commitment to stay behind the curve.
My current story based on public info is, up until mid 2022, there was indeed an intention to stay at the frontier but not push it forward significantly. This changed sometime in late 2022-early 2023, maybe after ChatGPT released and the AGI race became somewhat “hot”.
I feel some kinda missing mood in these comments. It seems like you’re saying “Anthropic didn’t make explicit commitments here”, and that you’re not weighting as particularly important whether they gave people different impressions, or benefited from that.
(AFAICT you haven’t explicitly stated “that’s not a big deal”, but, it’s the vibe I get from your comments. Is that something you’re intentionally implying, or do you think of yourself as mostly just trying to be clear on the factual claims, or something like that?)