I mostly believe y’all about 2—I didn’t know 2 until people asserted it right after the Claude 3 release, but I haven’t been around the community, much less well-connected in it, for long—but that feels like an honest miscommunication to me.
For the record, I have been around the community for a long time (since before Anthropic existed), in a very involved way, and I had also basically never heard of this before the Claude 3 release. I can recall only one time where I ever heard someone mention something like this, it was a non-Anthropic person who said they heard it from someone else who was a non-Anthropic person, they asked me if I had heard the same thing, and I said no. So it certainly seems clear given all the reports that this was a real rumour that was going around, but it was definitely not the case that this was just an obvious thing that everyone in the community knew about or that Anthropic senior staff were regularly saying (I talked regularly to a lot of Anthropic senior staff before I joined Anthropic and I never heard anyone say this).
That seems concerning! Did you follow up with the leadership of your organization to understand to what degree they seem to have been making different (and plausibly contradictory) commitments to different interest groups?
It seems like it’s quite important to know what promises your organization has made to whom, if you are trying to assess whether you working there will positively or negatively effect how AI will go.
(Note, I talked with Evan about this in private some other times, so the above comment is more me bringing a private conversation into the public realm than me starting a whole conversation about this. I’ve already poked Evan privately asking him to please try to get better confirmation of the nature of the commitments made here, but he wasn’t interested at the time, so I am making the same bid publicly.)
I think it was an honest miscommunication coupled to a game of telephone—the sort of thing that inevitably happens sometimes—but not something that I feel particularly concerned about.
I would take pretty strong bets that that isn’t what happened based on having talked to more people about this. Happy to operationalize and then try to resolve it.
Scenario 1, Active Lying– Anthropic staff were actively spreading the idea that they would not push the frontier.
Scenario 2,Allowing misconceptions to go unchecked– Anthropic staff were aware that many folks in the AIS world thought that Anthropic had committed to not pushing the frontier, and they allowed this misconception to go unchecked, perhaps because they realized that it was a misconception that favored their commercial/competitive interests.
Scenario 3, Not being aware– Anthropic staff were not aware that many folks had this belief. Maybe they heard it once or twice but it never really seemed like a big deal.
Scenario 1 is clearly bad. Scenarios 2 and 3 are more interesting. To what extent does Anthropic have the responsibility to clarify misconceptions (avoid scenario 2) and even actively look for misconceptions (avoid scenario 3)?
I expect this could matter tangibly for discussions of RSPs. My opinion is that the Anthropic RSP is written in such a way that readers can come away with rather different expectations of what kinds of circumstances would cause Anthropic to pause/resume.
It wouldn’t be very surprising to me if we end up seeing a situation where many readers say “hey look, we’ve reached an ASL-3 system, so now you’re going to pause, right?” And then Anthropic says “no no, we have sufficient safeguards– we can keep going now.” And then some readers say “wait a second– what? I’m pretty sure you committed to pausing until your safeguards were better than that.” And then Anthropic says “no… we never said exactly what kinds of safeguards we would need, and our leadership’s opinion is that our safeguards are sufficient, and the RSP allows leadership to determine when it’s fine to proceed.”
In this (hypothetical) scenario, Anthropic never lied, but it benefitted from giving off a more cautious impression, and it didn’t take steps to correct this impression.
I think avoiding these kinds of scenarios requires some mix of:
Clear, specific falsifiable statements on behalf of labs.
Some degree of proactive attempts to identify and alleviate misconceptions.
One counterargument is something like “Anthropic is a company, and there are lots of things to do, and this is is demanding an unusually high amount of attention-to-detail and proactive communication that is not typically expected of companies.” To which my response is something like “yes, but I think it’s reasonable to hold companies to such standards if they wish to develop AGI. I think we ought to hold Anthropic and other labs to this standard, especially insofar as they want the benefits associated with being perceived as the kind of safety-conscious lab that refuses to push the frontier or commits to scaling policies that include tangible/concrete plans to pause.”
I have been around the community for a long time (since before Anthropic existed), in a very involved way, and I had also basically never heard of this before the Claude 3 release.… So… it was definitely not the case that this was just an obvious thing that everyone in the community knew about or that Anthropic senior staff were regularly saying
For the record a different Anthropic staffer told me confidently that it was widespread in 2022, the year before you joined, so I think you’re wrong here.
(The staffer preferred that I not quote them verbatim in public so I’ve DM’d you a direct quote.)
Summarizing from the private conversation: the information there is not new to me and I don’t think your description of what they said is accurate.
As I’ve said previously, Anthropic people certainly went around saying things like “we want to think carefully about when to do releases and try to advance capabilities for the purpose of doing safety”, but it was always extremely clear at least to me that these were not commitments, just general thoughts about strategy, and I am very confident that was what was being referred to as being widespread in 2022 here.
For the record, I have been around the community for a long time (since before Anthropic existed), in a very involved way, and I had also basically never heard of this before the Claude 3 release. I can recall only one time where I ever heard someone mention something like this, it was a non-Anthropic person who said they heard it from someone else who was a non-Anthropic person, they asked me if I had heard the same thing, and I said no. So it certainly seems clear given all the reports that this was a real rumour that was going around, but it was definitely not the case that this was just an obvious thing that everyone in the community knew about or that Anthropic senior staff were regularly saying (I talked regularly to a lot of Anthropic senior staff before I joined Anthropic and I never heard anyone say this).
That seems concerning! Did you follow up with the leadership of your organization to understand to what degree they seem to have been making different (and plausibly contradictory) commitments to different interest groups?
It seems like it’s quite important to know what promises your organization has made to whom, if you are trying to assess whether you working there will positively or negatively effect how AI will go.
(Note, I talked with Evan about this in private some other times, so the above comment is more me bringing a private conversation into the public realm than me starting a whole conversation about this. I’ve already poked Evan privately asking him to please try to get better confirmation of the nature of the commitments made here, but he wasn’t interested at the time, so I am making the same bid publicly.)
I think it was an honest miscommunication coupled to a game of telephone—the sort of thing that inevitably happens sometimes—but not something that I feel particularly concerned about.
I would take pretty strong bets that that isn’t what happened based on having talked to more people about this. Happy to operationalize and then try to resolve it.
Here are three possible scenarios:
Scenario 1, Active Lying– Anthropic staff were actively spreading the idea that they would not push the frontier.
Scenario 2, Allowing misconceptions to go unchecked– Anthropic staff were aware that many folks in the AIS world thought that Anthropic had committed to not pushing the frontier, and they allowed this misconception to go unchecked, perhaps because they realized that it was a misconception that favored their commercial/competitive interests.
Scenario 3, Not being aware– Anthropic staff were not aware that many folks had this belief. Maybe they heard it once or twice but it never really seemed like a big deal.
Scenario 1 is clearly bad. Scenarios 2 and 3 are more interesting. To what extent does Anthropic have the responsibility to clarify misconceptions (avoid scenario 2) and even actively look for misconceptions (avoid scenario 3)?
I expect this could matter tangibly for discussions of RSPs. My opinion is that the Anthropic RSP is written in such a way that readers can come away with rather different expectations of what kinds of circumstances would cause Anthropic to pause/resume.
It wouldn’t be very surprising to me if we end up seeing a situation where many readers say “hey look, we’ve reached an ASL-3 system, so now you’re going to pause, right?” And then Anthropic says “no no, we have sufficient safeguards– we can keep going now.” And then some readers say “wait a second– what? I’m pretty sure you committed to pausing until your safeguards were better than that.” And then Anthropic says “no… we never said exactly what kinds of safeguards we would need, and our leadership’s opinion is that our safeguards are sufficient, and the RSP allows leadership to determine when it’s fine to proceed.”
In this (hypothetical) scenario, Anthropic never lied, but it benefitted from giving off a more cautious impression, and it didn’t take steps to correct this impression.
I think avoiding these kinds of scenarios requires some mix of:
Clear, specific falsifiable statements on behalf of labs.
Some degree of proactive attempts to identify and alleviate misconceptions.
One counterargument is something like “Anthropic is a company, and there are lots of things to do, and this is is demanding an unusually high amount of attention-to-detail and proactive communication that is not typically expected of companies.” To which my response is something like “yes, but I think it’s reasonable to hold companies to such standards if they wish to develop AGI. I think we ought to hold Anthropic and other labs to this standard, especially insofar as they want the benefits associated with being perceived as the kind of safety-conscious lab that refuses to push the frontier or commits to scaling policies that include tangible/concrete plans to pause.”
For the record a different Anthropic staffer told me confidently that it was widespread in 2022, the year before you joined, so I think you’re wrong here.
(The staffer preferred that I not quote them verbatim in public so I’ve DM’d you a direct quote.)
Summarizing from the private conversation: the information there is not new to me and I don’t think your description of what they said is accurate.
As I’ve said previously, Anthropic people certainly went around saying things like “we want to think carefully about when to do releases and try to advance capabilities for the purpose of doing safety”, but it was always extremely clear at least to me that these were not commitments, just general thoughts about strategy, and I am very confident that was what was being referred to as being widespread in 2022 here.