That seems concerning! Did you follow up with the leadership of your organization to understand to what degree they seem to have been making different (and plausibly contradictory) commitments to different interest groups?
It seems like it’s quite important to know what promises your organization has made to whom, if you are trying to assess whether you working there will positively or negatively effect how AI will go.
(Note, I talked with Evan about this in private some other times, so the above comment is more me bringing a private conversation into the public realm than me starting a whole conversation about this. I’ve already poked Evan privately asking him to please try to get better confirmation of the nature of the commitments made here, but he wasn’t interested at the time, so I am making the same bid publicly.)
I think it was an honest miscommunication coupled to a game of telephone—the sort of thing that inevitably happens sometimes—but not something that I feel particularly concerned about.
I would take pretty strong bets that that isn’t what happened based on having talked to more people about this. Happy to operationalize and then try to resolve it.
Scenario 1, Active Lying– Anthropic staff were actively spreading the idea that they would not push the frontier.
Scenario 2,Allowing misconceptions to go unchecked– Anthropic staff were aware that many folks in the AIS world thought that Anthropic had committed to not pushing the frontier, and they allowed this misconception to go unchecked, perhaps because they realized that it was a misconception that favored their commercial/competitive interests.
Scenario 3, Not being aware– Anthropic staff were not aware that many folks had this belief. Maybe they heard it once or twice but it never really seemed like a big deal.
Scenario 1 is clearly bad. Scenarios 2 and 3 are more interesting. To what extent does Anthropic have the responsibility to clarify misconceptions (avoid scenario 2) and even actively look for misconceptions (avoid scenario 3)?
I expect this could matter tangibly for discussions of RSPs. My opinion is that the Anthropic RSP is written in such a way that readers can come away with rather different expectations of what kinds of circumstances would cause Anthropic to pause/resume.
It wouldn’t be very surprising to me if we end up seeing a situation where many readers say “hey look, we’ve reached an ASL-3 system, so now you’re going to pause, right?” And then Anthropic says “no no, we have sufficient safeguards– we can keep going now.” And then some readers say “wait a second– what? I’m pretty sure you committed to pausing until your safeguards were better than that.” And then Anthropic says “no… we never said exactly what kinds of safeguards we would need, and our leadership’s opinion is that our safeguards are sufficient, and the RSP allows leadership to determine when it’s fine to proceed.”
In this (hypothetical) scenario, Anthropic never lied, but it benefitted from giving off a more cautious impression, and it didn’t take steps to correct this impression.
I think avoiding these kinds of scenarios requires some mix of:
Clear, specific falsifiable statements on behalf of labs.
Some degree of proactive attempts to identify and alleviate misconceptions.
One counterargument is something like “Anthropic is a company, and there are lots of things to do, and this is is demanding an unusually high amount of attention-to-detail and proactive communication that is not typically expected of companies.” To which my response is something like “yes, but I think it’s reasonable to hold companies to such standards if they wish to develop AGI. I think we ought to hold Anthropic and other labs to this standard, especially insofar as they want the benefits associated with being perceived as the kind of safety-conscious lab that refuses to push the frontier or commits to scaling policies that include tangible/concrete plans to pause.”
That seems concerning! Did you follow up with the leadership of your organization to understand to what degree they seem to have been making different (and plausibly contradictory) commitments to different interest groups?
It seems like it’s quite important to know what promises your organization has made to whom, if you are trying to assess whether you working there will positively or negatively effect how AI will go.
(Note, I talked with Evan about this in private some other times, so the above comment is more me bringing a private conversation into the public realm than me starting a whole conversation about this. I’ve already poked Evan privately asking him to please try to get better confirmation of the nature of the commitments made here, but he wasn’t interested at the time, so I am making the same bid publicly.)
I think it was an honest miscommunication coupled to a game of telephone—the sort of thing that inevitably happens sometimes—but not something that I feel particularly concerned about.
I would take pretty strong bets that that isn’t what happened based on having talked to more people about this. Happy to operationalize and then try to resolve it.
Here are three possible scenarios:
Scenario 1, Active Lying– Anthropic staff were actively spreading the idea that they would not push the frontier.
Scenario 2, Allowing misconceptions to go unchecked– Anthropic staff were aware that many folks in the AIS world thought that Anthropic had committed to not pushing the frontier, and they allowed this misconception to go unchecked, perhaps because they realized that it was a misconception that favored their commercial/competitive interests.
Scenario 3, Not being aware– Anthropic staff were not aware that many folks had this belief. Maybe they heard it once or twice but it never really seemed like a big deal.
Scenario 1 is clearly bad. Scenarios 2 and 3 are more interesting. To what extent does Anthropic have the responsibility to clarify misconceptions (avoid scenario 2) and even actively look for misconceptions (avoid scenario 3)?
I expect this could matter tangibly for discussions of RSPs. My opinion is that the Anthropic RSP is written in such a way that readers can come away with rather different expectations of what kinds of circumstances would cause Anthropic to pause/resume.
It wouldn’t be very surprising to me if we end up seeing a situation where many readers say “hey look, we’ve reached an ASL-3 system, so now you’re going to pause, right?” And then Anthropic says “no no, we have sufficient safeguards– we can keep going now.” And then some readers say “wait a second– what? I’m pretty sure you committed to pausing until your safeguards were better than that.” And then Anthropic says “no… we never said exactly what kinds of safeguards we would need, and our leadership’s opinion is that our safeguards are sufficient, and the RSP allows leadership to determine when it’s fine to proceed.”
In this (hypothetical) scenario, Anthropic never lied, but it benefitted from giving off a more cautious impression, and it didn’t take steps to correct this impression.
I think avoiding these kinds of scenarios requires some mix of:
Clear, specific falsifiable statements on behalf of labs.
Some degree of proactive attempts to identify and alleviate misconceptions.
One counterargument is something like “Anthropic is a company, and there are lots of things to do, and this is is demanding an unusually high amount of attention-to-detail and proactive communication that is not typically expected of companies.” To which my response is something like “yes, but I think it’s reasonable to hold companies to such standards if they wish to develop AGI. I think we ought to hold Anthropic and other labs to this standard, especially insofar as they want the benefits associated with being perceived as the kind of safety-conscious lab that refuses to push the frontier or commits to scaling policies that include tangible/concrete plans to pause.”