evhub comments on Anthropic AI made the right call

evhub 15 Apr 2024 20:36 UTC
5 points
−17
I think it was an honest miscommunication coupled to a game of telephone—the sort of thing that inevitably happens sometimes—but not something that I feel particularly concerned about.
- habryka 15 Apr 2024 20:57 UTC
  15 points
  −7
  Parent
  I would take pretty strong bets that that isn’t what happened based on having talked to more people about this. Happy to operationalize and then try to resolve it.
  - Akash 15 Apr 2024 23:02 UTC
    12 points
    10
    Parent
    Here are three possible scenarios:
    Scenario 1, Active Lying– Anthropic staff were actively spreading the idea that they would not push the frontier.
    Scenario 2, Allowing misconceptions to go unchecked– Anthropic staff were aware that many folks in the AIS world thought that Anthropic had committed to not pushing the frontier, and they allowed this misconception to go unchecked, perhaps because they realized that it was a misconception that favored their commercial/competitive interests.
    Scenario 3, Not being aware– Anthropic staff were not aware that many folks had this belief. Maybe they heard it once or twice but it never really seemed like a big deal.
    Scenario 1 is clearly bad. Scenarios 2 and 3 are more interesting. To what extent does Anthropic have the responsibility to clarify misconceptions (avoid scenario 2) and even actively look for misconceptions (avoid scenario 3)?
    I expect this could matter tangibly for discussions of RSPs. My opinion is that the Anthropic RSP is written in such a way that readers can come away with rather different expectations of what kinds of circumstances would cause Anthropic to pause/resume.
    It wouldn’t be very surprising to me if we end up seeing a situation where many readers say “hey look, we’ve reached an ASL-3 system, so now you’re going to pause, right?” And then Anthropic says “no no, we have sufficient safeguards– we can keep going now.” And then some readers say “wait a second– what? I’m pretty sure you committed to pausing until your safeguards were better than that.” And then Anthropic says “no… we never said exactly what kinds of safeguards we would need, and our leadership’s opinion is that our safeguards are sufficient, and the RSP allows leadership to determine when it’s fine to proceed.”
    In this (hypothetical) scenario, Anthropic never lied, but it benefitted from giving off a more cautious impression, and it didn’t take steps to correct this impression.
    I think avoiding these kinds of scenarios requires some mix of:
    Clear, specific falsifiable statements on behalf of labs.
    Some degree of proactive attempts to identify and alleviate misconceptions.
    One counterargument is something like “Anthropic is a company, and there are lots of things to do, and this is is demanding an unusually high amount of attention-to-detail and proactive communication that is not typically expected of companies.” To which my response is something like “yes, but I think it’s reasonable to hold companies to such standards if they wish to develop AGI. I think we ought to hold Anthropic and other labs to this standard, especially insofar as they want the benefits associated with being perceived as the kind of safety-conscious lab that refuses to push the frontier or commits to scaling policies that include tangible/concrete plans to pause.”