Anthropic AI made the right call
I’ve seen a number of people criticize Anthropic for releasing Claude 3 Opus, with arguments along the lines of:
Anthropic said they weren’t going to push the frontier, but this release is clearly better than GPT-4 in some ways! They’re betraying their mission statement!
I think that criticism takes too narrow a view. Consider the position of investors in AI startups. If OpenAI has a monopoly on the clearly-best version of a world-changing technology, that gives them a lot of pricing power on a large market. However, if there are several groups with comparable products, investors don’t know who the winner will be, and investment gets split between them. Not only that, but if they stay peers, then there will be more competition in the future, meaning less pricing power and less profitability.
The comparison isn’t just “GPT-4 exists” vs “GPT-4 and Claude Opus exist”—it’s more like “investors give X billion dollars to OpenAI” vs “investors give X/3 billion dollars to OpenAI and Anthropic”.
Now, you could argue that “more peer-level companies makes an agreement to stop development less likely”—but that wasn’t happening anyway, so any pauses would be driven by government action. If Anthropic was based in a country that previously had no notable AI companies, maybe that would be a reasonable argument, but it’s not.
If you’re concerned about social problems from widespread deployment of LLMs, maybe you should be unhappy about more good LLMs and more competition. But if you’re concerned about ASI, especially if you’re only concerned about future developments and not LLM hacks like BabyAGI, I think you should be happy about Anthropic releasing Claude 3 Opus.
Did you not read the discussion from when this came out? Your post doesn’t engage with the primary criticism which is about Anthropic staff lying about the company’s plans, including the evidence suggesting the CEO lied to one of their investors leaving the investor with the belief that Anthropic had a “commitment to not meaningfully advance the frontier with a launch”. In contrast to the private communications with an investor, here’s what Anthropic’s launch post says of their new model (emphasis added):
I’m coming to this late, but this seems weird. Do I understand correctly that many people were saying that Anthropic, the AI research company, had committed never to advance the state of the art of AI research, and they believed Anthropic would follow this commitment? That is just… really implausible.
This is the sort of commitment which very few individuals are psychologically capable of keeping, and which ~zero commercial organizations of more than three or four people are institutionally capable of keeping, assuming they actually do have the ability to advance the state of the art. I don’t know whether Anthropic leadership ever said they would do this, and if they said it then I don’t know whether they meant it earnestly. But even imagining they said it and meant it earnestly there is just no plausible world in which a company with hundreds of staff and billions of dollars of commercial investment would keep this commitment for very long. That is not the sort of thing you see from commercial research companies in hot fields.
If anyone here did believe that Anthropic would voluntarily refrain from advancing the state of the art in all cases, you might want to check if there are other things that people have told you about themselves, which you would really like to be true of them, but you have no evidence for other than their assertions, and would be very unusual if they were true.
I also strongly expected them to violate this commitment, though my understanding is that various investors and early collaborators did believe they would keep this commitment.
I think it’s important to understand that Anthropic was founded before the recent post-Chat-GPT hype/AI-interest-explosion. Similarly to how OpenAIs charter seemed plausible as something that OpenAI could adhere to for people early on, so did it seem possible that commercial pressures would not cause a fully-throated arms-race between all the top companies, with billions to trillions of dollars for the taking for whoever got to AGI first, which I do agree made violating this commitment a relatively likely conclusion.
To be blunt, this is why I believe there was a game of telephone that happened, because I agree that this commitment was unlikely to be held, so I don’t think they promised this would happen (though the comms from Anthropic are surprisingly unclear on such an important point.)
This comment is inspired by Raemon’s comment below this paragraph, and I’ll elaborate on a problematic/toxic dynamic in criticizing orgs that might be right but also have a reasonable probability of being shady is that the people who do try to criticize such orgs are often selected for both more conflict than is optimal, and more paranoia than a correctly calibrated person, which means it’s too easy for even shady organizations to validly argue away any criticism, no matter how serious, and the honest organizations will almost certainly respond the same way, which means you get much less evidence and data for your calibration, and this can easily spiral into being far too paranoid/insane about an organization to the point of elaborating false conspiracy theories about it:
https://www.lesswrong.com/posts/wn5jTrtKkhspshA4c/?commentId=9qfPihnpHoESSCAGP
My own take on the entire affair is that Anthropic comms definitely needs to be more consistent and clear, but also we should try to be much more careful around the qualifiers, and importantly to treat 2 similar sounding sentences as potentially extremely different, because every word does matter for these sorts of high-stakes situations.
More generally, it’s important to realize early when a telephone game is happening, so that you can stop the spread of misconceptions.
Most of us agree with you that deploying Claude 3 was reasonable, although I for one disagree with your reasoning. The criticism was mostly about the release being (allegedly) inconsistent with Anthropic’s past statements/commitments on releases.
[Edit: the link shows that most of us don’t think deploying Claude 3 increased AI risk, not think deploying Claude 3 was reasonable.]
I at least didn’t interpret this poll to mean that deploying it was reasonable. I think given past Anthropic commitments it was pretty unreasonable (violating your deployment commitments seems really quite bad, and is IMO one of the most central things that Anthropic should be judged on). It’s just not really clear whether it directly increased risk. I would be quite sad if that poll result would be seen as something like “approval of whether Anthropic made the right call”.
Sorry for using the poll to support a different proposition. Edited.
To make sure I understand your position (and Ben’s):
Dario committed to Dustin that Anthropic wouldn’t “meaningfully advance the frontier” (according to Dustin)
Anthropic senior staff privately gave AI safety people the impression that Anthropic would stay behind/at the frontier (although nobody has quotes)
Claude 3 Opus meaningfully advanced the frontier? Or slightly advanced it but Anthropic markets it like it was a substantial advance so they’re being similarly low-integrity?
...I don’t think Anthropic violated its deployment commitments. I mostly believe y’all about 2—I didn’t know 2 until people asserted it right after the Claude 3 release, but I haven’t been around the community, much less well-connected in it, for long—but that feels like an honest miscommunication to me. If I’m missing “past Anthropic commitments” please point to them.
For the record, I have been around the community for a long time (since before Anthropic existed), in a very involved way, and I had also basically never heard of this before the Claude 3 release. I can recall only one time where I ever heard someone mention something like this, it was a non-Anthropic person who said they heard it from someone else who was a non-Anthropic person, they asked me if I had heard the same thing, and I said no. So it certainly seems clear given all the reports that this was a real rumour that was going around, but it was definitely not the case that this was just an obvious thing that everyone in the community knew about or that Anthropic senior staff were regularly saying (I talked regularly to a lot of Anthropic senior staff before I joined Anthropic and I never heard anyone say this).
That seems concerning! Did you follow up with the leadership of your organization to understand to what degree they seem to have been making different (and plausibly contradictory) commitments to different interest groups?
It seems like it’s quite important to know what promises your organization has made to whom, if you are trying to assess whether you working there will positively or negatively effect how AI will go.
(Note, I talked with Evan about this in private some other times, so the above comment is more me bringing a private conversation into the public realm than me starting a whole conversation about this. I’ve already poked Evan privately asking him to please try to get better confirmation of the nature of the commitments made here, but he wasn’t interested at the time, so I am making the same bid publicly.)
I think it was an honest miscommunication coupled to a game of telephone—the sort of thing that inevitably happens sometimes—but not something that I feel particularly concerned about.
I would take pretty strong bets that that isn’t what happened based on having talked to more people about this. Happy to operationalize and then try to resolve it.
Here are three possible scenarios:
Scenario 1, Active Lying– Anthropic staff were actively spreading the idea that they would not push the frontier.
Scenario 2, Allowing misconceptions to go unchecked– Anthropic staff were aware that many folks in the AIS world thought that Anthropic had committed to not pushing the frontier, and they allowed this misconception to go unchecked, perhaps because they realized that it was a misconception that favored their commercial/competitive interests.
Scenario 3, Not being aware– Anthropic staff were not aware that many folks had this belief. Maybe they heard it once or twice but it never really seemed like a big deal.
Scenario 1 is clearly bad. Scenarios 2 and 3 are more interesting. To what extent does Anthropic have the responsibility to clarify misconceptions (avoid scenario 2) and even actively look for misconceptions (avoid scenario 3)?
I expect this could matter tangibly for discussions of RSPs. My opinion is that the Anthropic RSP is written in such a way that readers can come away with rather different expectations of what kinds of circumstances would cause Anthropic to pause/resume.
It wouldn’t be very surprising to me if we end up seeing a situation where many readers say “hey look, we’ve reached an ASL-3 system, so now you’re going to pause, right?” And then Anthropic says “no no, we have sufficient safeguards– we can keep going now.” And then some readers say “wait a second– what? I’m pretty sure you committed to pausing until your safeguards were better than that.” And then Anthropic says “no… we never said exactly what kinds of safeguards we would need, and our leadership’s opinion is that our safeguards are sufficient, and the RSP allows leadership to determine when it’s fine to proceed.”
In this (hypothetical) scenario, Anthropic never lied, but it benefitted from giving off a more cautious impression, and it didn’t take steps to correct this impression.
I think avoiding these kinds of scenarios requires some mix of:
Clear, specific falsifiable statements on behalf of labs.
Some degree of proactive attempts to identify and alleviate misconceptions.
One counterargument is something like “Anthropic is a company, and there are lots of things to do, and this is is demanding an unusually high amount of attention-to-detail and proactive communication that is not typically expected of companies.” To which my response is something like “yes, but I think it’s reasonable to hold companies to such standards if they wish to develop AGI. I think we ought to hold Anthropic and other labs to this standard, especially insofar as they want the benefits associated with being perceived as the kind of safety-conscious lab that refuses to push the frontier or commits to scaling policies that include tangible/concrete plans to pause.”
For the record a different Anthropic staffer told me confidently that it was widespread in 2022, the year before you joined, so I think you’re wrong here.
(The staffer preferred that I not quote them verbatim in public so I’ve DM’d you a direct quote.)
Summarizing from the private conversation: the information there is not new to me and I don’t think your description of what they said is accurate.
As I’ve said previously, Anthropic people certainly went around saying things like “we want to think carefully about when to do releases and try to advance capabilities for the purpose of doing safety”, but it was always extremely clear at least to me that these were not commitments, just general thoughts about strategy, and I am very confident that was what was being referred to as being widespread in 2022 here.
I updated somewhat over the following weeks that Opus had meaningfully advanced the frontier, but I don’t know how much that is true for other people.
It seems like Anthropic’s marketing is in direct contradiction with the explicit commitment they made to many people, including Dustin, which seems to have quite consistently been the “meaningfully advance the frontier” line. I think it’s less clear whether their actual capabilities are, as opposed to their marketing statements. I think if you want to have any chance of enforcing commitments like this, the enforcement needs to happen at the latest when the organization publicly claims to have done something in direct contradiction to it, so I think the marketing statements matter a bunch here.
Anthropic has also continued to publish ads claiming that Claude 3 has meaningfully pushed the state of the art and is the smartest model on the market since the discussion around this happened, so it’s not just a one-time oversight by their marketing department.
Separately, multiple Anthropic staffers seem to think themselves no longer bound by their previous commitment and expect that Anthropic will likely unambiguously advance the frontier if they get the chance.
Thanks.
I guess I’m more willing to treat Anthropic’s marketing as not-representing-Anthropic. Shrug. [Edit: like, maybe it’s consistent-with-being-a-good-guy-and-basically-honest to exaggerate your product in a similar way to everyone else. (You risk the downsides of creating hype but that’s a different issue than the integrity thing.)]
It is disappointing that Anthropic hasn’t clarified its commitments after the post-launch confusion, one way or the other.
I feel sympathetic to this, but when I think of the mess of trying to hold an organization accountable when I literally can’t take the public statements of the organization itself as evidence, then that feels kind of doomed to me. It feels like it would allow Anthropic to weasel itself out of almost any commitment.
Like, when OpenAI marketing says GPT-4 is our most aligned model yet! you could say this shows that OpenAI deeply misunderstands alignment but I tend to ignore it. Even mostly when Sam Altman says it himself.
[Edit after habryka’s reply: my weak independent impression is that often the marketing people say stuff that the leadership and most technical staff disagree with, and if you use marketing-speak to substantially predict what-leadership-and-staff-believe you’ll make worse predictions.]
Oh, I have indeed used this to update that OpenAI deeply misunderstands alignment, and this IMO has allowed me to make many accurate predictions about what OpenAI has been doing over the last few years, so I feel good about interpreting it that way.
Your argument would imply that competition begets worse products?