I think part of the disappointment is the lack of communication regarding violating the commitment or violating the expectations of a non-trivial fraction of the community.
If someone makes a promise to you or even sets an expectation for you in a softer way, there is of course always some chance that they will break the promise or violate the expectation.
But if they violate the commitment or the expectation, and they care about you as a stakeholder, I think there’s a reasonable expectation that they should have to justify that decision.
If they break the promise or violate the soft expectation, and then they say basically nothing (or they say “well I never technically made a promise– there was no contract!”, then I think you have the right to be upset with them not only for violating you expectation but also for essentially trying to gaslight you afterward.
I think a Responsible Lab would have issued some sort of statement along the lines of “hey, we’re hearing that some folks thought we had made commitments to not advance the frontier and some of our employees were saying this to safety-focused members of the AI community. We’re sorry about this miscommunication, and here are some steps we’ll take to avoid such miscommunications in the future.” or “We did in fact intend to follow-through on that, but here are some of the extreme events or external circumstances that caused us to change our mind.”
In the absence of such statement, it makes it seem like Anthropic does not really care about honoring its commitments/expectations or generally defending its reasoning on important safety-relevant issues. I find it reasonable that this disposition harms Anthropic’s reputation among safety-conscious people and makes safety-conscious people less excited about voluntary commitments from labs in general.
See my comment below. Basically I think this depends a lot on the extent to which a commitment was made.
Right now it seems like the entire community is jumping to conclusions based on a couple of “impressions” people got from talking to Dario, plus an offhand line in a blog post. With that little evidence, if you have formed strong expectations, that’s on you. And trying to double down by saying “I have been bashing you because I formed an unreasonable expectation, now it’s your job to fix that” seems pretty adversarial.
I do think it would be nice if Anthropic did make such a statement, but seeing how adversarially everyone has treated the information they do release, I don’t blame them for not doing so.
Right now it seems like the entire community is jumping to conclusions based on a couple of “impressions” people got from talking to Dario, plus an offhand line in a blog post.
No, many people had the impression that Anthropic had made such a commitment, which is why they were so surprised when they saw the Claude 3 benchmarks/marketing. Their impressions were derived from a variety of sources; those are merely the few bits of “hard evidence”, gathered after the fact, of anything that could be thought of as an “organizational commitment”.
Also, if Dustin Moskovitz and Gwern—two dispositionally pretty different people—both came away from talking to Dario with this understanding, I do not think that is something you just wave off. Failures of communication do happen. It’s pretty strange for this many people to pick up the same misunderstanding over the course of several years, from many different people (including Dario, but also others), in a way that’s beneficial to Anthropic, and then middle management starts telling you that maybe there was a vibe but they’ve never heard of any such commitment (nevermind what Dustin and Gwern heard, or anyone else who might’ve heard similar from other Anthropic employees).
I do think it would be nice if Anthropic did make such a statement, but seeing how adversarially everyone has treated the information they do release, I don’t blame them for not doing so.
I really think this is assuming the conclusion. I would be… maybe not happy, but definitely much less unhappy, with a response like, “Dang, we definitely did not intend to communicate a binding commitment to not release frontier models that are better than anything else publicly available at the time. In the future, you should not assume that any verbal communication from any employee, including the CEO, is ever a binding commitment that Anthropic, as an organzation, will respect, even if they say the words This is a binding commitment. It needs to be in writing on our website, etc, etc.”
Right now it seems like the entire community is jumping to conclusions based on a couple of “impressions” people got from talking to Dario, plus an offhand line in a blog post. With that little evidence, if you have formed strong expectations, that’s on you.
Like Robert, the impressions I had were based on what I heard from people working at Anthropic. I cited various bits of evidence because those were the ones available, not because they were the most representative. The most representative were those from Anthropic employees who concurred that this was indeed the implication, but it seemed bad form to cite particular employees (especially when that information was not public by default) rather than, e.g., Dario. I think Dustin’s statement was strong evidence of this impression, though, and I still believe Anthropic to have at least insinuated it.
I agree with you that most people are not aiming for as much stringency with their commitments as rationalists expect. Separately, I do think that what Anthropic did would constitute a betrayal, even in everyday culture. And in any case, I think that when you are making a technology which might extinct humanity, the bar should be significantly higher than “normal discourse.” When you are doing something with that much potential for harm, you owe it to society to make firm commitments that you stick to. Otherwise, as kave noted, how are we supposed to trust your other “commitments”? Your RSP? If all you can offer are vague “we’ll figure it out when we get there,” then any ambiguous statement should be interpreted as a vibe, rather than a real plan. And in the absence of unambiguous statements, as all the labs have failed to provide, this is looking very much like “trust us, we’ll do the right thing.” Which, to my mind, is nowhere close to the assurances society ought to be provided given the stakes.
I do think it would be nice if Anthropic did make such a statement, but seeing how adversarially everyone has treated the information they do release, I don’t blame them for not doing so.
This reasoning seems to imply that Anthropic should only be obliged to convey information when the environment is sufficiently welcoming to them. But Anthropic is creating a technology which might extinct humanity—they have an obligation to share their reasoning regardless of what society thinks. In fact, if people are upset by their actions, there is more reason, not less, to set the record straight. Public scrutiny of companies, especially when their choices affect everyone, is a sign of healthy discourse.
The implicit bid for people not to discourage them—because that would make it less likely for a company to be forthright—seems incredibly backwards, because then the public is unable to mention when they feel Anthropic has made a mistake. And if Anthropic is attempting to serve the public, which they at least pay lip service to through their corporate structure, then they should be grateful for this feedback, and attempt to incorporate it.
So I do blame them for not making such a statement—it is on them to show to humanity, the people they are making decisions for, why those decisions are justified. It is not on society to make the political situation sufficiently palatable such that they don’t face any consequences for the mistakes they have made. It is on them not to make those mistakes, and to own up to them when they do.
The most representative were those from Anthropic employees who concurred that this was indeed the implication, but it seemed bad form to cite particular employees (especially when that information was not public by default) rather than, e.g., Dario. I think Dustin’s statement was strong evidence of this impression, though, and I still believe Anthropic to have at least insinuated it.
This makes sense, and does update me. Though I note “implication”, “insinuation” and “impression” are still pretty weak compared to “actually made a commitment”, and still consistent with the main driver being wishful thinking on the part of the AI safety community (including some members of the AI safety community who work at Anthropic).
I think that when you are making a technology which might extinct humanity, the bar should be significantly higher than “normal discourse.” When you are doing something with that much potential for harm, you owe it to society to make firm commitments that you stick to.
...
So I do blame them for not making such a statement—it is on them to show to humanity, the people they are making decisions for, why those decisions are justified. It is not on society to make the political situation sufficiently palatable such that they don’t face any consequences for the mistakes they have made. It is on them not to make those mistakes, and to own up to them when they do.
I think there are two implicit things going on here that I’m wary of. The first one is an action-inaction distinction. Pushing them to justify their actions is, in effect, a way of slowing down all their actions. But presumably Anthropic thinks that them not doing things is also something which could lead to humanity going extinct. Therefore there’s an exactly analogous argument they might make, which is something like “when you try to stop us from doing things you owe it to the world to adhere to a bar that’s much higher than ‘normal discourse’”. And in fact criticism of Anthropic has not met this bar—e.g. I think taking a line from a blog post out of context and making a critical song about it is in fact unusually bad discourse.
What’s the disanalogy between you and Anthropic telling each other to have higher standards? That’s the second thing that I’m wary about: you’re claiming to speak on behalf of humanity as a whole. But in fact, you are not; there’s no meaningful sense in which humanity is in fact demanding a certain type of explanation from Anthropic. Almost nobody wants an explanation of this particular policy; in fact, the largest group of engaged stakeholders here are probably Anthropic customers, who mostly just want them to ship more models.
I don’t really have a strong overall take. I certainly think it’s reasonable to try to figure out what went wrong with communication here, and perhaps people poking around and asking questions would in fact lead to evidence of clear commitments being made. I am mostly against the reflexive attacks based on weak evidence, which seems like what’s happening here. In general my model of trust breakdowns involves each side getting many shallow papercuts from the other side until they decide to disengage, and my model of productive criticism involves more specificity and clarity.
if Anthropic is attempting to serve the public, which they at least pay lip service to through their corporate structure, then they should be grateful for this feedback, and attempt to incorporate it.
I don’t know if you’ve ever tried this move on an interpersonal level, but it is exactly the type of move that tends to backfire hard. And in fact a lot of these things are fundamentally interpersonal things, about who trusts whom, etc.
I think part of the disappointment is the lack of communication regarding violating the commitment or violating the expectations of a non-trivial fraction of the community.
If someone makes a promise to you or even sets an expectation for you in a softer way, there is of course always some chance that they will break the promise or violate the expectation.
But if they violate the commitment or the expectation, and they care about you as a stakeholder, I think there’s a reasonable expectation that they should have to justify that decision.
If they break the promise or violate the soft expectation, and then they say basically nothing (or they say “well I never technically made a promise– there was no contract!”, then I think you have the right to be upset with them not only for violating you expectation but also for essentially trying to gaslight you afterward.
I think a Responsible Lab would have issued some sort of statement along the lines of “hey, we’re hearing that some folks thought we had made commitments to not advance the frontier and some of our employees were saying this to safety-focused members of the AI community. We’re sorry about this miscommunication, and here are some steps we’ll take to avoid such miscommunications in the future.” or “We did in fact intend to follow-through on that, but here are some of the extreme events or external circumstances that caused us to change our mind.”
In the absence of such statement, it makes it seem like Anthropic does not really care about honoring its commitments/expectations or generally defending its reasoning on important safety-relevant issues. I find it reasonable that this disposition harms Anthropic’s reputation among safety-conscious people and makes safety-conscious people less excited about voluntary commitments from labs in general.
See my comment below. Basically I think this depends a lot on the extent to which a commitment was made.
Right now it seems like the entire community is jumping to conclusions based on a couple of “impressions” people got from talking to Dario, plus an offhand line in a blog post. With that little evidence, if you have formed strong expectations, that’s on you. And trying to double down by saying “I have been bashing you because I formed an unreasonable expectation, now it’s your job to fix that” seems pretty adversarial.
I do think it would be nice if Anthropic did make such a statement, but seeing how adversarially everyone has treated the information they do release, I don’t blame them for not doing so.
No, many people had the impression that Anthropic had made such a commitment, which is why they were so surprised when they saw the Claude 3 benchmarks/marketing. Their impressions were derived from a variety of sources; those are merely the few bits of “hard evidence”, gathered after the fact, of anything that could be thought of as an “organizational commitment”.
Also, if Dustin Moskovitz and Gwern—two dispositionally pretty different people—both came away from talking to Dario with this understanding, I do not think that is something you just wave off. Failures of communication do happen. It’s pretty strange for this many people to pick up the same misunderstanding over the course of several years, from many different people (including Dario, but also others), in a way that’s beneficial to Anthropic, and then middle management starts telling you that maybe there was a vibe but they’ve never heard of any such commitment (nevermind what Dustin and Gwern heard, or anyone else who might’ve heard similar from other Anthropic employees).
I really think this is assuming the conclusion. I would be… maybe not happy, but definitely much less unhappy, with a response like, “Dang, we definitely did not intend to communicate a binding commitment to not release frontier models that are better than anything else publicly available at the time. In the future, you should not assume that any verbal communication from any employee, including the CEO, is ever a binding commitment that Anthropic, as an organzation, will respect, even if they say the words
This is a binding commitment
. It needs to be in writing on our website, etc, etc.”Could you clarify how binding “OpenAI’s mission is to ensure that artificial general intelligence benefits all of humanity.” is?
Like Robert, the impressions I had were based on what I heard from people working at Anthropic. I cited various bits of evidence because those were the ones available, not because they were the most representative. The most representative were those from Anthropic employees who concurred that this was indeed the implication, but it seemed bad form to cite particular employees (especially when that information was not public by default) rather than, e.g., Dario. I think Dustin’s statement was strong evidence of this impression, though, and I still believe Anthropic to have at least insinuated it.
I agree with you that most people are not aiming for as much stringency with their commitments as rationalists expect. Separately, I do think that what Anthropic did would constitute a betrayal, even in everyday culture. And in any case, I think that when you are making a technology which might extinct humanity, the bar should be significantly higher than “normal discourse.” When you are doing something with that much potential for harm, you owe it to society to make firm commitments that you stick to. Otherwise, as kave noted, how are we supposed to trust your other “commitments”? Your RSP? If all you can offer are vague “we’ll figure it out when we get there,” then any ambiguous statement should be interpreted as a vibe, rather than a real plan. And in the absence of unambiguous statements, as all the labs have failed to provide, this is looking very much like “trust us, we’ll do the right thing.” Which, to my mind, is nowhere close to the assurances society ought to be provided given the stakes.
This reasoning seems to imply that Anthropic should only be obliged to convey information when the environment is sufficiently welcoming to them. But Anthropic is creating a technology which might extinct humanity—they have an obligation to share their reasoning regardless of what society thinks. In fact, if people are upset by their actions, there is more reason, not less, to set the record straight. Public scrutiny of companies, especially when their choices affect everyone, is a sign of healthy discourse.
The implicit bid for people not to discourage them—because that would make it less likely for a company to be forthright—seems incredibly backwards, because then the public is unable to mention when they feel Anthropic has made a mistake. And if Anthropic is attempting to serve the public, which they at least pay lip service to through their corporate structure, then they should be grateful for this feedback, and attempt to incorporate it.
So I do blame them for not making such a statement—it is on them to show to humanity, the people they are making decisions for, why those decisions are justified. It is not on society to make the political situation sufficiently palatable such that they don’t face any consequences for the mistakes they have made. It is on them not to make those mistakes, and to own up to them when they do.
This makes sense, and does update me. Though I note “implication”, “insinuation” and “impression” are still pretty weak compared to “actually made a commitment”, and still consistent with the main driver being wishful thinking on the part of the AI safety community (including some members of the AI safety community who work at Anthropic).
I think there are two implicit things going on here that I’m wary of. The first one is an action-inaction distinction. Pushing them to justify their actions is, in effect, a way of slowing down all their actions. But presumably Anthropic thinks that them not doing things is also something which could lead to humanity going extinct. Therefore there’s an exactly analogous argument they might make, which is something like “when you try to stop us from doing things you owe it to the world to adhere to a bar that’s much higher than ‘normal discourse’”. And in fact criticism of Anthropic has not met this bar—e.g. I think taking a line from a blog post out of context and making a critical song about it is in fact unusually bad discourse.
What’s the disanalogy between you and Anthropic telling each other to have higher standards? That’s the second thing that I’m wary about: you’re claiming to speak on behalf of humanity as a whole. But in fact, you are not; there’s no meaningful sense in which humanity is in fact demanding a certain type of explanation from Anthropic. Almost nobody wants an explanation of this particular policy; in fact, the largest group of engaged stakeholders here are probably Anthropic customers, who mostly just want them to ship more models.
I don’t really have a strong overall take. I certainly think it’s reasonable to try to figure out what went wrong with communication here, and perhaps people poking around and asking questions would in fact lead to evidence of clear commitments being made. I am mostly against the reflexive attacks based on weak evidence, which seems like what’s happening here. In general my model of trust breakdowns involves each side getting many shallow papercuts from the other side until they decide to disengage, and my model of productive criticism involves more specificity and clarity.
I don’t know if you’ve ever tried this move on an interpersonal level, but it is exactly the type of move that tends to backfire hard. And in fact a lot of these things are fundamentally interpersonal things, about who trusts whom, etc.