Right now it seems like the entire community is jumping to conclusions based on a couple of “impressions” people got from talking to Dario, plus an offhand line in a blog post. With that little evidence, if you have formed strong expectations, that’s on you.
Like Robert, the impressions I had were based on what I heard from people working at Anthropic. I cited various bits of evidence because those were the ones available, not because they were the most representative. The most representative were those from Anthropic employees who concurred that this was indeed the implication, but it seemed bad form to cite particular employees (especially when that information was not public by default) rather than, e.g., Dario. I think Dustin’s statement was strong evidence of this impression, though, and I still believe Anthropic to have at least insinuated it.
I agree with you that most people are not aiming for as much stringency with their commitments as rationalists expect. Separately, I do think that what Anthropic did would constitute a betrayal, even in everyday culture. And in any case, I think that when you are making a technology which might extinct humanity, the bar should be significantly higher than “normal discourse.” When you are doing something with that much potential for harm, you owe it to society to make firm commitments that you stick to. Otherwise, as kave noted, how are we supposed to trust your other “commitments”? Your RSP? If all you can offer are vague “we’ll figure it out when we get there,” then any ambiguous statement should be interpreted as a vibe, rather than a real plan. And in the absence of unambiguous statements, as all the labs have failed to provide, this is looking very much like “trust us, we’ll do the right thing.” Which, to my mind, is nowhere close to the assurances society ought to be provided given the stakes.
I do think it would be nice if Anthropic did make such a statement, but seeing how adversarially everyone has treated the information they do release, I don’t blame them for not doing so.
This reasoning seems to imply that Anthropic should only be obliged to convey information when the environment is sufficiently welcoming to them. But Anthropic is creating a technology which might extinct humanity—they have an obligation to share their reasoning regardless of what society thinks. In fact, if people are upset by their actions, there is more reason, not less, to set the record straight. Public scrutiny of companies, especially when their choices affect everyone, is a sign of healthy discourse.
The implicit bid for people not to discourage them—because that would make it less likely for a company to be forthright—seems incredibly backwards, because then the public is unable to mention when they feel Anthropic has made a mistake. And if Anthropic is attempting to serve the public, which they at least pay lip service to through their corporate structure, then they should be grateful for this feedback, and attempt to incorporate it.
So I do blame them for not making such a statement—it is on them to show to humanity, the people they are making decisions for, why those decisions are justified. It is not on society to make the political situation sufficiently palatable such that they don’t face any consequences for the mistakes they have made. It is on them not to make those mistakes, and to own up to them when they do.
The most representative were those from Anthropic employees who concurred that this was indeed the implication, but it seemed bad form to cite particular employees (especially when that information was not public by default) rather than, e.g., Dario. I think Dustin’s statement was strong evidence of this impression, though, and I still believe Anthropic to have at least insinuated it.
This makes sense, and does update me. Though I note “implication”, “insinuation” and “impression” are still pretty weak compared to “actually made a commitment”, and still consistent with the main driver being wishful thinking on the part of the AI safety community (including some members of the AI safety community who work at Anthropic).
I think that when you are making a technology which might extinct humanity, the bar should be significantly higher than “normal discourse.” When you are doing something with that much potential for harm, you owe it to society to make firm commitments that you stick to.
...
So I do blame them for not making such a statement—it is on them to show to humanity, the people they are making decisions for, why those decisions are justified. It is not on society to make the political situation sufficiently palatable such that they don’t face any consequences for the mistakes they have made. It is on them not to make those mistakes, and to own up to them when they do.
I think there are two implicit things going on here that I’m wary of. The first one is an action-inaction distinction. Pushing them to justify their actions is, in effect, a way of slowing down all their actions. But presumably Anthropic thinks that them not doing things is also something which could lead to humanity going extinct. Therefore there’s an exactly analogous argument they might make, which is something like “when you try to stop us from doing things you owe it to the world to adhere to a bar that’s much higher than ‘normal discourse’”. And in fact criticism of Anthropic has not met this bar—e.g. I think taking a line from a blog post out of context and making a critical song about it is in fact unusually bad discourse.
What’s the disanalogy between you and Anthropic telling each other to have higher standards? That’s the second thing that I’m wary about: you’re claiming to speak on behalf of humanity as a whole. But in fact, you are not; there’s no meaningful sense in which humanity is in fact demanding a certain type of explanation from Anthropic. Almost nobody wants an explanation of this particular policy; in fact, the largest group of engaged stakeholders here are probably Anthropic customers, who mostly just want them to ship more models.
I don’t really have a strong overall take. I certainly think it’s reasonable to try to figure out what went wrong with communication here, and perhaps people poking around and asking questions would in fact lead to evidence of clear commitments being made. I am mostly against the reflexive attacks based on weak evidence, which seems like what’s happening here. In general my model of trust breakdowns involves each side getting many shallow papercuts from the other side until they decide to disengage, and my model of productive criticism involves more specificity and clarity.
if Anthropic is attempting to serve the public, which they at least pay lip service to through their corporate structure, then they should be grateful for this feedback, and attempt to incorporate it.
I don’t know if you’ve ever tried this move on an interpersonal level, but it is exactly the type of move that tends to backfire hard. And in fact a lot of these things are fundamentally interpersonal things, about who trusts whom, etc.
Like Robert, the impressions I had were based on what I heard from people working at Anthropic. I cited various bits of evidence because those were the ones available, not because they were the most representative. The most representative were those from Anthropic employees who concurred that this was indeed the implication, but it seemed bad form to cite particular employees (especially when that information was not public by default) rather than, e.g., Dario. I think Dustin’s statement was strong evidence of this impression, though, and I still believe Anthropic to have at least insinuated it.
I agree with you that most people are not aiming for as much stringency with their commitments as rationalists expect. Separately, I do think that what Anthropic did would constitute a betrayal, even in everyday culture. And in any case, I think that when you are making a technology which might extinct humanity, the bar should be significantly higher than “normal discourse.” When you are doing something with that much potential for harm, you owe it to society to make firm commitments that you stick to. Otherwise, as kave noted, how are we supposed to trust your other “commitments”? Your RSP? If all you can offer are vague “we’ll figure it out when we get there,” then any ambiguous statement should be interpreted as a vibe, rather than a real plan. And in the absence of unambiguous statements, as all the labs have failed to provide, this is looking very much like “trust us, we’ll do the right thing.” Which, to my mind, is nowhere close to the assurances society ought to be provided given the stakes.
This reasoning seems to imply that Anthropic should only be obliged to convey information when the environment is sufficiently welcoming to them. But Anthropic is creating a technology which might extinct humanity—they have an obligation to share their reasoning regardless of what society thinks. In fact, if people are upset by their actions, there is more reason, not less, to set the record straight. Public scrutiny of companies, especially when their choices affect everyone, is a sign of healthy discourse.
The implicit bid for people not to discourage them—because that would make it less likely for a company to be forthright—seems incredibly backwards, because then the public is unable to mention when they feel Anthropic has made a mistake. And if Anthropic is attempting to serve the public, which they at least pay lip service to through their corporate structure, then they should be grateful for this feedback, and attempt to incorporate it.
So I do blame them for not making such a statement—it is on them to show to humanity, the people they are making decisions for, why those decisions are justified. It is not on society to make the political situation sufficiently palatable such that they don’t face any consequences for the mistakes they have made. It is on them not to make those mistakes, and to own up to them when they do.
This makes sense, and does update me. Though I note “implication”, “insinuation” and “impression” are still pretty weak compared to “actually made a commitment”, and still consistent with the main driver being wishful thinking on the part of the AI safety community (including some members of the AI safety community who work at Anthropic).
I think there are two implicit things going on here that I’m wary of. The first one is an action-inaction distinction. Pushing them to justify their actions is, in effect, a way of slowing down all their actions. But presumably Anthropic thinks that them not doing things is also something which could lead to humanity going extinct. Therefore there’s an exactly analogous argument they might make, which is something like “when you try to stop us from doing things you owe it to the world to adhere to a bar that’s much higher than ‘normal discourse’”. And in fact criticism of Anthropic has not met this bar—e.g. I think taking a line from a blog post out of context and making a critical song about it is in fact unusually bad discourse.
What’s the disanalogy between you and Anthropic telling each other to have higher standards? That’s the second thing that I’m wary about: you’re claiming to speak on behalf of humanity as a whole. But in fact, you are not; there’s no meaningful sense in which humanity is in fact demanding a certain type of explanation from Anthropic. Almost nobody wants an explanation of this particular policy; in fact, the largest group of engaged stakeholders here are probably Anthropic customers, who mostly just want them to ship more models.
I don’t really have a strong overall take. I certainly think it’s reasonable to try to figure out what went wrong with communication here, and perhaps people poking around and asking questions would in fact lead to evidence of clear commitments being made. I am mostly against the reflexive attacks based on weak evidence, which seems like what’s happening here. In general my model of trust breakdowns involves each side getting many shallow papercuts from the other side until they decide to disengage, and my model of productive criticism involves more specificity and clarity.
I don’t know if you’ve ever tried this move on an interpersonal level, but it is exactly the type of move that tends to backfire hard. And in fact a lot of these things are fundamentally interpersonal things, about who trusts whom, etc.