Richard_Ngo comments on Richard Ngo’s Shortform

Richard_Ngo 24 Jun 2024 22:54 UTC
2 points
0
The most representative were those from Anthropic employees who concurred that this was indeed the implication, but it seemed bad form to cite particular employees (especially when that information was not public by default) rather than, e.g., Dario. I think Dustin’s statement was strong evidence of this impression, though, and I still believe Anthropic to have at least insinuated it.
This makes sense, and does update me. Though I note “implication”, “insinuation” and “impression” are still pretty weak compared to “actually made a commitment”, and still consistent with the main driver being wishful thinking on the part of the AI safety community (including some members of the AI safety community who work at Anthropic).
I think that when you are making a technology which might extinct humanity, the bar should be significantly higher than “normal discourse.” When you are doing something with that much potential for harm, you owe it to society to make firm commitments that you stick to.
...
So I do blame them for not making such a statement—it is on them to show to humanity, the people they are making decisions for, why those decisions are justified. It is not on society to make the political situation sufficiently palatable such that they don’t face any consequences for the mistakes they have made. It is on them not to make those mistakes, and to own up to them when they do.
I think there are two implicit things going on here that I’m wary of. The first one is an action-inaction distinction. Pushing them to justify their actions is, in effect, a way of slowing down all their actions. But presumably Anthropic thinks that them not doing things is also something which could lead to humanity going extinct. Therefore there’s an exactly analogous argument they might make, which is something like “when you try to stop us from doing things you owe it to the world to adhere to a bar that’s much higher than ‘normal discourse’”. And in fact criticism of Anthropic has not met this bar—e.g. I think taking a line from a blog post out of context and making a critical song about it is in fact unusually bad discourse.
What’s the disanalogy between you and Anthropic telling each other to have higher standards? That’s the second thing that I’m wary about: you’re claiming to speak on behalf of humanity as a whole. But in fact, you are not; there’s no meaningful sense in which humanity is in fact demanding a certain type of explanation from Anthropic. Almost nobody wants an explanation of this particular policy; in fact, the largest group of engaged stakeholders here are probably Anthropic customers, who mostly just want them to ship more models.
I don’t really have a strong overall take. I certainly think it’s reasonable to try to figure out what went wrong with communication here, and perhaps people poking around and asking questions would in fact lead to evidence of clear commitments being made. I am mostly against the reflexive attacks based on weak evidence, which seems like what’s happening here. In general my model of trust breakdowns involves each side getting many shallow papercuts from the other side until they decide to disengage, and my model of productive criticism involves more specificity and clarity.
if Anthropic is attempting to serve the public, which they at least pay lip service to through their corporate structure, then they should be grateful for this feedback, and attempt to incorporate it.
I don’t know if you’ve ever tried this move on an interpersonal level, but it is exactly the type of move that tends to backfire hard. And in fact a lot of these things are fundamentally interpersonal things, about who trusts whom, etc.