Am I correct that the implied implication here is that assurances from a non-rationalist are essentially worthless?
I think it is also wrong to imply that Anthropic have violated their commitment simply because they didn’t rationally think through the implications of their commitment when they made it.
I think you can understand Anthropic’s actions as purely rational, just not very ethical.
They made an unenforceable commitment to not push capabilities when it directly benefited them. Now that it is more beneficial to drop the facade, they are doing so.
I think “don’t trust assurances from non-rationalists” is not a good takeaway. Rather it should be “don’t trust unenforceable assurances from people who will stand to greatly benefit from violating your trust at a later date”.
The intended implication is something like “rationalists have a bias towards treating statements as much firmer commitments than intended then getting very upset when they are violated”.
For example, unless I’m missing something, the “we do not wish to advance the rate of AI capabilities” claim is just one offhand line in a blog post. It’s not a firm commitment, it’s not even a claim about what their intentions are. As stated, it’s just one consideration that informs their actions—and in fact the “wish” terminology is often specifically not a claim about intended actions (e.g. “I wish I didn’t have to do X”).
Yet rationalists are hammering them on this one sentence—literally making songs about it, tweeting it to criticize Anthropic, etc. It seems like there is a serious lack of metacognition about where a non-adversarial communication breakdown could have occurred, or what the charitable interpretations of this are.
(I am open to people considering them then dismissing them, but I’m not even seeing that. Like, if people were saying “I understand the difference between Anthropic actually making an organizational commitment, and just offhand mentioning a fact about their motivations, but here’s why I’m disappointed anyway”, that seems reasonable. But a lot of people seem to be treating it as a Very Serious Promise being broken.)
I guess the followup question is “how were Anthropic able to cultivate the impression that they were safety focused if they had only made an extremely loose offhand commitment?”
Certainly the impression I had from how integrated they are in the EA community was that they had made a more serious commitment.
Everyone is afraid of the AI race, and hopes that one of the labs will actually end up doing what they think is the most responsible thing to do. Hope and fear is one hell of a drug cocktail, makes you jump to the conclusions you want based on the flimsiest evidence. But the hangover is a bastard.
Imo I don’t know if we have evidence that Anthropic deliberately cultivated or significantly benefitted from the appearance of a commitment. However if an investor or employee felt like they made substantial commitments based on this impression and then later felt betrayed that would be more serious. (The story here is I think importantly different from other stories where I think there were substantial benefits from commitment appearance and then violation)
The intended implication is something like “rationalists have a bias towards treating statements as much firmer commitments than intended then getting very upset when they are violated”.
That sounds suspiciously similar to “autists have a bias towards interpreting statements literally”.
This post confuses me.
Am I correct that the implied implication here is that assurances from a non-rationalist are essentially worthless?
I think it is also wrong to imply that Anthropic have violated their commitment simply because they didn’t rationally think through the implications of their commitment when they made it.
I think you can understand Anthropic’s actions as purely rational, just not very ethical.
They made an unenforceable commitment to not push capabilities when it directly benefited them. Now that it is more beneficial to drop the facade, they are doing so.
I think “don’t trust assurances from non-rationalists” is not a good takeaway. Rather it should be “don’t trust unenforceable assurances from people who will stand to greatly benefit from violating your trust at a later date”.
The intended implication is something like “rationalists have a bias towards treating statements as much firmer commitments than intended then getting very upset when they are violated”.
For example, unless I’m missing something, the “we do not wish to advance the rate of AI capabilities” claim is just one offhand line in a blog post. It’s not a firm commitment, it’s not even a claim about what their intentions are. As stated, it’s just one consideration that informs their actions—and in fact the “wish” terminology is often specifically not a claim about intended actions (e.g. “I wish I didn’t have to do X”).
Yet rationalists are hammering them on this one sentence—literally making songs about it, tweeting it to criticize Anthropic, etc. It seems like there is a serious lack of metacognition about where a non-adversarial communication breakdown could have occurred, or what the charitable interpretations of this are.
(I am open to people considering them then dismissing them, but I’m not even seeing that. Like, if people were saying “I understand the difference between Anthropic actually making an organizational commitment, and just offhand mentioning a fact about their motivations, but here’s why I’m disappointed anyway”, that seems reasonable. But a lot of people seem to be treating it as a Very Serious Promise being broken.)
That makes sense.
I guess the followup question is “how were Anthropic able to cultivate the impression that they were safety focused if they had only made an extremely loose offhand commitment?”
Certainly the impression I had from how integrated they are in the EA community was that they had made a more serious commitment.
Everyone is afraid of the AI race, and hopes that one of the labs will actually end up doing what they think is the most responsible thing to do. Hope and fear is one hell of a drug cocktail, makes you jump to the conclusions you want based on the flimsiest evidence. But the hangover is a bastard.
Imo I don’t know if we have evidence that Anthropic deliberately cultivated or significantly benefitted from the appearance of a commitment. However if an investor or employee felt like they made substantial commitments based on this impression and then later felt betrayed that would be more serious. (The story here is I think importantly different from other stories where I think there were substantial benefits from commitment appearance and then violation)
That sounds suspiciously similar to “autists have a bias towards interpreting statements literally”.
I mean, yes, they’re closely related.