I think that overall it’s good on the margin for staff at companies risking human extinction to be sharing their perspectives on criticisms and moving towards having dialogue at all
No disagreement.
your implicit demand for Evan Hubinger to do more work here is marginally unhelpful
The community seems to be quite receptive to the opinion, it doesn’t seem unreasonable to voice an objection. If you’re saying it is primarily the way I’ve written it that makes it unhelpful, that seems fair.
I originally felt that either question I asked would be reasonably easy to answer, if time was given to evaluating the potential for harm.
However, given that Hubinger might have to run any reply by Anthropic staff, I understand that it might be negative to demand further work. This is pretty obvious, but didn’t occur to me earlier.
I will add: It’s odd to me, Stephen, that this is your line for (what I read as) disgust at Anthropic staff espousing extremely convenient positions while doing things that seem to you to be causing massive harm.
Ultimately, the original quicktake was only justifying one facet of Anthropic’s work so that’s all I’ve engaged with. It would seem less helpful to bring up my wider objections.
I wouldn’t expect them or their employees to have high standards for public defenses of far less risky behavior
I don’t expect them to have a high standard for defending Anthropic’s behavior, but I do expect the LessWrong community to have a high standard for arguments.
This is bad, but perhaps there is a silver lining.
If internal communication within the scaffold appears to be in plain English, it will tempt humans to assume the meaning coincides precisely with the semantic content of the message.
If the chain of thought contains seemingly nonsensical content, it will be impossible to make this assumption.