FWIW, as a common critic of Anthropic, I think I agree with this. I am a bit worried about engaging with the DoD being bad for Anthropic’s epistemics and ability to be held accountable by the government and public, but I think the basics of engaging on defense issues seems fine to me, and I don’t think risks from AI route basically at all through AI being used for building military technology, or intelligence analysis.
I would guess it does somewhat exacerbate risk. I think it’s unlikely (~15%) that alignment is easy enough that prosaic techniques even could suffice, but in those worlds I expect things go well mostly because the behavior of powerful models is non-trivially influenced/constrained by their training. In which case I do expect there’s more room for things to go wrong, the more that training is for lethality/adversariality.
Given the present state of atheoretical confusion about alignment, I feel wary of confidently dismissing these sorts of basic, obvious-at-first-glance arguments about risk—like e.g., “all else equal, probably we should expect more killing people-type problems from models trained to kill people”—without decently strong countervailing arguments.
I mostly agree. But I think some kinds of autonomous weapons would make loss-of-control and coups easier. But boosting US security is good so the net effect is unclear. And that’s very far from the recent news (and Anthropic has a Usage Policy, with exceptions, which disallows various uses — my guess is this is too strong on weapons).
(and Anthropic has a Usage Policy, with exceptions, which disallows weapons stuff — my guess is this is too strong on weapons).
I think usage policies should not be read as commitments, and so I think it would be reasonable to expect that Anthropic will allow weapon development if it becomes highly profitable (and in contrast to other things Anthropic has promised, to not be interpreted as a broken promise when they do so).
FWIW, as a common critic of Anthropic, I think I agree with this. I am a bit worried about engaging with the DoD being bad for Anthropic’s epistemics and ability to be held accountable by the government and public, but I think the basics of engaging on defense issues seems fine to me, and I don’t think risks from AI route basically at all through AI being used for building military technology, or intelligence analysis.
I would guess it does somewhat exacerbate risk. I think it’s unlikely (~15%) that alignment is easy enough that prosaic techniques even could suffice, but in those worlds I expect things go well mostly because the behavior of powerful models is non-trivially influenced/constrained by their training. In which case I do expect there’s more room for things to go wrong, the more that training is for lethality/adversariality.
Given the present state of atheoretical confusion about alignment, I feel wary of confidently dismissing these sorts of basic, obvious-at-first-glance arguments about risk—like e.g., “all else equal, probably we should expect more killing people-type problems from models trained to kill people”—without decently strong countervailing arguments.
I mostly agree. But I think some kinds of autonomous weapons would make loss-of-control and coups easier. But boosting US security is good so the net effect is unclear. And that’s very far from the recent news (and Anthropic has a Usage Policy, with exceptions, which disallows various uses — my guess is this is too strong on weapons).
I think usage policies should not be read as commitments, and so I think it would be reasonable to expect that Anthropic will allow weapon development if it becomes highly profitable (and in contrast to other things Anthropic has promised, to not be interpreted as a broken promise when they do so).