FWIW, as a common critic of Anthropic, I think I agree with this. I am a bit worried about engaging with the DoD being bad for Anthropic’s epistemics and ability to be held accountable by the government and public, but I think the basics of engaging on defense issues seems fine to me, and I don’t think risks from AI route basically at all through AI being used for building military technology, or intelligence analysis.
I would guess it does somewhat exacerbate risk. I think it’s unlikely (~15%) that alignment is easy enough that prosaic techniques even could suffice, but in those worlds I expect things go well mostly because the behavior of powerful models is non-trivially influenced/constrained by their training. In which case I do expect there’s more room for things to go wrong, the more that training is for lethality/adversariality.
Given the state of atheoretical confusion about alignment, I feel wary of confidently dismissing these sorts of basic, obvious-at-first-glance arguments about risk—like e.g., “all else equal, probably we should expect more killing people-type problems from models trained to kill people”—without decently strong countervailing arguments.
I mostly agree. But I think some kinds of autonomous weapons would make loss-of-control and coups easier. But boosting US security is good so the net effect is unclear. And that’s very far from the recent news (and Anthropic has a Usage Policy, with exceptions, which disallows various uses — my guess is this is too strong on weapons).
(and Anthropic has a Usage Policy, with exceptions, which disallows weapons stuff — my guess is this is too strong on weapons).
I think usage policies should not be read as commitments, and so I think it would be reasonable to expect that Anthropic will allow weapon development if it becomes highly profitable (and in contrast to other things Anthropic has promised, to not be interpreted as a broken promise when they do so).
If you are in any way involved in this project, please remember you may end up with the blood of millions of people on your hands. You will erode the moral inhibitions people in San Francisco have against building this sort of thing, and eventually SF will ship the best surveillance tools to dictators worldwide.
This is not hyperbole, this sort of thing has already happened. Zuckerberg basically ignored the genocide in Myanmar which his app enabled because maintaining his image of political neutrality is more important to him. Saudi Arabia has already executed people for social media posts found using tools written by western software developers.
Sure, xrisk may be more important than genocide, but please remember you will need to sleep at night knowing what you’ve done and you may not have any motivation to work on xrisk after this.
FWIW, as a common critic of Anthropic, I think I agree with this. I am a bit worried about engaging with the DoD being bad for Anthropic’s epistemics and ability to be held accountable by the government and public, but I think the basics of engaging on defense issues seems fine to me, and I don’t think risks from AI route basically at all through AI being used for building military technology, or intelligence analysis.
I would guess it does somewhat exacerbate risk. I think it’s unlikely (~15%) that alignment is easy enough that prosaic techniques even could suffice, but in those worlds I expect things go well mostly because the behavior of powerful models is non-trivially influenced/constrained by their training. In which case I do expect there’s more room for things to go wrong, the more that training is for lethality/adversariality.
Given the state of atheoretical confusion about alignment, I feel wary of confidently dismissing these sorts of basic, obvious-at-first-glance arguments about risk—like e.g., “all else equal, probably we should expect more killing people-type problems from models trained to kill people”—without decently strong countervailing arguments.
I mostly agree. But I think some kinds of autonomous weapons would make loss-of-control and coups easier. But boosting US security is good so the net effect is unclear. And that’s very far from the recent news (and Anthropic has a Usage Policy, with exceptions, which disallows various uses — my guess is this is too strong on weapons).
I think usage policies should not be read as commitments, and so I think it would be reasonable to expect that Anthropic will allow weapon development if it becomes highly profitable (and in contrast to other things Anthropic has promised, to not be interpreted as a broken promise when they do so).
If you are in any way involved in this project, please remember you may end up with the blood of millions of people on your hands. You will erode the moral inhibitions people in San Francisco have against building this sort of thing, and eventually SF will ship the best surveillance tools to dictators worldwide.
This is not hyperbole, this sort of thing has already happened. Zuckerberg basically ignored the genocide in Myanmar which his app enabled because maintaining his image of political neutrality is more important to him. Saudi Arabia has already executed people for social media posts found using tools written by western software developers.
Sure, xrisk may be more important than genocide, but please remember you will need to sleep at night knowing what you’ve done and you may not have any motivation to work on xrisk after this.