COI: I work at Anthropic and I ran this by Anthropic before posting, but all views are exclusively my own.
I got a question about Anthropic’s partnership with Palantir using Claude for U.S. government intelligence analysis and whether I support it and think it’s reasonable, so I figured I would just write a shortform here with my thoughts. First, I can say that Anthropic has been extremely forthright about this internally, and it didn’t come as a surprise to me at all. Second, my personal take would be that I think it’s actually good that Anthropic is doing this. If you take catastrophic risks from AI seriously, the U.S. government is an extremely important actor to engage with, and trying to just block the U.S. government out of using AI is not a viable strategy. I do think there are some lines that you’d want to think about very carefully before considering crossing, but using Claude for intelligence analysis seems definitely fine to me. Ezra Klein has a great article on “The Problem With Everything-Bagel Liberalism” and I sometimes worry about Everything-Bagel AI Safety where e.g. it’s not enough to just focus on catastrophic risks, you also have to prevent any way that the government could possibly misuse your models. I think it’s important to keep your eye on the ball and not become too susceptible to an Everything-Bagel failure mode.
FWIW, as a common critic of Anthropic, I think I agree with this. I am a bit worried about engaging with the DoD being bad for Anthropic’s epistemics and ability to be held accountable by the government and public, but I think the basics of engaging on defense issues seems fine to me, and I don’t think risks from AI route basically at all through AI being used for building military technology, or intelligence analysis.
I would guess it does somewhat exacerbate risk. I think it’s unlikely (~15%) that alignment is easy enough that prosaic techniques even could suffice, but in those worlds I expect things go well mostly because the behavior of powerful models is non-trivially influenced/constrained by their training. In which case I do expect there’s more room for things to go wrong, the more that training is for lethality/adversariality.
Given the state of atheoretical confusion about alignment, I feel wary of confidently dismissing these sorts of basic, obvious-at-first-glance arguments about risk—like e.g., “all else equal, probably we should expect more killing people-type problems from models trained to kill people”—without decently strong countervailing arguments.
I mostly agree. But I think some kinds of autonomous weapons would make loss-of-control and coups easier. But boosting US security is good so the net effect is unclear. And that’s very far from the recent news (and Anthropic has a Usage Policy, with exceptions, which disallows various uses — my guess is this is too strong on weapons).
(and Anthropic has a Usage Policy, with exceptions, which disallows weapons stuff — my guess is this is too strong on weapons).
I think usage policies should not be read as commitments, and so I think it would be reasonable to expect that Anthropic will allow weapon development if it becomes highly profitable (and in contrast to other things Anthropic has promised, to not be interpreted as a broken promise when they do so).
If you are in any way involved in this project, please remember you may end up with the blood of millions of people on your hands. You will erode the moral inhibitions people in San Francisco have against building this sort of thing, and eventually SF will ship the best surveillance tools to dictators worldwide.
This is not hyperbole, this sort of thing has already happened. Zuckerberg basically ignored the genocide in Myanmar which his app enabled because maintaining his image of political neutrality is more important to him. Saudi Arabia has already executed people for social media posts found using tools written by western software developers.
Sure, xrisk may be more important than genocide, but please remember you will need to sleep at night knowing what you’ve done and you may not have any motivation to work on xrisk after this.
Nothing in that announcement suggests that this is limited to intelligence analysis.
U.S. intelligence and defense agencies do run misinformation campaigns such as the antivaxx campaign in the Philippines, and everything that’s public suggests that there’s not a block to using Claude offensively in that fashion.
If Anthropic has gotten promises that Claude is not being used offensively under this agreement they should be public about those promises and the mechanisms that regulate the use of Claude by U.S. intelligence and defense agencies.
I confirmed internally (which felt personally important for me to do) that our partnership with Palantir is still subject to the same terms outlined in the June post “Expanding Access to Claude for Government”:
For example, we have crafted a set of contractual exceptions to our general Usage Policy that are carefully calibrated to enable beneficial uses by carefully selected government agencies. These allow Claude to be used for legally authorized foreign intelligence analysis, such as combating human trafficking, identifying covert influence or sabotage campaigns, and providing warning in advance of potential military activities, opening a window for diplomacy to prevent or deter them. All other restrictions in our general Usage Policy, including those concerning disinformation campaigns, the design or use of weapons, censorship, and malicious cyber operations, remain.
The core of that page is as follows, emphasis added by me:
For example, with carefully selected government entities, we may allow foreign intelligence analysis in accordance with applicable law. All other use restrictions in our Usage Policy, including those prohibiting use for disinformation campaigns, the design or use of weapons, censorship, domestic surveillance, and malicious cyber operations, remain.
This is all public (in Anthropic’s up-to-date support.anthropic.com portal). Additionally it was announced when Anthropic first announced its intentions and approach around government in June.
The United States has laws that prevent the US intelligence and defense agencies from spying on their own population. The Snowden revelations showed us that the US intelligence and defense agencies did not abide by those limits.
Facebook has a usage policy that forbids running misinformation campaigns on their platform. That did not stop US intelligence and defense agencies from running disinformation campaigns on their platform.
Instead of just trusting contracts, Antrophics could add oversight mechanisms, so that a few Antrophics employees can look over how the models are used in practice and whether they are used within the bounds that Antrophics expects them to be used in.
If all usage of the models is classified and out of reach of checking by Antrophics employees, there’s no good reason to expect the contract to be limiting US intelligence and defense agencies if those find it important to use the models outside of how Antrophics expects them to be used.
For example, with carefully selected government entities, we may allow foreign intelligence analysis in accordance with applicable law. All other use restrictions in our Usage Policy, including those prohibiting use for disinformation campaigns, the design or use of weapons, censorship, domestic surveillance, and malicious cyber operations, remain.
This sounds to me like a very carefully worded nondenail denail.
If you say that one example of how you can break your terms is to allow a select government entity to do foreign intelligence analysis in accordance with applicable law and not do disinformation campaigns, you are not denying that another example of how you could do expectations is to allow disinformation campaigns.
If Antrophics would be sincere in this being the only expectation that’s made, it would be easy to add a promise to Exceptions to our Usage Policy, that Anthropic will publish all expectations that they make for the sake of transparency.
Don’t forget, that probably only a tiny number of Anthropic employees have seen the actual contracts and there’s a good chance that those are build by classification from talking with other Anthropics employees about what’s in the contracts.
At Antrophics you are a bunch of people who are supposed to think about AI safety and alignment in general. You could think of this as a testcase of how to design mechanisms for alignment and the Exceptions to our Usage Policy seems like a complete failure in that regard, because it neither contains mechanism to make all expectations public nor any mechanisms to make sure that the policies are followed in practice.
I likely agree that anthropic-><-palantir is good, but i disagree about blocking hte US government out of AI being a viable strategy. It seems to me like many military projects get blocked by inefficient beaurocracy, and it seems plausible to me for some legacy government contractors to get exclusive deals that delay US military ai projects for 2+ years
I think people opposing this have a belief that the counterfactual is “USG doesn’t have LLMs” instead of “USG spins up its own LLM development effort using the NSA’s no-doubt-substantial GPU clusters”.
Needless to say, I think the latter is far more likely.
NSA building it is arguably better because atleast they won’t sell it to countries like Saudi Arabia, and they have better ability to prevent people quitting or diffusing knowledge and code to companies outside.
Also most people in SF agree working for the NSA is morally grey at best, and Anthropic won’t be telling everyone this is morally okay.
When faced with evidence which might update your beliefs about Anthropic, you adopt a set of beliefs which, coincidentally, means you won’t risk losing your job.
How much time have you spent analyzing the positive or negative impact of US intelligence efforts prior to concluding that merely using Claude for intelligence “seemed fine”?
What future events would make you re-evaluate your position and state that the partnership was a bad thing?
Example:
-- A pro-US despot rounds up and tortures to death tens of thousands of pro-union activists and their families. Claude was used to analyse social media and mobile data, building a list of people sympathetic to the union movement, which the US then gave to their ally.
EDIT:
The first two sentences were overly confrontational, but I do think either question warrants an answer.
As a highly respected community member and prominent AI safety researchers, your stated beliefs and justifications will be influential to a wide range of people.
Personally, I think that overall it’s good on the margin for staff at companies risking human extinction to be sharing their perspectives on criticisms and moving towards having dialogue at all, so I think (what I read as) your implicit demand for Evan Hubinger to do more work here is marginally unhelpful; I weakly think quick takes like this are marginally good.
I will add: It’s odd to me, Stephen, that this is your line for (what I read as) disgust at Anthropic staff espousing extremely convenient positions while doing things that seem to you to be causing massive harm. To my knowledge the Anthropic leadership has ~never engaged in public dialogue about why they’re getting rich building potentially-omnicidal-minds with worthy critics like Hinton, Bengio, Russell, Yudkowsky, etc, so I wouldn’t expect them or their employees to have high standards for public defenses of far less risky behavior like working with the US military.[1]
As an example of the low standards for Anthropic’s public discourse, notice how a recent essay about what’s required for Anthropic to succeed at AI Safety by Sam Bowman (a senior safety researcher at Anthropic) flatly states “Our ability to do our safety work depends in large part on our access to frontier technology… staying close to the frontier is perhaps our top priority in Chapter 1” with ~no defense of this claim or engagement with the sorts of reasons that I consider adding a marginal competitor to the suicide race is an atrocity, or acknowledgement that this makes him personally very wealthy (i.e. he and most other engineers at Anthropic will make millions of dollars due to Anthropic acting on this claim).
I think that overall it’s good on the margin for staff at companies risking human extinction to be sharing their perspectives on criticisms and moving towards having dialogue at all
No disagreement.
your implicit demand for Evan Hubinger to do more work here is marginally unhelpful
The community seems to be quite receptive to the opinion, it doesn’t seem unreasonable to voice an objection. If you’re saying it is primarily the way I’ve written it that makes it unhelpful, that seems fair.
I originally felt that either question I asked would be reasonably easy to answer, if time was given to evaluating the potential for harm.
However, given that Hubinger might have to run any reply by Anthropic staff, I understand that it might be negative to demand further work. This is pretty obvious, but didn’t occur to me earlier.
I will add: It’s odd to me, Stephen, that this is your line for (what I read as) disgust at Anthropic staff espousing extremely convenient positions while doing things that seem to you to be causing massive harm.
Ultimately, the original quicktake was only justifying one facet of Anthropic’s work so that’s all I’ve engaged with. It would seem less helpful to bring up my wider objections.
I wouldn’t expect them or their employees to have high standards for public defenses of far less risky behavior
I don’t expect them to have a high standard for defending Anthropic’s behavior, but I do expect the LessWrong community to have a high standard for arguments.
Thanks for the responses, I have a better sense of how you’re thinking about these things.
I don’t feel much desire to dive into this further, except I want to clarify one thing, on the question of any demands in your comment.
I originally felt that either question I asked would be reasonably easy to answer, if time was given to evaluating the potential for harm.
However, given that Hubinger might have to run any reply by Anthropic staff, I understand that it might be negative to demand further work. This is pretty obvious, but didn’t occur to me earlier.
That actually wasn’t primarily the part that felt like a demand to me. This was the part:
How much time have you spent analyzing the positive or negative impact of US intelligence efforts prior to concluding that merely using Claude for intelligence “seemed fine”?
I’m not quite sure what the relevance of the time was if not to suggest it needed to be high. I felt that this line implied something like “If your answer is around ’20 hours’, then I want to say that the correct standard should be ‘200 hours’”. I felt like it was a demand that Hubinger may have to spend 10x the time thinking about this question before he met your standards for being allowed to express his opinion on it.
But perhaps you just meant you wanted him to include an epistemic status, like “Epistemic status: <Here’s how much time I’ve spent thinking about this question>”.
We live in an information society → “You” are trying to build the ultimate dual use information tool/thing/weapon → The government require your service. No news there. So why the need to whitewash this? What about this is actually bothering you?
COI: I work at Anthropic and I ran this by Anthropic before posting, but all views are exclusively my own.
I got a question about Anthropic’s partnership with Palantir using Claude for U.S. government intelligence analysis and whether I support it and think it’s reasonable, so I figured I would just write a shortform here with my thoughts. First, I can say that Anthropic has been extremely forthright about this internally, and it didn’t come as a surprise to me at all. Second, my personal take would be that I think it’s actually good that Anthropic is doing this. If you take catastrophic risks from AI seriously, the U.S. government is an extremely important actor to engage with, and trying to just block the U.S. government out of using AI is not a viable strategy. I do think there are some lines that you’d want to think about very carefully before considering crossing, but using Claude for intelligence analysis seems definitely fine to me. Ezra Klein has a great article on “The Problem With Everything-Bagel Liberalism” and I sometimes worry about Everything-Bagel AI Safety where e.g. it’s not enough to just focus on catastrophic risks, you also have to prevent any way that the government could possibly misuse your models. I think it’s important to keep your eye on the ball and not become too susceptible to an Everything-Bagel failure mode.
FWIW, as a common critic of Anthropic, I think I agree with this. I am a bit worried about engaging with the DoD being bad for Anthropic’s epistemics and ability to be held accountable by the government and public, but I think the basics of engaging on defense issues seems fine to me, and I don’t think risks from AI route basically at all through AI being used for building military technology, or intelligence analysis.
I would guess it does somewhat exacerbate risk. I think it’s unlikely (~15%) that alignment is easy enough that prosaic techniques even could suffice, but in those worlds I expect things go well mostly because the behavior of powerful models is non-trivially influenced/constrained by their training. In which case I do expect there’s more room for things to go wrong, the more that training is for lethality/adversariality.
Given the state of atheoretical confusion about alignment, I feel wary of confidently dismissing these sorts of basic, obvious-at-first-glance arguments about risk—like e.g., “all else equal, probably we should expect more killing people-type problems from models trained to kill people”—without decently strong countervailing arguments.
I mostly agree. But I think some kinds of autonomous weapons would make loss-of-control and coups easier. But boosting US security is good so the net effect is unclear. And that’s very far from the recent news (and Anthropic has a Usage Policy, with exceptions, which disallows various uses — my guess is this is too strong on weapons).
I think usage policies should not be read as commitments, and so I think it would be reasonable to expect that Anthropic will allow weapon development if it becomes highly profitable (and in contrast to other things Anthropic has promised, to not be interpreted as a broken promise when they do so).
If you are in any way involved in this project, please remember you may end up with the blood of millions of people on your hands. You will erode the moral inhibitions people in San Francisco have against building this sort of thing, and eventually SF will ship the best surveillance tools to dictators worldwide.
This is not hyperbole, this sort of thing has already happened. Zuckerberg basically ignored the genocide in Myanmar which his app enabled because maintaining his image of political neutrality is more important to him. Saudi Arabia has already executed people for social media posts found using tools written by western software developers.
Sure, xrisk may be more important than genocide, but please remember you will need to sleep at night knowing what you’ve done and you may not have any motivation to work on xrisk after this.
Another potential benefit of this is that Anthropic might get more experience deploying their models in high-security environments.
Nothing in that announcement suggests that this is limited to intelligence analysis.
U.S. intelligence and defense agencies do run misinformation campaigns such as the antivaxx campaign in the Philippines, and everything that’s public suggests that there’s not a block to using Claude offensively in that fashion.
If Anthropic has gotten promises that Claude is not being used offensively under this agreement they should be public about those promises and the mechanisms that regulate the use of Claude by U.S. intelligence and defense agencies.
COI: I work at Anthropic
I confirmed internally (which felt personally important for me to do) that our partnership with Palantir is still subject to the same terms outlined in the June post “Expanding Access to Claude for Government”:
The contractual exceptions are explained here (very short, easy to read): https://support.anthropic.com/en/articles/9528712-exceptions-to-our-usage-policy
The core of that page is as follows, emphasis added by me:
This is all public (in Anthropic’s up-to-date support.anthropic.com portal). Additionally it was announced when Anthropic first announced its intentions and approach around government in June.
The United States has laws that prevent the US intelligence and defense agencies from spying on their own population. The Snowden revelations showed us that the US intelligence and defense agencies did not abide by those limits.
Facebook has a usage policy that forbids running misinformation campaigns on their platform. That did not stop US intelligence and defense agencies from running disinformation campaigns on their platform.
Instead of just trusting contracts, Antrophics could add oversight mechanisms, so that a few Antrophics employees can look over how the models are used in practice and whether they are used within the bounds that Antrophics expects them to be used in.
If all usage of the models is classified and out of reach of checking by Antrophics employees, there’s no good reason to expect the contract to be limiting US intelligence and defense agencies if those find it important to use the models outside of how Antrophics expects them to be used.
This sounds to me like a very carefully worded nondenail denail.
If you say that one example of how you can break your terms is to allow a select government entity to do foreign intelligence analysis in accordance with applicable law and not do disinformation campaigns, you are not denying that another example of how you could do expectations is to allow disinformation campaigns.
If Antrophics would be sincere in this being the only expectation that’s made, it would be easy to add a promise to Exceptions to our Usage Policy, that Anthropic will publish all expectations that they make for the sake of transparency.
Don’t forget, that probably only a tiny number of Anthropic employees have seen the actual contracts and there’s a good chance that those are build by classification from talking with other Anthropics employees about what’s in the contracts.
At Antrophics you are a bunch of people who are supposed to think about AI safety and alignment in general. You could think of this as a testcase of how to design mechanisms for alignment and the Exceptions to our Usage Policy seems like a complete failure in that regard, because it neither contains mechanism to make all expectations public nor any mechanisms to make sure that the policies are followed in practice.
I’m kind of against it. There’s a line and I draw it there, it’s just too much power waiting to fall in a bad actor’s hand...
I likely agree that anthropic-><-palantir is good, but i disagree about blocking hte US government out of AI being a viable strategy. It seems to me like many military projects get blocked by inefficient beaurocracy, and it seems plausible to me for some legacy government contractors to get exclusive deals that delay US military ai projects for 2+ years
Building in california is bad for congresspeople! better to build across all 50 states like United Launch Alliance
I think people opposing this have a belief that the counterfactual is “USG doesn’t have LLMs” instead of “USG spins up its own LLM development effort using the NSA’s no-doubt-substantial GPU clusters”.
Needless to say, I think the latter is far more likely.
NSA building it is arguably better because atleast they won’t sell it to countries like Saudi Arabia, and they have better ability to prevent people quitting or diffusing knowledge and code to companies outside.
Also most people in SF agree working for the NSA is morally grey at best, and Anthropic won’t be telling everyone this is morally okay.
I don’t have much trouble with you working with the US military. I’m more worried about the ties to Peter Thiel.
This explanation seems overly convenient.
When faced with evidence which might update your beliefs about Anthropic, you adopt a set of beliefs which, coincidentally, means you won’t risk losing your job.
How much time have you spent analyzing the positive or negative impact of US intelligence efforts prior to concluding that merely using Claude for intelligence “seemed fine”?
What future events would make you re-evaluate your position and state that the partnership was a bad thing?
Example:
-- A pro-US despot rounds up and tortures to death tens of thousands of pro-union activists and their families. Claude was used to analyse social media and mobile data, building a list of people sympathetic to the union movement, which the US then gave to their ally.
EDIT: The first two sentences were overly confrontational, but I do think either question warrants an answer.
As a highly respected community member and prominent AI safety researchers, your stated beliefs and justifications will be influential to a wide range of people.
Personally, I think that overall it’s good on the margin for staff at companies risking human extinction to be sharing their perspectives on criticisms and moving towards having dialogue at all, so I think (what I read as) your implicit demand for Evan Hubinger to do more work here is marginally unhelpful; I weakly think quick takes like this are marginally good.
I will add: It’s odd to me, Stephen, that this is your line for (what I read as) disgust at Anthropic staff espousing extremely convenient positions while doing things that seem to you to be causing massive harm. To my knowledge the Anthropic leadership has ~never engaged in public dialogue about why they’re getting rich building potentially-omnicidal-minds with worthy critics like Hinton, Bengio, Russell, Yudkowsky, etc, so I wouldn’t expect them or their employees to have high standards for public defenses of far less risky behavior like working with the US military.[1]
As an example of the low standards for Anthropic’s public discourse, notice how a recent essay about what’s required for Anthropic to succeed at AI Safety by Sam Bowman (a senior safety researcher at Anthropic) flatly states “Our ability to do our safety work depends in large part on our access to frontier technology… staying close to the frontier is perhaps our top priority in Chapter 1” with ~no defense of this claim or engagement with the sorts of reasons that I consider adding a marginal competitor to the suicide race is an atrocity, or acknowledgement that this makes him personally very wealthy (i.e. he and most other engineers at Anthropic will make millions of dollars due to Anthropic acting on this claim).
No disagreement.
The community seems to be quite receptive to the opinion, it doesn’t seem unreasonable to voice an objection. If you’re saying it is primarily the way I’ve written it that makes it unhelpful, that seems fair.
I originally felt that either question I asked would be reasonably easy to answer, if time was given to evaluating the potential for harm.
However, given that Hubinger might have to run any reply by Anthropic staff, I understand that it might be negative to demand further work. This is pretty obvious, but didn’t occur to me earlier.
Ultimately, the original quicktake was only justifying one facet of Anthropic’s work so that’s all I’ve engaged with. It would seem less helpful to bring up my wider objections.
I don’t expect them to have a high standard for defending Anthropic’s behavior, but I do expect the LessWrong community to have a high standard for arguments.
Thanks for the responses, I have a better sense of how you’re thinking about these things.
I don’t feel much desire to dive into this further, except I want to clarify one thing, on the question of any demands in your comment.
That actually wasn’t primarily the part that felt like a demand to me. This was the part:
I’m not quite sure what the relevance of the time was if not to suggest it needed to be high. I felt that this line implied something like “If your answer is around ’20 hours’, then I want to say that the correct standard should be ‘200 hours’”. I felt like it was a demand that Hubinger may have to spend 10x the time thinking about this question before he met your standards for being allowed to express his opinion on it.
But perhaps you just meant you wanted him to include an epistemic status, like “Epistemic status: <Here’s how much time I’ve spent thinking about this question>”.
We live in an information society → “You” are trying to build the ultimate dual use information tool/thing/weapon → The government require your service. No news there. So why the need to whitewash this? What about this is actually bothering you?