The United States has laws that prevent the US intelligence and defense agencies from spying on their own population. The Snowden revelations showed us that the US intelligence and defense agencies did not abide by those limits.
Facebook has a usage policy that forbids running misinformation campaigns on their platform. That did not stop US intelligence and defense agencies from running disinformation campaigns on their platform.
Instead of just trusting contracts, Antrophics could add oversight mechanisms, so that a few Antrophics employees can look over how the models are used in practice and whether they are used within the bounds that Antrophics expects them to be used in.
If all usage of the models is classified and out of reach of checking by Antrophics employees, there’s no good reason to expect the contract to be limiting US intelligence and defense agencies if those find it important to use the models outside of how Antrophics expects them to be used.
For example, with carefully selected government entities, we may allow foreign intelligence analysis in accordance with applicable law. All other use restrictions in our Usage Policy, including those prohibiting use for disinformation campaigns, the design or use of weapons, censorship, domestic surveillance, and malicious cyber operations, remain.
This sounds to me like a very carefully worded nondenail denail.
If you say that one example of how you can break your terms is to allow a select government entity to do foreign intelligence analysis in accordance with applicable law and not do disinformation campaigns, you are not denying that another example of how you could do expectations is to allow disinformation campaigns.
If Antrophics would be sincere in this being the only expectation that’s made, it would be easy to add a promise to Exceptions to our Usage Policy, that Anthropic will publish all expectations that they make for the sake of transparency.
Don’t forget, that probably only a tiny number of Anthropic employees have seen the actual contracts and there’s a good chance that those are build by classification from talking with other Anthropics employees about what’s in the contracts.
At Antrophics you are a bunch of people who are supposed to think about AI safety and alignment in general. You could think of this as a testcase of how to design mechanisms for alignment and the Exceptions to our Usage Policy seems like a complete failure in that regard, because it neither contains mechanism to make all expectations public nor any mechanisms to make sure that the policies are followed in practice.
The United States has laws that prevent the US intelligence and defense agencies from spying on their own population. The Snowden revelations showed us that the US intelligence and defense agencies did not abide by those limits.
Facebook has a usage policy that forbids running misinformation campaigns on their platform. That did not stop US intelligence and defense agencies from running disinformation campaigns on their platform.
Instead of just trusting contracts, Antrophics could add oversight mechanisms, so that a few Antrophics employees can look over how the models are used in practice and whether they are used within the bounds that Antrophics expects them to be used in.
If all usage of the models is classified and out of reach of checking by Antrophics employees, there’s no good reason to expect the contract to be limiting US intelligence and defense agencies if those find it important to use the models outside of how Antrophics expects them to be used.
This sounds to me like a very carefully worded nondenail denail.
If you say that one example of how you can break your terms is to allow a select government entity to do foreign intelligence analysis in accordance with applicable law and not do disinformation campaigns, you are not denying that another example of how you could do expectations is to allow disinformation campaigns.
If Antrophics would be sincere in this being the only expectation that’s made, it would be easy to add a promise to Exceptions to our Usage Policy, that Anthropic will publish all expectations that they make for the sake of transparency.
Don’t forget, that probably only a tiny number of Anthropic employees have seen the actual contracts and there’s a good chance that those are build by classification from talking with other Anthropics employees about what’s in the contracts.
At Antrophics you are a bunch of people who are supposed to think about AI safety and alignment in general. You could think of this as a testcase of how to design mechanisms for alignment and the Exceptions to our Usage Policy seems like a complete failure in that regard, because it neither contains mechanism to make all expectations public nor any mechanisms to make sure that the policies are followed in practice.