Miles Brundage: Finding Ways to Credibly Signal the Benignness of AI Development and Deployment is an Urgent Priority

Link post

Miles Brundage has a new substack and I like this post. Here’s the introduction.

A simplified view of AI policy is that there are two “arenas” with distinct challenges and opportunities. In the domestic arena, governments aim to support AI innovation within their borders and ensure that it is widely beneficial to their citizens, while simultaneously preventing that innovation from harming citizens’ interests (safety, human rights, economic well-being, etc). In the international arena, governments aim to leverage AI to improve their national security, while minimizing the extent to which these efforts cause other governments to fear for their security (unless this is the explicit intent in a given context, e.g., deterrence or active conflict).

For example, in the domestic arena, a government might require that companies assess the degree to which a language model is capable of aiding in the creation of biological weapons, and share information about the process of red teaming that model and mitigating such risks. And governments could require documentation of known biases in the model. In each case, the government would be attempting to ensure, and produce evidence, that an AI system is benign, in the sense of not posing threats to the citizens of that country (although in some cases the government may also be mitigating risks to citizens of other countries, particularly for catastrophic risks that spill across borders, of companies with a global user base). Citizens may also demand evidence that a government’s own use of AI is benign — e.g., not infringing on citizens’ privacy — or that appropriate precautions are taken in high-stakes use cases, or one part of government might demand this of another part of government.

In the international arena, a military might build an autonomous weapon system and attempt to demonstrate that it will be used for defensive purposes only. A military might also state that it will not use AI in certain contexts like nuclear command and control.

In all of these cases, outside observers might be justifiably skeptical of these claims being true, or staying true over time, without evidence. The common theme here is that many parties would benefit from it being possible to credibly signal the benignness of AI development and deployment. Those parties include organizations developing and deploying AI (who want to be trusted), governments (who want citizens to trust them and the services they use), commercial or national competitors of a given company/​country (who want to know that precautions are being taken and that cutting corners in order to keep up can be avoided), etc. By credibly signaling benignness, I mean demonstrating that a particular AI system, or an organization developing or deploying AI systems, is not a significant danger to third parties. Domestically, governments should seek to ensure that their agencies’ use of AI as well as private actors’ use of AI is benign. Internationally, militaries should seek to credibly signal benignness and should demand the same of their adversaries, again at least outside of the context of deterrence or active conflict.

When I say that an AI system or organization is shown to not pose a significant danger to third parties, I mean that outsiders should have high confidence that:

  • The AI developer or deployer will not accidentally cause catastrophic harms, enable others to cause catastrophic harms, or have lax enough security to allow others to steal their IP and then cause catastrophic harms.

  • The AI developer or deployer’s statements about evaluation and mitigation of risks, including catastrophic and non-catastrophic risks, are complete and accurate

  • The AI developer or deployer’s statements regarding how their technology is being used (and ways in which its use is restricted) are complete and accurate

  • The AI developer or deployer is not currently planning to use their capabilities in a way that intentionally causes harm to others, and if this were to change, there would be visible signs of the change far enough in advance for appropriate actions to be taken.

Unfortunately, it is inherently difficult to credibly signal the benign of AI development and deployment (at a system level or an organization level) due to AI’s status as a general purpose technology, and because the information required to demonstrate benignness may compromise security, privacy, or intellectual property. This makes the research, development, and piloting of new ways to credibly signal the benign of AI development and deployment, without causing other major problems in the process, an urgent priority.

Miles recommends research to determine what information labs should share, government monitoring compute “to ensure that large-scale illicit projects aren’t possible,” verification for demonstrating benignness internationally, and more.

(Context: Miles recently left OpenAI; see his Why I’m Leaving OpenAI and What I’m Doing Next and Garrison Lovely’s Miles Brundage resigned from OpenAI, and his AGI readiness team was disbanded.)

Edit: as mentioned in the comments, this post mostly sets aside the alignment/​control problem to focus on another problem.