Verification methods for international AI agreements

Link post

TLDR: A new paper summarizes some verification methods for international AI agreements. See also summaries on LinkedIn and Twitter.

Several co-authors and I are currently planning some follow-up projects about verification methods. There are also at least 2 other groups planning to release reports on verification methods. If you have feedback or are interested in getting involved, please feel free to reach out.

Overview

There have been many calls for potential international agreements around the development or deployment of advanced AI. If governments become more concerned about AI risks, there might be a short window of time in which ambitious international proposals are seriously considered. If this happens, I expect many questions will be raised, such as:

  • Can compliance with international AI agreements be robustly verified?

  • What tactics could adversaries use to try to secretly develop unauthorized AI projects or unauthorized data centers?

  • What assumptions do various verification methods rely on? Under what circumstances could they be deployed?

Our paper attempts to get readers thinking about these questions and considering the kinds of verification methods that nations could deploy. The paper is not conclusive– its main goal is to provide some framings/​concepts/​descriptions/​examples that can help readers orient to this space & inspire future research.

I’d be especially interested in feedback on the following questions:

  • New verification methods. What are verification methods that are missing from our existing list?

  • Evasion strategies. What kinds of things could adversaries do to hide AI development or data centers?

  • Technical advances. What kinds of technical advances could make verification easier or harder? (For example, distributed training could make verification harder; LLMs that could securely scan code could make verification easier.)

Abstract

What techniques can be used to verify compliance with international agreements about advanced AI development? In this paper, we examine 10 verification methods that could detect two types of potential violations: unauthorized AI training (e.g., training runs above a certain FLOP threshold) and unauthorized data centers. We divide the verification methods into three categories: (a) national technical means (methods requiring minimal or no access from suspected non-compliant nations), (b) access-dependent methods (methods that require approval from the nation suspected of unauthorized activities), and (c) hardware-dependent methods (methods that require rules around advanced hardware). For each verification method, we provide a description, historical precedents, and possible evasion techniques. We conclude by offering recommendations for future work related to the verification and enforcement of international AI governance agreements.

Executive summary

Efforts to maximize the benefits and minimize the global security risks of advanced AI may lead to international agreements. This paper outlines methods that could be used to verify compliance with such agreements. The verification methods we cover are focused on detecting two potential violations:

  • Unauthorized AI development (for example, AI development that goes beyond a FLOP threshold set by an international agreement, or the execution of a training run that has not received a license).

  • Unauthorized data centers (for example, data centers that go beyond a maximum computing capacity limit or networking limit set by an international agreement).

Verification methods

We identify 10 verification methods and divide them into three categories:

  1. National technical means. Methods that can be used by nations unilaterally.

  2. Access-dependent methods. Methods that require a nation to grant access to national or international inspectors

  3. Hardware-dependent methods. Methods that require agreements pertaining to advanced hardware

National technical means

  1. Remote sensing: Detect unauthorized data centers and semiconductor manufacturing via visual and thermal signatures.

  2. Whistleblowers: Incentivize insiders to report non-compliance.

  3. 3. Energy monitoring: Detect power consumption patterns that suggest the potential presence of large GPU clusters.

  4. 4. Customs data analysis: Track the movement of critical AI hardware and raw materials.

  5. Financial intelligence: Monitor large financial transactions related to AI development.

Access-dependent methods

  1. Datacenter inspections: Conduct inspections of sites to assess the size of a data center, verify compliance with hardware agreements, and verify compliance with other safety and security agreements.

  2. Semiconductor manufacturing facility inspections: Conduct inspections of sites to determine the quantity of chip production and verify that chip production conforms to any agreements around advanced hardware.

  3. AI developer inspections: Conduct inspections of AI development facilities via interviews, document and training transcript audits, and potential code reviews.

Hardware-dependent methods

  1. Chip location tracking: Automatic location tracking of advanced AI chips.

  2. Chip-based reporting: Automatic notification if chips are used for unauthorized purposes.

Limitations and considerations

The verification methods we propose have some limitations, and there are many complicated national and international considerations that would influence if and how they are implemented. Some of these include:

  • Invasiveness: Some methods (especially on-site inspections) may be seen as intrusive and could raise concerns about privacy and sovereignty. Several factors could influence a nation’s willingness to accept invasive measures (e.g., the amount of international tension or distrust between nations, the degree to which nations are concerned about risks from advanced AI, the exact types of risks that nations find most concerning.)

  • Imperfect detection: No single method is foolproof. However, the combination of multiple methods could create a “Swiss chees” model, where the weaknesses of one method are covered by the strengths of others.

  • Developmental stage: Some methods (especially the hardware-dependent ones) may require additional R&D. Furthermore, unlike methods that have been used for decades in other areas, the real-world effectiveness of some hardware-dependent methods has not yet been determined.

Future directions

Our work provides a foundation for discussions on AI governance verification, but several key areas require further research:

  • Red-teaming exercises for verification regimes. Future work could examine how adversaries might attempt to circumvent a verification regime, describe potential evasion methods, and develop robust countermeasures to improve the effectiveness of the verification regime.

  • Design of international AI governance institutions. Future work could examine how international AI governance institutions should be designed, potentially drawing lessons from existing international bodies. Such work could explore questions such as: (a) what specific powers should be granted to the international institution, (b) how the institution should make core decisions, (c) how power is distributed between nations, and (d) how to handle potential violations or instances of non-compliance.

  • Enforcement strategies. Future work could examine what kinds of responses could be issued if non-compliance is discovered. This includes examining how such responses can be proportionate to the severity of the violation.

  • Development of tamper-proof and privacy-preserving hardware-enabled verification mechanisms. Future R&D efforts could improve the effectiveness, feasibility, robustness, or desirability of various hardware-dependent verification methods.

Crossposted to EA Forum (0 points, 0 comments)