Most of the sections are just lists of interesting questions for further research; the lists of questions seem fairly comprehensive. The section on avoiding arms races has something more in the way of conceptually breaking up the space—in particular, the third and fourth paragraphs distill the basic models around these topics in a way I found useful. My guess is that this section is most representative of the future work of Allan Dafoe.
6.4 Avoiding or Ending the Race
Given the likely large risks from an AI race, it is imperative to examine possible routes for avoiding races or ending one underway. The political solutions to global public bads are, in increasing explicitness and institutionalization: norms, agreements (“soft law”), treaties, or institutions. These can be bilateral, multilateral, or global. Norms involve a rough mutual understanding about what (observable) actions are unacceptable and what sanctions will be imposed in response. Implicit norms have the advantage that they can arise without explicit consent, but the disadvantage that they tend to be crude, and are thus often inadequate and may even be misdirected. A hardened form of international norms is customary law, though absent [109] a recognized international judiciary this is not likely relevant for great-power cooperation.[110]
Diplomatic agreements and treaties involve greater specification of the details of compliance and enforcement; when well specified these can be more effective, but require greater levels of cooperation to achieve. Institutions, such as the WTO, involve establishing a bureaucracy with the ability to clarify ambiguous cases, verify compliance, facilitate future negotiations, and sometimes the ability to enforce compliance. International cooperation often begins with norms, proceeds to (weak) bilateral or regional treaties, and consolidates with institutions.
Some conjectures about when international cooperation in transformative AI will be more likely are when: (1) the parties mutually perceive a strong interest in reaching a successful agreement (great risks from non-cooperation or gains from cooperation, low returns on unilateral steps); (2) when the parties otherwise have a trusting relationship; (3) when there is sufficient consensus about what an agreement should look like (what compliance consists of), which is more likely if the agreement is simple, appealing, and stable; (4) when compliance is easily, publicly, and rapidly verifiable; (5) when the risks from being defected on are low, such as if there is a long “breakout time”, a low probability of a power transition because technology is defense dominant, and near-term future capabilities are predictably non-transformative; (6) the incentives to defect are otherwise low. Compared to other domains, AI appears in some ways less amenable to international cooperation—conditions (3), (4), (5), (6)--but in other ways could be more amenable, namely (1) if the parties come to perceive existential risks from unrestricted racing and tremendous benefits from cooperating, (2) because China and the West currently have a relatively cooperative relationship compared to other international arms races, and there may be creative technical possibilities for enhancing (4) and (5). We should actively pursue technical and governance research today to identify and craft potential agreements.
Third-Party Standards, Verification, Enforcement, and Control
One set of possibilities for avoiding an AI arms race is the use of third party standards, verification, enforcement, and control. What are the prospects for cooperation through third party institutions? The first model, almost certainly worth pursuing and feasible, is an international “safety” agency responsible for “establishing and administering safety standards.” This is crucial to achieve common knowledge about what counts as compliance. The second [111] “WTO” or “IAEA” model builds on the first by also verifying and ruling on non-compliance, after which it authorizes states to impose sanctions for noncompliance. The third model is stronger still, endowing the institution with sufficient capabilities to enforce cooperation itself. The fourth, “Atomic Development Authority” model, involves the agency itself controlling the dangerous materials; this would involve building a global AI development regime sufficiently outside the control of the great powers, with a monopoly on this (militarily) strategic technology. Especially in the fourth case, but also for the weaker models, great care will need to go into their institutional design to assure powerful actors, and ensure competence and good motivation.
Such third party models entail a series of questions about how such institutions could be implemented. What are the prospects that great powers would give up sufficient power to a global inspection agency or governing body? What possible scenarios, agreements, tools, or actions could make that more plausible? What do we know about how to build government that is robust against sliding into totalitarianism and other malignant forms (see section 4.1)? What can we learn from similar historical episodes, such as the failure of the Acheson-Lilienthal Report and Baruch Plan, the success of arms control efforts that led towards the 1972 Anti-Ballistic Missile (ABM) Treaty, and episodes of attempted state formation? 112
There may also be other ways to escape the race. Could one side form a winning or encompassing coalition? Could one or several racers engage in unilateral “stabilization” of the world, without risking catastrophe? The section AI Ideal Governance discusses the desirable properties of a candidate world hegemon.
[110] Cf. Williamson, Richard. “Hard Law, Soft Law, and Non-Law in Multilateral Arms Control: Some Compliance Hypotheses.” Chicago Journal of International Law 4, no. 1 (April 1, 2003). https://chicagounbound.uchicago.edu/cjil/vol4/iss1/7. 38
Most of the sections are just lists of interesting questions for further research; the lists of questions seem fairly comprehensive. The section on avoiding arms races has something more in the way of conceptually breaking up the space—in particular, the third and fourth paragraphs distill the basic models around these topics in a way I found useful. My guess is that this section is most representative of the future work of Allan Dafoe.