Great writeup! I recently wrote a brief summary and review of the same paper.
Alaga & Schuett (2023) propose a framework for frontier AI developers to manage potential risk from advanced AI systems, by coordinating pausing in response to models are assessed to have dangerous capabilities, such as the capacity to develop biological weapons.
The scheme has five main steps:
Frontier AI models are evaluated by developers or third parties to test for dangerous capabilities.
If a model is shown to have dangerous capabilities (“fails evaluations”), the developer pauses training and deployment of that model, restricts access to similar models, and delays related research.
Other developers are notified whenever a dangerous model is discovered, and also pause similar work.
The failed model’s capabilities are analyzed and safety precautions are implemented during the pause.
Developers only resume paused work once adequate safety thresholds are met.
The report discusses four versions of this coordination scheme:
Voluntary – developers face public pressure to evaluate and pause but make no formal commitments.
Pausing agreement – developers collectively commit to the process in a contract.
Mutual auditor – developers hire the same third party to evaluate models and require pausing.
Legal requirements – laws mandate evaluation and coordinated pausing.
The authors of the report prefer the third and fourth versions, as they are most effective.
Strengths and weaknesses
The report addresses the important and underexplored question of what AI labs should do in response to evaluations finding dangerous capabilities. Coordinated pausing is a valuable contribution to this conversation. The proposed scheme seems relatively effective and potentially feasible, as it aligns with the efforts of the dangerous-capability evaluation teams of OpenAI and the Alignment Research Center.
A key strength is the report’s thorough description of multiple forms of implementation for coordinated pausing. This ranges from voluntary participation relying on public pressure, to contractual agreements among developers, shared auditing arrangements, and government regulation. Having flexible options makes the framework adaptable and realistic to put into practice, rather than a rigid, one-size-fits-all proposal.
The report acknowledges several weaknesses of the proposed framework, including potential harms from its implementation. For example, coordinated pausing could provide time for competing countries (such as China) to “catch up,” which may be undesirable from a US policy perspective. Pausing could mean that capabilities rapidly increase after a pause, through applying algorithmic improvements discovered during the pause, which may be less safe than a “slow takeoff.”
Additionally, the paper acknowledges concerns with feasibility, such as the potential that coordinated pausing may violate US and EU antitrust law. As a countermeasure, it suggests making “independent commitments to pause without discussing them with each other,” with no retaliation against non-participating AI developers, but defection would seem to be an easy option under such a scheme. It recommends further legal analysis and consultation regarding this topic, but the authors are not able to provide assurances regarding the antitrust concern. The other feasibility concerns – regarding enforcement, verifying that post-deployment models are the same as evaluated models, potential pushback from investors, and so on – are adequately discussed and appear possible to overcome.
One weakness of the report is that the motivation for coordinated pausing is not presented in a compelling manner. The report provides twelve pages of implementation details before explaining the benefits. These benefits, such as “buying more time for safety research,” are indirect and may not be persuasive to a skeptical reader. AI lab employees and policymakers often take a stance that technological innovation, especially in AI, should not be hindered unless otherwise demonstrated. Even if the report intends to take a balanced perspective rather than advocating for the proposed framework, the arguments provided in favor of the framework seem weaker than what is possible.
It seems intuitive that deployment of a dangerous AI system should be halted, though it is worth clearly noting that “failing” a dangerous-capability evaluation does not necessarily mean that the AI system in practice has dangerous capability. However, it is not clear why the development of such a system must also be paused. As long as the dangerous AI system is not deployed, further pretraining of the model does not appear to pose risks. AI developers may be worried about falling behind competitors, so the costs incurred from this requirement must be clearly motivated for them to be on board.
While the report makes a solid case for coordinated pausing, it has gaps around considering additional weaknesses of the framework, explaining its benefits, and solving key feasibility issues. More work may be done to strengthen the argument to make coordinated pausing more feasible.
Were you able to check the prediction in the section “Non-sourcelike references”?