Structured access (GovAI: Shevlane 2022): structured access is about deploying systems to get some of the benefits of open-sourcing without all of the costs. Releasing via API is a simple example of structured access.
[Maybe a lot of early AI risk—risk from AIs that are just powerful enough to be extremely useful—]comes from the lab using AIs internally to do AI development (by which I mean both research and engineering). This is because the AIs doing AI development naturally require access to compute and model weights that they can potentially leverage into causing catastrophic outcomes—in particular, those resources can be abused to run AIs unmonitored.
Using AIs for AI development looks uniquely risky to me among applications of early-transformative AIs, because unlike all other applications I know about:
It’s very expensive to refrain from using AIs for this application.
There’s no simple way to remove affordances from the AI such that it’s very hard for the AI to take a small sequence of actions which plausibly lead quickly to loss of control. In contrast, most other applications of AI probably can be controlled just by restricting their affordances.
I wrote this post because I’m helping BlueDot create/run a lab governance session. One constraint they impose is focusing on OpenAI, so I made an OpenAI section. Other than that, this doc is just my recommendations.
Lab governance reading list
What labs should do
The table/list in Towards best practices in AGI safety and governance: A survey of expert opinion (GovAI: Schuett et al. 2023): a collection of many briefly-described safety practices
Responsible Scaling Policies and Key Components of an RSP (METR 2023)
Model evals for dangerous capabilities (Stein-Perlman 2024): just the summary
Safety cases (arguments that an AI system is safe to deploy): Clymer twitter thread
Structured access (GovAI: Shevlane 2022): structured access is about deploying systems to get some of the benefits of open-sourcing without all of the costs. Releasing via API is a simple example of structured access.
More: Deployment Corrections (IAPS: O’Brien et al. 2023); Open-Sourcing Highly Capable Foundation Models (GovAI: Seger et al. 2023)
Yet more:Assessing AI Foundation Model Risk Along a Gradient of Access(Institute for Security and Technology: Brammer et al. 2023);The Gradient of Generative AI Release(Solaiman 2023)Control[1]
The case for ensuring that powerful AIs are controlled (Redwood: Greenblatt and Shlegeris 2024): intro and “Appendix: Control techniques from our paper”
Threat model: this quote[2] and The prototypical catastrophic AI action is getting root access to its datacenter (Shlegeris 2022)
Strategy/planning
Racing through a minefield: the AI deployment problem (Karnofsky 2022): describing the dilemma labs face when deciding how to deploy powerful AI and proposing high-level actions for a lab to improve safety
The Checklist: What Succeeding at AI Safety Will Involve (Bowman 2024)
What AI companies should do: Some rough ideas (Stein-Perlman 2024)
OpenAI[3]
Preparedness Framework (Beta): OpenAI’s version of an RSP, describing risk assessment and how to respond to risk assessment results
Optional commentary: Zach Stein-Perlman, Zvi Mowshowitz
o1 System Card: in the pdf, sections 3.3 and 4 (or in the blogpost, “External Red Teaming” and “Preparedness Framework Evaluations”)
Integrity incidents/issues/imperfections (AI Lab Watch): OpenAI section
Resources
Newsletters
Transformer: aggregation and commentary on news relevant to AI safety; also, journalism and analysis; often aimed at normies
AI Lab Watch: what labs are doing and should do, both research and distillation, for high-context AI safety people
Garrison Lovely: journalism and analysis, often aimed at normies
Zvi: everything, too long
Center for AI Safety: occasional AI safety news & analysis
AI Lab Watch
Other scorecards & evaluation—AI Lab Watch
Commitments—AI Lab Watch
Suggestions are welcome. You can put suggestions that don’t deserve their own LW comment in this doc.
Source: The case for ensuring that powerful AIs are controlled (Redwood: Greenblatt and Shlegeris 2024).
Source: Shlegeris 2024.
I wrote this post because I’m helping BlueDot create/run a lab governance session. One constraint they impose is focusing on OpenAI, so I made an OpenAI section. Other than that, this doc is just my recommendations.