Lab governance reading list

What labs should do

OpenAI[3]

Resources

Suggestions are welcome. You can put suggestions that don’t deserve their own LW comment in this doc.

  1. ^

    There are two main lines of defense you could employ to prevent schemers from causing catastrophes.

    • Alignment: Ensure that your models aren’t scheming.

    • Control: Ensure that even if your models are scheming, you’ll be safe, because they are not capable of subverting your safety measures.

    Source: The case for ensuring that powerful AIs are controlled (Redwood: Greenblatt and Shlegeris 2024).

  2. ^

    [Maybe a lot of early AI risk—risk from AIs that are just powerful enough to be extremely useful—]comes from the lab using AIs internally to do AI development (by which I mean both research and engineering). This is because the AIs doing AI development naturally require access to compute and model weights that they can potentially leverage into causing catastrophic outcomes—in particular, those resources can be abused to run AIs unmonitored.

    Using AIs for AI development looks uniquely risky to me among applications of early-transformative AIs, because unlike all other applications I know about:

    • It’s very expensive to refrain from using AIs for this application.

    • There’s no simple way to remove affordances from the AI such that it’s very hard for the AI to take a small sequence of actions which plausibly lead quickly to loss of control. In contrast, most other applications of AI probably can be controlled just by restricting their affordances.

    Source: Shlegeris 2024.

  3. ^

    I wrote this post because I’m helping BlueDot create/​run a lab governance session. One constraint they impose is focusing on OpenAI, so I made an OpenAI section. Other than that, this doc is just my recommendations.