One reason that researchers might disagree on what approaches to take for alignment is that they might be solving different versions of the alignment problem. This post identifies two axes on which the “type” of alignment problem can differ. First, you may consider AI systems with differing levels of capability, ranging from subhuman to wildly superintelligent, with human-level somewhere in the middle. Second, you might be thinking about different mechanisms by which this leads to bad outcomes, where possible mechanisms include <@the second species problem@>(@AGI safety from first principles@) (where AIs seize control of the future from us), the “missed opportunity” problem (where we fail to use AIs as well as we could have, but the AIs aren’t themselves threatening us), and a grab bag of other possibilities (such as misuse of AI systems by bad actors).
Depending on where you land on these axes, you will get to rely on different assumptions that change what solutions you would be willing to consider:
1. **Competence.** If you assume that the AI system is human-level or superintelligent, you probably don’t have to worry about the AI system causing massive problems through incompetence (at least, not to a greater extent than humans do).
2. **Ability to understand itself.** With wildly superintelligent systems, it seems reasonable to expect them to be able to introspect and answer questions about its own cognition, which could be a useful ingredient in a solution that wouldn’t work in other regimes.
3. **Inscrutable plans or concepts.** With sufficiently competent systems, you might be worried about the AI system making dangerous plans you can’t understand, or reasoning with concepts you will never comprehend. Your alignment solution must be robust to this.
Planned opinion:
When I talk about alignment, I am considering the second species problem, with AI systems whose capability level is roughly human-level or more (including “wildly superintelligent”).
Planned summary for the Alignment Newsletter:
Planned opinion: