This post recommends that we think about AI alignment research in the following framework:
1. Defining the problem and its terms: for example, we might want to define “agency”, “optimization”, “AI”, and “well-behaved”.
2. Exploring these definitions, to see what they entail.
3. Solving the now well-defined problem.
This is explicitly _not_ a paradigm, but rather a framework in which we can think about possible paradigms for AI safety. A specific paradigm would choose a specific problem formulation and definition (or at least something significantly more concrete than “solve AI safety”). However, we are not yet sufficiently deconfused to be able to commit to a specific paradigm; hence this overarching framework.
Planned summary for the Alignment Newsletter: