Personally, I like mentally splitting the space into AI safety (emphasis on measurement and control), AI alignment (getting it to align to the operators purposes and actually do what the operators desire), and AI value-alignment (getting the AI to understand and care about what people need and want).
Feels like a Venn diagram with a lot of overlap, and yet some distinct non-overlap spaces.
By my framing, Redwood research and METR are more centrally AI safety. ARC/Paul’s research agenda more of a mix of AI safety and AI alignment. MIRI’s work to fundamentally understand and shape Agents is a mix of AI alignment and AI value-alignment. Obviously success there would have the downstream effect of robustly improving AI safety (reducing the need for careful evals and control), but is a more difficult approach in general with less immediate applicability.
I think we need all these things!
Personally, I like mentally splitting the space into AI safety (emphasis on measurement and control), AI alignment (getting it to align to the operators purposes and actually do what the operators desire), and AI value-alignment (getting the AI to understand and care about what people need and want). Feels like a Venn diagram with a lot of overlap, and yet some distinct non-overlap spaces.
By my framing, Redwood research and METR are more centrally AI safety. ARC/Paul’s research agenda more of a mix of AI safety and AI alignment. MIRI’s work to fundamentally understand and shape Agents is a mix of AI alignment and AI value-alignment. Obviously success there would have the downstream effect of robustly improving AI safety (reducing the need for careful evals and control), but is a more difficult approach in general with less immediate applicability. I think we need all these things!