Mostly agree. For some more starting points, see posts with the AI-assisted alignment tag. I recently did a rough categorization of strategies for AI-assisted alignment here.
If this strategy is promising, it likely recommends fairly different prioritisation from what the alignment community is currently doing.
Not totally sure about this, my impression (see chart here) is that much of the community already considers some form of AI-assisted alignment to be our best shot. But I’d still be excited for more in-depth categorization and prioritization of strategies (e.g. I’d be interested in “AI-assisted alignment” benchmarks that different strategies could be tested against). I might work on something like this myself.
Mostly agree. For some more starting points, see posts with the AI-assisted alignment tag. I recently did a rough categorization of strategies for AI-assisted alignment here.
Not totally sure about this, my impression (see chart here) is that much of the community already considers some form of AI-assisted alignment to be our best shot. But I’d still be excited for more in-depth categorization and prioritization of strategies (e.g. I’d be interested in “AI-assisted alignment” benchmarks that different strategies could be tested against). I might work on something like this myself.