Linking to a post I wrote on a related topic, where I sketch a process (see diagram) for using this kind of red-teaming to iteratively improve your oversight process. (I’m more focussed on a scenario where you’re trying to offload as much of the work in evaluating and improving your oversight process to AIs)
Linking to a post I wrote on a related topic, where I sketch a process (see diagram) for using this kind of red-teaming to iteratively improve your oversight process. (I’m more focussed on a scenario where you’re trying to offload as much of the work in evaluating and improving your oversight process to AIs)