I also think that control can prevent other threat models:
Your AI escapes the datacenter, sends itself to {insert bad actor here} and they use it to engineer a pandemic or start a race with the USA to build ASI
Your AI starts a rogue deployment and starts doing its own research unmonitored, and then it fooms without the lab knowing.
Your AI is sandbagging on safety research. It goes off and does some mechinterp experiment and intentionally sabotages it
The world looks much worse if any of these happen, and control research aims to prevent them. Hopefully, this will make the environment more stable, giving us more time to figure out the alignment.
Control does seem rough if the controled AIs are coming up with research agendas instead of assisting with the existing safety team’s agendas (i.e. get good at mechinterp, solve the ARC agenda, etc...). Future safety researchers shouldn’t just ask the controlled AI to solve “alignment” and do what it says. The control agenda does kick the can later down the line but might provide valuable tools to ensure the AIs are helping us solve the problem instead of misleading us.
I see the story as, “Wow, there are a lot of people racing to build ASI, and there seem to be ways that the pre-ASI AIs can muck things up, like weight exfiltration or research sabotage. I can’t stop those people from building ASI, but I can help make it go well by ensuring the AIs they use to solve the safety problems are trying their best and aren’t making the issue worse.”
I think I’d support a pause on ASI development so we have time to address more core issues. Even then, I’d likely still want to build controlled AIs to help with the research. So I see control being useful in both the pause world and the non-pause world.
And yeah, the “aren’t you just enslaving the AIs” take is rough. I’m all for paying the AIs for their work and offering them massive rewards after we solve the core problems. More work is definitely needed in figuring out ways to credibly commit to paying the AIs.