Deployment mitigations level 2 discusses the need for mitigations on internal deployments.
Good point; this makes it clearer that “deployment” means external deployment by default. But level 2 only mentions “internal access of the critical capability,” which sounds like it’s about misuse — I’m more worried about AI scheming and escaping when the lab uses AIs internally to do AI development.
ML R&D will require thinking about internal deployments (and so will many of the other CCLs).
OK. I hope DeepMind does that thinking and makes appropriate commitments.
two-party control
Thanks. I’m pretty ignorant on this topic.
“every 3 months of fine-tuning progress” was meant to capture [during deployment] as well
Good point; this makes it clearer that “deployment” means external deployment by default. But level 2 only mentions “internal access of the critical capability,” which sounds like it’s about misuse — I’m more worried about AI scheming and escaping when the lab uses AIs internally to do AI development.
You’re right: our deployment mitigations are targeted at misuse only because our current framework focuses on misuse. As we note in the “Future work” section, we would need to do more work to address risks from misaligned AI. We focused on risks from deliberate misuse initially because they seemed more likely to us to appear first.
Thanks.
Good point; this makes it clearer that “deployment” means external deployment by default. But level 2 only mentions “internal access of the critical capability,” which sounds like it’s about misuse — I’m more worried about AI scheming and escaping when the lab uses AIs internally to do AI development.
OK. I hope DeepMind does that thinking and makes appropriate commitments.
Thanks. I’m pretty ignorant on this topic.
Yayyy!
You’re right: our deployment mitigations are targeted at misuse only because our current framework focuses on misuse. As we note in the “Future work” section, we would need to do more work to address risks from misaligned AI. We focused on risks from deliberate misuse initially because they seemed more likely to us to appear first.