aysja comments on AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work

aysja 22 Aug 2024 21:03 UTC
72 points
85
Some commentary (e.g. here) also highlighted (accurately) that the FSF doesn’t include commitments. This is because the science is in early stages and best practices will need to evolve.
I think it’s really alarming that this safety framework contains no commitments, and I’m frustrated that concern about this is brushed aside. If DeepMind is aiming to build AGI soon, I think it’s reasonable to expect they have a mitigation plan more concrete than a list of vague IOU’s. For example, consider this description of a plan from the FSF:
When a model reaches evaluation thresholds (i.e. passes a set of early warning evaluations), we will formulate a response plan based on the analysis of the CCL and evaluation results.
This isn’t a baseline of reasonable practices upon which better safety practices can evolve, so much as a baseline of zero—the fact that DeepMind will take some unspecified action, rather than none, if an evaluation triggers does not appear to me to be much of a safety strategy at all.
Which is especially concerning, given that DeepMind might create AGI within the next few years. If it’s too early to make commitments now, when do you expect that will change? What needs to happen, before it becomes reasonable to expect labs to make safety commitments? What sort of knowledge are you hoping to obtain, such that scaling becomes a demonstrably safe activity? Because without any sense of when these “best practices” might emerge, or how, the situation looks alarmingly much like the labs requesting a blank check for as long as they deem fit, and that seems pretty unacceptable to me.
AI is a nascent field. But to my mind its nascency is what’s so concerning: we don’t know how to ensure these systems are safe, and the results could be catastrophic. This seems much more reason to not build AGI than it does to not offer any safety commitments. But if the labs are going to do the former, then I think they have a responsibility to provide more than a document of empty promises—they should be able to state, exactly, what would cause them to stop building these potentially lethal machines. So far no lab has succeeded at this, and I think that’s pretty unacceptable. If labs would like to take up the mantle of governing their own safety, then they owe humanity the service of meaningfully doing it.