Two more disclaimers from both policies that worry me:
Meta writes:
Security Mitigations—Access is strictly limited to a small number of experts, alongside security protections to prevent hacking or exfiltration insofar as is technically feasible and commercially practicable.
“Commercially practicable” is so load-bearing here. With a disclaimer like this, why not publicly commit to writing million-dollar checks to anyone who asks for one? It basically means “We’ll do this if it’s in our interest, and we won’t if it’s not.” Which is like, duh. That’s the decision procedure for everything you do.
I do think the public intention setting is good, and it might support the codification of these standards, but it does not a commitment make.
Google, on the other hand, has this disclaimer:
The safety of frontier AI systems is a global public good. The protocols here represent our current understanding and recommended approach of how severe frontier AI risks may be anticipated and addressed. Importantly, there are certain mitigations whose social value is significantly reduced if not broadly applied to AI systems reaching critical capabilities. These mitigations should be understood as recommendations for the industry collectively: our adoption of them would only result in effective risk mitigation for society if all relevant organizations provide similar levels of protection, and our adoption of the protocols described in this Framework may depend on whether such organizations across the field adopt similar protocols.
I think it’s funny these are coming out this week as an implicit fulfillment of the Seoul commitments. Did I miss some language in the Seoul commitments saying “we can abandon these promises if others are doing worse or if it’s not in our commercial interest?”
And, if not, do policies with disclaimers like these really count as fulfilling a commitment? Or is it more akin to Anthropic’s non-binding anticipated safeguards for ASL-3? If it’s the latter, then fine, but I wish they’d label it as such.
It’s the first official day of the AI
SafetyAction Summit, and thus it’s also the day that the Seoul Commitments (made by sixteen companies last year to adopt an RSP/safety framework) have come due.I’ve made a tracker/report card for each of these policies at www.seoul-tracker.org.
I’ll plan to keep this updated for the foreseeable future as policies get released/modified. Don’t take the grades too seriously — think of it as one opinionated take on the quality of the commitments as written, and in cases where there is evidence, implemented. Do feel free to share feedback if anything you see surprises you, or if you think the report card misses something important.
My personal takeaway is that both compliance and quality for these policies are much worse than I would have hoped. I believe many peoples’ theories of change for these policies gesture at something about a race to the top, where companies are eager to outcompete each other on safety to win talent and public trust, but I don’t sense much urgency or rigor here. Another theory of change is that this is a sort of laboratory for future regulation, where companies can experiment now with safety practices and the best ones could be codified. But most of the diversity between policies here is in how vague they can be while claiming to manage risks :/
I’m really hoping this changes as AGI gets closer and companies feel they need to do more to prove to govts/public that they can be trusted. Part of my hope is that this report card makes clear to outsiders that not all voluntary safety frameworks are equally credible.