Chris_Leong comments on Chris_Leong’s Shortform

Chris_Leong 23 May 2024 5:23 UTC
10 points
2
I’ll post some extracts from the Seoul Summit. I can’t promise that this will be a particularly good summary, I was originally just writing this for myself, but maybe it’s helpful until someone publishes something that’s more polished:
Frontier AI Safety Commitments, AI Seoul Summit 2024

The major AI companies have agreed to Frontier AI Safety Commitments. In particular, they will publish a safety framework focused on severe risks: “internal and external red-teaming of frontier AI models and systems for severe and novel threats; to work toward information sharing; to invest in cybersecurity and insider threat safeguards to protect proprietary and unreleased model weights; to incentivize third-party discovery and reporting of issues and vulnerabilities; to develop and deploy mechanisms that enable users to understand if audio or visual content is AI-generated; to publicly report model or system capabilities, limitations, and domains of appropriate and inappropriate use; to prioritize research on societal risks posed by frontier AI models and systems; and to develop and deploy frontier AI models and systems to help address the world’s greatest challenges”

″Risk assessments should consider model capabilities and the context in which they are developed and deployed”—I’d argue that the context in which it is deployed should account for whether it is open or closed source/weights

”They should also be accompanied by an explanation of how thresholds were decided upon, and by specific examples of situations where the models or systems would pose intolerable risk.”—always great to make policy concrete”

In the extreme, organisations commit not to develop or deploy a model or system at all, if mitigations cannot be applied to keep risks below the thresholds.”—Very important that when this is applied the ability to iterate on open-source/weight models is taken into account
https://www.gov.uk/government/publications/frontier-ai-safety-commitments-ai-seoul-summit-2024/frontier-ai-safety-commitments-ai-seoul-summit-2024

Seoul Declaration for safe, innovative and inclusive AI by participants attending the Leaders’ Session
Signed by Australia, Canada, the European Union, France, Germany, Italy, Japan, the Republic of Korea, the Republic of Singapore, the United Kingdom, and the United States of America.

”We support existing and ongoing efforts of the participants to this Declaration to create or expand AI safety institutes, research programmes and/or other relevant institutions including supervisory bodies, and we strive to promote cooperation on safety research and to share best practices by nurturing networks between these organizations”—guess we should now go full-throttle and push for the creation of national AI Safety institutes
“We recognise the importance of interoperability between AI governance frameworks”—useful for arguing we should copy things that have been implemented overseas.
“We recognize the particular responsibility of organizations developing and deploying frontier AI, and, in this regard, note the Frontier AI Safety Commitments.”—Important as Frontier AI needs to be treated as different from regular AI.

https://www.gov.uk/government/publications/seoul-declaration-for-safe-innovative-and-inclusive-ai-ai-seoul-summit-2024/seoul-declaration-for-safe-innovative-and-inclusive-ai-by-participants-attending-the-leaders-session-ai-seoul-summit-21-may-2024
Seoul Statement of Intent toward International Cooperation on AI Safety Science

Signed by the same countries.
“We commend the collective work to create or expand public and/or government-backed institutions, including AI Safety Institutes, that facilitate AI safety research, testing, and/or developing guidance to advance AI safety for commercially and publicly available AI systems”—similar to what we listed above, but more specifically focused on AI Safety Institutes which is a great.

”We acknowledge the need for a reliable, interdisciplinary, and reproducible body of evidence to inform policy efforts related to AI safety”—Really good! We don’t just want AIS Institutes to run current evaluation techniques on a bunch of models, but to be actively contributing to the development of AI safety as a science.
“We articulate our shared ambition to develop an international network among key partners to accelerate the advancement of the science of AI safety”—very important for them to share research among each other
https://www.gov.uk/government/publications/seoul-declaration-for-safe-innovative-and-inclusive-ai-ai-seoul-summit-2024/seoul-statement-of-intent-toward-international-cooperation-on-ai-safety-science-ai-seoul-summit-2024-annex

Seoul Ministerial Statement for advancing AI safety, innovation and inclusivity
Signed by: Australia, Canada, Chile, France, Germany, India, Indonesia, Israel, Italy, Japan, Kenya, Mexico, the Netherlands, Nigeria, New Zealand, the Philippines, the Republic of Korea, Rwanda, the Kingdom of Saudi Arabia, the Republic of Singapore, Spain, Switzerland, Türkiye, Ukraine, the United Arab Emirates, the United Kingdom, the United States of America, and the representative of the European Union
“It is imperative to guard against the full spectrum of AI risks, including risks posed by the deployment and use of current and frontier AI models or systems and those that may be designed, developed, deployed and used in future”—considering future risks is a very basic, but core principle

”Interpretability and explainability”—Happy to interpretability explicitly listed

”Identifying thresholds at which the risks posed by the design, development, deployment and use of frontier AI models or systems would be severe without appropriate mitigations”—important work, but could backfire if done poorly

”Criteria for assessing the risks posed by frontier AI models or systems may include consideration of capabilities, limitations and propensities, implemented safeguards, including robustness against malicious adversarial attacks and manipulation, foreseeable uses and misuses, deployment contexts, including the broader system into which an AI model may be integrated, reach, and other relevant risk factors.”—sensible, we need to ensure that the risks of open-sourcing and open-weight models are considered in terms of the ‘deployment context’ and ‘foreseeable uses and misuses’

”Assessing the risk posed by the design, development, deployment and use of frontier AI models or systems may involve defining and measuring model or system capabilities that could pose severe risks,”—very pleased to see a focus beyond just deployment

”We further recognise that such severe risks could be posed by the potential model or system capability or propensity to evade human oversight, including through safeguard circumvention, manipulation and deception, or autonomous replication and adaptation conducted without explicit human approval or permission. We note the importance of gathering further empirical data with regard to the risks from frontier AI models or systems with highly advanced agentic capabilities, at the same time as we acknowledge the necessity of preventing the misuse or misalignment of such models or systems, including by working with organisations developing and deploying frontier AI to implement appropriate safeguards, such as the capacity for meaningful human oversight”—this is massive. There was a real risk that these issues were going to be ignored, but this is now seeming less likely.

”We affirm the unique role of AI safety institutes and other relevant institutions to enhance international cooperation on AI risk management and increase global understanding in the realm of AI safety and security.”—“Unique role”, this is even better!

”We acknowledge the need to advance the science of AI safety and gather more empirical data with regard to certain risks, at the same time as we recognise the need to translate our collective understanding into empirically grounded, proactive measures with regard to capabilities that could result in severe risks. We plan to collaborate with the private sector, civil society and academia, to identify thresholds at which the level of risk posed by the design, development, deployment and use of frontier AI models or systems would be severe absent appropriate mitigations, and to define frontier AI model or system capabilities that could pose severe risks, with the ambition of developing proposals for consideration in advance of the AI Action Summit in France”—even better than above b/c it commits to a specific action and timeline

https://www.gov.uk/government/publications/seoul-ministerial-statement-for-advancing-ai-safety-innovation-and-inclusivity-ai-seoul-summit-2024