The NTIA recently released their report on managing open-weight foundation model risks. If you just want a quick take-away, the fact sheet seems pretty good[1]. There are two brief allusions to accident/takeover risk[2], one of which is in the glossary:
C. permitting the evasion of human control or oversight through means of deception or obfuscation7
I personally was grimly amused by footnote 7 (bolding mine):
This provision of Executive Order 14110 refers to the as-yet speculative risk that AI systems will evade human control, for instance through deception, obfuscation, or self-replication. Harms from loss of control would likely require AI systems which have capabilities beyond those known in current systems and have access to a broader range of permissions and resources than current AI systems are given. However, open models introduce unique considerations in these risk calculations, as actors can remove superficial safeguards that prevent model misuse. They can also customize, experiment with, and deploy models with more permissions and in different contexts than the developer originally intended. Currently, AI agents who can interact with the world lack capabilities to independently perform complex or open-ended tasks, which limits their potential to create loss of control harms. Hague, D. (2024). Multimodality, Tool Use, and Autonomous Agents: Large Language Models Explained, Part 3. Centerfor Security and Emerging Technology. https://cset.georgetown.edu/article/multimodality-tool-use-and-autonomous-agents/ (“While LLM agents have been successful in playing Minecraft and interacting in virtual worlds, they have largely not been reliable enough to deploy in real-life use cases. . . Today, research often focuses on getting autonomous LLMs to perform specific, defined tasks like booking flights.”). Developing capable AI agents remains an active research goal in the AI community. Xi, Z., et al. (2023). The Rise and Potential of Large Language Model Based Agents: A Survey. ArXiv.org. https://doi.org/10.48550/ arXiv.2309.07864. However, given the nascent stage of these efforts, this Report cannot yet meaningfully discuss these risks in greater depth.
Given the report’s focus on misuse risk (CRBN & cyber), the first category of which I think can be fairly described as an “as-yet speculative risk”, I wonder to what extent that was the author taking a dig at whatever process caused takeover risk to be omitted from the report. (It probably wasn’t, but it’s fun to imagine.)
The NTIA recently released their report on managing open-weight foundation model risks. If you just want a quick take-away, the fact sheet seems pretty good[1]. There are two brief allusions to accident/takeover risk[2], one of which is in the glossary:
I personally was grimly amused by footnote 7 (bolding mine):
Given the report’s focus on misuse risk (CRBN & cyber), the first category of which I think can be fairly described as an “as-yet speculative risk”, I wonder to what extent that was the author taking a dig at whatever process caused takeover risk to be omitted from the report. (It probably wasn’t, but it’s fun to imagine.)
Epistemic status: read some chunk of the report in depth, skimmed the rest.
Both brief mentions of deception.