You can still have the RSP commitment rule be a foundation for actually effective policies down the line
+1. I do think it’s worth noting, though, that RSPs might not be a sensible foundation for effective policies.
One of my colleagues recently mentioned that the voluntary commitments from labs are much weaker than some of the things that the G7 Hiroshima Process has been working on.
More tangibly, it’s quite plausible to me that policymakers who think about AI risks from first principles would produce things that are better and stronger than “codify RSPs.” Some thoughts:
It’s plausible to me that when the RSP concept was first being developed, it was a meaningful improvement on the status quo, but the Overton Window & awareness of AI risk has moved a lot since then.
It’s plausible to me that RSPs set a useful “floor”– like hey this is the bare minimum.
It’s plausible to me that RSPs are useful for raising awareness about risk– like hey look, OpenAI and Anthropic are acknowledging that models might soon have dangerous CBRN capabilities.
But there are a lot of implicit assumptions in the RSP frame like “we need to have empirical evidence of risk before we do anything” (as opposed to an affirmative safety frame), “we just need to make sure we implement the right safeguards once things get dangerous” (as opposed to a frame that recognizes we might not have time to develop such safeguards once we have clear evidence of danger), and “AI development should roughly continue as planned” (as opposed to a frame that considers alternative models, like public-private partnerships).
More concretely, I would rather see policy based on things like the recent Bengio paper than RSPs. Examples:
Despite evaluations, we cannot consider coming powerful frontier AI systems “safe unless proven unsafe.” With present testing methodologies, issues can easily be missed. Additionally, it is unclear whether governments can quickly build the immense expertise needed for reliable technical evaluations of AI capabilities and societal-scale risks. Given this, developers of frontier AI should carry the burden of proof to demonstrate that their plans keep risks within acceptable limits. By doing so, they would follow best practices for risk management from industries, such as aviation, medical devices, and defense software, in which companies make safety cases
Commensurate mitigations are needed for exceptionally capable future AI systems, such as autonomous systems that could circumvent human control. Governments must be prepared to license their development, restrict their autonomy in key societal roles, halt their development and deployment in response to worrying capabilities, mandate access controls, and require information security measures robust to state-level hackers until adequate protections are ready. Governments should build these capacities now.
Sometimes advocates of RSPs say “these are things that are compatible with RSPs”, but overall I have not seen RSPs/PFs/FSFs that are nearly this clear about the risks, this clear about the limitations of model evaluations, or this clear about the need for tangible regulations.
I’ve feared previously (and continue to fear) that there are some motte-and-bailey dynamics at play with RSPs, where proponents of RSPs say privately (and to safety people) that RSPs are meant to have strong commitments and inspire strong regulation, but then in practice the RSPs are very weak and end up conveying and overly-rosy picture to policymakers.
One of my colleagues recently mentioned that the voluntary commitments from labs are much weaker than some of the things that the G7 Hiroshima Process has been working on.
+1. I do think it’s worth noting, though, that RSPs might not be a sensible foundation for effective policies.
One of my colleagues recently mentioned that the voluntary commitments from labs are much weaker than some of the things that the G7 Hiroshima Process has been working on.
More tangibly, it’s quite plausible to me that policymakers who think about AI risks from first principles would produce things that are better and stronger than “codify RSPs.” Some thoughts:
It’s plausible to me that when the RSP concept was first being developed, it was a meaningful improvement on the status quo, but the Overton Window & awareness of AI risk has moved a lot since then.
It’s plausible to me that RSPs set a useful “floor”– like hey this is the bare minimum.
It’s plausible to me that RSPs are useful for raising awareness about risk– like hey look, OpenAI and Anthropic are acknowledging that models might soon have dangerous CBRN capabilities.
But there are a lot of implicit assumptions in the RSP frame like “we need to have empirical evidence of risk before we do anything” (as opposed to an affirmative safety frame), “we just need to make sure we implement the right safeguards once things get dangerous” (as opposed to a frame that recognizes we might not have time to develop such safeguards once we have clear evidence of danger), and “AI development should roughly continue as planned” (as opposed to a frame that considers alternative models, like public-private partnerships).
More concretely, I would rather see policy based on things like the recent Bengio paper than RSPs. Examples:
Sometimes advocates of RSPs say “these are things that are compatible with RSPs”, but overall I have not seen RSPs/PFs/FSFs that are nearly this clear about the risks, this clear about the limitations of model evaluations, or this clear about the need for tangible regulations.
I’ve feared previously (and continue to fear) that there are some motte-and-bailey dynamics at play with RSPs, where proponents of RSPs say privately (and to safety people) that RSPs are meant to have strong commitments and inspire strong regulation, but then in practice the RSPs are very weak and end up conveying and overly-rosy picture to policymakers.
Are you able to say more about this?