I feel like a lot of the issues in this post are that the published RSPs are not very detailed and most of the work to flesh them out is not done. E.g. the comparison to other risk policies highlights lack of detail in various ways.
I think it takes a lot of time and work to build our something with lots of analysis and detail, years of work potentially to really do it right. And yes, much of that work hasn’t happened yet.
But I would rather see labs post the work they are doing as they do it, so people can give feedback and input. If labs do so, the frameworks will necessarily be much less detailed than they would if we waited until they were complete.
So it seems to me that we are in a messy process that’s still very early days. Feedback about what is missing and what a good final product would look like is super valuable, thank you for your work doing that. I hope the policy folks pay close attention.
But I think your view that RSPs are the wrong direction is misguided, or at least I don’t find your reasons to be persuasive—there’s much more work to be done before they’re good and useful, but that doesn’t mean they’re not valuable. Honestly I can’t think of anything much better that could have been reasonably done given the limited time and resources we all have.
I think your comments on the name are well taken. I think your ideas about disclaimers and such are basically impossible for a modern corporation, unfortunately. I think your suggestion about pushing for risk management in policy are the clear next step, that’s only enabled by the existence of an RSP in the first place.
Thanks for the detailed and thoughtful effortpost about RSPs!
I feel like a lot of the issues in this post are that the published RSPs are not very detailed and most of the work to flesh them out is not done.
I strongly disagree with this. In my opinion, a lot of the issue is that RSPs have been thought from first principles without much consideration for everything the risk management field has done, and hence doing wrong stuff without noticing.
It’s not a matter of how detailed they are; they get the broad principles wrong. As I argued (the entire table is about this) I think that the existing principles of other existing standards are just way better and so no, it’s not a matter of details.
As I said, the details & evals of RSPs is actually the one thing that I’d keep and include in a risk management framework.
Honestly I can’t think of anything much better that could have been reasonably done given the limited time and resources we all have
Well, I recommend looking at Section 3 and the source links. Starting from those frameworks and including evals into it would be a Pareto improvement.
But most of the deficiencies you point out in the third column of that table is about missing and insufficient risk analysis. E.g.:
“RSPs doesn’t argue why systems passing evals are safe”.
“the ISO standard asks the organization to define risk thresholds”
“ISO proposes a much more comprehensive procedure than RSPs”
“RSPs don’t seem to cover capabilities interaction as a major source of risk”
“imply significant chances to be stolen by Russia or China (...). What are the risks downstream of that?”
If people took your proposal as a minimum bar for how thorough a risk management proposal would be, before publishing, it seems like that would interfere with labs being able to “post the work they are doing as they do it, so people can give feedback and input”.
This makes me wonder: Would your concerns be mostly addressed if ARC had published a suggestion for a much more comprehensive risk management framework, and explicitly said “these are the principles that we want labs’ risk-management proposals to conform to within a few years, but we encourage less-thorough risk management proposals before then, so that we can get some commitments on the table ASAP, and so that labs can iterate in public. And such less-thorough risk management proposals should prioritize covering x, y, z.”
Would your concerns be mostly addressed if ARC had published a suggestion for a much more comprehensive risk management framework, and explicitly said “these are the principles that we want labs’ risk-management proposals to conform to within a few years, but we encourage less-thorough risk management proposals before then, so that we can get some commitments on the table ASAP, and so that labs can iterate in public. And such less-thorough risk management proposals should prioritize covering x, y, z.”
Great question! A few points:
Yes, many of the things I point are “how to do things well” and I would in fact much prefer something that contains a section “we are striving towards that and our current effort is insufficient” than the current RSP communication which is more “here’s how to responsibly scale”.
That said, I think we disagree on the reference class of the effort (you say “a few years”). I think that you could do a very solid MVP of what I suggest with like 5 FTEs over 6 months.
As I wrote in “How to move forward” (worth skimming to understand what I’d change) I think that RSPs would be incredibly better if they:
had a different name
said that they are insufficient
linked to a post which says “here’s the actual thing which is needed to make us safe”.
Answer to your question: if I were optimizing in the paradigm of voluntary lab commitments as ARC is, yes I would much prefer that. I flagged early though that because labs are definitely not allies on this (because an actual risk assessment is likely to output “stop”), I think the “ask labs kindly” strategy is pretty doomed and I would much prefer a version of ARC trying to acquire bargaining power through a way or another (policy, PR threat etc.) rather than adapting their framework until labs accept to sign them.
Regarding
If people took your proposal as a minimum bar for how thorough a risk management proposal would be, before publishing, it seems like that would interfere with labs being able to “post the work they are doing as they do it, so people can give feedback and input”.
I don’t think it’s necessarily right, e.g. “the ISO standard asks the organization to define risk thresholds” could be a very simple task, much simpler than developing a full eval. The tricky thing is just to ensure we comply with such levels (and the inability to do that obviously reveals a lack of safety).
“ISO proposes a much more comprehensive procedure than RSPs”, it’s not right either that it would take longer, it’s just that there exists risk management tools, that you can run in like a few days, that helps having a very broad coverage of the scenario set.
“imply significant chances to be stolen by Russia or China (...). What are the risks downstream of that?” once again you can cover the most obvious things in like a couple pages. Writing “Maybe they would give the weights to their team of hackers, which increases substantially the chances of leak and global cyberoffence increase”. And I would be totally fine with half-baked things if they were communicated as such and not as RSPs are.
I feel like a lot of the issues in this post are that the published RSPs are not very detailed and most of the work to flesh them out is not done. E.g. the comparison to other risk policies highlights lack of detail in various ways.
I think it takes a lot of time and work to build our something with lots of analysis and detail, years of work potentially to really do it right. And yes, much of that work hasn’t happened yet.
But I would rather see labs post the work they are doing as they do it, so people can give feedback and input. If labs do so, the frameworks will necessarily be much less detailed than they would if we waited until they were complete.
So it seems to me that we are in a messy process that’s still very early days. Feedback about what is missing and what a good final product would look like is super valuable, thank you for your work doing that. I hope the policy folks pay close attention.
But I think your view that RSPs are the wrong direction is misguided, or at least I don’t find your reasons to be persuasive—there’s much more work to be done before they’re good and useful, but that doesn’t mean they’re not valuable. Honestly I can’t think of anything much better that could have been reasonably done given the limited time and resources we all have.
I think your comments on the name are well taken. I think your ideas about disclaimers and such are basically impossible for a modern corporation, unfortunately. I think your suggestion about pushing for risk management in policy are the clear next step, that’s only enabled by the existence of an RSP in the first place.
Thanks for the detailed and thoughtful effortpost about RSPs!
Thanks for your comment.
I strongly disagree with this. In my opinion, a lot of the issue is that RSPs have been thought from first principles without much consideration for everything the risk management field has done, and hence doing wrong stuff without noticing.
It’s not a matter of how detailed they are; they get the broad principles wrong. As I argued (the entire table is about this) I think that the existing principles of other existing standards are just way better and so no, it’s not a matter of details.
As I said, the details & evals of RSPs is actually the one thing that I’d keep and include in a risk management framework.
Well, I recommend looking at Section 3 and the source links. Starting from those frameworks and including evals into it would be a Pareto improvement.
But most of the deficiencies you point out in the third column of that table is about missing and insufficient risk analysis. E.g.:
“RSPs doesn’t argue why systems passing evals are safe”.
“the ISO standard asks the organization to define risk thresholds”
“ISO proposes a much more comprehensive procedure than RSPs”
“RSPs don’t seem to cover capabilities interaction as a major source of risk”
“imply significant chances to be stolen by Russia or China (...). What are the risks downstream of that?”
If people took your proposal as a minimum bar for how thorough a risk management proposal would be, before publishing, it seems like that would interfere with labs being able to “post the work they are doing as they do it, so people can give feedback and input”.
This makes me wonder: Would your concerns be mostly addressed if ARC had published a suggestion for a much more comprehensive risk management framework, and explicitly said “these are the principles that we want labs’ risk-management proposals to conform to within a few years, but we encourage less-thorough risk management proposals before then, so that we can get some commitments on the table ASAP, and so that labs can iterate in public. And such less-thorough risk management proposals should prioritize covering x, y, z.”
Great question! A few points:
Yes, many of the things I point are “how to do things well” and I would in fact much prefer something that contains a section “we are striving towards that and our current effort is insufficient” than the current RSP communication which is more “here’s how to responsibly scale”.
That said, I think we disagree on the reference class of the effort (you say “a few years”). I think that you could do a very solid MVP of what I suggest with like 5 FTEs over 6 months.
As I wrote in “How to move forward” (worth skimming to understand what I’d change) I think that RSPs would be incredibly better if they:
had a different name
said that they are insufficient
linked to a post which says “here’s the actual thing which is needed to make us safe”.
Answer to your question: if I were optimizing in the paradigm of voluntary lab commitments as ARC is, yes I would much prefer that. I flagged early though that because labs are definitely not allies on this (because an actual risk assessment is likely to output “stop”), I think the “ask labs kindly” strategy is pretty doomed and I would much prefer a version of ARC trying to acquire bargaining power through a way or another (policy, PR threat etc.) rather than adapting their framework until labs accept to sign them.
Regarding
I don’t think it’s necessarily right, e.g. “the ISO standard asks the organization to define risk thresholds” could be a very simple task, much simpler than developing a full eval. The tricky thing is just to ensure we comply with such levels (and the inability to do that obviously reveals a lack of safety).
“ISO proposes a much more comprehensive procedure than RSPs”, it’s not right either that it would take longer, it’s just that there exists risk management tools, that you can run in like a few days, that helps having a very broad coverage of the scenario set.
“imply significant chances to be stolen by Russia or China (...). What are the risks downstream of that?” once again you can cover the most obvious things in like a couple pages. Writing “Maybe they would give the weights to their team of hackers, which increases substantially the chances of leak and global cyberoffence increase”. And I would be totally fine with half-baked things if they were communicated as such and not as RSPs are.
Has anyone made such a credible, detailed, and comprehensive list?
If not, what would it look like in your opinion?