Thanks for leaving this comment on the doc and posting it.
But I feel like that’s mostly just a feature of the methodology, not a feature of the territory. Like, if you applied the same methodology to computer security, or financial fraud, or any other highly complex domain you would end up with the same situation where making any airtight case is really hard.
You are right in that safety cases are not typically applied to security. Some of the reasons for this are explained in this paper, but I think the main reason is this:
“The obvious difference between safety and security is the presence of an intelligent adversary; as Anderson (Anderson 2008) puts it, safety deals with Murphy’s Law while security deals with Satan’s Law… Security has to deal with agents whose goal is to compromise systems.”
My guess is that most safety evidence will come down to claims like “smart people tried really hard to find a way things could go wrong and couldn’t.” This is part of why I think ‘risk cases’ are very important.
I share the intuitions behind some of your other reactions.
I feel like the framing here tries to shove a huge amount of complexity and science into a “safety case”, and then the structure of a “safety case” doesn’t feel to me like it is the kind of thing that would help me think about the very difficult and messy questions at hand.
Making safety cases is probably hard, but I expect explicitly enumerating their claims and assumptions would be quite clarifying. To be clear, I’m not claiming that decision-makers should only communicate via safety and risk cases. But I think that relying on less formal discussion would be significantly worse.
Part of why I think this is that my intuitions have been wrong over and over again. I’ve often figured this out after eventually asking myself “what claims and assumptions am I making? How confident am I these claims are correct?”
it also feels more like it just captures “the state of fashionable AI safety thinking in 2024” more than it is the kind of thing that makes sense to enshrine into a whole methodology.
To be clear, the methodology is separate from the enumeration of arguments (which I probably could have done a better job signaling in the paper). The safety cases + risk cases methodology shouldn’t depend too much on what arguments are fashionable at the moment.
I agree that the arguments will evolve to some extent in the coming years. I’m more optimistic about the robustness of the categorization, but that’s maybe minor.
Thanks for leaving this comment on the doc and posting it.
You are right in that safety cases are not typically applied to security. Some of the reasons for this are explained in this paper, but I think the main reason is this:
“The obvious difference between safety and security is the presence of an intelligent adversary; as Anderson (Anderson 2008) puts it, safety deals with Murphy’s Law while security deals with Satan’s Law… Security has to deal with agents whose goal is to compromise systems.”
My guess is that most safety evidence will come down to claims like “smart people tried really hard to find a way things could go wrong and couldn’t.” This is part of why I think ‘risk cases’ are very important.
I share the intuitions behind some of your other reactions.
Making safety cases is probably hard, but I expect explicitly enumerating their claims and assumptions would be quite clarifying. To be clear, I’m not claiming that decision-makers should only communicate via safety and risk cases. But I think that relying on less formal discussion would be significantly worse.
Part of why I think this is that my intuitions have been wrong over and over again. I’ve often figured this out after eventually asking myself “what claims and assumptions am I making? How confident am I these claims are correct?”
To be clear, the methodology is separate from the enumeration of arguments (which I probably could have done a better job signaling in the paper). The safety cases + risk cases methodology shouldn’t depend too much on what arguments are fashionable at the moment.
I agree that the arguments will evolve to some extent in the coming years. I’m more optimistic about the robustness of the categorization, but that’s maybe minor.