I’ve published in this area so I have some meta comments about this work.
First the positive:
1. Assurance cases are the state of the art for making sure things don’t kill people in a regulated environment. Ever wonder why planes are so safe? Safety cases. Because the actual process of making one is so unsexy (GSNs make me want to cry), people tend to ignore them, so you deserve lots of credit for somehow getting ex-risk people to upvote this. More lesswronger types should be thinking about safety cases.
2. I do think you have good / defensible arguments overall, minus minor quibbles that don’t matter much.
Some bothers:
1. Since I used to be a little involved, I am perhaps a bit too aware of the absolutely insane amount of relevant literature was not mentioned. To me, the introduction made it sound a little bit like the specifics of applying safety cases to AI systems have not been studied. That is very, very, very not true.
That’s not to say you don’t have a contribution! Just that I don’t think it was placed well in the relevant literature. Many have done safety cases for AI but they usually do it as part of concrete applied work on drones or autonomous vehicles, not ex-risk pie-in-the-sky stuff. I think your arguments would be greatly improved by referencing back to this work.
I was extremely surprised to see so few of the (to me) obvious suspects referenced, particularly more from York. Some labs with people that publish lots in this area.
University of York Institute for Safe Autonomy
NASA Intelligent Systems Division
Waterloo Intelligent Systems Engineering Lab
Anything funded by the DARPA Assured Autonomy program
2. Second issue is a little more specific, related to this paragraph:
To mitigate these dangers, researchers have called on developers to provide evidence that their systems are safe (Koessler & Schuett, 2023; Schuett et al., 2023); however, the details of what this evidence should look like have not been spelled out. For example, Anderljung et al vaguely state that this evidence should be “informed by evaluations of dangerous capabilities and controllability”(Anderljung et al., 2023). Similarly, a recently proposed California bill asserts that developers should provide a “positive safety determination” that “excludes hazardous capabilities” (California State Legislature, 2024). These nebulous requirements raise questions: what are the core assumptions behind these evaluations? How might developers integrate other kinds of evidence?
The reason the “nebulous requirements” aren’t explicitly stated is that when you make a safety case you assure the safety of a system against specific relevant hazards for the system you’re assuring. These are usually identified by performing a HAZOP analysis or similar. Not all AI systems have the same list of hazards, so its obviously dubious to expect you can list requirements a priori. This should have been stated, imo.
To me, the introduction made it sound a little bit like the specifics of applying safety cases to AI systems have not been studied
This is a good point. In retrospect, I should have written a related work section to cover these. My focus was mostly on AI systems that have only existed for ~ a year and future AI systems, so I didn’t spend much time reading safety cases literature specifically related to AI systems (though perhaps there are useful insights that transfer over).
The reason the “nebulous requirements” aren’t explicitly stated is that when you make a safety case you assure the safety of a system against specific relevant hazards for the system you’re assuring. These are usually identified by performing a HAZOP analysis or similar. Not all AI systems have the same list of hazards, so its obviously dubious to expect you can list requirements a priori.
My impression is that there is still a precedent for fairly detailed guidelines that describe how safety cases are assessed in particular industries and how hazards should be analyzed. For example, see the UK’s Safety Assessment Principles for Nuclear Facilities. I don’t think anything exists like this for evaluating risks from advanced AI agents.
I agree, however, that not everyone who mentions that developers should provide ‘safety evidence’ should need to specify in detail what this could look like.
I hear what you’re saying. I probably should have made the following distinction:
A technology in the abstract (e.g. nuclear fission, LLMs)
A technology deployed to do a thing (e.g. nuclear in a power plant, LLM used for customer service)
The question I understand you to be asking is essentially how do we make safety cases for AI agents generally? I would argue that’s more situation 1 than 2, and as I understand it safety cases are basically only ever applied to case 2. The nuclear facilities document you linked definitely is 2.
So yeah, admittedly the document you were looking for doesn’t exist, but that doesn’t really surprise me. If you started looking for narrowly scoped safety principles for AI systems you start finding them everywhere. For example, a search for “artificial intelligence” on the ISO website results in 73 standards .
Just a few relevant standards, though I admit, standards are exceptionally boring (also many aren’t public, which is dumb):
UL 4600 standard for autonomous vehicles
ISO/IEC TR 5469 standard for ai safety stuff generally (this one is decently interesting)
ISO/IEC 42001 this one covers what you do if you set up a system that uses AI
I’ve published in this area so I have some meta comments about this work.
First the positive:
1. Assurance cases are the state of the art for making sure things don’t kill people in a regulated environment. Ever wonder why planes are so safe? Safety cases. Because the actual process of making one is so unsexy (GSNs make me want to cry), people tend to ignore them, so you deserve lots of credit for somehow getting ex-risk people to upvote this. More lesswronger types should be thinking about safety cases.
2. I do think you have good / defensible arguments overall, minus minor quibbles that don’t matter much.
Some bothers:
1. Since I used to be a little involved, I am perhaps a bit too aware of the absolutely insane amount of relevant literature was not mentioned. To me, the introduction made it sound a little bit like the specifics of applying safety cases to AI systems have not been studied. That is very, very, very not true.
That’s not to say you don’t have a contribution! Just that I don’t think it was placed well in the relevant literature. Many have done safety cases for AI but they usually do it as part of concrete applied work on drones or autonomous vehicles, not ex-risk pie-in-the-sky stuff. I think your arguments would be greatly improved by referencing back to this work.
I was extremely surprised to see so few of the (to me) obvious suspects referenced, particularly more from York. Some labs with people that publish lots in this area.
University of York Institute for Safe Autonomy
NASA Intelligent Systems Division
Waterloo Intelligent Systems Engineering Lab
Anything funded by the DARPA Assured Autonomy program
2. Second issue is a little more specific, related to this paragraph:
The reason the “nebulous requirements” aren’t explicitly stated is that when you make a safety case you assure the safety of a system against specific relevant hazards for the system you’re assuring. These are usually identified by performing a HAZOP analysis or similar. Not all AI systems have the same list of hazards, so its obviously dubious to expect you can list requirements a priori. This should have been stated, imo.
This is a good point. In retrospect, I should have written a related work section to cover these. My focus was mostly on AI systems that have only existed for ~ a year and future AI systems, so I didn’t spend much time reading safety cases literature specifically related to AI systems (though perhaps there are useful insights that transfer over).
My impression is that there is still a precedent for fairly detailed guidelines that describe how safety cases are assessed in particular industries and how hazards should be analyzed. For example, see the UK’s Safety Assessment Principles for Nuclear Facilities. I don’t think anything exists like this for evaluating risks from advanced AI agents.
I agree, however, that not everyone who mentions that developers should provide ‘safety evidence’ should need to specify in detail what this could look like.
I hear what you’re saying. I probably should have made the following distinction:
A technology in the abstract (e.g. nuclear fission, LLMs)
A technology deployed to do a thing (e.g. nuclear in a power plant, LLM used for customer service)
The question I understand you to be asking is essentially how do we make safety cases for AI agents generally? I would argue that’s more situation 1 than 2, and as I understand it safety cases are basically only ever applied to case 2. The nuclear facilities document you linked definitely is 2.
So yeah, admittedly the document you were looking for doesn’t exist, but that doesn’t really surprise me. If you started looking for narrowly scoped safety principles for AI systems you start finding them everywhere. For example, a search for “artificial intelligence” on the ISO website results in 73 standards .
Just a few relevant standards, though I admit, standards are exceptionally boring (also many aren’t public, which is dumb):
UL 4600 standard for autonomous vehicles
ISO/IEC TR 5469 standard for ai safety stuff generally (this one is decently interesting)
ISO/IEC 42001 this one covers what you do if you set up a system that uses AI
You also might find this paper a good read: https://ieeexplore.ieee.org/document/9269875
This makes sense. Thanks for the resources!