This academic note was previously linked but undiscussed in AN#35, and I finally got around to reading it and really liked it so wanted to highlight it with a post.
The abstract:
Every AI system is deployed by a human organization. In high risk applications, the combined human plus AI system must function as a high-reliability organization in order to avoid catastrophic errors. This short note reviews the properties of high-reliability organizations and draws implications for the development of AI technology and the safe application of that technology.
The gist is that AI, including narrow AI, can be a dangerously powerful technology similar to nuclear power and weapons, and the organizations that best control these systems have certain features that categorize them as High Reliability Organizations (HROs). The features of an HRO are (quoted from Wikipedia, quoted from this book by the originators of the HRO concept):
Preoccupation with failure: HROs treat anomalies as symptoms of a problem with the system. The latent organizational weaknesses that contribute to small errors can also contribute to larger problems, so errors are reported promptly so problems can be found and fixed.
Reluctance to simplify interpretations: HROs take deliberate steps to comprehensively understand the work environment as well as a specific situation. They are cognizant that the operating environment is very complex, so they look across system boundaries to determine the path of problems (where they started, where they may end up) and value a diversity of experience and opinions.
Sensitivity to operations: HROs are continuously sensitive to unexpected changed conditions. They monitor the systems’ safety and security barriers and controls to ensure they remain in place and operate as intended. Situational awareness is extremely important to HROs.
Commitment to resilience: HROs develop the capability to detect, contain, and recover from errors. Errors will happen, but HROs are not paralyzed by them.
Deference to expertise: HROs follow typical communication hierarchy during routine operations, but defer to the person with the expertise to solve the problem during upset conditions. During a crisis, decisions are made at the front line and authority migrates to the person who can solve the problem, regardless of their hierarchical rank.
The note argues for finding ways to incorporate AI in an HRO not only as a technology to be controlled by one (that’s just the baseline), but also as a functional member of the HRO, with the same responsibilities to the organization as any human member would have, such as the right to halt operations if it believes danger is imminent and a responsibility to report anomalies it discovers.
An interesting extension of this line of thinking would be to combine the HRO Safety-I approach with newer Safety-II approaches (an overly short summary is that Safety-I, the traditional approach to safety, is about taking steps to avoid errors, and Safety-II is about creating robust conditions for success where errors are less likely to happen).
I’ve seen good outcomes from applying these kinds of changes to organizational design to accomplish safety within my primary field of work (system operations, site reliability engineering, devops, or whatever you want to call it), and I think extending this kind of thinking to how organizations work with AI is also likely valuable in that it will reduce, even if not theoretically eliminate, the risk of AI accidents that would otherwise be preventable in retrospect.
How does “Safety-II” compare with Eliezer’s description of security mindset? On the surface they sound very similar, and I would expect highly reliable organizations to value a security mindset in some form.
I don’t recall how Eliezer thinks of security mindset, but in its original context it’s more about thinking like an adversary and designing things knowing that they will be subject to attack from multiple angles and that you might fail to anticipate all angles of attack so you better be ready for the unexpected.