zeshen comments on [AN #112]: Engineering a Safer World

zeshen 2 Nov 2022 11:08 UTC
LW: 3 AF: 3
AF
I got the book (thanks to Conjecture) after doing the Intro to ML Safety Course where the book was recommended. I then browsed through the book and thought of writing a review of it—and I found this post instead, which is a much better review than I would have written, so thanks a lot for this!

Let me just put down a few thoughts that might be relevant for someone else considering picking up this book.
Target audience: Right at the beginning of the book, the author says “This book is written for the sophisticated practitioner rather than the academic researcher or the general public.” I think this is relevant, as the book goes to a level of detail way beyond what’s needed to get a good overview of engineering safety.

Relevance to AI safety: I feel like most engineering safety concepts are not applicable to alignment, firstly because an AGI would likely not have any human involvement in its optimization process, and secondly the basic underlying STAMP constructs of safety constraints, hierarchical safety control structures, and process models are simply more applicable to engineering systems. As stated in p100, ““STAMP focuses particular attention on the role of constraints in safety management.“ and I highly doubt an AGI can be bounded by constraints. ” Nevertheless, Chapter 8 STPA: A New Hazard Analysis Technique that describes STPA (System Theoretic Process Analysis) may be somewhat relevant to designing safety interlocks. Also, the final chapter (13) on Managing Safety and the Safety Culture, is broadly applicable to any field that involves safety.
Criticisms on conventional techniques: The book often mentions that techniques like STAMP and STPA is superior than other conventional techniques like HAZOP and gives quotes by reviewers that attest to their superiority. I don’t know if those criticisms are really fair, given how it is not really adopted at least in the oil and gas industry that, for all its flaws, takes safety very seriously. Perhaps the criticisms could be fair for very outdated safety practices. Nevertheless, the general concepts of engineering safety feels quite similar whether it uses ‘conventional’ techniques or the ‘new’ techniques described in the book.
Overall, I think this book provides a good overview of engineering safety concepts, but for the general audience (or alignment researchers) it goes into too much detail on specific case studies and arguments.