I wasn’t sure at the time the effort I put into this post would be worth it. I spent around 8 hours I think, and I didn’t end up with a clear gearsy model of how High Reliability Tends to work.
I did end up following up on this, in “Carefully Bootstrapped Alignment” is organizationally hard. Most of how this post applied there was me including the graph from the vague “hospital Reliability-ification process” paper, in which I argued:
The report is from Genesis Health System, a healthcare service provider in Iowa that services 5 hospitals. No, I don’t know what “Serious Safety Event Rate” actually means, the report is vague on that. But, my point here is that when I optimistically interpret this graph as making a serious claim about Genesis improving, the improvements took a comprehensive management/cultural intervention over the course of 8 years.
If you’re working at an org that’s planning a Carefully Aligned AGI strategy, and your org does not already seem to hit the Highly Reliable bar, I think you need to begin that transition now. If your org is currently small, take proactive steps to preserve a safety-conscious culture as you scale. If your org is large, you may have more people who will actively resist a cultural change, so it may be more work to reach a sufficient standard of safety.
I don’t know whether it’s reasonable to use the graph in this way (i.e. I assume the graph is exaggerated and confused, but that it still seems suggestive of an lower bound on how long it might take a culture/organizational-practice to shift towards high reliability.
After writing “Carefully Bootstrapped Alignment” is organizationally hard, I spent a couple months exploring and putting some effort into trying to understand why the AI safety focused members of Deepmind, OpenAI and Anthropic weren’t putting more emphasis on High Reliability. My own efforts there petered out and I don’t know that they were particularly counterfactually helpful.
But, later on Anthropic did announce their Scaling Policy, which included language that seems informed by biosecurity practices (since writing this post, I later went on to interview someone about High Reliability practices in bio, and they described a schema that seems to roughly map onto the Anthropic security levels). I am currently kind on the fence about whether Anthropic’s policy has teeth or is more like elaborate Safetywashing, but I think it’s at least plausibly a step in the right direction.
Self Review.
I wasn’t sure at the time the effort I put into this post would be worth it. I spent around 8 hours I think, and I didn’t end up with a clear gearsy model of how High Reliability Tends to work.
I did end up following up on this, in “Carefully Bootstrapped Alignment” is organizationally hard. Most of how this post applied there was me including the graph from the vague “hospital Reliability-ification process” paper, in which I argued:
I don’t know whether it’s reasonable to use the graph in this way (i.e. I assume the graph is exaggerated and confused, but that it still seems suggestive of an lower bound on how long it might take a culture/organizational-practice to shift towards high reliability.
After writing “Carefully Bootstrapped Alignment” is organizationally hard, I spent a couple months exploring and putting some effort into trying to understand why the AI safety focused members of Deepmind, OpenAI and Anthropic weren’t putting more emphasis on High Reliability. My own efforts there petered out and I don’t know that they were particularly counterfactually helpful.
But, later on Anthropic did announce their Scaling Policy, which included language that seems informed by biosecurity practices (since writing this post, I later went on to interview someone about High Reliability practices in bio, and they described a schema that seems to roughly map onto the Anthropic security levels). I am currently kind on the fence about whether Anthropic’s policy has teeth or is more like elaborate Safetywashing, but I think it’s at least plausibly a step in the right direction.