evand comments on [missing post]

evand 25 May 2017 23:54 UTC
8 points

However, I’m skeptical of systems that require 99.99% reliability to work. Heuristically, I expect complex systems to be stable only if they are highly fault-tolerant and degrade gracefully.

On the other hand… look at what happens when you simply demand that level of reliability, put in the effort, and get it. From my engineering perspective, that difference looks huge. And it doesn’t stop at 99.99%; the next couple nines are useful too! The level of complexity and usefulness you can build from those components is breathtaking. It’s what makes the 21st century work.

I’d be really curious to see what happens when that same level of uncompromising reliability is demanded of social systems. Maybe it doesn’t work, maybe the analogy fails. But I want to see the answer!
- Lumifer 30 May 2017 17:45 UTC
  0 points
  Parent
  
  to see what happens when that same level of uncompromising reliability is demanded of social systems
  
  Who exactly will be doing the demanding and what would be price for not delivering?
  
  Authoritarian systems are often capable of delivering short-term reliability by demanding the head of everyone who fails (“making the trains run on time”). Of course pretty soon they are left without any competent professionals.
- JacekLach 30 May 2017 17:35 UTC
  0 points
  Parent
  Do you have examples of systems that reach this kind of reliabilty internally?
  
  Most high-9 systems work by taking lots of low-9 components, and relying on not all of them failing at the same time. I.e. if you have 10 95% systems that fail completely independently, and you only need one of them to work, that gets you like eleven nines (99.9{11}%).
  
  Expecting a person to be 99% reliable is ridiculous. That’s like two sick days per year, ignoring all other possible causes of failing to make a task. Instead you should build systems and organisations that have slack, so that one person failing at a particular point in time doesn’t make a project/org fail.
  - evand 30 May 2017 19:57 UTC
    1 point
    Parent
    Well, in general, I’d say achieving that reliability through redundant means is totally reasonable, whether in engineering or people-based systems.
    
    At a component level? Lots of structural components, for example. Airplane wings stay attached at fairly high reliability, and my impression is that while there is plenty of margin in the strength of the attachment, it’s not like the underlying bolts are being replaced because they failed with any regularity.
    
    I remember an aerospace discussion about a component (a pressure switch, I think?). NASA wanted documentation for 6 9s of reliability, and expected some sort of very careful fault tree analysis and testing plan. The contractor instead used an automotive component (brake system, I think?), and produced documentation of field reliability at a level high enough to meet the requirements. Definitely an example where working to get the underlying component that reliable was probably better than building complex redundancy on top of an unreliable component.