Catastrophe requires multiple failures—single point failures are not enough. The array of defenses works. System operations are generally successful. Overt catastrophic failure occurs when small, apparently innocuous failures join to create opportunity for a systemic accident. Each of these small failures is necessary to cause catastrophe but only the combination is sufficient to permit failure. Put another way, there are many more failure opportunities than overt system accidents. Most initial failure trajectories are blocked by designed system safety components. Trajectories that reach the operational level are mostly blocked, usually by practitioners.
There is no root cause. The problem with this term isn't just that it's singular or that the word root is misleading: there's more. Trying to find causes at all is problematic — looking for causes to explain an incident limits what you'll find and learn. And the irony is that root cause analysis is built on this idea that incidents can be fully comprehended. They can't. We already have a better phrase for this, and it sounds way cooler: it's called a perfect storm. In this way, separating out causes and breaking down incidents into their multiple contributing factors, we're able to see that the things that led to an incident are either always or transiently present. An incident is just the first time they combined into a perfect storm of normal things that went wrong at the same time.
From an abstract perspective, language that describes causality is, ostensibly, value-neutral. But use of the term root cause
is almost always used in the context of untoward or negative outcomes, and not in situations where an outcome is deemed a success. Rarely does someone demand a search for the root cause
of a successful product launch, for example. It seems widely accepted that successful outcomes in complex systems come from many influences that come together in a positive way. Failures aren’t often viewed the same way.
References
- Allspaw J.
What we talk about when we talk about
root cause
Adaptive Capacity Labs.