Reliability Risk
Part Of
Reduced By Practices
- Debugging: Improves the reliability and stability of the software.
- Monitoring: Identifies and addresses potential issues before they impact system reliability.
- Redundancy: Minimizes operational disruptions by providing backup components.
This points to the problem that when we use an external dependency, we are at the mercy of its reliability.
"... Reliability describes the ability of a system or component to function under stated conditions for a specified period of time." - Reliability Engineering, Wikipedia
It's easy to think about reliability for something like a bus: sometimes, it's late due to weather, or cancelled due to driver sickness, or the route changes unexpectedly due to road works.
In software, it's no different: unreliability is the flip-side of Feature Implementation Risk. It's caused in the gap between the real behaviour of the software and the expectations for it.
There is an upper bound on the reliability of the software you write, and this is based on the dependencies you use and (in turn) the reliability of those dependencies:
- If a component A depends on component B, unless there is some extra redundancy around B, then A can't be more reliable than B.
- Is A or B a Single Point Of Failure in a system?
- Are there bugs in B that are going to prevent it working correctly in all circumstances?
This kind of stuff is encapsulated in the science of Reliability Engineering. For example, Failure Mode and Effects Analysis (FEMA):
"...was one of the first highly structured, systematic techniques for failure analysis. It was developed by reliability engineers in the late 1950s to study problems that might arise from malfunctions of military systems. " - FEMA, Wikipedia
This was applied on NASA missions, and then in the 1970's to car design following the Ford Pinto exploding car affair. But establishing the reliability of software dependencies like this would be hard and expensive. We are more likely to mitigate Reliability Risk in software using testing, redundancy and reserves, as shown in the diagram above.
Additionally, we often rely on proxies for reliability. We'll look at these proxies (and the way in which software projects signal their reliability) in much more detail in the section on Software Dependency Risk.