series, parallel, and mixed component systems effect on reliability|healthcare|Tao's Tips

Links within this page:

Reliability

The system is a collection of n identifiable components performing some function. Two operating states that relate to the system's ability to perform its function are defined:

Success: The system performs its function satisfactorily for a given period of time,where the criterion for success is clearly defined.
Failure: The system fails to perform its function satisfactorily.

The system reliability ( `R` ) is the probability ( `bar (P_S)` ) that a system performs its function; if the system operates normally, the system reliability = 1.
Text=Barring terms ( ` bar P ` ) denotes consideration of their success properties.

Series System

A system in which all components must be operating for the system to be successful is called a series system.

` Pr{A∩B} : R = bar (P_S) ` `= \prod_{i=1}^n bar (P_i)` `= bar (P_1) × bar (P_2) × bar (P_3) … × bar (P_n) `

Stereo system example: series ` Pr{A∩B}`

For example, a stereo system with a compact disk (CD) player, an amplifier, and two speakers (A and B). If sucessful operation requires all four components to work (CD → amplifier → A → B), this is a series system. The failure of any one component will cause the system to fail.
If the reliabilities of the components are: CD ( `bar (P_1) = 0.97` ), amplifier ( `bar (P_2) = 0.99` ), and each speaker ( `bar (P_3) = bar (P_4) = 0.98` ), then the reliability of the product is:

`R = bar (P_1) × bar (P_2) × bar (P_3) × bar (P_4) = 0.9222`

Parallel System

A system for which the success of any one component is equivalent to the success of the system is a parallel system. Alternatively, all the components must fail before the parallel system fails. The reliability of a parallel system is the probability that all of the components do not fail.

`Pr{A∪B} :` `R = ∐_{i=1}^n P_i = 1 - \prod_{i=1}^n (1-bar P_i) `
`R = 1 - (1- bar(P_1))(1- bar (P_2))…(1-bar (P_n)) `

Stereo system example: parallel ` Pr{A∪B}`

Example: if the stereo system only needs at least one of the speakers to work to function successfully, the two speakers comprise a parallel system:

` bar P_x = 1 - (1-bar (P_3)) × (1-bar (P_4)) `
` bar P_x = 1 - (0.02)(0.02) = 0.9996`

The speaker combination forms a series system with the CD player and amplifier, so the total reliability is:

`R = bar (P_1) × bar (P_2) × bar P_x`
`R = (0.97)(0.99)(0.9996) = 0.9599`

Space vehicle computers: parallel ` Pr{A∪B}`

A space vehicle has three identical computers operating simultaneously and solving the same problems. The outputs of the three computers are compared, and if two or three of them are identical, that result is used. In this mode one of the three computers can fail without causing the system to fail. This is a two out of three system. Identifying the success or failure of each of the computers with the variables

`R = 1 - (1- P_1P_2)(1-P_1P_3)(1-P_2P_3) `
`R = 1 - (1-0.9^2)(1-0.9^2)(1-0.9^2)`
`R = 1 - (0.19)^3`
`R = 0.993`

The reliability of the combination of three computers ( `0.993` ) is much greater than that of an individual computer ( `0.9` ). This is an example of the use of redundancy to increase reliability. Since only one computer is required to perform the function, the other two are redundant from a functional point of view. They do play an important role, however, in increasing the reliability of the system.

Common Cause of Failure

The independence of failures is important here. If some failure mechanism causes all three computers to fail simultaneously, the reliability improvement will not be realized. For example, if all three computers used the same program and the program had an error, the combination of results will certainly be no better than any one of the individual results. The use of redundancy is common in systems for which failure has particularly severe consequences, such as in the space program or for very complex systems with many components.

A Common Cause is an event or a phenomenon which, if it occurs, will induce the occurrence of two or more fault tree elements.
Oversight of Common Causes is a frequently found flaw in fault tree analysis!

Burglar Alarm Example: four wholly independent alarm systems are provided to detect and annunciate intrusion: microwave, electro-optical, seismic footfall, acoustic. No two of them share a common operating principle. Redundancy appears to be absolute.
BUT
suppose the four systems share a single source of operating power, and that source fails, and there are no backup sources?

Common Cause Fault/Failure Sources

Utility outage: electricity, cooling water, pneumatic pressure, steam, etc
Moisture, Corrosion, Dust/grit
Seismic disturbance
Temperature effects (freezing/ overheat)
Electromagnetic disturbance
Single operator oversight

Common Cause Suppression Methods

Separation / isolation / insulation / sealing / shielding of system elements
Using redundant elements having differing operating principles
Separately powering /servicing /maintaining redundant elements
Using independent operators / inspectors

Reliability as a Function of Time: Bathtub

Most system elements have constant fault rates (λ) over long periods of useful life. During these periods, faults occur at random times.

Fault probability is modeled acceptably well as a function of exposure interval (T) by the exponential. For esposure intervals that are brief (T ≤ 0.2 MTBF), `P_F` is approximated within 2% of `λ`T.

The parameter `λ` is called the failure rate of the component and is given in units of failures per unit time.

`λ` = Fault Rate = `1/(MTBF)`

For more on reliability, cut sets, path sets see Reliability (reference #13 below from University of Austin).

`S` = Successes, `F` = Failures `S` = Successes, `F` = Failures
"Barring" terms ( ` bar P ` ) denotes consideration of their success properties. "Barring" terms ( ` bar P ` ) denotes consideration of their <u>success</u> properties.

`S` = Successes, `F` = Failures
`bar (P_S)` = `P_S` = Success Probability = `R` = Reliability
`P_F` = Failure Probability
`bar (P_S) + P_F = S / ((S+F)) + F / ((S+F)) ≡ 1 `
` P_F = 1 - bar (P_S) `
`λ` = Fault Rate = `1/(MTBF)`

` Pr{A∩B} : P_S `
`= ∏_(i=1)^n P_i`
`= P_1 × P_2 × P_3 … × P_n `

` Pr{A∪B} : P_F `
`= ∐_(i=1)^n P_i `
` = 1 - ∏_(i=1)^n (bar P_i) `
` = 1 - ∏_(i=1)^n (1 - P_i) `
` = 1 - [(1-P_1)(1-P_2)…(1-P_n)] `

Sources of Probability Data

The real power of fault trees is in performing probabilistic analyses. In practice, most healthcare systems do not have actual rate data for the underlying events. Furthermore, the teams often have limited information on human error and equipment failure rates available to them.

In addition, the top level events can often be benign or masked by the patient's illness and thereby tend to be underestimated in occurrence data.

Nonetheless, the risk modelling team must estimate the rates of occurrence based upon the experience of the team and/or the published rates in the literature. Probability estimates grounded in the experience of the team, while highly variable, are better than no probability estimates at all

When there is clearly no consensus related to estimated rates, the team's discussions can be facilitated by "anchoring" the probability estimate around a starting point such as one error per 1000 attempts ( `1×10^(-3)` ). The team will then adjust its estimate in an upward or downward direction through an iterative process before deciding on a final estimate. In practice, teams quickly gain comfort in the task of estimating error rates and at risk behavior rates [#6].

Example: Patient Identification by Armband

As one example of the probability estimation task performed in teams, consider the challenge of arriving at a rate for failure for checking armbands when dispensing medications. This is a commonplace at risk behavior that is not easy to identify in post-event investigations, particularly in terms of a normative rate for a group as a whole.

Nurses spend long shifts getting to know their patients, their patients' diagnoses, and their patients' medications. Despite policies and procedures to direct the checking of a patient identification band prior to medication administration nurses admit that, in practice, for a variety of reasons they fail universally to accomplish this safety check.

The team can be questioned about whether they fail to check identification in 1 in 100 doses, 5 in 100 doses, or 50 in 100 doses. Through this repetitive process the interdisciplinary team will arrive at an estimate for the local cultural norm.

Experience indicates that these team estimates are more accepted than rates derived from event data and, unfortunately, are often more accurate than the rates predicted by senior management within the hospital [#6]

Log average method of estimating probability

If probability is not estimated easily, but upper and lower credible bounds can be judged:

Estimate upper and lower credible bounds of probability for the phenomenon in question.
Average the logarithms of the upper and lower bounds.
The antilogarithm of the average of the logarithms of the upper and lower bounds is less than the upper bound and greater than he lower bound by the same factor. Thus, it is geometrically midway between the limits of estimation.
For the example in Figure 3, the arithmetic average would be:
` (0.01 + 0.1) / 2 = 0.055 `
that is, 5.5 times the lower bound and 0.55 times the upper bound.
In EXCEL:
Lower Probability Bound = `P_L` = 0.01
Upper Probability Bound = `P_U` = 0.1
Average of upper and lower bound `= avgB`
`=(LOG(P_L) + LOG(P_U))/2`

`=((-2) + (-1))/2`

`= ((-3))/2`

`= -1.5`
Log Average
`= POWER(10,AvgB) `

`= POWER(10, (-1.5)) `

`= 0.0316228`

Sources of Probability Data

Manufacturer's data
Industry consensus standards
MIL standards
Historical evidence — same or similar systems
Simulation / testing
Delphi estimates
WASH-1400 (NUREG-75/014) (several publications relating to nuclear power plant applications)
IEEE Standard 500
References listed below with [Probability data] after the reference

Table 1. Error rates for common events
Activity	Error Rate
Error of omission / item embedded in procedure	` 3 × 10^(-3) `
Simple arithmetic error with self-checking	`3 × 10^(-2) `
Inspector error of operator oversight	`10^(-1)`
General rate / high stress /dangerous activity	`2 × 10^(-1) ~ 3 × 10^(-1)`
Checkoff provision improperly used	`10^(-1) ~ 9 × 10^(-1) ` `(5 × 10^(-1) avg)`
Error of omission / 10-item checkoff list	`10^(-4) ~ 5 × 10^(-3)` `(10^(-3) avg)`
Carry out plant policy / no check on operator	`5 × 10^(-3) ~ 5 × 10^(-2)` `(10^(-2) avg)`
Select wrong control / group of identical, labeled, controls	`10^(-3) ~ 10^(-2)` `(3 × 10^(-3) avg)`

Some factors influencing human operator failure probability

Experience
Stress
Training
Individual self discipline / conscientiousness
Fatigue
Perception of error consequences (… to self /others)
Use of guides and checklists
Realization of failure on prior attempt
Character of task — complexity / repetitiveness

References

陳曉惠 Chen Xiaohui 集合的基本概念 Basic concepts of Set Theory www.slideserve.com 2010-03-30 (22 slides)
Clements PL. Fault Tree Analysis www.me.utexas.edu 1993-05-30 4^th Edition (96pp)
Abecassis ZA, McElroy LM, Patel RM, Khorzad R, Carroll C, Mehrotra S. Applying fault tree analysis to the prevention of wrong site surgery.
2014; 193(1): 88-94.
OK 陳曉惠
2010-03-30 (22 slides) 集合的基本概念 467KB (22 slides)
Clements PL.
邏輯閘概率計算 3,055KB (11 slides)
XXXX Clements PL.
1990-06-30 2^nd Edition (13pp)
OK Clements PL.
1993-05-30 4^th Edition (96pp)
OK Abecassis ZA, McElroy LM, Patel RM, Khorzad R, Carroll C, Mehrotra S. Applying fault tree analysis to the prevention of wrong site surgery.
2014; 193(1): 88-94.
XXXX Hyman WA, Johnson E. Fault tree analysis of clinical alarms.
2014; 193(1): 88-94.
XXXX Marx DA, Slonim AD. Assessing patient safety risk before the injury occurs: an introduction to sociotechnical probabilistic risk modelling in health care.
2003; 12 (Suppl II): ii33-ii38. [probability data]
XXXX Wreathall J, Nemeth C. Assessing risk: the role of probabilistic risk assessment (PRA) in patient safety improvement.
2004; 13: 206-212. [probability data]
OK (I alreay used it for Event tree?( NEBOSG National Diploma. Fault Tree Analysis (FTA) and Event Tree Analysis (ETA)
2004; 13: 206-212. [bow-tie model]
OK Lyons M, Adams S, Woloshynowych M, Vincent C. Human reliability analysis in healthcare: a review of techniques.
2004; 16: 223-237. [probability data] VIP REVIEW
XXXX McElroy LM, Khorzad R, Rowe TA, Abecassis ZA, Apley DW, Barnard C, Holl JL. Fault Tree Analysis: assessing the adequacy of reporting efforts to reduce postoperative bloodstream infection.
2017; 32(1): 80-86.
OK Charters DA, Barnett JR, Shannon M, Harrod P, Shea D, Morin K. Assessment of the probabilities that staff and/or patients will detect fires in hospitals.
Proceedings of the fifth international symposium. pp. 747-758. [probability data]
address same as previous one Rice WP. Medical Device Risk Based Evaluation and Maintenance Using Fault Tree Analysis.
2007; 41(1): 76-82.
XXXX Department of Mechanical Engineering (UT Austin) Reliability
2002; May 28.
OK Precalculus: Find the First Derivative of a Function
陳曉惠
2010-03-30 (22 slides) 集合的基本概念 467KB (22 slides)
Clements PL.
邏輯閘概率計算 3,055KB (11 slides)
Clements PL.
1990-06-30 2^nd Edition (13pp)
Clements PL.
1993-05-30 4^th Edition (96pp)
Abecassis ZA, McElroy LM, Patel RM, Khorzad R, Carroll C, Mehrotra S. Applying fault tree analysis to the prevention of wrong site surgery.
2014; 193(1): 88-94.
Hyman WA, Johnson E. Fault tree analysis of clinical alarms.
2014; 193(1): 88-94.
Marx DA, Slonim AD. Assessing patient safety risk before the injury occurs: an introduction to sociotechnical probabilistic risk modelling in health care.
2003; 12 (Suppl II): ii33-ii38. [probability data]
Wreathall J, Nemeth C. Assessing risk: the role of probabilistic risk assessment (PRA) in patient safety improvement.
2004; 13: 206-212. [probability data]
NEBOSG National Diploma. Fault Tree Analysis (FTA) and Event Tree Analysis (ETA)
2004; 13: 206-212. [bow-tie model]
Lyons M, Adams S, Woloshynowych M, Vincent C. Human reliability analysis in healthcare: a review of techniques.
2004; 16: 223-237. [probability data] VIP REVIEW
McElroy LM, Khorzad R, Rowe TA, Abecassis ZA, Apley DW, Barnard C, Holl JL. Fault Tree Analysis: assessing the adequacy of reporting efforts to reduce postoperative bloodstream infection.
2017; 32(1): 80-86.
Charters DA, Barnett JR, Shannon M, Harrod P, Shea D, Morin K. Assessment of the probabilities that staff and/or patients will detect fires in hospitals.
Proceedings of the fifth international symposium. pp. 747-758. [probability data]
Rice WP. Medical Device Risk Based Evaluation and Maintenance Using Fault Tree Analysis.
2007; 41(1): 76-82.
Department of Mechanical Engineering (UT Austin) Reliability
2002; May 28.
Precalculus: Find the First Derivative of a Function
Idaho National Laboratory PRA Technology and Regulatory Perspectives — VI Module N Importance Measures
Calculate values for four types of importance measures given Level 1 PRA results. www.nrc.gov/docs/ 22pp

Series, Parallel, and Mixed Component Systems effect on Reliability

How to calculate the reliability and failure probability of series systems, parallel systems and hybrid component systems

Reliability

Series System

Parallel System

Common Cause of Failure

Common Cause Fault/Failure Sources

Common Cause Suppression Methods

Reliability as a Function of Time: Bathtub

Sources of Probability Data

Example: Patient Identification by Armband

Log average method of estimating probability

Sources of Probability Data

Some factors influencing human operator failure probability

References