Barrier analysis

Assess where barriers and controls were unused or inadequate in protecting target

Barrier analysis starts from the assumption that a hazard comes into contact with a target because barriers or controls were unused or inadequate. It can be used proactively (e.g. risk assessment, FMEA) to evaluate proposed actions, or retrospectively (e.g. RCA) to identify missing or failed barriers, and is most often used to extend an initial ECF chart to consider a broader range of potential root causes.

The results of Barrier Analysis should be applied to the ECF diagrams. The first problem is that many of the barriers relate to distal factors. A second issue is that barrier analysis, typically, helps to identify additional events that ought to be introduced into an ECF diagram. This is particularly important because primary investigations often focus on catalytic events rather than events that weakened particular barriers.

Target:
    mouse

Hazard:
    mousetrap

Barrier:
    helmet
Figure 1. Multiple-Directional Defences
Target:
    cheese

Hazard:
    mouse

Barrier:
    mousetrap
Figure 1. Multiple-Directional Defences
Target: mouse Target: cheese
Hazard: mousetrap Hazard: mouse
Barrier: helmet Barrier: mousetrap
        
Basic Definitions
Figure 2. Person at beach getting sunburned

Hazard: A hazard is usually thought of as an unwanted energy transfer such as the transfer of electricity from an item of equipment to an unprotected worker. Energy can be kinetic, biological, acoustical, chemical, electrical, mechanical potential, electro-magnetic, thermal or radiation. It is the potential or threat to cause harm or an adverse outcome.

Target: The target is the person, equipment or other object that can be harmed by a hazard. The list of targets may include multiple items. In health care facilities, targets are generally the people, either individually or collectively, who can be damaged or harmed by an unwanted incident. However, targets can also be material things such as buildings and equipment; non-material things such as goodwill, friendship, and stature; or the environment.

Barrier: Barriers represent the diverse physical and organisational measures that are taken to prevent a target from being affected by a potential hazard. Barriers are those things that should have prevented or could prevent the undesired event. Many barrier analysis techniques identify:

  1. Controls: Control barriers direct wanted or 'desired' energy flows. They include conductors, disconnect switches, pressure vessels and approved work methods.
  2. Safety Devices: Safety devices are barriers to unwanted energy flows. These include protective equipment, guard rails, safety training and emergency places.

Such distinctions can be difficult to make because the same energy flow might be both wanted and unwanted at different times during a process.


Figure 3. Moat (barrier) around castle

To analyze the barriers, ask the questions:

  1. Were barriers in place to minimize threats/hazards to the target?
  2. Were such barriers adequate (capable of handling the threat)?
  3. Were there backups for each barrier?
Three Forms of Barriers

1. People

Figure 4. Head in mosquito net hat
  1. Physical Barriers: (the most reliable in terms of providing failsafe solutions)
    • Material technology has produced physical barriers that directly prevent a hazard from affecting a target. They include:
      • guards, gloves and goggles, protective clothing, shields, insulation on hot pipes. These devices are often rated to be effective within certain tolerances; for example, a fireguard may provide protection against a fire within particular heat and time limitations.
      • bar coding, keypad controlled doors, computer programs which prevent the inputter from going further if a field is not completed, controlled drug cupboards (locked)
    • Natural Barriers i.e. barriers of distance, time or placement e.g. isolating SARS patients to specific hospitals; asking travellers who return home with a fever to voluntarily isolate themselves at home for 10 days before mixing with other people.
  2. Dynamic Barriers: include warning devices and alarms. These are not continually apparent but are only issued when the system detects that there may be a potential hazard. This can include physical interlocks that restrict access or actions during critical phases of an operation. Limitations: The limitations of this approach stem from the dynamic nature of these warnings.
    • Operators may fail to notice information about a potential hazard.
    • Operators may also choose to disregard or circumvent warnings, especially, if they have been presented with a success of false alarms.
    • Conversely, warnings may not be invoked even though a hazard may be present. This poses a particular threat if operators grow accustomed to the additional protection afforded by these barriers.
Figure 5. Swiss cheese theory

In terms of Reason's Swiss Cheese theory, the cheese in each slice consists of all the functional barriers that are in place, and continually preventing potential hazards from reaching the target.

The holes are the active and latent failures that occur in this environment, allowing opportunities for hazards to breech that layer and reach the target. Hence, Reason's suggestion of adding

  • more cheese (barriers) to each layer
  • more layers (of barriers) to each process

Human action and administrative barriers are the least reliable barriers, in terms of failsafe, because they rely on human action and behavior, leading to error. In order to strengthen human and administrative barriers, we need to implement multiple barriers at different stages of the process to minimize the chance of system failure in the future. For example, if supervision of staff (administrative barrier) is recommended as a solution, the following issues should be considered and implemented at the same time:

  • mentors are specifically identified, have the appropriate skill set and are trained as competent supervisors
  • competency of the supervisor is assessed periodically, against a pre-designed competency framework, to ensure they are capable of supervising effectively
  • education/awareness raising for would-be supervisees so that they understand what to expect and can engage appropriately
  • guidelines on how and where supervisory meetings should take place
  • time availability to enable the supervision to occur
  • audit of the supervision (using both supervisees, and supervisors as contributors) process so that early detection of any system failure can be addressed in a timely way

People barriers: lack of staff, changes in management, inadequate training, poor communication.

2. Process: include the use of:

  • Training
  • Checklists
  • Standard operating procedures (SOP)
  • Other forms of workplace regulation that are intended to protect operators and their equipment from potential hazards:
    • explicit: supported by line management
    • implicit: arise over time as the result of everyday working practices. This can be unreliable if new employees fail to observe the way in which existing employees follow these unwritten rules.
  • People: human action is often associated with a procedural barrier. Examples of human action serving to control a hazard are controlling and extinguishing a fire, evacuating a building in response to a fire etc.
  • Managerial and administrative policies: can also act as a form of meta-level barrier. These constraints do not directly protect any particular target from any particular hazard; managerial and administrative barriers help to ensure that the acquisition, development, installation and maintenance of a system ensure the adequate provision of more direct barriers to protect potential targets.
  • Risk management:
    NASA guidelines for an 'ideal' approach to risk management:
    To reduce risk, projects need to be managed systematically. The Risk Management Process efficiently identifies, analyses, plans, tracks, controls, communicates, and documents risk to increase the likelihood of achieving program/project goals.
    Risk List: Every project should have a prioritized list of its risks at any point in the life cycle, along with the programmatic impacts. The list should indicate which risks have the highest probability, which have the highest consequences, and which need to be worked now.
    Team Members: It means that all members of the project team should have access to the risk list so that everyone knows what the risks are. It means that the project team members are responsible for the risks. The team should work to reduce or eliminate the risks that exist and develop contingency plans, so that they are prepared should a risk become a real problem.
    Risk Signature: From the beginning of a project, the Project Manager and team should have an idea of what the 'risk signature' of the project will be. The risk signature will identify expected risks over the source of the project and when the project risks are expected to increase and decrease.
    Tracking: During the project, risks should be tracked to determine if mitigation efforts are working.
    Risk Signature: From the beginning of a project, the Project Manager and team should have an idea of what the 'risk signature' of the project will be. The risk signature will identify expected risks over the source of the project and when the project risks are expected to increase and decrease.
    Tracking: During the project, risks should be tracked to determine if mitigation efforts are working.

Process barriers: separation of development (e.g. writing SOP) and operations (e.g. executing) teams, no systematic hazard analysis, inadequate testing, lack of overight.

3.Technology

Ways Barriers Can Fail
Figure 6. Mars Climate orbiter (surveyor robot)
  1. Barrier is impractical - impossible. These are situations in which it is impossible to provide adequate barriers against a potential energy transfer. Ideally, such situations are identified during a safety analysis. If the hazard could not be prevented or mitigated, regulators should ensure that the process fails to gain necessary permissions. For example, it is impossible to protect the public once chemicals have been released into the environment.
  2. Barrier is impractical - uneconomic. It may be technically feasible to develop appropriate barriers but their cost may prevent them from being deployed. The perceived benefits that are associated with particular barriers can change in response to public anxiety over particular incidents.
  3. Barrier fails - partially. A barrier that has been successfully introduced into an application process may, however, fail to fully protect the target from a potential hazard. This is an important class of failure in many incident reporting systems.
  4. Barrier fails - totally. The distinction between partial and total protection depends upon the nature of the application. Success of failure of a barrier must be interpreted with respect to the overall safety objectives of the system as a whole.
  5. Barrier is not used - not provided. This describes a situation in which a barrier might have protected a target had it been availalbe.
  6. Barrier is not used - by error. Barriers may not be used during an incident even though they are available and might prevent a target being exposed to a hazard.

Barrier Tables

A barrier table is a mechanism by which high-level concepts (barriers, targets, and hazards) and ways in which barriers fail can support the causal analysis of adverse occurrences can be explored. Relatively high-level barrier analysis using classifications such as people/procedures/technology are designated "Level 1 analysis"; however, barrier tables that capture more detailed observations about particular areas of the causation process can also be drawn up, and designated as the appropriate level of analysis.

Figure 7. Barrier table for loss of Mars Climate orbiter
Table 1. Level 1 Barrier Table for the Loss of the Climate Orbiter
Hazard:Impact/Re-Entry
Target:Mars Climate Orbiter
Barrier Reason for Failure?
People Lack of staff
Changes in management
Inadequate training/skills
Poor communication
Process Separation of development and operations teams
No systematic hazard analysis
Inadequate testing
Lack of oversight
Technology Incorrect trajectory modelling → Table 2
Tracking problems
Rejection of barbecue mode
Rejection of TCM-5


Table 2. Level 2 Technology: Incorrect Trajectory Modeling
Hazard:Impact/Re-Entry
Target:Mars Climate Orbiter
Barrier failure mode selected from the following choices:
  1. Impractical: impossible
  2. Impractical: uneconomic
  3. Fails: partially
  4. Fails: totally
  5. Not used: not provided
  6. Not used: by error
Barrier Barrier Failure Mode Reason for Failure?
Software Interface Specification No software audit to ensure SIS conformance
Poor navigation spacecraft team communication
Inadequate training on importance of SIS
Software Testing and Validation Unclear if independent tests conducted
Failure to recognise mission critical software
Poor understanding of interface issues
Incident Reporting Systems Team members did not use ISA scheme
Leaders fail to encourage reporting
Domain experts not consulted

Use Results of Barrier Analysis To Update ECF Charts

The results of barrier analysis should be incorporated in the development of Effects and Causal Factor (ECF) diagrams. Many of the barriers relate to distal factors, so barrier analysis typically helps to identify additional events that ought to be introduced into an ECF diagram because primary investigations often focus on catalytic events rather than events that weakened particular barriers.

Mars Climate Orbiter Incident

People Barriers

Lack of Staff

Firstly, there were insufficient staff. The primary investigation found that the staffing of the operations navigation team was inadequate. In particular, the … was responsible for running the … and the … in addition to the …. This loading had a particular effect on the … team.

The two individuals who led this group found it very difficult to provide the twenty-four hour a day coverage that was recommended during critical phases of a mission, such as …. The loss of … led to an increase in the number of nurses who were assigned to the …. In terms of the ealier …, however, this lack of personnel may have prevented the navigation team from sustaining their investigation into the anomalies that they found between the &hellip and the … systems. This, in turn, reduced the navigation team's ability to operate as an effective barrier to any navigational problems that might ultimately threaten the success of the mission.

Changes in Management

Changes in management prevented an effective response to the navigation problems. During the months leading up to MOI, the investigators found that the … team had "some key personnel vacancies and a change in top management". A number of further problems reduced management effectiveness in combating particular hazards. For example, there was a perceived "lack of ownership" by some … personnel who felt that the mission had simply been passed onto them by the … teams. A key management failure in this process was that the operations team had no systems engineering or mission assurance personnel who might have monitored the implementation of the process. This, in turn, might have helped to improve communication between these different phases of the mission.

Communication Issues

Poor communication appears as a separate explanation for the way in which human barriers failed to prevent mission failure. The investigators concluded that "the spacecraft operations team did not understand the concerns of the operations navigation team". The operations navigation team appeared to be isolated from the development team and from their colleagues in other areas of operations. Other problems stemmed from the nature of group communications during the cruise phase. For example, the navigation team relied on email to coordinate their reponse once the conflicts were identified in the navigation data. The investigators were concerned that this use of technology enabled some of the problems to "slip through the cracks".

Training Issues

Primary and secondary investigations also identified inadequate training as a potential reason why staff failed to identify the potential hazard to the mission. This was connected to the lack of key personnel because there was no adequate means of ensuring that new team members acquired necessary operational skills. In particular, there was no explicit mentoring system. The investigators argued that the "failure to use metric units in the coding of the … software used in trajectory modeling might have been uncovered with proper training." One particularly important area for concern was that the … team was not familiar with the attitude control system on-board the … "these functions and their ramifications for … navigation were fully understood by neither the operations navigation team nor the spacecraft team, due to inexperience and miscommunication". This lack of familiarity with spacecraft characteristics had considerable consequences throughout the incident. In particular, it may have prevented the operational navigation team from appreciating the full significance of the discrepancies that were identified.

Adding Barrier Analysis Events

Figure ? integrates our analysis of the human barriers to mission failure into an ECF chart. This diagram introduces a new event into the primary sequence. This denotes the decision not to isolate the damaged wheelchairs. It was introduced because the previous barrier analysis identified … as an important opportunity for preventing the hazard from affecting the target. Figure ? also uses the insights from the barrier analysis to explain why this opportunity was not acted upon. Lack of staff, inadequate training, management changes and poor communication between the operational navigation and spacecraft eams were all factos in the failure to perceive the significance of the AMD data anomaly. Figure ? also illustrates the way in which barrier analysis helps to identify key event sequences that may not have been identified during the initial analysis of an adverse occurrence.

PROCESS Barriers Fail to Protect



Separation of Teams

The operational staff lacked necessary training about the operating characteristics of the …. One reason for this was that the overall project plan did not provide for a careful hand-over from the development project to the operations staff. The … was also the first mission to be suppoerted by a multi-mission … Project. The operations staff had to assume control of the … without losing track of the A and B missions. These logistical problems were compounded by the fact that the … project was the first Jet Propulsion Laboratory mission in which only a small number of development staff were "transitioned" into the operations team. No navigation personnel made this move from the development of the … and its operation. This had a number of important consequences for subsequent events during the incident. In particular, the navigation team and other operational staff may have made a number of incorrect assumptions about hardware and software similarities between the A and B. The key point here is that the decision not to transition key development staff into the operation phase removed one of the procedural barriers that otherwise protect … missions. The navigational operations team might have realised the potential significance of the … anomaly if they had known more about the decisions that had informed the development of the …

A number of associated conditions show that the plans for this transition were less than adequate and that this was the first project for the multi-mission Mars project. The decision only to transfer a minimal number of staff helped to create the conditions in which operational teams made inappropriate assumptions about the similarity between the A and B. These erroneous nature of these suppositions is underlined by the changes in the solar array that are also noteed on Fig 10.9. Problems arise because although these incorrect assumptions stem from early in the transition from development to operations, they continue to have an influence throughout the incident. The condition that represents the potential for incorrect assumptions is surrounded by a double line. (and provides an important starting point for any subsequent attempts to distinguish root causes from contributory factors).



No Systematic Hazard Analysis

The lack of any systematic hazard assessment, for instance using Fault Tree analysis, had numerous consequences for the mission as a whole. This prevented engineers from considering a range of possible failure modes. It also prevented the development and operations teams from conducting a systematic assessment of what were, and what were not, mission critical features. In particular, some form of hazard analysis might have helped to identify that specific elements of the ground software could be "mission critical" for the operations navigation team. Finally, the lack of a coherent hazard analysis may also have led to inadequate contingency planning. The failure to conduct such an analysis had the knock-on effoct of removing a number of potential barriers that might have either detected the navigation software as as critical component prior to launch or might, subsequently, have encourage operations to reconsider contingency plans once the anomaly had been discovered.



Inadequate Testing

Further process barriers were undermined by the lack of any sustained validation at a systems level. Navigation requirements were set at too high a management level. In consequence, programmers and engineers were left to determine how best to satisfy those requirements without detailed guidance from others involved in the development process. These problems might not have been so severe had their consequences been detected by an adequate validation process. Several significant system and subsystem flaws were, however, only uncovered after the A had been launched. For instance, file format errors prevented the navigation team from receiving and interpreting telemetry from the ground system for almost six months. The investigators argued that there was "inadequate independent verification and validation of AA ground software (end-to-end testing to validate the small forces ground software performance and its applicbility to the software interface specification did not appear to be accomplished".



Lack of Oversight

The validation issues and the lack of any system level hazard analysis were exacerbated by a more general lack of oversight during the BB mission. There was little Jet Propulsion oversight of Lockheed ... subsystem developments. This created problems as the level of staffing was reduced during the transition from develpment to operations. Several mission critical functions, including navigation and software validation, received insufficient management oversight. It also became difficult to maintain lines of responsibility and accountability during the project.

Recurring questions in the investigation included "Who is in charge?" and "Who is the mission manager?" The investigators reported repeated examples of "hesitancy and wavering" whenever individuals attempted to answer the latter question. This is not surprising given the feelings of guilt and blame that often operators' reactions to adverse occurrences. Also one interviewee answered that the flight operations manager was acting like a mission manager without being designated a such.

The lack of oversight had an important effect on many diverse aspects of the … development and operation. If this oversight had been in place then it might have persuaded participants to be more circumspect in their assumptions about the … hardware and software characteristics. More coherent oversight might also have encouraged a systemic haard analysis, especially if more attention had been paid to the validation of high-level requirements.



Adding Barrier Analysis Events

It is difficult to decide whether or not a particular failure should be represented by the event that triggered the failure or by the conditions that form the consequences of that event. For example, the event lavelled [Decision not to perform an a priori analysis of what could go wrong on the MCO] might have been represented by a condition labelled [there was no systematic hazard analysis]. We use events to denote those stages in an incident that might become a focus for subsequent analysis.

Fall from wheelchair

Lack of Staff

The junior nurse is asked to transport the patient to the smoking area because the patient is in a wheelchair, but during the journey, she is recalled to help in the ward because of a decreased nursing staff (due to recent absences due to sick leave).



Training Issues

The junior nurse is straight out of nursing school, and has not yet completed the orientation program. No one is involved in assessing whether she is capable/has been trained for the tasks she is asked to undertake.



Communication Issues

The nurse leader very briefly told the junior nurse to take the wheelchair patient, with no explanation of risks or responsibilities.



Changes in Management

The nursing supervisor for quality in this ward was changed recently, and has not supervised any of the training programs for new nurses yet.



Adding Barrier Analysis Events

A non-event/condition: all of the linked conditions resulted in a ward where staff did not appreciate the risk of broken wheelchairs being left where patients could get them.

Wheelchair not locked away: waiting to be sent for repairs, but left in the open.
If all barriers functioned effectively, the yellow background would become transparent, and the sequence would be: well-trained personnel => know the risk of leaving wheelchairs out in the open => make sure wheelchairs are locked away while waiting to be sent to the repair shop.

        

Practice Barrier Analysis

Patient fell from wheelchair

References:

  1. C.W. Johnson, Failure in Safety-Critical Systems: A Handbook of Accident and Incident Reporting. University of Glasgow Press, Glasgow, Scotland, October 2003. www.dcs.gla.ac.uk/~johnson/book/
  2. Wikipedia. Mars Climate Orbiter. en.wikipedia.org/wiki/ Accessed March 7,1997.