A CONCEPTUAL FRAMEWORK FOR RESILIENCE ENGINEERING IN CONSTRUCTION
SAFETY
By
Don Wallace Schafer II

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
Construction Management - Doctor of Philosophy
2014

ABSTRACT
A CONCEPTUAL FRAMEWORK FOR RESILIENCE ENGINEERING IN CONSTRUCTION
SAFETY
By
Don Wallace Schafer II
Poor safety performance is a chronic problem that plagues the U.S. construction industry.
Some researchers contend that accidents are the result of disruptions. Disruptions are those
happenings that interrupt, or disturb, the normal course of work.
The overarching goal of this study is to explore schemes and methods to understand, harness,
and foresee disturbances that arise from demands placed on the construction operations of
project-based organizations that deliver the built environment. This study examines the emerging
paradigm of Resilience Engineering (RE) as a means to avoid, mitigate, and recover from
disruptions, considering these as the bell weather of untoward happenings on construction
projects that endanger worker safety. The ways that RE differs from traditional thinking about
dealing with disruptions, the current principles and practices of RE, elements of RE useful to
construction projects, and simulation of RE were critical elements addressed in this work.
RE is an emerging discipline that has been described as a paradigm shift about safety. The
RE approach recognizes the need for a progressive safety plan that is systems-based,
sociotechnical in its outlook, views work as a complex activity, and is concurrently reactive and
proactive in its vigilance to prevent accidents. In the RE outlook safety is a core value of the
organization. RE looks for ways to balance the tensions among ongoing production and
economic pressures and safety, recognizing the need to pull back or stifle production when safety
is threatened. A basic premise of RE maintains that adjustments are needed in systems given

that performance conditions are always underspecified. To be considered resilient, a system
must possess the abilities of anticipation, monitoring, response, and learning.
This work includes an extensive literature review, development of a framework that describes
how RE might be deployed in a construction setting, and a hybrid computer simulation
employing agent-based and discrete event modeling that demonstrates the RE principle of the
Efficiency-Thoroughness Trade-Off (ETTO).
The literature review revealed that RE has a rich history that is solidly built upon and expands
previous approaches to system disruption in general and safety management in particular. The
framework developed utilized the literature review and focused on translating RE premises and
principles to the construction industry. The main thesis of the work posits that the key features
needed for construction companies to act resiliently are for executives to consider resilience as a
quality of the system, to consider RE as a definitive positional strategy, to develop a “just”
culture to support RE implementation, and to view construction systems functionally as opposed
to structurally. Additionally, the framework offers guidance and instruction with regard to the
essential abilities of anticipation, monitoring, response, and learning. Finally, the hybrid
computer simulation proved to be a worthy exemplar of the possibility for agents with resilient
behaviors to populate and act in a simulated discrete-event production setting beset with
disruptions. Given that RE is an emerging paradigm, hybrid computer simulation may provide a
useful tool for researchers to scaffold as this concept is carried forward.

Copyright by
DON WALLACE SCHAFER II
2014

TABLE OF CONTENTS

LIST OF TABLES .......................................................................................................... vii
LIST OF FIGURES ....................................................................................................................... ix
Chapter 1: Introduction ................................................................................................................... 1
1.1 Background ................................................................................................................ 1
1.2 Defining Characteristics of the Construction Industry ...................................................... 4
1.2.1 Construction is Complex ........................................................................................... 4
1.2.2. Construction is Uncertain and Underspecified ............................................................. 4
1.2.3 Construction is Quick and Demanding ........................................................................ 6
1.3 The Construction Safety Problem .................................................................................. 7
1.4 A Brief Introduction to Resilience Engineering ............................................................. 10
1.5 Goals, Research Question, Objectives, and Method ....................................................... 11
1.5.1 Goal...................................................................................................................... 11
1.5.2 Research Questions ................................................................................................ 12
1.5.3 Objectives ............................................................................................................. 12
1.6 Chapter Summary ..................................................................................................... 13
Chapter 2: Literature Review........................................................................................................ 14
2.1 Introduction.............................................................................................................. 14
2.2 Disruption in General Industry and Construction Operations .......................................... 15
2.3 The Need for New Approaches to Safety – Background of RE ....................................... 21
2.3.1 The Evolution of Safety Thought ............................................................................. 22
2.3.1.1 Popular Accident Models ..................................................................................... 23
2.3.1.1.1 Sequential Accident Models .............................................................................. 24
2.3.1.1.2 Epidemiological Accident Models ...................................................................... 24
2.3.1.1.3.1 Normal Accident Theory ................................................................................ 29
2.3.1.1.3.2 High Reliability Organizations (HROs) ............................................................ 32
2.4 The Five Ages of Safety............................................................................................ 37
2.4.1 The First Age of Safety - The Technology Age ......................................................... 38
2.4.2 The Second Age of Safety - The Human Factors Age................................................. 40
2.4.2.1 Human Error ...................................................................................................... 42
2.4.3 Third Age of Safety – Management Systems and Culture ........................................... 47
2.4.3.1 Culture and Climate ............................................................................................ 48
2.4.3.1 Safety Culture .................................................................................................... 50
2.4.4 Fourth Age of Safety – The Integration Age ............................................................. 51
2.4.5 Fifth Age of Safety – The Adaptive Age ................................................................... 52
2.5 Current Understanding of Resilience Engineering ......................................................... 53
2.5.1 Background and Definitions of Resilience Engineering .............................................. 58
2.5.2 The Four Premises of Resilience Engineering ........................................................... 64
2.5.3 The Four Abilities of a Resilient Organization ........................................................... 65
2.5.3.1 Learning ............................................................................................................ 66
v

2.5.3.2 Monitoring ......................................................................................................... 69
2.5.3.3 Anticipation ....................................................................................................... 72
2.5.3.4 Responding ........................................................................................................ 75
2.6 Managing Performance Variability –“Making Ends Meet” ............................................ 78
2.6.1 Explanation of Performance Variability – The Efficiency Thoroughness Tradeoff (ETTO)
.................................................................................................................................... 81
2.7 Functional Resonance Analysis Method (FRAM) ......................................................... 84
2.8 The Resilience Analysis Grid (RAG) .......................................................................... 90
2.9 Other Understandings of Resilience Engineering.......................................................... 91
2.9.1 Stress-Strain Analogy for Resilience Engineering....................................................... 91
2.9.2 Madni and Jackson’s Resilience Engineering Framework .......................................... 94
Chapter 3: Methods ....................................................................................................................... 96
3.1 Introduction.............................................................................................................. 96
3.1.1 Conceptual Frameworks ......................................................................................... 98
3.1.2 Method ............................................................................................................... 100
3.2 Objective 1 ............................................................................................................. 102
3.3 Objective 2 ............................................................................................................. 103
3.4 Objective 3 ............................................................................................................. 104
3.4.1 Discrete Event Modeling ....................................................................................... 107
3.4.2 Agent Based Modeling .......................................................................................... 108
3.4.3 Multi-Scale Modeling ........................................................................................... 109
3.4.4 Anylogic Software ................................................................................................ 110
Chapter 4: Resilience Engineering Conceptual Framework for Construction Safety ................ 113
4.1 Purpose and Features of the Framework ..................................................................... 113
4.1.2 Linking Disruptions, Resilience Engineering, Safety, and Production ......................... 113
4.1.2.1 Disruptions can/may cause accidents .................................................................... 114
4.1.2.2 RE is proposed as a Formalized Approach to Understand Disturbances .................... 115
4.1.2.3 Two of the four premises of RE deal directly with the relationship between production
and safety .................................................................................................................... 115
4.2 The Elements of a Project-Based RE Construction Project ........................................... 116
4.2.1 Perspectives ......................................................................................................... 116
4.2.1.1 Resilience is a Quality of the System ................................................................... 117
4.2.1.2 Resilience is a Strategy ...................................................................................... 118
4.2.1.3 Culture for Resilience Engineering...................................................................... 119
4.2.1.4 View Systems Functionally ................................................................................ 121
4.2.1.5 The Four Abilities ............................................................................................. 124
4.2.1.5.1 Responding ................................................................................................... 124
4.2.1.5.2 Anticipating................................................................................................... 130
4.2.1.5.2.1 Patterns in Anticipation ................................................................................. 130
4.2.1.5.2.2 When to Anticipation .................................................................................... 132
4.2.1.5.2.3 General comments on Anticipation ................................................................. 134
4.2.1.5.3 Monitoring .................................................................................................... 137
4.2.1.5.4 Learning ....................................................................................................... 139

vi

4.3 Developing a Graphical Model for RE in Construction: The Software Development Process
and The Sociocognitive Framework for Engineering Work Systems ................................... 141
4.3.1 The Software Development Process ....................................................................... 141
4.3.3 The Resilience Engineering Model for Projects ....................................................... 142
Chapter 5: Resilience Engineering Conceptual Model Simulation ............................................ 147
5.1: Introduction .......................................................................................................... 147
5.1 Background of Conceptual Modeling........................................................................ 149
5.1.1 Developing the Conceptual Model ........................................................................ 150
5.1.2 Understanding the problem situation ..................................................................... 151
5.1.3 Determining the Modeling and General Project Objectives ...................................... 152
5.1.4 Identifying the Model Outputs.............................................................................. 153
5.1.5 Identify the Model Inputs .................................................................................... 153
5.1.6 Determining the Model Content ........................................................................... 154
5.1.7 Determining the Model Level of Detail ................................................................. 155
5.2 A Conceptual Framework to Describe Disruptions in the Construction Process............. 157
5.2.1 Step 1: Understand the Problem Situation .............................................................. 157
5.2.2 Step 2: Determine the Objectives .......................................................................... 159
5.2.3 Steps 3 and 4: Identifying the Model Outputs and Inputs ........................................ 160
5.2.4 Step 5: Determining the model content .................................................................. 161
5.2.5 Coding the Computer Model ................................................................................ 163
5.2.5.1 Experiment 1 Simulation................................................................................... 163
5.2.5.2 Experiment 2 Simulation................................................................................... 166
5.2.5.3 Experiment 3 Simulation................................................................................... 168
5.2.5.4 Experiment 4 Simulation................................................................................... 170
5.2.6 Understanding of the Experiments ........................................................................ 170
5.3 Verification and Validation ...................................................................................... 174
Chapter 6: Conclusions .............................................................................................................. 179
6.1 Review of the Research Goals, Questions, and Objectives............................................ 179
6.2 Contributions to Knowledge of this Research ............................................................. 180
6.3 Limitations of this Research ..................................................................................... 182
6.4 Future Agenda ........................................................................................................ 182
BIBLIOGRAPHY………………………………………………………………………………184

vii

LIST OF TABLES

Table 2.1: Probing Questions About the Ability to Learn ........................................................... 69
Table 2.2: Probing Questions about the Ability to Monitor ........................................................ 72
Table 2.3: Probing Questions about the Ability to Anticipate ..................................................... 75
Table 2.4: Probing Questions About the Ability to Respond ...................................................... 77
Table 4.1: Construction project guidance analysis item - Response…………………………...125
Table 4.2: Construction project guidance analysis item - Anticipate ………………………….134
Table 5.1: Template for consideration of level of detail by component type . ........................... 156
Table 5.2: Model Content………………………………………………………………………161
Table 5.3: Model Assumptions and Simplifications................................................................... 163
Table 5.4: Data Requirements .................................................................................................... 163
Table 5.5: Compiled Average Throughput of the system for each experiment. ........................ 171
Table 5.6: Compiled performance statistics of the model. ......................................................... 173

viii

LIST OF FIGURES

Figure 1.1: The Hierarchy of Socio-Technical Systems in Organizational Risk Management ..... 7
Figure 1.2: Plateaus in Overall Safety Reached after Interventions ............................................. 10
Figure 2.1: Disturbances registered at a construction site ............................................................ 17
Figure 2.2: A Generic Epidemiological Model............................................................................. 26
Figure 2.3: Reasons Swiss cheese Model ..................................................................................... 28
Figure 2.4: HRO Model that incorporates Collective Mindfullness Process................................ 37
Figure 2.5: Function/Activity representation and aspects............................................................. 87
Figure 2.6: Stress-strain state-space.............................................................................................. 92
Figure 3.1: Research Method ...................................................................................................... 101
Figure 4.1: Time Periods............................................................................................................. 133
Figure 4-2: Resilience Engineering at the Project Workface...................................................... 143
Figure 5.1: Robinson’s Conceptual Model in the Simulation Project Life-Cycle. ................... 151
Figure 5.2 : Listing the Project Aim and Objectives …………………………………………159
Figure 5.3: The model outputs ……………………………………………………………….160
Figure 5.4 : Identifying the model inputs …………………………………………………….160
Figure 5.5: Experiment 1 ............................................................................................................ 164
Figure 5.6: Sample Arrival schedule for Work Entities. ............................................................ 164
Figure 5.7: Resource Schedule ................................................................................................... 166
Figure 5.8: Animation for each Experiment ............................................................................... 166
Figure 5.9: Experiment 2, Type A Disruptions .......................................................................... 167
Figure 5.10: Experiment 2, Type B Disruptions ...................................................................... 1678

ix

Figure 5.11: Experiment 3 Java coding for Trade B. .................................................................. 169
Figure 5.12: Experiment 4 Java coding for Trade B. .................................................................. 170
Figure 5-13: Graph showing the behavior of the production system subject to disruptions and
increasing production pressure. .......................................................................................... 172

x

Chapter 1: Introduction
1.1 Background

It is not uncommon to learn that a construction project has been delayed or postponed. The
reasons proffered vary widely, for instance, needed funding may be withheld or suddenly
unavailable, or perhaps the prime contractor or a key subcontractor went bankrupt. Another
scenario might involve a shortage of labor or a labor strike that shuts down work temporarily. A
common thread among projects is that they are subject to the vagaries of disruptions that
interfere with commonly desired project outcomes such as customer satisfaction, on-time
completion, profitability, and safety. Disruptions are those happenings that interrupt, or disturb,
the normal course of work. Disruptions may assume many different forms and combinations that
can bedevil the best planning, scheduling, and risk-aversion efforts. They range from the
mundane, such as a key employee calling in sick, to the catastrophic, such as a crane collapse
that shuts down the project for several months. In some cases, several seemingly unrelated
disruptions may combine in previously unforeseen and unimaginable ways. Disruptions are
rarely thought of as fortuitous events and frequently are the precursor to some degree of failure,
such as physical injury or financial loss.
In the U.S. construction industry there does not appear to be a definitive answer to the
question “Why do disruptions occur?” It is the goal of this work to answer that question by
exploring schemes and methods to understand, harness, and foresee disruptions that arise from
demands placed on the construction operations of project-based organizations that deliver the
built environment. The underlying assumption of this goal is that a disruption free construction
project is desirable.

1

However, as with many other undertakings that seek to understand complex socio-technical
problems, achieving the goal as stated is a tall order. Disruptions may occur in many different
aspects of a project as mentioned above. It is prudent therefore to break down the analysis into
more focused and manageable portions. For instance, disruptions could be examined with
respect to finance, the supply chain, labor, or any combination or multitude of disruptions that
have been observed on construction projects. The scope of this work involves understanding
production workflow related disturbances. Geographically, this discussion is limited to the
United States of America (U.S.A.), although it is envisioned to apply to other countries with
similar means and methods of construction.
This work posits that the emerging paradigm of Resilience Engineering (RE) is a way to
begin to better understand how to handle the observed phenomena of disruptions on construction
projects and to begin to answer the question of “Why do disruptions occur?” This will get us
closer to the paradigm shift needed to make the next improvements in construction safety. In a
nutshell “ Resilience Engineering looks for ways to enhance the ability of organisations to create
processes that are robust yet flexible, to monitor and revise risk models, and to use resources
proactively in the face of disruptions or ongoing production and economic pressures.” (Nemeth
et al. 2009).
RE is based on four major premises (Hollnagel et al. 2011). First, that performance conditions
are always underspecified. Second, that adverse events can be understood as the result of
unexpected combinations of performance variability. Third, that safety management must be
both proactive and reactive. Finally, that safety can neither be isolated from the core business
process (e.g., production operations), nor vice versa. A resilient system must possess the

2

intrinsic abilities to respond, anticipate, monitor, and learn at all levels of the organization
(Hollangel et al. 2011).
Many operations and industries are turning to RE to better handle disruptions (and the failures
it may bring) as a means to proactively avoid them, to deal with and mitigate disruptions in realtime (i.e., as they are occurring), and to recover from them. RE is an idea that is gaining traction
in process-centric and high-risk industries such as oil and gas operations, the nuclear industry, air
traffic management, and health care, to name but a few.
In this work, the concept of RE is utilized to devise a conceptual framework of how RE could
be applied to construction operations to combat disruptions. Then the idea that RE might be an
avenue to understand, harnesses, and foresee disturbances is explored via a conceptual modeling
approach. The use of the simulation modeling methods of agent-based and discrete event are
employed to perform experiments that include disturbances in a production setting populated by
agents. Given that a basic tenet of RE is concerned with the relationship between safety and
production, the simulation utilizes discrete event modeling in a production setting. The agents
simulate the behavior of a trade crew or installers.
The ideas presented in this brief introduction are expanded upon in the following chapters.
The remainder of this opening chapter will briefly characterize the nature of the United States
construction industry, discuss the chronic construction safety problem, list and then briefly
elucidate the goals, research questions, and objectives of the dissertation. Finally, the remaining
chapters are summarized.
To begin to understand disruptions in the construction industry it is informative to first briefly
examine the nature of the construction industry.

3

1.2 Defining Characteristics of the Construction Industry

In 1989 Oglesby, Parker, and Howell (1989) described construction projects as complex,
uncertain, and quick. On closer examination, a fourth descriptor emerges – that working
conditions are underspecified.
1.2.1 Construction is Complex

Construction is complex. Bertelsen (2003) argues that even though “The general view of the
construction process is that it is an ordered, linear phenomenon, which can be organized, planned
and managed top down” however, upon closer inspection he observes that “construction is
indeed a complex, nonlinear and dynamic phenomenon, which often exists on the edge of
chaos.” A complex system is one which has, within itself, a capacity to respond to its
environment in more than one way, and to select among the options in some way (Miller and
Page 2007). Simon (1997) defined a complex system as one made up of a large number of
individual parts that have many interactions.
A key concept of complexity is that systems emerge to something that is greater than the sum
of its parts.
1.2.2. Construction is Uncertain and Underspecified

Construction projects, like most other human undertakings, have uncertain futures. Uncertainty
is defined by the Construction Industry Institute (CII) (1989) as “The gap between the
information required to estimate an outcome and the information already possessed by the
decision maker.” Uncertainty resides in all human activity but seems to be magnified in the

4

construction industry. Wild (2005) posits that uncertainty arises from asymmetric information,
limited information processing capacity, and bounded rationality. In other words, construction
managers either cannot gather or do not share all of the information necessary to reduce
uncertainty and this hinders the evaluation of all of the possible outcomes of decisions made for
a complex construction project. Even if it were possible to determine all of the possible outcomes
resulting from decisions few firms have the necessary resources (e.g., time and money) to
consider all of the possibilities for the available information. Given this state of limited
information firms make the best decisions possible in an opportunistic manner that suits their
self-interest. The resulting outcome is a series of optimized tasks or phases resulting in a suboptimized project.
Closely allied with uncertainty is the notion that working conditions are underspecified. No
amount of pre-planning effort can achieve a complete description of conditions that the frontline
worker will encounter. There are multiple scenarios and situational developments to account that
are nearly impossible to describe beforehand. This is compounded by the interactions of various
trades (plumbing, electrical, carpentry) and their corresponding supply chains and organizational
structures that are simultaneously working together on site and may have competing goals for
resources. It is the job of these frontline workers to “make ends meet” given the unique
conditions encountered on-site and complete the project according to plans and project
specifications, along with the demands outlined above. Underspecification is directly related to
the performance variability of workers that is discussed in Chapter 2. In a nutshell, performance
conditions are usually underspecified. Individuals and organizations must therefore adjust what
they do to match current demands and resources. Because resources and time are finite, such
adjustments will inevitably be approximate (Nemeth et al. 2009).

5

1.2.3 Construction is Quick and Demanding

The mantra of the modern business world is “better, faster, cheaper.” Broadly this might be
termed the ‘demands’ placed on the construction industry to rush to completion. However, there
are several dimensions to the demands placed on the industry. Rasmussen (1997) captured and
abstracted the demands placed on the typical organization in a 1997 paper titled “Risk
management In A Dynamic Society: A Modelling Problem” and in a follow-up report to the
Swedish Rescue Services Agency (SRSA) (Rasmussen and Svedung 2000). He succinctly
captures the problems facing modern risk analysis by arguing that present models of accident
causation are inadequate, narrowly defined, and do not reflect the changing social landscape. He
argues that risk management must be modeled by cross-disciplinary studies, done in a control
theoretic manner, and considered on a socio-technical systems basis. Figure 1.1 describes this
outlook. Rasmussen argues that there should be “vertical alignment” across the levels. That is,
information about what is happening at the workface should be communicated up through the
column while decisions should propagate down through the column. The vertical
interdependencies are essential to the functioning of the system and should be primary
considerations when attempting to reduce risk. However, Rasmussen points out that much of the
research is conducted horizontally by the various disciplines. For instance, research is conducted
on the management level in isolation of the work level. The environmental stressors listed, such
as the fast pace of technological change and changing competency levels, occur at different
levels and on various time scales.

6

Figure 1.1: The Hierarchy of Socio-Technical Systems in Organizational Risk Management
(Rasmussen 1997)

Next, the chronic construction safety problem is reviewed as well as the phenomena of
“plateauing” with respect to improvements in safety.
1.3 The Construction Safety Problem

Although safety is a widely touted priority in the construction industry, in the period from 1992
to 2005 an average of about 1,147 worker deaths per year were reported in the U.S.(CPWR
7

2008). Construction safety appears to be a chronic problem in the industry. Based on statistics
alone, safety appears to be improving somewhat in the construction industry. The most recent
version of the “Construction Chart Book” (CPWR 2008) reports that rates for construction
overall work-related fatalities have decreased by 22% in the period from 1992 to 2005 and
nonfatal injuries and illnesses with days away from work (DFW) dropped by 55% in the same
period. Falls and electrocutions, the leading causes of death in the industry, have declined over
the past 15 years. The Chart Book attributes these improvements to focused efforts on
prevention.
Despite the improvements, 16,068 construction fatalities were recorded in the same period, an
average of about 1,147 worker deaths per year. Based on 2005 statistics from The Chartbook,
only the agriculture, mining, and transportation industries had higher annual fatality rates (per
100,000 workers) and only transportation industry had a higher rate of nonfatal injuries and
illnesses with days away from work (DFW) (per 10,000 workers). These numbers are
disproportionately high compared to other industries given that construction workers only
account for approximately 7% of the workforce. Other statistics reveal the instance of elevated
lead blood levels is disproportionally high in construction workers compared to other workforce
sectors and about 41% of construction workers over age 55 were diagnosed with hypertension in
2005 (CPWR 2008). It appears there are many areas for improvement in construction worker
safety, health, and well-being.
Another discouraging aspect of the construction safety problem is that it is costly. In 2002
dollars the total (direct and indirect) costs of fatalities and nonfatal injuries was $13 billion.
Construction researchers frequently speak about reaching a “plateau” with regard to
construction accidents. Groeneweg (1998) contends that plateauing (his term is “stabilizing”) is a
8

natural occurrence in an effort to reduce accidents. In a typical scenario, a company recognizes
that the number of accidents is a problem and takes measures to decrease that amount. The
efforts (i.e. audits, inspections, safety meetings) yield positive results and accidents are reduced –
for a while. Stabilization often takes place even with more effort being expended. To break the
pattern more measures and countermeasures are implemented; it is as if the former actions have
lost their impact on safety. Groeneweg (1998) observes that, after safety improvements are put
into place, that “Routinenisation and normalizations of programs and initiatives, the aging of
systems, and the sometimes intrinsically hazardous environment seem to push the number of
failures back up again.”
Taking an historical overview of industrial safety it is seen that plateaus are not uncommon.
From 1937 to 1956 the fatality rate for U.S. workers in all industries decreased from 43 to 23
deaths per 100,000, respectively (Groeneweg 1998). This decrease was attributed to the
intervention of engineering controls applied to industrial hardware, ergonomics science, and the
increased use of personal protective equipment (PPE). Another factor was a decrease in fatigue
due to federally mandated lowering of working hours. In the 1960s and 1970s the number
industrial accidents stabilized. Efforts were expended on employee behavior in the form of
motivational programs and improving the quality of associates. The 1980’s saw a 21% decline in
the accident death rate. The current focus is on improving the fatality rate via sociotechnical
organizational change while engineering controls and behavior modifications are ongoing.
Figure 1.2 summarizes these trends.

9

Figure 1.2: Plateaus in Overall Safety Reached after Interventions (adapted from Groeneweg
1998)

This thesis posits that a way to deal with the disruptions that surround the safety problem in
the construction industry is found in the emerging discipline of RE as introduced below.
1.4 A Brief Introduction to Resilience Engineering

Resilience Engineering (RE) is a different way to look at safety and is unique compared to other
approaches in its perspective (Hollnagel et al. 2008). In RE, failure and success are viewed as
opposite sides of the same coin, both depend on normal performance variability of the
production process of the system. Workers are not viewed as “cogs in a machine” but as humans
whose performance naturally varies. Mostly this variance maintains system stability but on rare
occasions it does not. The goal is to dampen the variability that precedes failure and to amplify
the variability that produces success (Hollnagel et al. 2008). A resilient system aims to adjust its
functioning prior to or following changes and disturbances so that it can continue functioning
after a disruption or a major mishap and in the presence of continuous stress (Hollnagel et al
2006).
10

The approach of Resilience Engineering is to look for ways to maintain control of a system
and harness (but not constrain) performance variability. Additionally, it strives to apply the skills
of foresight and imagination to proactively promote needed change and to constantly monitor
and adjust the safety model as project conditions vary or may vary.
RE is an emerging discipline that has been described as a paradigm shift about safety in the
“Kuhnian” sense (Woods and Hollnagel 2006). RE builds upon previous approaches such as
those found in the Normal Accident Theory (NAT) and High Reliability Organization (HRO)
approach.
The RE approach recognizes the need for a progressive safety plan that is systems-based,
sociotechnical in its outlook, views work as a complex activity, and is concurrently reactive and
proactive in its vigilance to prevent accidents. In the RE outlook safety is a core value of the
organization. In a complex, uncertain, and quick world of business, RE looks for ways to
productively balance the tensions among ongoing production and economic pressures and safety,
recognizing the need to pull back or stifle production when safety is threatened.
The goal, research questions, objectives and method of this work are described below.
1.5 Goals, Research Question, Objectives, and Method
1.5.1 Goal

The goal of this study is to explore schemes and methods to understand, harness, and foresee
disturbances that arise from demands placed on the construction operations of project-based
organizations that deliver the built environment.

11

1.5.2 Research Questions

The research questions are:
1. How does RE differ from traditional ways of thinking about how to deal with
disruptions?
2. What are the current principles and practices of RE?
3. What elements of RE may help a construction project avoid, survive, and recover
from disruptions?
4. How can we begin to simulate disruptions in construction operations and use RE
principles?

The focus, or scope, of this work is on the disruptions that may create the conditions
conducive for an accident. However, the Research Questions could be a springboard for future
researchers to examine other scopes of interest that arise in construction operations such as
disturbances from financial or logistical difficulties. The first two questions are considered from
the literature review. Questions three and four comprise the Conceptual Framework presented.
1.5.3 Objectives

The Objectives of this research are:

1. Abstract the concept and underlying theories of RE and explore RE deployment in nonconstruction industries for use in formulating Objectives 2 and 3.

2. To present a RE conceptual framework for construction safety.

12

3. To explore RE implementation in construction production settings that experience
disruptions in a formalized way using hybrid computational methods.

1.6 Chapter Summary
The research presented is contained in six chapters. This chapter began with the observation that
disruptions are endemic to the construction industry and examined the general characteristics of
the construction industry and the specific problem of safety. It was hypothesized that the
paradigm of RE is an approach to break the plateau that safety efforts seem to be mired in. The
Goal and Objectives of this research were discussed as was the proposed method.
Chapter 2 presents a background on the relationship between safety and production as well as
the evolution of RE.
Chapter 3 outlines the methods used in the study.
Chapter 4 introduces a new conceptual model, developed by the author, for understanding
disturbances in the paradigm of RE. The chapter discusses building the computational model.
The chapter concludes by discussing the steps taken to experiment with the computational
model.
Chapter 5 presents the Conceptual Model and results and discussion of the simulation.
Finally, Chapter 6 contains conclusions of the research and its contributions to knowledge. The
chapter also suggests possible areas for future research in RE. The Appendix provides the model
code and raw data.

13

Chapter 2: Literature Review
2.1 Introduction
This literature review has three primary purposes. First, to examine the knowledge and research
associated with disruptions in industry. Second, to provide the reader with the background
necessary to understand how RE evolved from safety thought and approaches throughout the
industrial age and how RE diverges from and expands upon popular safety approaches. Third, it
presents the current state of RE to the reader as well as other understandings of RE.
It first explores the current state of research of construction disruptions. Given that the
research of disruptions is sparse; this information is brief and straightforward.
The second and third purpose of this literature review is to fully explore the emerging
paradigm of RE. This is guided by first focusing a skeptic’s eye on the four premises of RE as
presented in Chapter 1. The first three premises are primarily examined by looking at popular
models of safety and the evolution, or “ages,” of safety thought in order to examine why the new
paradigm of RE may be useful. In short, the first three premises challenge the notions that
human variability at the workface is a threat to safety, that malfunctions always occur in a linear
fashion and always have a root cause, and finally that the calculation of failure probabilities
through techniques such as fault tree analysis, while useful, only partially explain the mechanism
of accidents. The background essential to understand RE is also presented in the context of the
first three premises, namely, the “Normal Accident Theory” and that of “High Reliability
Organizations.” The so-called “softer factors of culture and climate as they relate to the
understanding of RE are also introduced. The concepts and applications of RE are then
presented as they are currently understood in the literature. Finally, alternate understanding and
approaches to the RE paradigm are presented.
14

The review begins by looking at how disruptions are described in the literature,
2.2 Disruption in General Industry and Construction Operations
The purpose of this section is to discuss disruptions in general as well as how they are
understood in construction operations. Disruptions are those happenings that interrupt, or
disturb, the normal course of work. Disruptions are a fact of life on many construction projects.
Additionally, they may contribute to the performance variability of a system. In general,
“Resilience represents the ability of a system to adapt or absorb disturbances, disruptions, and
changes and especially those that fall outside the textbook operational envelope” (Woods et al.
2007).
The Association for the Advancement of Cost Engineering (AACE) defines disruption as “an
action or event which hinders a party from proceeding with the work or some portion of the work
as planned or as scheduled” (AACE 2004). Ibbs et al. (2007), in the context of discussing owner
or contractor change upon a project, notes that some studies define disruption as “…the
occurrence of events that are acknowledged to negatively impact on labor productivity.”
Kuivanen (1996) proposes defining a disturbance in a general way as an “unplanned or
undesirable state or function of the system. ”Disturbances are stochastic in nature and are
difficult to foresee and therefore difficult to plan for (Lindau and Lunmsden 1995). They can
range from mildly annoying and having just a slight impact on the project, for instance a crew
waiting on a late ready-mix concrete delivery, to more impactful events, such as demolishing
and replacing a misplaced concrete wall or a crane collapse. Traditionally disturbances have
been primarily evaluated from either a legal or contractual point of view but have rarely been
examined in the context of the production site or to the assessment of construction processes

15

(Gehbauer et al. 2007). The topic of disturbance management receives little attention in
construction research.
Gehbauer et al. (2007) defines a construction project disturbance as an “"unexpected
occurrences causing an interruption or at least a delay in the execution of tasks; they cause a
significant discrepancy between the target and actual data" and note that “target data usually
refer to time or cost-related operations.” They observed a construction project for twenty days
for disturbances in order to build a disturbance database. The database is categorized into four
broad areas: project description, data regarding the observed disturbance, disturbance
elimination, and disturbance effects. The results of the observations are shown in Figure 2.1.
Execution errors (42%) were observed most frequently, followed by information errors (27%),
delivery problems (12%), and then planning errors (8%). The term “disturbance factor” is used
to describe a reason for a target-actual discrepancy. They define “construction operations” as the
project under consideration and “construction processes” as any sub-section of the project, such
as masonry work. Operational disturbances may be external (i.e., those related to natural,
legislative or economical events) or internal such as those occurring in procurement, sales and
construction site work. “Disturbances are further subdivided into personnel-, material- and arearelated disturbances.” Here, area-related refers to the sub-process under examination, such as
masonry or formwork. The authors also introduce primary and induced disturbance factors and
describe them thusly: “Primary disturbance factors are deviations caused by independent actions
within the same area. A primary disturbance arises in construction site work when for example
the forms for the pouring of a concrete wall burst because the locks were forgotten during the
assembly of the forms. In contrast, an example of an induced disturbance would be the opening
of form locks too soon because of an error made when calculating the concrete pressure. Induced

16

disturbances are deviations originating in another area.” Safety is not mentioned in the Gehbauer
et al. (2007) paper.

Figure 2.1: Disturbances registered at a construction site (Gehbauer et al (2007))

Jackson (2010), speaking of the general industry, states that “Accidents are the result of
disruptions, and the resilience of the system to disruptions will depend on the nature of the
disruption. The goal of resilience is to avoid, survive, and recover from disruptions.” Madni
and Jackson (2009), speaking in the context of RE define disruption as “conditions or events that
interrupt or impede normal operations by creating discontinuity, confusion, disorder or
displacement.” They can take the form of operational contingencies, natural disasters, terrorism
and political instability, and financial meltdown.

17

Disruptions can also be described as either Type A or Type B (Jackson 2010, Madni and
Jackson 2009). Type A disruptions are external to the system and are exampled as earthquakes,
floods, tornados, and so forth. Type A disruptions can be caused by the influence of one system
upon another. For instance, an aircraft flying too close to another may exert an aerodynamic
wake and damage the structural integrity of a nearby jet.
Type B disruptions are systematic in nature and are “a disruption of function, capability or
capacity.” The spaceflights of Apollo 13 and the shuttle Columbia could be broadly classified as
Type B failures. In the first case, Apollo 13 was resilient and survived; in the second case,
Columbia was brittle (i.e. the opposite of resilient) and met with disaster. Type B disruptions
are in socio-technological systems and its sources include humans, automated systems, and
combinations of these (Madni and Jackson 2009, Jackson 2009). These are collectively termed
“agents” and are classified as human agents, automation agents, and multiagent agents.
Type B disruptions can also be termed predictable or unpredictable. A predictable disruption is
one that has been previously uncounted and is usually accounted for in design and operations.
“Unpredictable disturbances can occur, either because a phenomenon was unknown to modern
science, or because it was unanticipated/unknown to the systems designers” (Madni and Jackson
2009). A special case of a Type B disruption is called a “Disruption of Unreliability.” A
condition of this is that the component has been verified by historical or test data to have failed
by “mean time between failure” (MTBF) standards established for the component. For instance
a formwork lock, is considered a disruption if it fails prematurely from its verified useful life.
Otherwise, the component is categorized as “management failure to assure proper verification”
(Jackson 2009). Finally, disruptions can be caused by latent conditions, as described above in
Reason’s Swiss Cheese Model (Madni and Jackson 2009, Jackson 2009)).

18

Lindau and Lumsden (1995) examined disturbances in manufacturing with the aim to
“…classify safety actions used in manufacturing companies and to evaluate their efficiency in
preventing the propagation of disturbances from a holistic perspective.” In their study they
classify a disturbance as “as an event which affects a planned resource movement in such a way
that a deviation from plan occurs.” They classified the actions to prevent the propagation of
disturbances as either formal or informal. Formal approaches are actions considered in the
planning stage to absorb the effects of a disturbance. They include safety stock, safety capacity,
safety lead time, overplanning, expediting, and subcontracting. Safety in this sense means to
prevent interruptions to the work flow and does not concern the worker.

Informal actions are

used when the actions of the formal are not effective in absorbing the disruption. Informal
actions include subcontracting, expediting, partial delivery, short-term replanning, and
reservation breaking (i.e. ordering materials sooner than needed). To analyze the effects of
common disturbances on overall system performance the authors observed the common
disturbances of material shortage, absenteeism, machine breakdown, tool shortage, and technical
documentation shortage. They noted the formal and informal actions taken to absorb the
disturbances on the system and developed a mean absorption ability, relative absorption ability,
and general absorption ability to gauge the effectiveness of the actions while broadly considering
the cost and upset to other parts of the system. The authors concluded that there is no efficient
way of preventing the propagation of disturbances that is concurrently cost efficient or unsettling
to the system.
Barroso and Wilson (1999), writing in the manufacturing area, define disturbance as “any
event which has not been planned for or which is undesirable and that reduces or has the
potential for reducing overall system performance in terms of either production or safety

19

requirements and goals.” They report that a common finding among several authors who have
analyzed the process of industrial accident causation is that accidents tend to occur more often in
abnormal, conflicting system states than in normal ones and that the number of accidents
occurring while the operator is involved in troubleshooting or disturbance control tasks varies
between one-third and two-thirds of the total of accidents analyzed. In terms of production they
report that that between 80% and 94% of the disturbances registered have an effect on
production, stressing that 27% of the disturbances resulted in material damage. Barasso and
Wilson (1999) summarize the categories for “Consequences of Disturbances” from several
researchers, the categories include: no effect, presence of risk factors, hazardous situation, minor
and serious accident, catastrophe, fatal, nonfatal lost time, and nonfatal non-lost time.
Toulous (2002) writes about operator intervention in automated batch production systems
and defines a disturbance thusly, “A disturbance corresponds to a variation in the state or
function of the system that requires operator intervention to avoid production shutdowns,
material damage, defects in quality, or to return the automated production system to its operating
state following an unanticipated shutdown or the appearance of defects in the product.”
Toulouses’ study mainly explored how operators came into contact by manual intervention with
a source of energy required to run the system after a disturbance. He also reports that research
has shown that 50% of disturbances reduce the operators’ safety, and one accident occurs for
approximately 2% of them.
Even if the understanding of disruptions is not well known and the literature on the concept is
sparse (Toulouse 2002, Barasso and Wilson 1999), they are phenomena that must be anticipated
and avoided. RE is proposed as a way to formally understand, harness, and foresee disturbances

20

that arise from demands placed on the construction operations of project-based organizations that
deliver the built environment. It is discussed in the following sections.

2.3 The Need for New Approaches to Safety – Background of RE

RE protagonists argue that while historical approaches to safety have been fruitful and improved
safety over time, the approaches of the past that model systems as linear and simple do not
contribute to the understanding of today’s complex industrial world. The focus has been, as Hale
and Hovden (1998) point out, on the “negative” or the unreliable, weak, and problem-generating
areas of a socio-technical system and to fix the most recent breakdown or flaw. This adjustment
is often made just after a high-profile disaster. Many times this is done by erecting some form of
barrier, either physical or rule-based and placing blame on technical, human, or organizational
features of the system. However, following Perrow’s ideas that accidents are “normal” and are
the “unanticipated interaction of multiple failures” (Perrow 1984), RE seeks to identify how
work is performed under “normal” conditions and resource pressures (especially time and how
system variability is handled on a daily basis). The identification of possible interactions of
multiple failures of the system and how to eliminate or mitigate them is also considered.
The following sections examine different approaches to understanding safety from the advent
of the industrial age to the present. Researchers have made several attempts to delineate and
trace the genealogy of safety thought. There are conflicting opinions about the evolution and
trajectory of safety science. In short, there does not seem to be a consensus with regard to how
safety is understood and best practiced in both general industry and construction. However, two
lines of thought are useful to explain how RE came to be and why it came to be. First, the “Five
Ages of Safety” as identified by Borys et al. (2009) provides a direct path to the current need for
21

and understanding of RE. Additionally, it is useful to examine prominent and popular models of
safety to help to understand the background of RE. Along the way, the concepts behind High
Reliability Organizations (HRO’s) and Normal Accident Theory (NAT) are examined as are the
concepts of climate and culture as they relate to RE.
2.3.1 The Evolution of Safety Thought

The Literature Review explores the background and justification of RE by examining both the
“Ages” of safety as well as popular safety models. It is informative to briefly look at the
different ways researchers have divided safety into various epochs or categories for analysis.
Hale and Hovden (1998) delineate safety management efforts in terms of three ages, legislative,
human factors, and a management age. Groeneweg (1998) divided safety evolution into periods
of engineering, employee, and organizational control as illustrated in Figure 1.2. Hollnagel
(2004) posits that the focus on each area might stem from the “…strong and natural tendency to
look for explanations or causes of the systems that fail most frequently or which in some ways
are conspicuous.” So, for instance, engineering failures were most prevalent as materials and
engineering sophistication struggled to keep pace with changes in the early part of the 20th
century; correspondingly accidents seemed most reasonable to explain in terms of human errors
around the 1950’s and organizational struggles seemed a likely target in 1980’s. Finally,
Hollnagel (2004) categorizes accident models as occurring sequentially, epidemiologically, and
systematically. These approaches have overlap of theories and models and exhibit the absence
of a consistent worldview of how safety is achieved.
. Borys et al. (2009) built upon Hale and Hovden’s (1998) three ‘ages of safety’ to better
reflect current understanding of modern complex systems. They discuss the fourth age of safety
as introduced by Glendon et al. and then introduce a fifth age of safety which they call the
22

adaptive age which is informed by Resilience Engineering. It is useful to examine these five
ages to learn not only the precepts upon which Resilience Engineering is built but also to unearth
how prior attempts at risk management fail to explain and manage some of today’s complex
safety problems. Categorizing the evolution of safety thought chronologically is not ideal,
however, reviewing other researcher’s attempts to corral safety thought is the best available
option for the purposes of this thesis along with Hollnagel’s take on the evolution of accident
models that are classified as sequential, epidemiological, and systematic because they highlight
the “negative” outlook and represent, even today, popular outlooks on safety by professionals
and allow the logical introduction of systems approaches of the NAT and HROs.
Popular accident models are first examined followed by the “Five Ages of Safety.”
2.3.1.1 Popular Accident Models

Hovden et al. (2010) writes “Most accident models and theories applied in the field of
occupational accidents are still based on the ideas in Heinrich’s Domino Model, Gibson’s and
Haddon’s epidemiological models of energy-barriers, and are using a closed system safety
mindset with mechanistic metaphors to describe the conditions, barriers and linear chains of an
accident process.” Because of this popularity they are described below. In general, “Sequential
and epidemiological accident models are inadequate to capture the dynamics and nonlinear
interactions between system components in complex sociotechnical systems” (Qureshi et al.
2009). Researchers developed systematic socio-technical models to move beyond the linear and
epidemiological models in order to better understand modern accidents and to better identify
risk. Popular systems models include Normal Accident Theory (NAT) and High Reliability
Organizations (HROs). These are described below given that they are crucial to the
understanding of RE. Additionally, the Functional Resonance Analysis Method (FRAM) is
23

discussed. The FRAM is described because it “proposes a methodology to identify and assess
performance variability. Based on a functional modeling, the FRAM shares Resilience
Engineering assumptions about the complex socio-technical systems underspecification and
recognizes in it the need for local adjustments” (Macchi and Hollnagel 2011).
2.3.1.1.1 Sequential Accident Models

A sequential accident model describes an accident as the result of a sequence of linear events
(Hollangel 2004). Qureshi et al. (2007) points out that the assumption in these types of models is
that the cause-effect relationships between consecutive events are linear and deterministic and
that a single initiating factor can be found that triggered the offending event. Therefore, if that
factor can be removed then the accident can be avoided. However, Qureshi et al. (2007) states
that “The reality is that accidents always have more than one contributing factor.” Sequential
models are useful for describing component failures or human error in simple systems but are an
oversimplification in many cases. (Hollnagel 2004, Qureshi et al. 2007). “The underlying
assumption, as illustrated by the domino model, is that an accident is the result of a sequence of
events and that causes, once they have been found, can be eliminated or encapsulated, thereby
effectively preventing future accidents” (Hollnagel 2004). An advantage of sequential models
is that they are easy to communicate graphically and are easily understood as compared to multicausal reasoning (Hollnagel 2004).
2.3.1.1.2 Epidemiological Accident Models

The 1979 investigation into the nuclear core meltdown and radiation release occurring at the
Three Mile Island Nuclear Generating Station in Pennsylvania prompted accident researchers to
explore more sophisticated accident models. Sequential models, while useful for simple accident
24

scenarios, could not explain this complex accident. Hollnagel (2004) explains that the use of
epidemiological accident models was used to explain the accident because sequential models
were not powerful enough to explain what happened. The epidemiological approach to industrial
accident analysis is borrowed from the medical field. Lingard and Rowlinson (2005) note that
Gordon and Suchman pioneered the use of the epidemiological approach for industrial accident
prevention in 1949 and 1961, respectively. It was observed that the occurrence of occupational
injuries bears a resemblance to the study of infectious and non-infectious diseases (i.e. the
science of epidemiology) and that the techniques used in the medical field might be transferred
to the study of occupational health. Lingard and Rowlinson (2005) describes accident causation
as derived from the combination of at least three sources: the host, agent, and environment. The
host is the person to whom the injury or illness occurred. Hosts may have characteristics that
promote certain types of injury or illness such as physiological features (e.g. strength, age, and
gender), levels of training, and competence, and motivation or behavior issues. The agent is the
deliverer of the injury or illness and can be physical, chemical, or biological in nature. The
environment is the physical, biological, and socio-political aspects of the work environment,
although socio-political aspects are rarely considered in construction site accidents. A generic
epidemiological model modified for the construction case is shown in Figure 2.2.

25

ENVIRONMENT
Physical: site layout, noise
levels, housekeeping,
temperature, ventilation.
Biological: sanitary
conditions, insect bites.

BARRIERS,
DEFENSES

HOST: age,
strength gender,

AGENT: tools,
equipment,
chemical, building
components

Figure 2.2: A Generic Epidemiological Model: an unsupportive environment may weaken
defenses, the Host has defenses against attacks (adapted from Hollnagel (2004) and Lingard
and Rowlinson (2005))

Epidemiological models differ from sequential models in four main areas (Hollnagel 2004).
First, the neutral term “performance deviation” replaced the idea of human error when discussing
an unsafe act. Performance deviation can include a component or a human and thus took the
focus off of blaming human error entirely for an accident. Second, discussing environmental
conditions leaves the door open to discuss if multiple causes could have contributed to the
accident. Lingard and Rowlinson (2005) describe epidemiological models as “…consistent with
the concept of multi-causality.” Multi-causality simply states that “Contributing causes combine
26

together in a random manner resulting in an accident.” Third, barriers are incorporated in to the
model that could prevent unintended consequences and thus the accident. Finally, the concept of
latent conditions was introduced into the model. Latent conditions are those that are present in
the system prior to the accident sequence, they may not trigger accidents but may become
apparent in the course of performance deviation. They have been described as “resident
pathogens” that may lie within the system for years until they combine with other triggering
factors to create an accident opportunity. “Latent conditions have two kinds of adverse effect:
they can translate into error provoking conditions within the local workplace (for example, time
pressure, understaffing, inadequate equipment, fatigue, and inexperience) and they can create
long lasting holes or weaknesses in the defenses (untrustworthy alarms and indicators,
unworkable procedures, design and construction deficiencies, etc.)” (Reason 1990).
Hollangel (2004) classifies James Reason’s “Swiss Cheese Model” (see Figure 2.3) as an
epidemiological model. Reason characterizes defenses, barriers, and safeguards as layers of
Swiss cheese. He describes his model (2000) as each “slice” being a barrier that could be
engineered (e.g. alarms or physical barriers), rely on humans (e.g. pilot, surgeon), or depend on
procedures and administrative controls. For a total defense each layer would be intact and
prevent the occurrence of an accident. However, holes arise in the barriers (“slices”) that allows
for the possibility of an unwanted outcome. “The presence of holes in any one “slice” does not
normally cause a bad outcome. Usually, this can happen only when the holes in many layers
momentarily line up to permit a trajectory of accident opportunity—bringing hazards into
damaging contact with victims.” The holes arise because of active failures and latent conditions
(as discussed previously). Active failures are committed by those in contact with the systems
and take the form of “…slips, lapses, fumbles, mistakes, and procedural violations.” The move

27

from latent failures to active failures can be thought of as moving from the executive (i.e.
“blunt”) end of the spectrum to the field (i.e. “sharp”) level or operations.
It is interesting to note that although Hollnagel considers the Swiss cheese model to be an
epidemiological model Reason considers it to be a systems model; he also considers the Domino
and other types of sequential models to be systems models (Reason 2008).

Figure 2.3: Reasons Swiss cheese Model (Reason 2000)

2.3.1.1.3 Systems Models
Systems models differ from the structural decomposition of linear and epidemiological factors by
focusing on the “characteristic performance on the level of the system as a whole” (Hollnagel
2004). Accidents are viewed as emergent phenomena, which is in-line with Perrow’s view that
accidents are “normal” or to be expected. Systems models trace their roots to many different
28

disciplines, including complexity theory (as evidenced by the emergent nature of systems),
control theory, chaos theory, and systems theory, cognitive science and its branches, and
numerous other disciplines that may affect the system (Hollnagel 2004, Qureshi et al. 2008,
Levenson 2004). Systems are sometimes termed socio-technical systems. This term was coined
from studies at the Tavistock Institute in London and is concerned with the interaction of people
and technology with respect to work design (Trist and Bamforth 1951).
Because of the multitude of systems approaches and the corresponding disciplines that they
draw upon, systems approaches are presented here as they are relevant to the development of
Resilience Engineering. Below the systems models of Normal Accident Theory (NAT) as
devised by Perrow, and High Reliability Organizations (HROs) as developed by researchers at
Berkeley and refined by researchers at the University of Michigan are examined as they relate to
RE. The Functional Resonance Analysis Method (FRAM), which could be classified as a
systems approach, as developed by Hollangel is discussed in the section on the background on
RE. All three are important to the understanding of Resilience Engineering.
2.3.1.1.3.1 Normal Accident Theory

In 1984 sociologist Charles Perrow published a book titled “Normal Accidents: Living with
High-Risk Technologies” in which he examined several high-risk technologies along with the
corresponding industries and enterprises that house the technology. He examined systems,
which he viewed as “organizations, and the organization of organizations” and the technology
these organizations used. These included nuclear power plants, petrochemical plants, air and sea
travel, and genetic engineering, among others. These risk-laden undertakings had the common
denominator in that each could cause injury and death to an untold number of workers and
innocent bystanders as well as to future generations. In fact, the text was inspired by a major
29

accident in 1979 at the Three Mile Island Nuclear Generating Station (TMI) near Harrisburg,
Pennsylvania that released radioactivity due to a partial core meltdown. By analyzing this and
other catastrophes related to complex technologies and organizations, Perrow came to the
conclusion that further disasters were inevitable. This seemingly inevitable and repeating
collision course with disaster and catastrophe, was labeled as “normal” by Perrow (Perrow 1984)
– thus the “Normal Accident Theory” (NAT) was born.
The idea behind the NAT is elegantly simple. NAT focuses on the elements of design,
equipment, procedures, operators, and environment, abbreviated as DEPOSE.

Perrow

abstracted the common elements of several catastrophes that were based in high-technology
industries (Perrow 1984). In a nutshell his argument is as follows. “Something, such as a plane,
a factory, or a university for instance, that has a lot of interacting components (commonly parts,
procedures, and operators), and two or more of them fail in an unforeseen way to the designers
of the “thing””. Perrow (1984) calls this the: interactive complexity” of the systems. In
addition, the interaction of the two parts is not obvious while the accident is occurring and
perhaps not till years afterward, if ever. If the system has a lot of “slack” between the interacting
components, and time to react to the accident, and other resources the accident may not spread or
become dangerous and the system will not destabilize. In “tightly coupled” systems the
interactions are swift and have major impacts on one another.
Perrow defines an accident as involving some damage to people, objects, or to both. In
Normal Accident Theory (NAT) the degree of disturbance is crucial to how we define an
accident. In Perrow’s thinking, there are degrees of disturbances to a system and can help to
define what we really mean when one states that an accident occurred. In some respects, Perrow
posits, a system is what we make it and the system definition and boundaries are defined by

30

one’s own self-interest and task. To be consistent, Perrow proposes a scheme that can be used
across different system boundaries and in different industries. In this scheme the system is
divided into four levels consisting of parts, unit or a collection of parts, a subsystem or an array
of units, and finally, a collection of subsystems which together is termed the project. It is then
ranked according to the level on which the disruption occurs.
Perrow’s (1984) take on humans in the system is worth noting. He considers humans as
“mere parts” while admitting that this characterization sounds “heartless.” However, he notes
that the focus of his work in high-risk industries is on the systems level and is to ultimately
protect humans, “it is the character of the systems that cause that damage.” He is concerned
with stopping the catastrophes that have the potential to kill hundreds or thousands, not the
individual.
Perrow divides interactions into two types, linear and complex. Linear interactions are the
most common form of interactiveness in our daily lives and in business, are simple, and
comprehensible to people involved in the system. They are formally defined as the “interactions
of one component in the DEPOSE system …with one or more components that precede or
follow it immediately in the sequence of production.”

Complex interactions “are those of

unfamiliar sequences, or unplanned and unexpected sequences, and either not visible or not
immediately comprehensible” (Perrow 1984). Perrow (1984) contends that complex systems can
be universities, research and development firms, and some government bureaucracies; not only
those kinds of high-risk undertakings such as nuclear plants. Perrow posits that complexity
exists because in most systems designers do not know how to make a production system linear,
and thus create “expected sequences.” He also adds that complexity is not intrinsically
undesirable as it is welcome in some bureaucracies.

31

Perrow also distinguishes between “loose” and “tight” coupling in systems. Coupling is a
word adapted from the engineering field that describes how two things are attached. “Tight
coupling” means that “there is no slack or buffer or give between two items” (Perrow 1984). In
the “loose coupling” condition slack or buffer exists between two items. Perrow (1984)
discusses four main characteristics of coupling in systems. First, tightly coupled systems have
more time-dependent processes than those that are loosely coupled. One reason for this is that
the production process may not allow for waiting. A second characteristic of coupling is that
tightly coupled systems are invariant. This means that there is only one way to make the product
that A must precede B (Perrow 1984). A third characteristic is that in tightly coupled systems
the “overall design of the process allows only one way to reach the production goal” (Perrow
1984). This is in addition to the invariance described above. Finally, the fourth characteristic is
that tightly coupled systems have little slack in terms of time, resources, and equipment. In many
instances there are no substitutes available for resources or equipment in a tightly coupled
operation. Tightly coupled systems respond quickly to perturbations and the results may be
disastrous (Perrow 1984). The reader is referred to Perrow (1984) for further analysis and
insight on systems coupling.
2.3.1.1.3.2 High Reliability Organizations (HROs)

Some researchers disputed the underlying assumptions of the NAT. There has been and there is
a healthy debate in the literature about the merits of the NAT as well as its shortcomings. The
reader is directed to Sagan (1993) for a full discussion of the NAT versus HROs in the nuclear
industry. The basic argument of opponents to the NAT state that there are some industries that
are tightly coupled and interactively complex but have fantastic safety records. One important
branch of this opposing viewpoint is termed “High Reliability Organizations” (HROs).
32

As described by Dr. Karlene Roberts (1993), in 1984 a team of interdisciplinary researchers at
the University of California, Berkley began studying the Federal Aviation’s Administration’s Air
Traffic Control System, the U.S. Navy’s nuclear powered aircraft carriers, and Pacific Gas and
Electric Company’s nuclear power plant at Diablo Canyon. Each of these three organizations
operates complex and potentially hazardous technologies that have the potential to unleash
catastrophe in the event of operational error(s). These organizations and other complex and
tightly coupled ones like them, “somehow seem to avoid the unavoidable” (Boin and Schulman
2008).

These three organizations were chosen because of their outstanding safety records

in the face of daily danger (i.e., they are reliable). In addition to traditional research methods
such as collecting archival data, the team spent considerable time with the organizations in the
field and in workshops to understand their inner workings. Unlike traditional academic pursuits,
the team strove to keep preconceived notions about the organizations to a minimum as a research
strategy for theory building. In doing so they felt that this uniformed approach helped to
strengthen the trust between the research team and the operators given that the systems are
difficult to understand by outsiders and that the different operators did not necessarily have a
“big picture” of their own organization.
In time these and similar organizations became known as High Reliability Organizations
(HRO’s) based on the research and theories developed by the team. HRO’s can be defined as
“organizations which have fewer than normal accidents. Boin and Schulman (2008) define
HRO’s as “those organizations that had successfully avoided such failure while providing
operational capabilities under a full range of environmental conditions.” Boin and Schulman
(2008) elegantly summarize the essence of what sets HRO’s apart from non-HRO organizations.
“What makes HROs special is that they do not treat reliability as a probabilistic property
that can be traded at the margins for other organizational values such as efficiency or
33

market competitiveness. An HRO has identified a specific set of events that must be
deterministically precluded; they must simply never happen. They must be prevented not
by technological design alone, but by organizational strategy and management.”
The underlying theories of HRO’s are termed High Reliability Theory (HRT). From the initial
three studies and others over time the Berkley group found two main themes. First, they found
that HROs react quickly to any threat to safety. Boin and Schulman (2008) report that HROs are
extremely sensitive to safety threats and “immediately “reorders” and reorganizes to deal with
that threat.” and that “Safety is the chief value against which all decisions, practices, incentives,
and ideas are assessed — and remains so under all circumstances.” Across the board HROs
acted in similar ways to value safety above everything else. In general HROs had the following
features (Boin and Schulman 2008):
•

High technical competence throughout the organization

•

A constant, widespread search for improvement across many dimensions of
reliability

•

A careful analysis of core events that must be precluded from happening

•

An analyzed set of “precursor” conditions that would lead to a precluded event, as
well as a clear demarcation between these and conditions that lie outside prior
analysis

•

An elaborate and evolving set of procedures and practices, closely linked to
ongoing analysis, which are directed toward avoiding precursor conditions

•

A formal structure of roles, responsibilities, and reporting relationships that can
be transformed under conditions of emergency or stress into a decentralized,
team-based approach to problem solving

34

•

A “culture of reliability” that distributes and instills the values of care and
caution, respect for procedures, attentiveness, and individual responsibility for the
promotion of safety among members throughout the organization.

Roberts (1993) identified four factors that contribute to risk mitigation in HRO’s. They are:

1. Command by exception or negation: this refers to upper management
“pushing” authority to closely monitored subordinates. Decision making
is done (“migrates”) by the person(s) with the most expertise and can be
migrate in any direction (i.e., up, down, or laterally).
2. Redundancy in people and technology
3. Procedures and rules as a means to prevent errors
4. The ability of management to “see the big picture” to capture the various
migrating decisions and integrate them within the organizations.

Researchers at the University of Michigan, primarily Karl Weick and Kathleen
Sutcliff, built on Weick’s notions of “sensemaking” in organizations to further clarify
and define HRO’s and the concept of “mindfulness.” In organizations (Roberts 2003)
“sensemaking” seeks to answer the questions “How does something come to be an event
for organizational members?” and “What does an event mean?” It is defined as “…the
ongoing, retrospective development of plausible images that rationalize what people are
doing” (Weick et al.2005). “The basic idea of sensemaking is that reality is an ongoing
accomplishment that emerges from efforts to create order and make retrospective sense of
what occurs…Sensemaking emphasizes that people try to make things rationally
accountable to themselves and others” (Weick 1996). Roberts (1993) clarifies and
35

defines sensemaking as “…the importance of various people in the organization correctly
perceiving the events before them and artfully tying them together to produce a “big
picture” that includes processes through which error is avoided. A representation of the
knowledge available might be a Venn diagram or a hologram in which no one has the
whole story but different individuals have important parts of the story that then are tied
together to represent the whole.”
Weick and his research team built on the work of the Berkely HRT researchers, their
previous work on sensemaking, the concept of “collective mindfullenss,” and extensive
field studies to develop a popular model of HRO’s.

Weick et al. (1999) hold that the

distinctive nature of HRO’s lies in how “…diverse but stable cognitive processes
interrelate in the service of the discovery and correction of errors.” Traditional
organizational theory and accident prevention focusses on decision-making while HRO’s
are “…more about inquiry and interpretation grounded in capabilities for action” along
with “…a persistent mindset that admits the possibility that any “familiar” event is
known imperfectly and is capable of novelty. This ongoing wariness is expressed in
active, continuous revisiting and revision of assumptions, rather than in hesitant action
(Weick et al. 1999). Five different cognitive processes are isolated that HROs focus on
to achieve the state of “collective mindfulness.” These are preoccupation with failure,
reluctance to simplify interpretations, sensitivity to operations, commitment to resilience,
and underspecification of structures. The first three items deal with anticipation of
potential dangers while the latter two are concerned with containment and mitigation of
hazards after an incident. As illustrated in Figure 2.4, the coalescence of the five
processes results in a collective organizational mindfulness that recognizes and manages

36

unexpected events and “produces” a reliable organization. The reader is directed to
Weick et al. (1999) for details of the elements described in Figure 2.4.

Figure 2.4: HRO Model that incorporates Collective Mindfullness Process. (adapted from Weick
et al.1999)
The preceding sections briefly described popular models and approaches that serve as
points of convergence and divergence for the RE paradigm. However, they do not entirely show
the bright path that lead researchers to believe that there is a superior way, namely via the RE
paradigm, to represent today’s complex, non-linear, and sociotechnical organizations and
corresponding accidents. Bory’s et al (2009) description of “The Five Ages of Safety”
illuminates that path and serves as further background of RE.
2.4 The Five Ages of Safety

Hale and Hovden (1998) classified the three ages of industrial safety. Bory’s et al (2009)
build on work by Hale and Hovden (1998) to classify industrial safety into five “ages” as
numbered below. The first three are attributed to Hale and Hovden, the remaining two are
established by Borys et al (2009). This chronology leads directly to the need for the RE
37

approach and allows for the background discussion of the topics of culture and climate, and
human factors and human error that are utilized in RE. The Five Ages are:
1. The Technology Age: lasting from the nineteenth century until after the World War II
2. The Human Factors Age (starting around 1979)
3. The Safety Management Age (starting around the late 1980’s)
4. The Integration Age
5. The Adaptive Age
2.4.1 The First Age of Safety - The Technology Age
The first age of safety that Hale and Hovden (1998) identify coincides roughly with the
occurrence of the industrial revolution, from the nineteenth century until just after WW II. In
this era the emphasis was concerned with “…the technical measures to guard machinery, stop
explosions and prevent structures collapsing” and that the attitude of factory inspectors at that
time (late 1900’s) was that other causes of accidents could not be “reasonably prevented”,
meaning that things like personal behavior and management influence with regard to culture and
individual shortcomings (e.g. accident proneness) were beyond the influence of safety inspectors.
As the industrial revolution progressed, the worker moved from a position such as craftsmen
(or blacksmith, or other guild-type occupation) where they controlled their own pace of work and
were responsible for their own safety; to a management controlled (usually the shop foreman)
manufacturing scenario where the emphasis was on little else than to produce low unit-cost items
that justified the high fixed-cost capital equipment of the day. Aldrich (1997) quotes a 1910
New York State Compensation Commission report that stated “Previous to the introduction of
machinery into modern industry industrial accidents were relatively few and unimportant.”
Aldrich notes that while in the early twentieth century industrial technology reduced the overall
38

number of workers needed in a particular trade, and thus reduced the exposure of workers to
unsafe conditions, safety eroded within industries because of the increased pace and
unfamiliarity brought on by the technology as well as increasingly complex organizational
structures formed by the giant corporations that did not know how to exercise sufficient control
to ensure worker safety. The emphasis, Aldrich notes, was almost exclusively on production.
Aldrich (1997) reports that production responsibilities of the nineteenth-century were strictly
placed on the shop foreman. Correspondingly, safety, or the lack of safety was also left to the
discretion of the foreman. This resulted in an uncertain and dangerous situation for workers and
led to “…high labor turnover, poor morale, and …occasional…mass strikes.” As the twentieth
century dawned and companies became economically impacted by the public policy pressures of
regulation, poor public image, and higher costs due to accidents, companies turned more to
systematic management techniques and the emergence of safety departments.
Aldrich (1997) alludes to the safety culture and climate of the late nineteenth and early
twentieth century as harsh and mainly relying on the whims of the shop foreman. Furthermore,
he states that “Dangerous practices were part if the craftsman’s code; they were traditional, and
they reflected his manliness.” He further notes that some safety devices were not used simply
because they were new. Employees were basically on their own with regard to what work
procedures to follow and what personal protective equipment to use. However, as manufacturing
schemes became more sophisticated and complex, and activities became tightly coupled it was
difficult for employees to coordinate with one another with respect to safety.
Aldrich (1997) posits that in the First Age safety initiatives came from management and
not from labor. A prime example of this top-down realization that safety was the responsibility
of management was the “Safety First” movement started at United States Steel in 1906 (Aldrich

39

1997). This was an active attempt to get the working man involved in safety and demarcated the
beginning of formal safety programs in industry.
By the early 1920s the Safety First movement was gaining ground in many industries but
lagged in others due to a lack of top management “buy-in” (Aldrich 1997). To garner executive
support safety proponents of the time sought to establish a ”… correlation between safety and
production.” President Herbert Hoover established the “Hoover Commission” on waste in
industry that studied this link and was published under the title “Waste in Industry” in 1921. The
book discussed construction waste in a chapter titled “The Building Industry.” The study found
the chief sources of waste in the building industry to be irregular employment, inefficient
management, and wasteful labor regulations. Accidents are listed as a secondary cause of waste
and are estimated to “…involve losses up to 10% of the labor cost in addition to the human loss
of lives and energy.” with the average loss at about 2.25% of labor costs” Accident costs per year
totaled around an estimated economic loss of $120,000,000 (in 1920s dollars) taking into
account work stoppages, labor replacement, and extended loss of the crew. The report attributes
the cause of accidents to be mainly that of carelessness of the workman and lack of ordinary
safeguards. The report estimated that overall 12,000,000 labor days could be saved by
implementing safety measures. However, the savings in dollars and labor days was based on
expert opinion and not scientific studies. Nonetheless, it was a longstanding belief of safety
engineers in the early twentieth century that production and safety were inextricably linked and
that injuries were symptoms of inefficiency (Aldrich 1997).
2.4.2 The Second Age of Safety - The Human Factors Age

Research that defined the second age of safety was conducted around the period between WW I
and WW II. Studies revolved around the human component of work, guided by the theory of
40

accident proneness and accident prevention by better “…personnel selection, training and
motivation…” (Hale and Hovden 1998).

Hale and Hovden (1998) note that the technical and

human-based studies operated on separate paths until the 1960’s and 1970’s when advances in
probabilistic risk assessment and ergonomics came into vogue and the two paths, namely, the
technical and human factors, merged to create a deeper understanding of the accident
phenomena.
The International Ergonomics Association (IEA) defines human factors (n.b. the terms
“human factors” and “ergonomics” are used interchangeably although “human factors” appears
to be the preferred term in the U.S.) as “…the scientific discipline concerned with the
understanding of the interactions among humans and other elements of a system, and the
profession that applies theoretical principles, data and methods to design in order to optimize
human wellbeing and overall system performance” (IEA 2011). Human Factors is a systemsoriented approach and can be applied to many areas of human activity. Human Factors mainly
consists of three distinct areas of inquiry, physical, cognitive, and organizational. “Physical
ergonomics is concerned with human anatomical, anthropometric, physiological and
biomechanical characteristics as they relate to physical activity.” “Cognitive ergonomics is
concerned with mental processes, such as perception, memory, reasoning, and motor response, as
they affect interactions among humans and other elements of a system.” “Organizational
ergonomics is concerned with the optimization of sociotechnical systems, including their
organizational structures, policies, and processes.” (IEA 2011).
Just as in the technical age, the focus of the human factors age concentrated on isolating an
element of the system and trying to correct or to eliminate its deficiency or hazard to the system.

41

Thus a major thrust of the human factors efforts focused on the elimination of human error on
both the system and individual levels.
2.4.2.1 Human Error

Aldrich (1997) stresses that safety vastly improved in the in the First Age. This was largely
due to the safeguarding of equipment and the realization by management that proactive safety
management was vital to the financial bottom-line and to avoid labor unrest as well as negative
public scrutiny. However, accidents continued to occur and researchers felt the models and
methods of the First Age did not adequately explain the continued problems with safety in a
rapidly changing technological industrial world. To explain and understand the problems of the
mid twentieth -century safety researchers and practitioners turned their attention to the human in
the system. On the surface it would seem that defining what is meant by the term “human error”
would be a simple task. Hollnagel (2007) gives a simple and commonly cited definition of
human error as follows: “…an incorrectly performed human action, particularly in cases where
the term is used to denote the cause of an unwanted outcome.” Hollnagel’s colleague Dr. David
Woods writes that “Human error is a very elusive concept” (Woods et al 2010). The initial
difficulty when discussing human error comes in defining the term across various academic
disciplines and among numerous industry practitioners, regulators, and investigators. Hollnagel
(2007) discusses some of these differences: From the human factors perspective “…the human
operator is viewed as a system component for which successes and failures can be described in
much the same way as for equipment”…”In behavioral science…the starting assumption is that
human behavior is essentially purposive and that it therefore can be fully understood only by
reference to subjective goals and intentions… in social science, the origins of failure are usually

42

ascribed to features of the prevailing socio-technical system so that management style and
organizational structure are seen as the mediating variables influencing error rates.”
James Reason (2000) approaches the notion of human error from two perspectives. The first
is from the “person” approach. Here the errors of individuals with regard to such things as moral
weakness and forgetfulness are emphasized. The second (and preferred) is a “systems” approach
that focuses on the “… conditions under which individuals work and tries to build defenses to
avert errors or mitigate their effects.” Woods et al. (2010) describes common outlooks of human
error as two mutually exclusive worlds “colliding.” The first world is populated by “…erratic
people who degrade an otherwise safe system,” and safety is created by protecting the system
from “unreliable” people. The other (preferred) world consists of “…people who create safety
at all levels of the socio-technical system by learning and adapting to information about how we
can all contribute to success and failure.” In other words, the latter world helps workers cope
with complexity to be safe.
Hollnagel (2007) stresses that it is important to distinguish between process and product when
defining and discussing human error. On the product side, it is relatively easy to determine if an
error was made. For instance, a finger was severed when ripping a piece of lumber. The product
here, the loss of a finger is easy to determine. However, the process leading up to that loss,
picking up the lumber, placing it on the rip fence, and guiding it through the saw, may have been
repeated hundreds of times without an accident. The point here is that identifying a deficient
process is not a “go – no go” kind of observation. Hollnagel posits that “…whether a process is
right or wrong is normally a matter of degree rather than of absolutes.”
There is not generally agreed upon definition of “human error” in the literature. Some
definitions focus on the product, that is, the unwanted event that occurred, and some focus on the

43

process, or both (Hollnagel 2007). Hollnagel notes that the term “human error” is generally used
in practice in three very different ways. In the first ““human error” denotes the cause of
something.” For example, Saurin, Formosa, and Cambria (2004) report that human error is
attributed as the cause of anywhere from around 30 to 96 percent of all construction accidents.
The second meaning focuses on the “…action or process itself, whereas the outcome or the
consequence is not considered.” For example, a crane operator might forget to check a load
chart for a lift. This may or may not result in an unintended outcome, depending on the item
lifted. Finally, the usage of “human error” sometimes used to denote the outcome of an action.”
Woods et al. (2010) defines (or more clearly approaches) human error in the following
statement: “the label “human error” is a judgment made in hindsight. After the outcome is clear,
any attribution of error is a social and psychological judgment process, not a narrow, purely
technical, or objective analysis.” This definition conveys a trend by many safety researchers
(Dekker 2005, Hollnagel 2004, Perrow 1984) to eliminate the notion of “human error”
altogether. They claim it is a false construct built upon false logic and of thinking of humans in
mechanistic terms. The manner in which humans reason when investigating an accident is cited
as one reason the idea of human error is incorrect. The Law of Causality, that every cause has an
effect, is twisted into its reverse when investigators reason from effect to cause. This also
assumes that a cause exists and can be found. In complex systems there may be multiple causes
or the cause may never be pinpointed. Furthermore, Dekker (2005) points out that after the fact
investigators act as if they are able to move backward in time (the “tunnel” as he calls it) and
differentiate all of the contextual elements and complex details that the person based their actions
on. In other words, the assumption is that the human acted in a rational manner, in the economic
sense of the word, and made the optimal decisions while the accident unfolded. In reality,

44

humans make the best possible decisions that may or may not be optimal, given the situation in
the course of “normal” work (Perrow 1984, Simon 1997). Additionally, due to limited resources
of time and money, accident investigators typically stop an investigation when the first cause is
found. This is known as the “stop rule.” Because humans are found at all levels of an accident
it is not difficult to find a human error to cast the blame upon and stop the investigation there.
Finally, some authors point out that blaming the human in the system, and only the human, takes
the pressure off of companies to look further into the deficiencies of the system. For instance,
there may be massive retooling necessary to provide a safe working environment. However, it is
easier and more cost effective to blame a human error than make corrections.
Another rejection of the term “human error” is that it is borne out of the technical age and is a
misnomer. In the technical realm a hazardous component can be identified by risk analysis
methods such as an event or fault-tree. Humans, the naysayer’s state, are not analogous to
machines. The view that a human is rational and machine-like was reinforced in early cognitive
work that compared the human decision-making capabilities to that of the computer. Here the
human was treated as an information processing system (IPS) with clearly defined mental
processes similar to that of a computer (Hollnagel and Woods 2005). Essentially, this view
treated the human in the system in the same manner as a mechanical component. In this view,
the human decision-making process and actions should match that of the machine or
environment in which they are acting. This thinking is along the lines of the rational decision
maker in economics, who has the mental capabilities and information resources to vet every
option to make an optimal choice or course of action. In reality, resources are limited and are
context-dependent (Hollnagel and Woods 2005). Humans make the best decisions given the

45

available resources (e.g. time and money) and every decision is context dependent. Simon
(1997) called this ‘satisficing.”
In general, much of the research in safety in the second age assumed that everything always
went right and if a disruption occurred it was due to the a malfunctioning or maladaptive
component. It did not matter if the component was mechanical or human, either one could be
replaced (Hollnagel and Wods 2005). While this worked well for the mechanical side, it
neglected that human variability, resourcefulness, and flexibility were key components that made
systems successful (Hollnagel and Woods 2005).
The Resilience Engineering community does not categorically rule out the possibility of
human error occurring. However, on an individual basis human error should only be designated
when the following three conditions are present “…a clearly specified performance standard or
criterion against which a deviant response can be measured,” an event or an action that results in
a measurable performance shortfall such that the expected level of performance is not met by the
acting agent”, and finally, “there must be a degree of volition such that the person had the
opportunity to act in a way that would not be considered erroneous” (Hollnagel 2007).
The Resilience Engineering community would like to see the term “human error” disappear
unless the preceding three conditions are met. It is a judgment made in hindsight as Woods
points out. Furthermore, it hinders communications given that there is no common
understanding or definition of the term, it is a problem for measurements and statistics because
we do not know exactly what we are counting, and finally, it is a hindrance for learning more
about the accident given that the search for a cause is abandoned once the erring human is
identified (Hollnagel 2007).

46

2.4.3 Third Age of Safety – Management Systems and Culture

Even though the second age of safety research brought about a clearer understanding of safety
and the interaction among technology, individuals, and the organizational system, there was still
dissatisfaction among safety researchers with the methods developed to assess and investigate
accidents. Hale and Hovden (1998) point to the spectacular disasters of the mid 1980s such as
the explosion of the Challenger space shuttle, the meltdown of the Chernobyl nuclear reactor,
and the chemical release in Bhopal, India as heralding in the age of management as a focus of
safety research. While these types of accidents had occurred previously the focus had been on
technological and human factors and not on structure, i.e., management factors (Hale and
Hovden 1998). The predominant view of the organizations in which these accidents occurred in
(i.e., NASA and large multi-national and government run agencies) was that these “…welldeveloped, often highly-bureaucratic, safety systems …. had been thought to be, until then, to be
safe-proofed against such major disasters” (Hale and Hovden 1998).
That management had influence in the safety process was not a new idea in the 1980s,
Heinrich et al. (1980), and later Bird (Heinrich 1980) among others recognized that management
played an important role in the worker and organizational safety, but their theories had little or
no scientific basis and were characterized by Hale and Hovden as little more than
“…accumulated common sense and as general management principles applied to the specific
field of safety.”
The idea that management could influence safety led to the idea that a certain climate and
culture could be developed and fostered in the organization and among the organizational
associates. The two notions are discussed below.
47

2.4.3.1 Culture and Climate

In the preface of his 2010 book “Organizational Culture and Leadership,” Edgar Schein, a
leading researcher in the study of organizational culture, discusses his frustration with how
complicated the research of culture has become and how he sometimes feels overwhelmed by the
amount of research and consulting in the burgeoning field. Guldenmund (2000) observes that
“Organisational culture and climate are complex concepts” and cites several authors regarding
the elusiveness of finding consensus definitions and categories for the concepts of culture and
climate. This occurs despite the fact that the concept of culture is over 100 years old (Schein
1999).
Schein (2010) defines the culture of a group “…as a pattern of shared basic assumptions
learned by a group as it solved its problems of external adaptation and internal integration, which
has worked well enough to be considered valid and, therefore, to be taught to new members as
the correct way to perceive, think, and feel in relation to those problems.” Here culture is a
viewed as a product of social learning.
Notwithstanding the difficulties described above, the phenomenon of culture is pervasive,
everyone is involved in several cultures and sub-cultures throughout their lifetime and it affects
how business is conducted. Schein (1999) states that “Culture matters because it is a powerful,
latent, and often unconscious set of forces that determine both our individual and collective
behavior, ways of perceiving, thought patterns, and values. Organizational culture in particular
matters because cultural elements determine strategy, goals, and modes of operating. The values
and thought patterns of leaders and senior managers are partially determined by their cultural
backgrounds and their shared experience.” Schein (2010) identifies four cultures. The
macroculture consists of nations, ethnic and religious groups, and occupations that exist globally.
48

Organizational cultures include private, public, nonprofit, and government organizations (n.b.
Corporation cultures are a subset of organizational cultures). Subcultures are occupational
groups within organizations. Finally, microcultures are microsystems within or outside
organizations, such as “small coherent units within organizations, such as surgical teams or task
forces that cut across occupational groups…and are different from occupational subcultures.”
Being aware of the spectrum of cultures is essential to understanding a particular culture because
they are all interconnected.
Schein (1999, 2010)) warns not to oversimplify the concept of culture by simplifying its
definition to such trite sayings such as it is “the way we do things around here,” or as “the
company climate.” Although somewhat valid, such sayings merely reflect manifestations of the
culture. To understand culture we must realize that it exists at several levels of the organization
and that only by digging deeper will one begin to understand the prevailing cultural outlook.
These levels consist of the things we can easily see (e.g., artifacts such as architecture and
interpersonal relationships), the espoused values (e.g., literature describing how “safety comes
first”), and finally level three, the basic underlying assumptions that drive the culture. Here,
Schein posits, lies the “ultimate source of values and actions” of the firm which are embedded in
the “unconscious, taken-for-granted beliefs, perceptions, thoughts, and feelings” of the
employees. To understand the artifacts and espoused values Schein feels that one must first
understand the intricacies of level three and only then will the artifacts and espoused values
make sense. Corporate culture is stable and resistant to change because it provides meaning and
predictability to daily life. However, he believes that cultures can be transformed if needed.
The concept of “climate” in relation to “culture” is also mentioned in the safety literature
quite frequently and, just as exposed above, “climate” is as difficult as “culture” to define.

49

Kuenzi and Schminke (2009), adopt Schneider and Reicher’s (1983) definition of organizational
work climates as “a set of shared perceptions regarding the policies, practices, and procedures
that an organization rewards, supports, and expects.” They further posit that organizational
climate is a property of the unit but lies in individual perceptions. Kuenzi and Schminke (2009)
also discuss the similarities and differences between culture and climate. Culture and climate
both explore how individuals make sense of their environments and both involve a shared
experience.
2.4.3.1 Safety Culture

Guldenmund (2007) reports that the term “safety culture” came into use around 1986 and that
there are multiple meanings and no universal definition of the phrase “safety culture,” as would
be expected from the general discussion of the culture and climate above. Furthermore, she
reports that there is no consensus on the dimensions that make up a safety culture, and that they
“commitment by management and workforce, leadership style and communication, individual
responsibility, management responsibility, risk awareness and risk-taking” (citing McConnell
2004). Chenhall (2010) identifies some of the components of “safety culture” from various
authors as “safety system” (Choudhry et al 2007) “safety climate” (Choudhry et al. 2007) “safety
management system” (Diaz- Cabrera et al. 2007) and “socio-technical system” (Grote & Künzler
2000).
Chenhall (2010) reports that safety culture indicators are classified as either formal or
informal and, citing Rao (2007) “The formal norms in a safety culture are characterized as
written organizational safety policies and procedures, such as OSHA regulations, whereas the
informal norms are not documented.” From this Chenhall concludes that even if a culture has
elements of a formal “safety culture” in place and it is lacking the informal portion then safety is
50

“…not likely part of the culture.” She also promotes Schein’s view that one must look beyond
artifacts and espoused values to find the underlying culture. Manuele (2008), when discussing
ANSI – Z10 also holds this view.
Schein sums up his current thinking on the differences between culture and climate in a
preface to a compilation of cultural studies thusly, “my advice to readers is to view both climate
and culture as abstractions that lead them to taking a useful perspective toward human behavior
in complex systems. It is the perspective that is important, not a particular research result or a
broad generalization about how important climate or culture is to some practical phenomenon.”
2.4.4 Fourth Age of Safety – The Integration Age

In a brief portion of their text “Human Safety and Risk Management” Glendon et al. (2006)
build on Hale and Hovden’s (1999) work by suggesting a fourth age of safety. They ponder the
then current (2006) state of safety and risk management and conclude that it might be called the
“integration age.” They speculated that this age takes on “…some characteristics of
HROs...”

They base this view on the previous three ages by making an analogy to MacLean’s

“triune brain theory” that suggests that the human brain is actually three brains in one and that
each part developed according to evolutionary needs and was linked to and retained parts of the
previous growth. In a similar way, views on safety progressed according to the needs of the
worker and industry. They state “Characteristics of the successive ages of safety may well not
supplant earlier ways of thinking and acting (i.e., the cultures) of previous eras; rather they are
more likely to build on previous structures, so that the contemporary collage of safety
philosophies and practices remains rooted in the technical era, but has suffused this with layers
of human factors applications and management systems.”

51

Although they are short on examples of the integrationist approach, Glendon et al (2006) cite
Havold (2005), and envision this period (Age) as taking a “safety orientation” outlook that
unites safety climate and safety culture and includes the factors of safety rules, management
commitment to safety, safety behavior, communication, work situation, job satisfaction,
competence, management priorities and organizational risk, satisfaction with safety activities,
reporting culture and supportive environment, and fatalism.
2.4.5 Fifth Age of Safety – The Adaptive Age

Borys et al. (2009) introduce the possibility that a fifth age has emerged in safety, the
‘adaptive age’ that essentially presents the case for the existence of Resilience Engineering.
They claim that the adaptive age “…transcends all other ages without discounting them, whilst
introducing the concept of ‘adaptation’, the adaptive age goes beyond simply integrating the
past.” The adaptive age is meant as a means to “…take us beyond the contemporary ways of
thinking about managing OHS that typically focus on OHS management systems (OHSMS),
safety culture and safety rules.” Like the other shifts in safety thinking this outlook stems from
the limitations of existing approaches to understand and assess existing systems. Borys et al.
(2009) cites that Robson et al.’s (2005) research of OHSMSs that found that there is insufficient
evidence in the peer-reviewed literature to suggest that they are either effective or ineffective.
Borys et al. also cite several researchers who feel that OHSMSs are complex paperwork burdens
that do not reflect conditions at the workface and are primarily rule-focused. In the verbiage of
Borys et al., OHSMSs (and other safety approaches) are not to be discounted but transcended by
an adaptive culture (more specifically the existence of social construction sub-cultures as
discussed in The Age of Safety Management section) along with the concepts of collective

52

mindfulness (as presented in the discussion of HRO’s) and a new perspective regarding safety as
embodied in the RE approach, which is discussed in the following section.
2.5 Current Understanding of Resilience Engineering

The notion of RE has evolved in other domains (e.g. aviation, nuclear industries) as a way to
overcome the limitations of existing accident analysis and risk assessment models that are used
to manage safety. It is a proposal to cease from relying primarily on hindsight to explain the
cause of accidents and to explore the sources of resilience that prevent and mitigate accidents.
Many accident causation models are sequential, such as Heinrich’s Domino Model and
Reason’s Swiss Cheese Model (Hollnagel, 2004). The physical structure of devices such as
fault-trees and event trees (i.e., the structural view) promote the idea that accident causation is
linear. Additionally, regulatory standards are often just bolstered incrementally to cover the
latest crack in the regulation as exposed by the latest accident. Safety is often managed by error
tabulation and probabilities, for example, setting a goal to reduce falls by 33%.
In contrast, RE is more focused on unearthing the positive quality of resilience rather than on
managing by error counts such as the number of fatalities and injuries. RE cites research that
explains that “Untoward events more often are due to an unfortunate combination of a number of
conditions, than to the failure of a single function or component” (Woods and Hollnagel 2006).
This outlook promotes the non-linear functional view, as opposed to linear structural views, that
looks to the interdependencies among system components. In this outlook control is considered
both on the activity level and on the system level, as previously described in the Functional
Resonance Analysis Method. A more subtle point of the functional versus structural view is that
many accidents rarely re-occur in the same fashion but are a confluence of seemingly unrelated
events, each necessary but only jointly sufficient to create an accident. This view is in line with
53

the notion that work is complex and that accidents and performance variability concerns cannot
be confined to a single component but emerge from a confluence of demand induced pressures
(Hollnagel 2004).
From this perspective, developing foresight, or trying to imagine what might go wrong and
developing strategies to defeat failure, is a more valuable skill than relying exclusively on
hindsight when extrapolating accident investigations to accident assessment models. To avoid
failure we must anticipate key aspects of the future to imagine what might go wrong. This
notion, as conceived by Westrum, is embodied in the idea of “requisite imagination.” This
prescient activity “…is a means for the designer to explore what can affect design outcomes in
future contexts” (Adamski and Westrum, 2003). The use of this method by designers and front
line workers can foretell routes to disaster, as well as to success. In light of the above discussion,
safety management must then be reactive and proactive (Hollnagel, 2008), reactive to respond to
those threats that have materialized or are imminent and proactive to bolster gaps in the
organization’s safety or to compensate for gaps in the design.
Resilience engineering recognizes that despite even outstanding planning efforts, performance
conditions are always underspecified (Hollnagel 2008). The front-line worker must always make
adjustments in the course of operations given the context of underspecification of operational
conditions, changing environmental conditions and the intensity of demands. In other words,
there will always be performance variability due to the need to respond to demands imposed on
the system. Resilience engineering aims to dampen the variability that may contribute to adverse
events and to amplify the variability that leads to positive outcomes (Hollnagel 2008).
Hollnagel’s (2004) Efficiency-Thoroughness Trade-Off (ETTO) Principle captures an aspect
of this notion. In attempts to optimize performance goals people work to be as thorough as they

54

can be (i.e., follow the rules) given the prevailing environmental conditions and circumstances.
However, there is also pressure to be efficient. People and organizations that are not efficient
may become, respectively, unemployed and unprofitable (or bankrupt). Also, those who are not
sufficiently thorough possibly endanger safety and may also cease to exist economically. One
reason thoroughness is sometimes shunted is that in the quest to optimize work processes people
skip seemingly unnecessary steps in work tasks. In construction this is manifest in the phrase
“We have always done it this way” when discussing field operations with, for example, a
subcontractor, when in fact, field conditions may necessitate that operations be revised from
what has always been done. The shortcut, (e.g,. always “doing it this way”) is the norm in work
rather than the exception given that work environments are relatively stable places and that
accidents are rare events (Hollnagel, 2004). People may take certain aspects of their work for
granted and skip seemingly inefficient steps. This can sometimes lead to an accident. An
example might be to neglect to “tie-off” a ladder to save time. Excuses can range from “We
never have ladders slip” to “I was only going on the roof for a minute.” The event of an accident
and its associated costs can wipe out efficiency gains from shortcuts.
Closely allied with the ETTO Principle is the outlook that failure is the temporary inability to
effectively cope with complexity under demanding conditions. A situation that RE addresses,
and is closely allied to complexity, is the all too common production and efficiency tensions
inherent in industrial work. Research has shown that workers implicitly choose production over
safety concerns when a trade-off is available and therefore act in a riskier manner than they
normally would. An example of RE in construction would involve knowing when to relax
production pressures by, for example, reducing overtime hours, adding additional crews,

55

subcontracting extra work in critical periods, or simply knowing when to slow down production
so that safety is not endangered.
This work explores the RE constructs to better understand how disruptions affect construction
projects. RE is a new perspective on safety for complex socio-technical systems. Traditionally,
improvements in safety have been based on hindsight – and asked “what went wrong” in
accident analysis and “what could go wrong” in risk assessment. For instance, after a major
accident involving loss of lives, an incremental change (a tweak to the regulations or building
codes) or a barrier is implemented (fall protection if working over 6’) after statistics show a trend
of injuries or fatalities. This behavior can be thought of as hindsight thinking and is commonly
associated with traditional views of safety. Hindsight thinking colors how we think about failure
and safety. In the traditional view, failure is characterized as arising from a breakdown or
malfunctioning of normal systems. Safety is defined as “freedom from unacceptable risk.” Both
of these approaches require the analyst or investigator to think about how accidents happened
and what went or could “go wrong.” In general – it is a reactive approach and the “negative”
approach to understanding accidents. This approach has been successful in saving lives and
preventing injury. However, we seem to have reached a plateau in the effectiveness of this way
of doing business in construction safety as discussed in Chapter One.
RE embraces the traditional approach but posits that it is only part of the picture – to get a
better understanding of safety we need to ask “what can go right” for risk assessment and “what
went right” for accident analysis. A proactive and “positive” approach is needed to gain a
complete understanding of safety. RE proposes that we observe work as it is normally
performed on a day-to-day basis and look at how humans in the system “make ends meet” in the
face of under-specification of operations, constantly changing conditions, and unrelenting

56

demands and stressors placed on the system. We should observe what makes systems resilient,
how to engineer resilience, and how to maintain and manage the resilience of a system.
Resilience is a quality of the system. It can’t be counted – it is something that the systems
does (“acts in a resilient manner”) rather than something a system has (it would be wrong to say
a system has “10 units of resilience”). Therefore, managing resilience is a kind of process
control.
In the RE view, failures arise from adjustments made by people to cope with underspecification of a system or a process, and safety is defined as “the ability to succeed under
varying conditions.” Defining safety this way includes the reactive and proactive approaches.
The term “performance variability” is used to describe the ways in which individual and
collective performances are adjusted to match current demands and resources, in order to ensure
that things go right. Thus a key feature of a resilient system is its ability to adjust its
performance. Adjustments can, in principle, be reactive, concurrent, and proactive and are
described below (Hollnagel 2009):


Reactive adjustments are the most common and happen in the aftermath of
an event (i.e. “lessons learned from a major change or disruption). This is
an incomplete approach given that the adjustments made may not be
suitable for the unique and uncertain events of the future.



Concurrent adjustments are basically fast reactive adjustments that take
place while the situation is developing.



Proactive adjustments means that the system can change from a state of
normal operation to a state of heightened readiness, and possibly also act,
before something happens.

57

2.5.1 Background and Definitions of Resilience Engineering

Although the term resilience is well known in various academic domains such as ecology and
engineering, the term “Resilience Engineering” is relatively new as applied to safety, beginning
to appear in the literature around the end of the last century (Woods et al. 2007). However, the
elements that characterize resilience engineering have been brewing for many decades.
Resilience engineering borrows from many different areas of organizational and safety research
such as High Reliability Organizations (HROs), Normal Accident Theory (NAT), and other
systems approaches. Researchers in the field are mainly involved in the areas of healthcare,
nuclear power, aviation, and aerospace, and among others that are characterized by complex,
high-risk and high-visibility industries.
Resilience Engineering draws inspiration from work by industrial psychologists, sociologists,
anthropologists, and other safety theorists such as James Reason, Jens Rasmussen, Scott Sagan,
Donald Norman, and Charles Perrow. It utilizes and respects the effective methods, models and
techniques developed over the years by various industries and academic disciplines with the
caveat that they “…must be looked at anew and therefore possibly used in a way that may differ
from what has traditionally been the case” (Hollnagel, 2008).
Resilience engineering is strongly influenced by the discipline of Cognitive Systems
Engineering (CSE). CSE is a systems approach which was formulated in the early 1980’s to
study complex system failures such as the Three Mile Island nuclear power generating facility
release of radiation and as a way to overcome the limitations of previous safety models
(Hollnagel and Woods 2005).

58

A cognitive system is defined as one that “…can modify its behavior on the basis of
experience so as to achieve specific anti-entropic ends…they [cognitive systems] are able to
maintain order in the face of disruptive influences…specifically…to control what it does”
(Hollnagel and Woods 2005). CSE focuses on analyzing Joint Cognitive Systems (JCS). JCS’s
are a human-machine coagency where humans and machines are described as “equal partners,”
humans are not described as if they are machines nor are machines given human attributes, the
analysis is on what the JCS does, not what it is, and how performance is controlled (Hollnagel
and Woods 2005). Machines are expressed as artifacts in CSE. An artifact is something devised
for a specific purpose, for instance a screwdriver is an artifact, so is a corporation.
On a construction project one can envision many JCS’s as well as many JCS’s nested within
others. The largest may be the JCS of the stakeholders and the organization. With regard to
resilience the emphasis is in how the JCS copes with surprise (unexpected events) and error
(Woods and Hollnagel 2006). Finally, the CSE outlook considers all work as cognitive;
everything we do requires our brain (Hollnagel and Woods 2005). This is especially true in the
construction industry.
A resilient system is able to maintain control when faced with disruptions in the form of
unexpected events. A system is said to be in control if it is able to mitigate or eliminate
unwanted internal or external variability with respect to the demands placed on the system,
especially pressing time concerns such as schedule acceleration or increased tempo of projects
(Hollnagel et al. 2006).
The term “Resilience Engineering” was formally applied in a conference of safety scientists
and researcher in 2004. Over the last several years the term has been differently understood by
researchers and has been updated as the understanding and maturation of the scope focus of the

59

new discipline has emerged. A few of the definitions are provided below and are listed
chronologically to illustrate the evolution and different understanding of this discipline.
Some definitions include:
•

“The intrinsic ability of an organization (system) to maintain or regain a dynamically
stable state, which allows it to continue operations after a major mishap and/or in the
presence of a continuous stress" (Hollnagel et al 2006).

•

"How well a system can handle disruptions and variations that fall outside of the base
mechanisms/model for being adaptive as defined in that system" (Woods and
Hollangel 2006).

•

Westrum (Hollnagel et al. 2006) looks at resilience from three different vantage
points:
o Resilience is the ability to prevent something bad from happening,
o Or the ability to prevent something bad from becoming worse,
o Or the ability to recover from something bad once it has happened.

•

"Its ability effectively to adjust its functioning prior to or following changes and
disturbances so that it can continue its functioning after a disruption or major mishap,
and in the presence of continuous stresses" (Hollnagel et al 2008)

•

“RE is the intrinsic ability of a system to adjust its functioning prior to, during, or
following changes and disturbances, so that it can sustain required operations under
both expected and unexpected conditions” (Hollnagel et al. 2011).

The last definition is used for this work. It reflects a key feature of Resilience Engineering,
which is for a resilient system to be able to adjust its performance reactively, concurrently, and
proactively. In the time continuum of an accident this outlook seems crucial.
60

The use of the term ‘engineering’ is not meant in the traditional engineering sense of the
word involving disciplines such as civil, mechanical, or electrical nor does it exclusively mean
engineering controls. In the resilience engineering perspective engineering controls are
necessary, but not sufficient to ensure safety. A better definition of engineering for resilience
purposes is “…to arrange, manage, or carry through by skillful or artful contrivance.” Granted,
this could just as well apply to engineering controls but the overwhelming emphasis of resilience
engineering is sociotechnical; that is the focus is on people and how they interact with the
artifacts (i.e. something devised to help people perform work; machines and/or organizations)
found in the working world and on better control of systems.
Although the term resilience is well known in various academic disciplines the moniker of
‘resilience engineering’ is relatively new, starting to appear in the literature around the end of the
last century (Woods et al 2007). However, the elements that characterize resilience engineering
have been brewing for many decades. Indeed, resilience engineering borrows from many
different areas of organizational and safety research such as High Reliability Organizations and
similar approaches. Researchers in the field are mainly involved in the areas of healthcare,
nuclear power, aviation, and aerospace, among others characterized by complex, high-risk and
high-visibility industries. Resilience engineering draws inspiration from work by industrial
psychologists, sociologists, anthropologists, and other safety theorists such as James Reason,
Scott Sagan, Donald Norman, and Charles Perrow. Resilience engineering uses and respects the
effective methods and techniques developed over the years by various industries with the caveat
that they “…must be looked at anew and therefore possibly used in a way that may differ from
what has traditionally been the case.” (Hollnagel 2008).

61

Some have described resilience engineering as a paradigm shift in the ‘Kuhnian sense”
(Woods and Hollangel 2006), referring to Thomas Kuhn’s well-known "The Structure of
Scientific Revolutions" published in 1962. By proposing a new outlook and vocabulary for
system safety it may well be, as Kuhn stated “A revolution occurs when a community changes its
lexicon.” However, in the course of scientific (or any type of) discovery there are those who,
early on, become embroiled in discussions concerning the use of a phrase such as ‘paradigm
shift’ and enmeshed in conducting assessments of the validity of the researcher’s use of the term
instead of seeing the possibilities in the new paradigm. The concept of Resilience Engineering is
presented here in the mildest revelational terms possible, it is left to future generations to ponder
if Resilience Engineering is truly a paradigm shift or is to be cast upon the high pile of existing
safety models. As Kuhn stated “…if a new candidate for paradigm had to be judged from the
start by hard-headed people who examined only relative problem-solving ability, the sciences
would experience very few major revolutions." (Kuhn 1996). The primary goal of this paper is
to introduce the reader to resilience engineering and explore the possibilities of applying it to
construction industry safety.
In general, RE is concerned with any sociotechnical aspect of production but automation is of
special interest given its prominent use in industries such as aviation and nuclear power
production. Woods and Hollnagel (2006 prologue) posit that automation grew out of the ‘error
counting’ paradigm where the worker was merely considered an unreliable part of the system
and a liability to safety. The system would seemingly perform better without the human
mucking up the works. However, as with any man-made object, a human is involved
somewhere in the system, either as a designer or to interact in some manner with the automation.
The automation may only be as good as the designers’ or developers’ ability to anticipate what

62

disturbances may occur. Automation has been referred to a “team player” in the sociotechnical
outlook because it improves the response to disturbances in the work setting and helps the
system to adapt (Nemeth et al 2009).
Paries (2010) explains that automation aims to reduce the uncertainty in the system by
reducing variety, diversity, deviation, and instability. For instance, it can reduce fatigue, a kind
of performance variability, in airline pilots by handling monotonous tasks. So, in one respect,
automation can be seen as an aid to resilience in that it decreases performance variability by
standardizing routine or complex functions (McDonald 2006). However, it also has the side
effect(s) of reducing autonomy, creativity, and reactivity. Additionally, it may lead to increased
reliance on rules and less training than normally done. Wiener termed the phrase ‘clumsy’
automation to refer to the new demands with respect to communication and coordination that
automation imposes on a flight crew but does not support well (Sarter et al. 1997).
In the AEC sector there is currently no counterpart to the sophistication of the automation of
the auto pilot found in the airline industry. There are pockets of advanced automation found in
areas of building safety and security. For instance, sprinkler and other fire-suppression systems
employ a great deal of monitoring and reaction without constant supervisory control. The
current popularity of green building has spawned an interest in indoor environmental systems
that monitor and adjust variables related to building performance. Automation in the
construction process extends to administrative practices. Drawing and code review have been
automated as to virtually eliminate paper documents and constant human supervision. Some
sophisticated construction fabrication shops can also extend the automation function to interface
with factory floor computer aided manufacturing systems. A full discussion of automation is
beyond the scope of this work.

63

2.5.2 The Four Premises of Resilience Engineering

“In agreement with Perrow's notion of Normal Accidents, the Resilience Engineering approach
uses the understanding of normal performance as a premise to explain that accidents emerge
from normal system performance, rather than resulting from
technical, human or organisational failures” (Macchi and Hollnagel 2010). .Correspondingly,
Resilience Engineering is based on the following four premises (Hollnagel et al 2011):

1. Performance conditions are always underspecified. Individuals and organizations must
therefore adjust what they do to match current demands and resources. Because resources
and time are finite, such adjustments will inevitably be approximate.
2. Some adverse events can be attributed to a breakdown or malfunctioning of components
and normal system functions, but others cannot. The latter can best be understood as the
result of unexpected combinations of performance variability. This is illustrated via the
Functional Resonance Analysis Method (FRAM).
3. Safety management cannot be based exclusively on hindsight, nor rely on error
tabulation and the calculation of failure probabilities. Safety management must be
proactive as well as reactive.
4. Safety and field operations management are inseparable and do not operate
independently. No conflict or tension should exist between these functions. Safety must
therefore be achieved by improvements to the operations (i.e. by engineering a better
operations process) rather than by simply constraining operations (i.e. by barriers, more
regulations, etc.).

64

2.5.3 The Four Abilities of a Resilient Organization
In order to be resilient, an organization must possess the four basic abilities of response,
anticipation, monitoring, and learning. The mix of the four cornerstones depends on the context
of the analysis. Some situations may require more response capabilities while another may
require more anticipation. However, an organization is not considered resilient, in the RE
outlook, if all of the abilities are not present in some form. Resilience Engineering thus
emphasizes function over structure and ability over capacity. The four abilities are (Hollnagel et
al. 2011) are explained below.
1.Knowing what to do, or being able to respond to regular and irregular variability,
disturbances, and opportunities either by adjusting the way things are done or by
activating ready-made responses. This is the capability to address the actual.
2.Knowing what to look for, or being able to monitor that which changes, or may change,
so much in the near term that it will require a response. The monitoring must cover
the system’s own performance as well as changes in the environment. This is the
capability to address the critical.
3.Knowing what to expect, that is, or being able to anticipate developments, threats, and
opportunities further into the future, such as potential disruptions or changing
operating conditions. This is the ability to address the potential.
4.Knowing what has happened, or being able to learn the right lessons from the right
experience - successes as well as failures. This is the capability to address the factual.

The analysis is at the organizational level given that an individual can’t reasonably be
expected to possess all four abilities on a sustainable basis.
65

The four abilities discussed above can be used to develop a profile of the resilience of an
organization.

The profile is created by asking “probing “questions about each of the individual

resilience abilities then plot these on a star-shaped grid. This is termed the “Resilience Analysis
Grid,” Or RAG. The profile can be used to assess and also to bolster the resilience of an
organization, and thus to better understand how disruptions affect safety and resilience.
2.5.3.1 Learning

Individual and organizational learning from experience is the cornerstone for dealing with the
factual, or knowing what has happened. Learning should be a continuous activity and involve all
levels of the firm, from the CEO to field labor. Because fatalities and injuries are relatively rare
events the opportunity to learn from them is limited. The resilient firm will take opportunities to
learn from near-misses which occur more frequently than fatalities (Groenewig 1998). This
outlook assumes that an atmosphere of true collaboration exists and that associates are able to
voice legitimate concerns without retribution from others. Hollnagel et al. (2011) thinks that
resilience is about “how systems learn to modulate their adaptive capacities to continuously
update their fitness relative to an environment of changing pressures and opportunities.”
In traditional safety thinking learning has occurred from that which has gone wrong. This is
reasonable given that knowledge of that which has gone wrong in the past is essential for to
prepare for and eliminate the things that contributed to the untoward event so that it does not
occur again. In the contrarian approach to safety, the Resilience Engineer asks “Is this the best
way to learn?” Accidents are relatively infrequent events. Additionally, accidents are usually
different from one another and the information learned may not be useful for a one-time event
(Hollangel 2010).

66

The Resilience Engineering perspective is not to discard the learning that occur post event,
but to also learn from what goes right in the normal course of work. Studying that which goes
right provides more opportunities for learning given that things go right more often than they go
wrong. A basic principle of Resilience Engineering is that failures are the flip side of success
and that they both have their origin in performance variability (Hollnagel et al 2006).
Learning and responding are related. Response relies on the ability to learn. If the
environment remained static and processes stayed constant – in other words if the world
remained stable, there would be no need to continue to learn other than one cycle of pre-defined
responses. However, everything is ever-changing and dynamic and responses must be
continually updated. Learning occurs when observation and evaluation of the efficiency of the
responses is vetted from time to time. Learning and monitoring are related. Learning helps the
observer discover and evaluate the indicators for monitoring. The choice of indicators to use and
which to discard is vetted in learning for efficiency. Learning and anticipation are related.
Learning helps the organization develop a new model of the future and what adaptations may be
needed (Hollnagel et al. 2011).
Knowing what to learn is not an easy task and is an imprecise art. Pitfalls of the traditional
safety thought – the “negative” view” color investigations and assessments. Woods and Cook
(2002) recommend looking for alternative explanations. It is also influenced by prevailing safety
models and making situation conform that narrow view (Hollangel et al. 2011). Hollnagel notes
that it is important to look beneath the surface and gather evidence about how a system functions
(borne form the “normal” view as championed by Perrow) as well as direct causes, even though
it may be a protracted and lengthy process. Incidents are a good source of learning material if
the culture supports this approach. In general, for incidents to be effective for learning a fair and

67

just culture (e.g., a reporting culture) is helpful. Culture and the different occupations are also
important to examine as the perceptions of what is safe and what is risky varies among
organizations and occupations of individuals.
The probing questions for learning are presented below in Table 2.1.

68

Analysis item (ability to learn)

Selection
criteria
Learning basis

Data collection
Classification

Frequency
Resources

Delay

Learning target

Implementation

Verification/
maintenance

Is there a clear principle for which events are investigated and
which are not (severity, value, etc.)? Is the selection made
systematically or haphazardly? Does the selection depend on the
conditions (time, resources)?
Does the organisation try to learn from what is common
(successes, things that go right) as well as from what is rare
(failures, things that go wrong)?
Is there any formal training or organisational support for data
collection, analysis and learning?
How are the events described? How are data collected and
categorised? Does the categorisation depend on investigation
outcomes?
Is learning a continuous or discrete (event-driven) activity?
Are adequate resources allocated to investigation/analysis and to
dissemination of results and learning? Is the allocation stable or
is it made on ad hoc basis?
What is the delay between the reporting the event, analysis, and
learning? How fast are the outcomes communicated inside and
outside of the organisation?
On which level does the learning take effect (individual,
collective, organisational)? Is there someone responsible for
compiling the experiences and making them 'learnable'?
How are 'lessons learned' implemented? Through regulations,
procedures, norms, training, instructions, redesign,
reorganisation, etc.?
Are there means in place to verify or confirm that the intended
learning has taken place? Are there means in place to maintain
what has been learned?
Table 2.1: Probing Questions About the Ability to Learn

2.5.3.2 Monitoring

The critical coping mechanism refers to the importance of monitoring and assessing the system
so that surprises do not catch stakeholders off guard in the near term. A construction company
may perform an honest assessment and realize that they are operating dangerously close to a
safety breakdown and should relax production goals or garner other resources to alleviate the

69

problem. Surprises are abundant in the construction industry; the trick is to not let them catch
you off guard.
Organizations need metrics to monitor safety. In traditional safety management the indicators
of the safety performance have been lagging measurements. For instance, fatalities, days from
work (DFW), near misses, to name but a few are recorded and used as indicators of how safe or
unsafe an organization is or is not. These measurements are used because they are objective,
relatively easy to gather and to quantify (Hollnagel et al. 2011). The reader is directed to
Hollnagel et al.(2011) for a description of why this method of accounting is not useful in
managing safety. In short, these type of measurements are outcomes – or a product of a
seemingly unsafe condition. In the Resilience Engineering perspective we are interested in the
process of work and understanding what workers do to “make ends meet” in the normal course
of operations. Thus, the focus of RE efforts is on understanding the process (not product) of
normal work and amplifying what “works” and dampening what does not. Wreathall (2009)
states that while indicators are crucial but underdeveloped at the present time. To reach the goal
of being “proactive” data must be gathered from intermediate activities as well as the output to
develop indicators. This will allow adaptation that may influence an untoward outcome.
Indicators should also be developed for changes in the environment that may impact the system.
Examples include financial crunches or material shortages. Faint signals as indicators should not
be ignored in a resilient systems. Faint signals are hints of coming trouble in a system that, after
the fact, may be recognized as early warnings.
Wreathall (2009) proposes a guide to indicator selection. Preferred indictors (in order of
preference) are summarized below:
•

Objective: They are based on observable and non-manipulative sources.

70

•

Quantitative: They are measurable and can identify when changes in performance
occur.

•

Available: They can be obtained from existing data.

•

Simple to understand/represent worthy goals/possess face validity.

•

Related to/compatible with other programs.

Indicators are called leading if they can provide information that forestalls an unwanted
outcome such as an accident or financial damage/ruin. Indicators are called lagging if the
measure is a reflection of the output and nothing can be done to change the outcome, that is, it is
a past performance measure. Wreathall (2009) warns that the levels and time-scales of the
system should be considered when labeling an indicator as either leading or lagging. For
instance, the lagging indicator of short-term staffing turnover may indicate fatigue in certain
situations but may also be a leading indicator for systematic change.
The probing questions for monitoring are presented in Table 2.2.

71

Analysis item (ability to monitor)
Indicator list

Relevance

Indicator type

Validity

Delay
Measurement
type

Measurement
frequency
Analysis /
interpretation

Stability
Organisational
support

How have the indicators been defined? (By analysis,
by tradition, by industry consensus, by the regulator,
by international standards, etc.)
When was the list created? How often is it revised?
On which basis is it revised? Is someone responsible
for maintaining the list?
How appropriate is the mixture of 'leading', 'current'
and 'lagging indicators'? Do indicators refer to single
or aggregated measurements?
For 'leading' indicators, how is their validity
established? Are they based on an articulated process
model?
For 'lagging' indicators, what is the duration of the
lag?
How appropriate are the measurements? Are they
qualitative or quantitative? (if quantitative, is a
reasonable kind of scaling used?) Are the
measurements reliable?
How often are the measurements made?
(Continuously, regularly, now and then?)
What is the delay between measurement and
analysis/interpretation? How many of the
measurements are directly meaningful and how many
require analysis of some kind? How are the results
communicated and used?
Are the effects that are measure transient or
permanent? How is this determined?
Is there a regular inspection scheme or schedule? Is it
properly resourced?

Table 2.2: Probing Questions about the Ability to Monitor
2.5.3.3 Anticipation

Irregular events, or the understood but challenging one-off event(s) that are unexpected but not
impossible, are the impetus for dealing with the potential to anticipate disruptions, pressures, and
their consequences.
Woods et al. (2010) frames anticipation in terms of adaptive capacity and warns that this must
always be on the minds of managers. Failing to do so could result in a vulnerable system that
72

may face sudden collapse. Adaptive capacity may include buffers or reserves of the system.
Woods et al. (2010) identifies six patterns that anticipate that the adaptive capacity of a system is
falling in terms of dwindling buffers or reserves indicating that a shift in operations is in order to
avoid failure. They are summarized below.
First, “…resilient systems are able to recognize that adaptive capacity is falling or inadequate
to the contingencies and squeezes or bottlenecks ahead.”

An example in natural systems is the

pattern of how quickly or slowly the system recovers form disruptions – progressively slower
recoveries may mean that the system is near a tipping point. The analyst should always be aware
of what kind of disruptions the system can handle and if the disruption influence the
interdependencies among functions. This type of vigilance may also indicate the system has
untapped reserves of resilience that may surface if failure is avoided.
The second pattern is that “Resilient systems are able to recognize the threat of exhausting
buffers or reserves.” Basically, this means that managers should avoid employing all resources if
possible when meeting a challenging event or other disruption. For instance, urban firefighters
avoid “all hands” calls to be able to adapt to rapidly changing field conditions. Along the same
lines, hospitals keep beds open in emergency wards in case of a bed “crunch.”
The third pattern identified is “Resilient systems are able to recognize when to shift priorities
across goal-trade-offs.” An example of a goal trade-off is the ETTO mentioned previously. In
this pattern it is important for the analyst to identify where the system is positioned in the goaltrade-off space, is the position appropriate for the context, and can the system migrate to a more
favorable location in the continuum? A resilient system will know when to ease, or sacrifice,
production goals when safety is endangered.

73

The fourth pattern is “Resilient systems are able to make perspective shifts and contrast
diverse perspectives that go beyond their nominal system position.” This entails studying, from
a system’s view, how functional interdependencies, as perhaps identified in FRAM fashion, may
impact the systems. Additionally, cross-scale interactions in the system (Hollnagel et al. 2006)
that consist of how blunt-end decisions, policy’s, resource allocation, and so forth affect sharpend behavior. Woods calls this “downward resilience.” Sharp-end behavior can affect learning
and impact anticipation analysis and needs to be communicated to the strategy level.
Correspondingly, this is termed “upward resilience.”
The fifth pattern is “Resilient systems are able to navigate interdependencies across roles,
activities, levels.” Woods et al. (2010) warns that “Without the ability to carry out this form of
anticipation, systems are at risk of the adaptive breakdown pattern of working at cross-purposes
or being locally adaptive but globally maladaptive.”
Finally, the sixth pattern is “Resilient systems are able to recognize the need to learn new
ways to adapt.” This speaks to the relationship between anticipation and learning. As Woods et
al. (2010) points out, it would be difficult to anticipate intelligently without reflecting on how the
system works and what has previously gone right and wrong.
The probing questions for anticipating are presented in Table 2.3.

74

Analysis item (ability to anticipate)
Expertise
Frequency

Communication

Assumptions about
the future (model of
future)

Time horizon

Acceptability of
risks
Aetiology

Culture

Is there expertise available to look into the
future? Is it in-house or outsourced?
How often are future threats and opportunities
assessed? Are assessments (and re-assessments)
regular or irregular?
How well are the expectations about future
events communicated or shared within the
organisation?
Does the organisation have a recognisable
'model of the future'? Is this model clearly
formulated? Are the models or assumptions
about the future explicit or implicit? Is the model
articulated or a 'folk' model (e.g., general
common sense)?
How far does the organisation look ahead? Is
there a common time horizon for different parts
of the organisation (e.g. for business and
safety)? Does the time horizon match the nature
of the core business process?
Is there an explicit recognition of risks as
acceptable and unacceptable? Is the basis for this
distinction clearly expressed?
What is the assumed nature of future threats?
(What are they and how do they develop?) What
is the assumed nature of future opportunities?
(What are they and how do they develop?)
To which extent is risk awareness part of the
organizational culture?

Table 2.3: Probing Questions about the Ability to Anticipate

2.5.3.4 Responding

Responding is concerned with how the system behaves in “real-time.” This is the actual
capability of the system to deal with the demands of the current disrupting situation. “At the
‘sharp end’ of the system, ‘responding to the situation includes assessing the situation, knowing

75

what to respond to, finding or deciding what to do, and when to do it.” Stakeholders at the ‘blunt
end’ contribute by ensuring resources are available (Paries 2011).
Paries (2010) posits that there are two strategies associated with readiness to respond,
proactively or reactively. The proactive strategy involves anticipation of potential disruptions
and predefined responses. The reactive approach “is to generate, create, invent, or derive ad hoc
solutions.” He promotes a holistic approach that seeks to “establish (now) and maintain
(tomorrow) a readiness to respond (at any time in the future).
The response and anticipation cornerstones are closely related. Paries (2010) points out that
the Woods idea of cross-scale interactions is an important piece of the response puzzle. At the
global level analyst “may anticipate occurrences that are too rare to be even thought of at local
scales, while local operators will anticipate situations that are much too detailed to be tackled at a
larger scale.” Hollnagel also recognizes that it is important to recognize when situations are
encountered in ‘real-time’ that fall outside of the range of anticipated variations. The point being
that the systems will not adapt properly. To adapt properly a ‘real-time’ resilient system then has
to monitor its boundaries. The sequencing involves monitoring the current degree of control of
the system and then anticipating the amount of control needed in the immediate future. Woods
et al. (2010) summarizes this by stating “To be resilient, a system always keeps an eye on
whether its adaptive capacity, as it is currently configured and performs, is adequate to meet the
demands it will or could encounter in the future.”
Paries (2010) notes that it is impossible to anticipate everything and that a resilient system
“must be both prepared and be prepared to be unprepared.” Paries feels that something may be
lost in the attempt to anticipate every possible or probable event given that no two accidents are

76

alike. Some feel that it may be more beneficial to develop certain competencies to deal with
situations encountered rather than domain-specific skills.
The probing questions for responding are presented in Table 2.4.
Analysis item (ability to respond)

Event list

Background

Relevance

Threshold

Response
list
Speed
Duration

Resources
Stop rule
Verification

Is there a list of events for which the system has prepared
responses? Do the events on the list make sense and is
the list complete?
Is there a clear basis for selecting the events? Is the list
based on tradition, regulatory requirements, design basis,
experience, expertise, risk assessment, industry standard,
etc.?
Is the list kept up-to-date? Are there rules/guidelines for
when it should be revised (e.g. regularly or when
necessary?) On which basis is it revised (e.g. event
statistics, accidents)?
Are there clear criteria for activating a response? Do the
criteria refer to a threshold value or a rate of change? Are
the criteria absolute or do they depend on
internal/external factors? Is there a trade-off between
safety and productivity?
How is it determined that the responses are adequate for
the situations they refer to? (Empirically, or based on
analyses or models?) Is it clear how the responses have
been chosen?
How soon can an effective response begin? How fast can
full response capability be established?
For how long can an effective response be sustained?
How quickly can resources be replenished? What is the
'refractory' period?
Are there adequate resources available to respond
(people, materials, competence, expertise, time, etc.)?
How many are kept exclusively for the prepared
responses?
Is there a clear criterion for returning to a 'normal' state?
Is the readiness to respond maintained? How and when is
the readiness to respond verified?

Table 2.4: Probing Questions About the Ability to Respond

77

2.6 Managing Performance Variability –“Making Ends Meet”

The Resilience Engineering outlook “…stresses the role of performance variability to ensure
the normal functioning of socio-technical systems” and that “To improve system safety it is
necessary to understand and to manage performance variability” (Macchi and Hollnagel 2011).
Hollnagel (2004) argues that the performance variability is not from the complexity or demands
but rather from the adaptations required by humans and organizations (social entities) to control
the complexity and meet the demands on the systems. In other words, to “make ends meet.”
Performance variability is not found in machines and technology (Hollnagel 2009). Looking at
Woods et al (2007) remark that ““Resilience represents the ability of a system to adapt or absorb
disturbances, disruptions, and changes and especially those that fall outside the textbook
operational envelope” it is clear that some parts of a socio-technical system are equipped to
handle some disturbances and perturbations. However, many systems and sub-systems are
underspecified and it is impossible to identify all potential working scenarios or working
conditions. Additionally, variability will also ensue from unpredictable activities such as inputs,
resources, and late or incomplete instructions (Hollnagel 2004, 2009; Macchi and Hollnagel
2010).
In discussing performance variability, Hollnagel (2004, 2009) draws upon the work of
Perrow who argued in “Normal Accidents” that complexity cannot be significantly reduced; the
alternative is to try to manage the variability of the system. From the Resilience Engineering
perspective accident prevention is about managing performance variability; “Managing
something requires being able to observe or detect it, being able to determine when it is getting
out of hand, and being able effectively to introduce countermeasures or mitigating actions”
(Hollnagel 2004).
78

In the Resilience Engineering view performance variability has two sides. In the traditional
(or engineering) sense safety was ensured by “designing and enforcing barriers to reduce the
number of human errors and to mitigate their consequences, or in other terms to reduce
discretion and variability” (Re and Macchi 2010). Here the discretion of the worker or crew to
adapt and react to demands is constrained. However, Re and Macchi (2010) point out that
Hollnagel recognized that, in the norm, discretion and variability are also the sources of success
and continued functioning. Hollnagel et al (2008) state that “…failures represent the flip side of
the adaptations necessary to cope with the real world complexity rather than a failure of normal
system functions. Success depends on the ability of organisations, groups and individuals to
anticipate risks and critical situations, to recognise them in time, and to take appropriate action;
failure is due to the temporary or permanent absence of that ability.”
Re and Macchi (2010) argue that recognizing the duality of performance variability by
Hollnagel is a “transition point from an approach where humans are considered the weak and
unreliable components of a socio-technical system to an approach where humans’ contribution to
the functioning and to the safety of a system is mainly positive.”
Hollnagel (2004) notes that the spectrum of performance variability differs given the system
under consideration and the goals of the analyst. In, general, a system is variable if it changes
over time. Hence, the rate of change of the system is important. Systems can consist of several
subsystems, as evidenced by construction work where work is distributed to different trades and
specialists, therefore, variability can take place on several time scales simultaneously. Hollnagel
breaks down the variability in the subsystems as typical moment-to-moment, working
environment, and organization variability. Typical moment-to-moment variability (i.e., short
term fluctuations of resources, demands, and working conditions) takes place on the time scale of

79

a second or minute level. The working environment provides the next level of variability. This
is concerned with the demands placed on those in areas such as the military, open-heart surgery,
a construction site. This variability may be slower than moment-to-moment variability but can
occur rapidly. The final category of performance variability is in the organization and is
metaphorically described as the “slow drifting to new norms and emerging, tacit standards for
performance” (Hollnagel 2004).
Macchi and Hollnagel (2010) and Hollnagel (2009) describes the reasons for performance
variability, they are presented below as summarized from Macchi. The reader is directed to
Macchi’s dissertation for a full discussion of the reasons.
1. Physiological and/or fundamental Psychological factors. This class of factors have an
influence on perception and vigilance.
2. Higher level Psychological factors. Ingenuity, creativity, adaptability, etc. and their effect
on human performance have been investigated by Human Resources management and
selection studies. To improve safety it is necessary to choose the right people.
3. Contextual factors. Extensive lists of contextual factors (such as Hollnagel’s Common
Performance Factors (CPCs)) have been compiled to account for the detrimental effect
context may have on human reliability.
4. Social factors. Meeting personal or social expectations, and complying with informal
work standards, are examples of how organizational culture influences human
performance at work.
5. Systemic factors. The need to stretch resources in order to meet performance demands or
the need to substitute goals when dealing with unpredictable events is reasons why
performance variability is influenced by systemic factors.

80

The Efficiency – Thoroughness Trade Off principle constitutes a useful framework to
understand performance variability induced by systemic factors (Hollnagel 2004, 2009, Macchi
and Hollnagel 2010), and the focus is on this systematic factors as discussed in the next section.
2.6.1 Explanation of Performance Variability – The Efficiency Thoroughness Tradeoff (ETTO)

Hollnagel (2009) postulates that the principle of the efficiency-thoroughness trade-off
(ETTO), is a way to describe human and organizational performance variability. The ETTO
principle brings to light the fundamental human condition that, because resources are limited,
people and organizations act in ways that favor efficiency. The resource of “time,” because it
underlies all that is done, is especially stressed in ETTO conversations and considerations.
Conversely, thoroughness is required because accident events impact efficiency and perturb the
system. The ETTO occurs because of certain rules (discussed later) that concern individual,
organizations and social behavior in a work context. The ETTO can be used to reasonably
explain how work is done and success is achieved and failure sometimes occurs as a failure to
adapt to the demands of a system. At its core it represents the heuristics (or rules of thumb) that
people use to make decisions needed to complete their work. The trade-off or choice between
being thorough or efficient is a decision that the frontline worker is tacitly asked to make every
day in order to complete work given that it is impossible to maximize both simultaneously. The
ETTO principles aim to find the balance in context of the task at hand. Macchi and Hollnagel
(2011) posit that this balance is found by taking into account a subjective evaluation of available
resources and time, individual personality traits, social habits, practices, safety culture etc., social
and organizational pressure, and the tendency to save time and resources in case of unexpected
events. Hollangel (2009) states “For a recurrent work situation most people will naturally
81

choose the more efficient mode of operation as long as it, in their experience, is just as safe as
the alternative.” Woods and Hollnagel (2006) cite research into production/safety trade-offs in
laparoscopic surgery that has found that the decision to value production (i.e. efficiency) over
safety (thoroughness) is implicit and unrecognized.
Macchi and Hollnagel (2011) states that “From the ETTO perspective the understanding of
human performance requires the acknowledgement that humans take sacrificing decisions, use
mental models and apply heuristics.” These take into account the time pressures involved in
most situations along with the scarcity of other resources, such as information. This point of
view is summarized below and the reader is directed Macchi’s and Hollnagel and Hollnagel
(2011, 2009) texts for a full discussion.
Sacrificing decisions are a revision of Herbert Simon (1997) hypothesis of satisficing.
Satisficing is a portmanteau that combines “satisfy” with “suffice” and is “understood as the
attempt to achieve a minimum level of a particular variable when making a decision” (Macchi
and Hollnagel 2011). With regards to the ETTO principle and in a Resilience Engineering
perspective, “The sacrificing decision maker is unable to maximise the benefit due to the
complexity or intractability of the working environment” and “The intractable nature of complex
socio-technical not have a complete understanding of the situation, the potential consequences of
their actions and they have not cognitively explored all the available alternatives” (Macchi and
Hollnagel 2011).
Mental models are used by individuals to make sense and simplify their interactions with the
world. Macchi and Hollnagel (2011) cites Johnson-Laird’s theory of mental models and “how a
person holds a mental working model of the phenomenon he/she interacts with. To encompass
the scope required to support the human understanding of a situation, mental models must be

82

simpler than the real-world phenomenon they represent. In this way a person can base his/her
understanding on a check of salient characteristics rather than checking every detail. Mental
models therefore provide an effective way to cope with the complexity of the world based on
knowledge and experience of already encountered situations. Problems arise if a situation is
misjudged and a response plan is implemented for a situation which is not as it was thought. An
important contribution of the mental models theory is the acknowledgement that people’s
reasoning and behaviour is primarily influenced by the content-relatedness and form of the
information presented rather than a logic reasoning.”
Heuristics, or “rules of thumb” are used by people to reduce the complexity of the world
around them. In the ETTO jargon, they save time and are sufficiently thorough by relying on
past events and reasoning. Heuristics are used to quickly recognize similar situation and to judge
uncertainty. The heuristics of similarity matching and frequency gambling are used in the
former. The heuristics of representativeness, availability, and anchoring and adjustment are used
for the latter.
By promoting the ETTO principle, Hollnagel intends to shift the fundamental outlook of risk
assessment. In particular, instead of researchers and investigators trying to determine how a
component or subsystem may fail or an unwanted outcome may occur; they should be asking
“How and when the variability of normal performance, i.e., the adjustments that people must
make to accomplish their work, can lead to adverse outcomes” (Hollnagel 2009). Also, instead
of asking how human error might occur the question should be how likely is it that a person or an
organization will make an ETTO? As established previously, ETTOs will always occur. The
concern should be with how humans in different parts of the subsystems are practicing ETTOs
and how they may combine to cause unintended and unwanted outcomes (i.e., an accident).

83

2.7 Functional Resonance Analysis Method (FRAM)

FRAM is a RE based safety assessment and accident analysis method that builds on and
complements traditional risk analysis methods by providing new insights and a deeper
understanding of the actual functioning of the system (Herrera et al 2010). The premise behind
adopting the FRAM is that the current methods, models of safety risk assessment and accident
analysis are insufficient to gain a further understanding of complex socio-technical systems. The
FRAM assumes that some accidents result from unexpected combinations of normal
performance variability and that accidents are prevented by monitoring and damping this
variability (Hollnagel 2004, 2009, Macchi 2011). The FRAM provides a way for the analyst to
identify and visualize the dynamic interactions within a socio-technical systems approach and to
gain a better understanding on non-linear dependencies, performance conditions and variability,
and their resonance across important functions or activities. The FRAM and Resilience
Engineering applications are utilized in complex socio-technical modeling and have the
common assumptions that systems are always underspecified and local adjustments are needed
by those at the workface to “make ends meet” (Macchi 2011).
The FRAM is based on four principles (Hollangel 2009,Macchi 2011, Herrera et al 2010):
1. The principle of equivalences of successes and failures : Hollnagel (2009) quotes the
philosopher Ernest Mach, who stated in 1905 that “Knowledge and error flow from the same
mental sources, only success can tell one from the other.”
2. The principle of approximate adjustments: In the discussion above regarding performance
variability it discussed that underspecification always occurs and are therefore unpredictable.
Procedure and resources must be adapted to the situation. From the perspective of Resilience
Engineering performance variability is both normal and necessary.
84

3. The principle of emergence : “The variability of normal performance is rarely large enough to
be the cause of an accident in itself or even to constitute a malfunction. But the variability from
multiple functions may combine in unexpected ways, leading to consequences that are
disproportionally large, hence produce a non-linear effect. Both failures and normal performance
are emergent rather than resultant phenomena, because neither can be attributed to or explained
only by referring to the (mal) functions of specific components or parts” (Hollnagel 2010)
4. The principle of functional resonance: “FRAM replaces the traditional cause and effect
relation by the principle of resonance. This means that the variability of a number of functions
every now and then may resonate, i.e., reinforce each other and thereby cause the variability of
one function to exceed normal limits. The outcome may, of course, be advantageous as well as
detrimental, although the study of safety has naturally focused on the latter.) The consequences
may spread through tight couplings rather than via identifiable and enumerable cause-effect
links” …”The resonance analogy emphasizes that this is a dynamic phenomenon, hence not
attributable to a simple combination of causal links. This principle makes it possible to capture
the real dynamics of the system’s functioning (Woltjer & Hollnagel 2007), hence to identify
emergent system properties that cannot be understood if the system is decomposed in isolated
components” (Macchi 2010).
The FRAM model describes a system’s functions and the potential couplings among
functions. The model does not describe or depict an actual sequence of events (i.e., a scenario).
A scenario can be described by an instantiation of the model. The instantiation is a “map” of how
functions are coupled under given – favorable or unfavorable - conditions. The approaches
differ slightly for risk assessment and accident analysis. For risk assessment, the steps consist of
the following, which is greatly condensed (Macchi 2011, Hollnagel 2004, 2008, 2012):

85

1. Clarify the purpose of modeling and describe the situation being analyzed. In the prospective
use of the FRAM, the purpose is to develop an overall understanding of the couplings and
dependencies among the (foreground and background) functions of the system.
2. Identify the essential functions that are necessary (and sufficient) for the intended performance
to occur (when 'things go right'). Characterize using the six basic aspects (Input, Output, Preconditions, Resources, Time, and Control). Taken together, the functions are sufficient to
describe what should happen (i.e., the everyday or successful performance of a task or an
activity). The foreground basically means that which is occurring at the workface, i.e. “normal
work.”
“A function is an activity of the socio-technical system towards a specific object” “The
principle that guides the identification of functions is the need to achieve a description of the
normal activities performed by the socio-technical system being analysed” (Macchi and
Hollnagel 2010). The function is represented by a hexagon that is sometimes called a
“snowflake” and is illustrated in Figure 2.5. The function is described by six aspects, time
available, input, preconditions, resources, output, control and are described in the Figure.

86

Figure 2.5: Function/Activity representation and aspects (Hollnagel 2008)

3. Characterize the variability, first as the potential of the functions described by the model, and
then as the (possible) actual variability for a set of instantiations of the model. Consider whether
the actual variability will be what one should expect ('normal') or whether it will be unusually
large ('abnormal').
4. Identify the dynamic couplings (functional resonance) that likely will play a role during an
event. These comprise an instantiation of the model which can be used to predict how an event
will develop and whether control can be lost. “The aim of this step is to determine the possible
ways in which the variability from one function could spread in the system and how it may
combine with the variability of other functions.

87

5. Propose ways to monitor and dampen performance variability (indicators, barriers, design /
modification, etc.) In the case of unexpected positive outcome, one should look for ways to
amplify, in a controlled manner, the variability rather than for ways to dampen it.
It is worthwhile to consider how the FRAM differs from how the production and safety
combination is traditionally handled on construction projects. The authors experience in the
areas of residential, commercial, military, and municipal construction is that, in general, the two
are discussed separately, if at all. The venue for discussing production and safety includes preconstruction or preparatory meetings (in the case of the USACOE QC/QA system) meetings
where production means, methods and materials are discussed in terms of the CPM schedule and
how staffing and resources will be utilized to meet the production goals of the general contractor
and ultimately the owner. The specifications are also reviewed as applicable to the trade. Safety
is then discussed primarily in terms of compliance with regulations. The group produces a
document such as an “activity hazard analysis” (AHA) that has a singular focus on that particular
trade and usually pays little attention to the work of other trades. Safety plans are often generic
and not site-specific. Additionally, safety plans are often non-existent or outdated and sitespecific plans are not drafted unless specified in the contract. Then, even with such
specification, they may not be completed unless requested by the GC or owner’s representative.
The attendees of these meetings typically consist of the owner’s representative (the QA
personnel), the general contractor’s QC representative, the general superintendent, and the trade
foreman and project manager. Occasionally, a dedicated safety officer from the trade or GC will
attend. The discussion is heavily tilted toward “work-as-imagined” and not “work-as-done” as
the meeting participants are managers and not those who actually perform the work, with the

88

exception of the field superintendent or foreman. Additionally, only the trade under scrutiny, for
example the steel erector, is included in the conversation.
In the traditional approach to safety there are a few important underlying assumptions. One is
that accidents will occur in the same way that they occurred before, hence, the emphasis on
regulatory compliance (e.g. OSHA). Safety regulations do help individuals and firms avoid
repeating scenarios that have led to past accidents. However, they are very focused in nature
(e.g. extend the ladder three feet above the roof line and secure it to the structure) and do not
account for the occurrence of interactive unwanted events that may lead to an accident. For
instance, the concurrent events of a ladder falling and a trench cave-in on the project may be a
possibility. Additionally, accidents rarely happen in the same manner and in many instances the
conditions surrounding the event are quite different. Another underlying assumption is that by
removing the component (or potential hazard) that may fail, for example the “tied-off” ladder,
safety is ensured. This harkens back to the linear thinking associated with Heinrich’s Domino
Model whereby if the offending component (domino) is removed the system will be safe. The
overwhelming emphasis of the safety portion of the meeting is on what can go wrong and not on
what goes right and how we can amplify and learn from the adjustments that workers make in
the course of normal work to avoid accidents.
The FRAM offers a structured format in which to improve upon and add value to the
traditional way of discussing the production and safety interaction of construction activities.
First, it offers the opportunity for a richer discussion of how work will be performed in a safe
manner. This necessarily entails inviting those associates that will actually be performing the
work to select and instantiate the functions and activities. For example, in steel erection the
crane operator, riggers, connectors, and foreman would be involved in the production and safety

89

conversation in addition to the traditional attendees. They would discuss the means and methods
for production in terms of the six aspects of input, output, preconditions, resources, control, and
time. The emphasis is on what should happen in the course of everyday ‘normal’ case and then
how performance variability may affect that either positively or negatively. Unwanted events are
discussed in terms of how they can be eliminated or dampened. This discussion emphasizes how
the team may act proactively to affect safety by changing a production aspect, for example
allowing more time to rig a structural steel member, and to realize how the bounded system
interacts and may resonate out of control. Discussing regulatory requirements and contract
specifications is still an important part of the conversation and included in the FRAM because it
represents prior learning of how a system might fail.
The author knows of no U.S. construction firm currently using the formalized or deliberate
FRAM approach to risk assessment. No mention of FRAM in construction is made in either the
popular or scholarly literature. This new idea has just recently (in 2012) been formalized in a
dedicated text by Erik Hollnagel, the originator of the FRAM. Additionally, it has been
presented at Lean Construction, American Society of Civil Engineers, and Construction Safety
Council conferences only by the author.
2.8 The Resilience Analysis Grid (RAG)

To assess resilience Hollnagel et al. (2011) proposes that four sets of “probing questions” be
connected to the resilience capabilities can be used to develop a Resilience Analysis Grid
(RAG). The answers to these questions can be used to construct a resilience profile by
aggregating the ratings for each basic capability and coming up with a single rating for each
capability. The probing questions presented in the tables above are domain independent and
thus should not be used without confirming their relevance. Hence, the questions used could
90

either be a subset of these, reformulated ones, or completely new ones. As mentioned previously,
Hollnagel et al. (2011) states that the relative weight or importance of the four capabilities may
differ between domains, in other words, the mix of responding, learning, anticipating, and
monitoring may vary from situation and/or organization.
The procedure for filling out a RAG consists of the following steps:
1. Define and describe the system for which the RAG is to be constructed
2. Select a subset of relevant questions for each of the four capabilities
3. Rate the selected questions for each capability on a Likert type scale
4. Combine the ratings to a score for each capability, and for the four Capabilities
combined
2.9 Other Understandings of Resilience Engineering

Given that RE is an emerging field, researchers have concocted different understandings and/or
explanations to describe the concept. Two prominent approaches are described below, that of
Woods and Wreathall (2008) and Jackson (2009), and Madni and Jackson (2009).
2.9.1 Stress-Strain Analogy for Resilience Engineering

Woods and Wreathall (2008) borrow the concept of stress-strain plots from material science to
characterize and assess the resilience of a system. Figure 2.6 illustrates a typical stress-strain
plot. Varying demands placed on a project (e.g. production and labor demands as described
above) stand-in for stress (normally on the y-axis) on the plot. Strain is analogized to describe
how the system adapts (or stretches), using available capacity (e.g. working overtime, renting
additional excavation equipment) to the stress applied. Strain is plotted on the x-axis. The
defining characteristic (i.e. parameters and regions) of the typical stress-strain plot, also known
91

as the state space, distinguishes the organization (or unit of analysis) as resilient in terms of
adaptability. The state space also acts as a harbinger for management so that they can calibrate
the organizations true status with perceived status. Typically, managers overestimate the state of
safety, in other words firms believe that they are acting in a safer manner than reality indicates.

Figure 2.6: Stress-strain state-space (from Woods and Wreathall 2008)

The uniform portion of the curve (the elastic region) corresponds to times and situations
where the organization handles demands easily, stretching to accommodate them. Here the risks
are anticipated by building in capacity to avert extraordinary failure or disruption. In other
words, the company has adequately foreseen disruptions that may impact failures. Plans,
procedures, and flexibility in operations are the bellwether of the uniform region. In general, the
demands are well-known and accounted for, making stretching easy in this region. The yield
height (the inflection point of the curve where elasticity ends and plasticity begins) of the

92

uniform response curve captures the first-order adaptive capacity of the firm. The linear region
is also called the on-plan performance area.
The yield height can be adjusted by adding capacity in the uniform region or by changing the
range of demands the curve can accommodate. One example could be additional training such as
high-rise rescue training, drills, and simulation on a multi-story building. Thinking in terms of
adaptation and capacity can help the construction manager foresee risks and adapt plans
accordingly.
Beyond the uniform region lies what Woods and Wreathall call the extra region (x-region for
short). In material science this is known as the plastic region. Here the demands encountered
become more difficult to accommodate and the firm’s reaction to them is non-linear. Demands
are imposed upon the firms that go beyond what was anticipated in the on-plan performance
area. In this region safety and production efficiencies may be compromised as ‘gaps’ appear in
the organizations that exceed the first-adaptive capacity. Resources, or second-order adaptive
capacity, must be garnered to avoid reaching the failure point. Commonly, experienced groups
or individuals at the workface can recognize when they are operating in the plastic region and
take actions to cope with increasing demands, in other words they begin to fill in the ‘gaps’
caused by lack of capacity and foresight. These actions are indicators that the firm is in the xregion. This is reflected in the upswing portion of the x-region that corresponds to extra capacity
added to meet demands. This action is fraught with potential problems. Other bottlenecks and
constraints may appear and the tempo of the project increases. For instance, adding additional
crews will incur more supervision. If additional superintendents or other supervisors are not
added as the tempo increases in the x-region then safety might suffer as other areas on-site are
neglected.

93

If the demands imposed in the x-region begin to exceed second-order adaptive capacity then
the curve begins to acquire a negative slope and heads toward the failure point. To avoid failure,
the firm may decide to re-structure. The re-structuring occurs at a point in the x-region, prior to
failure but at a location where there is time to rally the requisite resources to rescue the project.
This could entail comprehensively re-planning the project. Resilient firms will anticipate the restructuring phase and adjust capacity accordingly. Firms that are not resilient may erode safety
margins and endanger personnel. This highlights the importance of calibration, or knowing
where a firm is situated with respect to the state space system. Mis-calibrated firms don’t realize
that they are in the x-region or perhaps heading toward failure.
2.9.2 Madni and Jackson’s Resilience Engineering Framework

Madni and Jackson (2009) build on early Resilience Engineering concepts to develop a
conceptual framework for Resilience Engineering. They also draw upon some of their previous
work in “architecting” system design. Jackson (2009) explains architecting in resilience terms
thusly: “When we say that resilience can be architected, we mean that systems can be defined
and the elements of the system can be arranged for which resilience will be an emergent
property, that is to say, a property not possessed by the individual system elements.” The idea is
that architecting “considers all aspects of defining the structure of a system” (Jackson 2009) as
opposed to the perceived limiting scope of engineering.
They view resilience as “a multi-faceted capability of a complex system that encompasses
avoiding, absorbing, adapting to, and recovering from disruptions” (2009). Their disruption
schema was discussed earlier in this work in the section on disruptions. The notions of
avoidance, absorption, adaptation, and recovery are similar to Hollnagel’s four cornerstones
(learning, responding, anticipating, and monitoring) of a resilient system although in a slightly
94

different manner in some respects and similar in others. Their conceptual framework for
Resilience Engineering is based on four key “pillars” of disruptions, system attributes, methods,
and metrics. The reader is referred to Madini and Jackson (2009) for details of their framework.

95

Chapter 3: Methods

3.1 Introduction

Construction research has been defined by the Associated Schools of Construction (ASC) (Syal
1998) as “…any scholarly activity that expands the knowledge base in the field of construction.
This may include: a) development of new knowledge, b) refinement of existing knowledge, and
c) the transfer of knowledge from other fields to construction.” This research most closely
resembles and focuses on the transfer of knowledge from other fields to construction given that
RE is applied in various industries such as aviation, nuclear generation, and sea fishing.
In a paper titled “Construction Research Agenda : Focus Area and Topics,” Syal (1998)
categorizes construction research types as either “Based on the type of problem and the
applicability of the solution” or “Based on the nature of the topic and the methodology.” The
former category is further subdivided into basic, applied, or a combination of basic and applied
research. He states that the latter category has the most relevance to the construction industry
and can be delineated as survey-based, experimental, exploratory/developmental, and descriptive
research. Syal does not dictate nor offer particular methods to be used within or across the
categories with the possible exception of the survey-based research.
Using Syal’s typology, this work cuts across three of the four subcategories of the latter
category mentioned above. It is descriptive research, which is, according to Syal (1998), a
“description of an unexplained or unknown problem or practice, including its causes, history,
evolution, and possible solutions.” The description is found in the comprehensive literature
review which outlines the history and evolution of the chronic safety problem in the United
States from the industrial revolution to the present. The literature review also discusses the
96

unknown problem of disruptions in both general industry and construction. Finally, it introduces
RE, as a possible solution to the managing of disruptions and increasing of safety on projects.
The conceptual framework in this work can be thought of as exploratory and developmental.
Syal (1998) describes this sub-category as “development of solutions for construction problems
without experimentation but by utilizing existing expertise in other aspects of construction
(highway to residential construction) or from other fields (manufacturing, business management,
computer science, etc.).” RE borrows heavily from other fields as mentioned above. This work
also explores the notion that RE is a way of understanding and explaining disruptions on
construction projects that may lead to accidents. It develops a framework based on general RE
principles applied to the construction project and the experience and 25-plus year expertise of the
author.
Finally, this work is experimental. It uses computer-based testing in the form of a hybrid
simulation consisting of agent-based and discrete event modeling to test, gauge and verify a
portion of the conceptual framework, namely the ETTO Principle. This part of the work is also
developmental given that no tools, computer-based or real-world, exist to test RE principles.
Given the unique nature of this topic and that it cuts across several different research
categories, innovative combinations of methods to accomplish the research were used.
Compounding the difficulty of choosing a method based on a safety-related topic is that it is
fraught with moral and legal issues as described below. Before RE is implemented in actual
field operations there needs to be much discussion among safety experts about how it would be
implemented in the field. As a starting point, this work proposes a conceptual framework to
guide that discussion.

97

3.1.1 Conceptual Frameworks

When used as a noun, the word ‘concept’ refers to something, a thought or a notion, conceived in
the mind. When used as an adjective it refers to the organization of an idea around a central
theme or idea. The Conceptual Framework represented in this work is built around the central
theme of RE as described by the RE scholarly community and how it can be used to better
understand and manage disruptions that may contribute to accidents. The author conceptually
translates these ideas to the construction context based on the literature and his experience and
education in the construction industry. Thus this research can be considered empirical given
that it is partially based on the observations and experience of the author. It provides a base
upon which future researchers may accept or reject the notions of RE and how the author
envisions it as applied to a construction project.
Smythe (2004) defines a conceptual framework thusly:
“A conceptual framework is described as a set of broad ideas and principles taken
from relevant fields of enquiry and used to structure a subsequent presentation.
When clearly articulated, a conceptual framework has potential usefulness as a
tool to scaffold research and, therefore, to assist a researcher to make meaning of
subsequent findings. Such a framework should be intended as a starting point for
reflection about the research and its context. The framework is a research tool
intended to assist a researcher to develop awareness and understanding of the
situation under scrutiny and to communicate this. As with all investigation in the
social world, the framework itself forms part of the agenda for negotiation to be
scrutinised and tested, reviewed and reformed as a result of investigation.”
Smythe (2004) feels that conceptual frameworks provide clear links from the literature to the
research goals and questions, inform the research design, provide reference points for discussion
of literature, methodology, and analysis of data, and contribute to the trustworthiness of the
study.

98

How to go about creating conceptual frameworks is not well understood or documented in the
literature. Many times the words ‘theory’ and ‘conceptual’ are used in conjunction with the
word “framework” to describe the same thing or same approach to understanding how the world
‘works.’ However, Shields and Tajalli (2006) cite Abraham Kaplan and John Dewey to explain
the difference between a theoretical framework and a conceptual one in empirical research.
Shields and Tajalli posit that theory is used as a tool to structure inquiry. However, there is an
intermediate step in forming theory that is often overlooked and this is the “behind the scenes”
development of the procedures for forming concepts and hypotheses that should be exposed. In
essence, the creation of the conceptual framework is an intermediate step to the formation of a
theory to understand reality. Theory is ultimately used to organize the exploration of the
problem under consideration. Shields and Tajalli (2006) state “A theory conforms to the facts
and is a way of looking at the facts.” However, Kaplan notes that “conceptual frameworks are
out in the open and are still conjectural or hypothetical. They are not truth; rather, a systematic
way (still subject to reason) to organize inquiry.” Shields and Tajalli (2006) explain that Dewey
(1938) likens conceptual frameworks to maps that help navigation through experience or the
experiential world and represent and abstract from reality and that “When accurate, maps enable
navigation within reality.”
Shields and Tajalli (2006) summarize the connective functions of conceptual frameworks to
theory, the purpose of the literature review, and the interplay of personal work experience by
stating:
“These (conceptual) frameworks help students connect forward into the problem and give
direction on how to collect and analyze data. They also have a connective function backward to
the literature and larger theoretical frameworks (i.e., neo-classical economics, organizational
theory). Students are expected to justify their framework by connecting it to the scholarly
literature (or an existing public affairs framework).

99

A literature review enables the student to get to know the topic, connect the larger literature to
their work experience, and refine the research question or problem. The literature review may
also reveal where previous inquiry has stopped. Conceptual frameworks are built upon the
premise and practice of a careful, thoughtful, and reflective review of the literature. Students are
thus expected to draw upon the wisdom and insights of the literature and their experiences to
develop a plan or map to guide their inquiry. A good map helps one reach an unknown
destination more quickly and with less anxiety.”
Smythe (2004) echoes the notion that the researchers’ ‘life-world experience’ is a part of the
development of the conceptual framework. However, Smythe cautions that the life-world
experiences of the person developing the framework should not be attributed a power that it does
not have and that the bounds of the researcher should be considered.
3.1.2 Method

The Method is illustrated in Figure 3.1. It is guided by the Objectives and, in general, follows
the scientific method using deductive reasoning by applying the general principles of RE to the
specific problem of how to manage disruptions to construction processes that may compromise
safety. Deductively, it links the premise that RE may be useful to explain disruptions that occur
in project-based construction endeavors to the Conceptual Framework and then to a Conceptual
Model for simulation of how disruptions may set the stage for accidents with the Efficiency
Thoroughness Trade Off (ETTO) Principle as a backdrop. Objective one is met by a thorough
literature review of constructions safety and production, given that this relationship is a major
premise of RE. The background RE is then explored. Objective two is met by constructing a
Framework of RE for project-based construction operations and is based on the Literature
Review and the author’s industry experience. The conceptual simulation model is presented
followed by a discussion of the results of the simulation and future work based on the
Framework.

100

The Evolution of Industrial
Safety
Popular Accident Models
1) Sequential

2) Epidemiological
a) Haddon’s Model
3) Systems
a) Normal accidents
b) HRO’s

Objective 1: Literature Review

a) Domino Model

Current Understanding of Resilience
Engineering

Conceptual Construction Framework of
Resilience Engineering Based on the
Literature Review

Objective
2

Conceptual Simulation Model

Objective
3

Analysis and Discussion

Figure 3.1: Method
101

3.2 Objective 1

Objective 1: Abstract the concept and underlying theories of RE and explore RE deployment in
non-construction industries for use in formulating Objectives 2 and 3.

RE is “…a field in the midst of defining itself and its relationship to other fields, and this
includes identifying and defining the phenomena which researchers in the field intend to
investigate” (Mendonca, 2008). Because scant research has been conducted in the field a critical
first step is to establish the first principles of RE as they are currently understood. This task
includes an investigation into the roots of the new field as well as the trajectory of safety and
production thought, theory, and understanding. This objective was accomplished by a
comprehensive literature review.
The literature review provided a comprehensive review of RE. Because RE is concerned with
safety and production, relevant areas of these disciplines were included in a general context. The
main thrust of the literature review emphasizes understanding RE in the context of safety and
production.
The review fulfilled traditional literature review goals (i.e., gaps in the literature reviewed,
consensus and debates revealed, future areas of study, etc.) and also comprehensively discussed
the RE concepts and principles in an effort to synthesize and abstract these concepts and enable
objective 2.

The most important topic to address was the explanation of RE. Objective 1 provided the
foundation for Objectives 2 and 3.

102

3.3 Objective 2

Objective 2: To present a RE conceptual framework for construction safety.
Informed by the work completed in Objective 1 and the author’s 25 years’ experience, Objective
2 addressed the question “How can RE be conceptually applied to reduce disruptions on
construction projects?” The underlying assumption is that fewer disruptions may lead to a safer
jobsite. Developing a conceptual framework for RE is a challenging intellectual undertaking.
When something is described as resilient that description entails an entity or object that is
dynamic and protean. There are many examples of using a conceptual approach in the earlier
stages of ideas, such as RE, the earlier stages to describe an emerging phenomena in the
literature.
The conceptual framework sought to answer the Research Questions of “What elements of
RE may help a construction project avoid, survive, and recover from disruptions?” and “How can
we apply RE, in a formalized way, to construction operations?” This is accomplished by filtering
out the essential elements of RE in the literature and applying them to the construction industry.
The development first approached the challenge of applying RE to the construction industry by
offering strategies to the executive or management level that may implement RE. Then specific
guidance was given, based on the Resilience Analysis Grid (RAG) on how the four abilities
might be implemented in the field.
The framework also attempted to capture the complexity of the construction safety problem.
As argued previously, construction projects are “complex systems.” A complex system is one
which has, within itself, a capacity to respond to its environment in more than one way, and to
select among the options in some way (Miller and Page 2007). Simon (1997) defined a complex
103

system as one made up of a large number of individual parts that have many interactions. A key
concept of complex is that systems emerge to something that is greater than the sum of its parts.
The framework, as well as the conceptual model and simulation, may best be described as a
“map” to incorporate RE in construction environments and as a way to understand and analyze
perturbations and disruptions in production processes. The “map” analogy is appropriate as
described by Shields and Tajalli above.
The elements of the framework consisted of a systematic and methodical process of
addressing the distinct temporal phases in the occupational accident timeline that occurs in a
context of production systems. In designing the production operations, a process of anticipation
of threats is deployed, as well as provisions for avoidance of these threats. In recognition that
anticipation will not be exhaustive given the random nature with which factors combine to create
occupational accidents, here RE focuses on what should be done during an accident to reduce its
impact. The final phase is post-accident and the recovery steps needed to stay on track with
respect to progress towards system objectives (e.g., production). In this dissertation RE was
primarily placed in the production context and the protection of workers during this phase.
The conceptual framework was needed to further knowledge in the area of the construction
safety/production mix. New approaches are needed that still respect and utilize existing
methods. RE may prove to provide such an approach. However, the field is currently evolving
and needs translators with domain specific knowledge to guide practitioners.
3.4 Objective 3
Objective 3: To explore RE implementation in construction production settings using hybrid
computational methods.

104

In an ideal world, the RE conceptual framework developed in Objective 2 would be tested on an
active job site to determine if it is effective for improving construction safety. We live in an
imperfect world and this course of action is not possible. Introducing an untried safety program
to suspecting and unsuspecting workers is, at least, fraught with moral, legal, and economic
hazards. Morally, one cannot knowingly put others in situations where they may be at risk.
Legally, beginning a new safety plan without prior simulation or by other means of “dry runs”
may be viewed as disregarding the principle of due diligence. Finally, economically, if workers
were injured or worse from the new approach it could mean financial ruin for all parties
involved.
Evaluating this new paradigm to elevate safety becomes a “catch-22” for the construction
researcher; new methods to alleviate the chronic problem of safety cannot be tried because they
may present unacceptable hazards to the worker while worker safety continues to stagnate
because new methods are difficult to vet. One way that is available to researches is conceptual
computer modeling and simulation.
Simulation is a product of the model “...that imitates a real or imaginary dynamic system”
(Martinez 1996). Done correctly, this approach does not contain a moral hazard, detours any
legal difficulties, and is economically attractive given that the cost resides in the software and
programming. Epstein (2008) posits that the goal of building models should be to move beyond
implicit mental models, where assumptions are hidden, and to create models that explicitly
present assumptions so that others may recreate them and test the assumptions. Models also are
useful in that they can act as focal points for others to discuss and understand the problem at
hand. Epstein lists 16 ways, in addition to predictive value, that models are useful, these are to:

105

1. Explain
2. Guide data collection
3. Illuminate core dynamics
4. Suggest dynamical analogies
5. Discover new questions
6. Promote a scientific habit of mind
7. Bound (bracket) outcomes to plausible ranges
8. Illuminate core uncertainties.
9. Offer crisis options in near-real time
10. Demonstrate tradeoffs / suggest efficiencies
11. Challenge the robustness of prevailing theory through
perturbations
12. Expose prevailing wisdom as incompatible with available data
13. Train practitioners
14. Discipline the policy dialogue
15. Educate the general public
16. Reveal the apparently simple (complex) to be complex (simple)

No model will do all of these things. Of particular interest to this research is number 11:
“Challenge the robustness of prevailing theory through perturbations.” An accident or incident is
essentially a perturbation, or disruption, the magnitude of which corresponds to the ability of the
system to absorb it. In essence, this is a measure of the resilience of the system. It is
hypothesized that a resilient system will deal with perturbations in a handy way and recover
quickly. The simulation created is subject to several disruptions.
106

Modeling has been described as an act of artful approximation (North and Macal 2007).
Modelers cannot (and should not) produce an exact recreation of the events to be modeled. Only
the salient points of an actual event or situation should be abstracted and detailed, other items
should be approximated, otherwise the model may become muddled and ineffective.
Various modeling techniques have been developed and refined over the years to determine the
appropriate level of detail for models, to define the desired end state of the model so that
progressive refinement may occur, and to develop criteria for determining the overall success
and effectiveness of the modeling project. in general, models can be broadly classified as
deterministic or stochastic (North and Macal 2007). Prominent approaches include discreteevent (DE) simulation, agent-based modeling (ABM), and blended (or hybrid) modeling. Each
will be discussed briefly below as well as the hybrid software product Anylogic.
3.4.1 Discrete Event Modeling

Discrete Event (DE) Modeling, which is sometimes called Process-Centric Modeling, is
commonly used while simulating queuing, manufacturing and similar systems. Here the modeler
models a process as a series of separate events, as opposed to the continuous view, as they unfold
over time and at discrete points, or states in the process. “In discrete-event simulation, it is
assumed that the state of the system changes instantaneously at specific times marked by events”
(Martinez 1996). The occurrence of a discrete event triggers another event or chain of events as
the simulation progresses though time. Discrete Event Modeling, which became popular in the
1960’s, is often used in conjunction with statistical techniques such as the Monte Carlo Method
(North and Macal 2007) and other statistical techniques. Martinez (1996) states, “Most
construction processes can be effectively modeled using discrete event simulation.”

107

3.4.2 Agent Based Modeling

ABM is useful for describing systems that are open, complex, and with distributed control and
resources. Economists, sociologists, anthropologists, political scientists, and others have applied
ABM to specific and general problems (Epstein and Axtell 1997, Watkin et al, 2009). Most
construction projects seem to fall under this umbrella. Agent based simulation reveals the global
behavior of a system as structures and patterns emerging as a result of repetitive and competitive
local interactions between agents and their environment (Axelrod & Tesfatsion, 2005). In
general, ABM is considered a bottom-up modeling approach. Watkins et al (2009) describe
ABM as:
“... a computer simulation technique that allows the examination of how system rules and
patterns emerge from the behaviors of individual agents. ABM creates artificial agents
that represent individuals that have the ability to perceive and interact with each other and
their environment. Based on their interactions, the agents can make autonomous
decisions. The goal of the simulation is to track the interactions of the agents in their
artificial environment and understand processes through which global patterns emerge,
for contingencies. “

Axelrod and Tesfatsion (2005) posit that researchers who use ABM should pursue four main
goals: empirical, normative, heuristic, and methodological. Empirical questions revolve around
the phenomena of large scale regularities in the absence of central control. Normative goals
revolve around the use of the ABM model as a “...laboratory for the discovery of good designs.”
In other words, if an ABM works, or resembles the real world in a cogent way, can we then
introduce other agents or alter the environment to create a “better world”? Heuristics involves
asking, “How can greater insight be attained about the fundamental causal mechanisms in social
systems?” The hope here is to envision causal relations beyond first-order effects. In The fifth
discipline, Peter Senge warns that these secondary effects are oftentimes inadvertent and can
108

slow down the success of the system (Senge, 1999). “A fourth goal is methodological
advancement” or to put it more generally, how will the next generation of researchers benefit,
methodologically, from the models created in the present?
3.4.3 Multi-Scale Modeling

Each of the singular approaches mentioned above has unique benefits. However, as North and
Macal (2007) point out, “…no single modeling approach can be said to be the best approach for
addressing all types of problems. There is no generic modeling technique.” However, they
further note that “..it often is useful to combine one or more modeling approaches, employing
each technique for that part of the model where it makes the most sense to do so, considering the
unique capabilities and recognizing the limitations of each modeling approach.” In other words,
different modeling methods are better suited to different levels of abstraction. North and Macal
call this model blending, others use the term multi- method modeling (Sadsad and McDonnell
2007) and hybrid systems modeling, using hybrid to refer to the union of discrete and continuous
systems (Borshchev and Filippov 2004).
Until recently, no singular platform was available to combine ABM, SD, and DE modeling.
Modelers who combined methods devised their own software or awkwardly combined
proprietary or non-proprietary software packages to combine approaches. Perhaps the reason
that hybrid methods have been underutilized resides in the reality that construction process and
influences occur at differing temporal and spatial scales. For instance, high-level decisions that
are made at executive levels, such as company policy and governmental regulation are resolved
at a slower time scale than project-level decisions. Obversely, field-level decisions mostly occur
on a quicker temporal scale. Spatial scales in both are obviously different. Sadsad (2007)
describes this systems conundrum as follows:
109

“Multi-scale systems modelling and simulation represents a system in terms of
different scales (spatial and temporal) of resolution (Bassingthwaighte, 2006) and
suited to characterising the functions and desirability of organisational forms (BarYam, 2006). This modelling technique has the ability to adaptively switch
between different levels of abstraction during real-time simulation
(Bassingthwaighte, 2006).”

3.4.4 Anylogic Software

The only known software program that can seamlessly integrate SD., ABM, and DE methods is
the AnyLogic development environment (Borshchev and Filippov 2004, Anylogic 2013).
Anylogic uses the Java Eclipse framework, which enables it to be used over a wide range of
operating systems. Anylogic utilizes a language dependent (Java) application programming
interface (API) that allows interoperability with office and corporate software, geographical
information system (GIS) datasets, and custom modules written in Java, depending on the
product option chosen, the cost of the software becomes increasingly steeper as the functionality
increases. AnyLogic 6.4 features an optional built-in optimizer and enables animations that can
be exported as java applets. Anylogic does not require an in-depth understanding of Java
programming and has a user-friendly interface.
For Objective 3 a conceptual hybrid model consisting of agent-based and discrete event
modeling was developed using the Anylogic software. This software was chosen mainly given
that it provides a superior platform to illustrate the RE premise that safety and field operations
management, or production, are inseparable and do not operate independently. Additionally, RE
is characterized as having a sociotechnical perspective, providing another reason to merge DES
and ABM. The DES feature of Anylogic provides a way to graphically illustrate the ‘technical’
that is the workflow and its associated queues and capacities. However, the DES feature does
not easily allow the ‘socio’ portion of the sociotechnical system to be modeled. For this the
110

ABM feature of Anylogic is utilized to mimic the behavior of the agents (or crews) in the
system. Representing the technical (i.e. that is the process) alongside of the socio (i.e.
representing the behavior of the agents). Providing both seeks to gain a richer understanding of
the emergence of the system as well as to illustrate to others how the two methods interact and
complement one another.
Alternatives to illustrate the model and carry-out the simulation considered in addition to the
Anylogic software included programming or coding the scenario in the Python or Java language
as entirely an agent based model and incorporating rudimentary elements of the discrete process,
using a less powerful ABM software such as Netlogo or including the production process as
another agent. Other researchers have built elaborate computer architectures that can combine
the two methods using existing platforms or an existing platform such as DES and augmenting it
with ABS by changing the coding, but this involves extensive intervention to coordinate the two
programs. Anylogics’ drag and drop architecture, along with minimal need for Java coding
allows the researcher to focus on the problem at hand and not computer programming. Finally,
Anylogic was chosen to meet future research needs as systems dynamics may be added to the
simulation at another time. The author knows of no other work in the construction industry that
uses Anylogic software in a hybrid manner to discuss risk assessment.
Additionally A crucial RE idea, the ETTO Principle, will be simulated. The principle of the
efficiency-thoroughness trade-off (ETTO), is a way to describe human and organizational
performance variability. The ETTO principle brings to light the fundamental human condition
that, because resources are limited, people and organizations act in ways that favor efficiency.
The ETTO principle was chosen to better understand how people and production systems
interact under the influence of disturbances. In particular, as mentioned in the literature review,

111

how performance variability is affected in construction processes is affected by the adjustments
that people must make to accomplish their work, and how these adjustments can lead to adverse
outcomes.
The “efficiency” will be simulated by agent behavior in the production scenario that maximize
production in each trade (or locally optimizes) without regard to other parts of the system. The
“thoroughness” is simulated by agents that lookahead to ensure that the trade downstream is not
overwhelmed by work and adjusts their production speed accordingly. To accomplish this, the
agents monitor the queues surrounding their trade and adjust their work speed accordingly. The
production scenario is stochastic and is populated by the agents and subject to disruptions. The
disruptions are internal and external. The internal disruptions will be stochastic and the external
disruptions will be randomly set by the modeler.
The conceptual approach as championed by Robinson (2007b) was used to first develop a
conceptual model. This was followed by the actual coding of the model. Details of this
approach are in Chapter 5.
Four simulation experiments were created subject to increasing production pressure that
measured the throughput of each simulation. First, a production scenario consisting of five
trades was created without disruptions or agents. In effect, this is akin to a critical path method
schedule where the project is under ideal conditions. The production line is subject to increasing
production pressure induced by the modeler. The second simulation introduces internal and
external disruptions to the system and measures throughput. The third simulation introduced
agents that acted in a thorough manner, as described above, to the second experiment. Finally,
Experiment 4 recreated Experiment 3 but here the agents only sought to maximize their
production –they are “efficient” in ETTO terms.

112

Chapter 4: Resilience Engineering Conceptual Framework for Construction Safety
4.1 Purpose and Features of the Framework
This Chapter addresses Objective Two of this work which is “To develop a Resilience
Engineering conceptual framework for construction safety.” This Objective is in the shadow of
the Goal of this work, which is “To explore schemes and methods to understand, harness, and
foresee disturbances that arise from demands placed on the construction operations of projectbased organizations that deliver the built environment.” The conceptual framework seeks to
answer the Research Questions of “What elements of RE may help a construction project avoid,
survive, and recover from disruptions?” and “How can we apply RE, in a formalized way, to
construction operations?” In short, RE is seen as a way to better manage disruptions that occur
in a construction project setting.
4.1.2 Linking Disruptions, Resilience Engineering, Safety, and Production

As a clarification, the relationship among disruptions, RE, safety, and production is outlined and
briefly explained here. The literature review contains detailed explanations of each area
described. The argument for linking these areas is outlined below and then briefly described and
defended:
•

Disruptions can/may cause accidents and affect production

•

By definition and design, RE is proposed as a formalized approach to understand
disruptions

•

Two of the four premises of RE deal directly with the relationship between
production and safety

113

4.1.2.1 Disruptions can/may cause accidents

As stated in the literature review, the literature on disturbances in construction operations as it
affects safety is non-existent. Some work has been done on specific disturbances as change
orders (Ibbs et al 2007) but, as mentioned previously, these are concerned mostly with post ex
facto financial and legal claims on productivity alone. For the most part, change orders would be
considered to fall within the base mechanism/model for being adaptive but should be considered
on an individual basis for impact on safety and production.
In other fields, for instance aviation (Madini and Jackson 2009, Jackson 2010) and
manufacturing (Barasso and Wilson 1999, Toulouse 2002), researchers agree that there is a
direct link between disruptions that impact production systems and safety. Jackson (2010)
boldly asserts that “Accidents are the result of disruptions….” Several researchers (Barasso and
Wilson 1999) summarize the categories for “Consequences of Disturbances” as: (having) no
effect, presence of risk factors, hazardous situation, minor and serious accident, catastrophe,
fatal, nonfatal lost time, and nonfatal non-lost time.
Although no research exists that directly ties disruptions to accidents in the area of the built
environment, it is not a large leap in logic, given the research done in other fields and the
opinions of safety experts, to suggest that various disruptions may influence safety on the
construction site. Mitropoulos et al (2005) offer a view that also proffers that the link between
production and safety should be strengthened. They proposed strategies to deal with exposures
and errors that are inherent in the production process. In their view construction is a system and
safety should focus on the work factors of production and how these interact with other causal
factors that trigger and ultimately release hazards on the construction site. Theirs is a decidedly
prescriptive approach based on actual job conditions and worker behaviors. They recommend
114

that practitioners reduce task unpredictability and that improve error management capabilities to
stay within safe operating boundaries.
4.1.2.2 RE is proposed as a Formalized Approach to Understand Disturbances

Accepting that disruptions may be either the direct cause of accidents and/or set the stage for
hazardous conditions in construction, then a formalized method to deal with these disruptions’ is
needed. This work proposed that the emerging field of RE as a worthy candidate to do so in the
complex world of construction.
The first clue that RE may be a guide to understanding disruptions is found in the current
working definition of RE:
“RE is the intrinsic ability of a system to adjust its functioning prior to, during, or
following changes and disturbances, so that it can sustain required operations under both
expected and unexpected conditions” (Hollnagel et al. 2011).
Furthermore, each of the four basic abilities of the RE paradigm (responding, monitoring,
anticipating, and learning) are concerned with dealing with disturbances. Finally, the basic
premises of RE are also concerned with how to think about disturbances. For instance, a
disturbance can be thought of as an ‘underspecification of performance conditions’ (Premise
One). Also, multiple disturbances may lead to ‘unexpected combinations of performance
variability’ (Premise two).
4.1.2.3 Two of the four premises of RE deal directly with the relationship between production
and safety

Premise three includes the statement that safety management must be proactive as well as
reactive. This means that the production process should be proactively changed or modified, or,
in other words, sufficiently flexible to adjust to avoid disruptions that might endanger safety.

115

Premise four, that “Safety and field operations management are inseparable and do not
operate independently” means just what it states. Decisions about the production process should
not be made without concern about safety, and vice versa. Traditionally, the emphasis in
industry has been on meeting production goals with safety as a secondary, or after the fact
concern. RE seeks to put safety and field operations on equal footing. The goal is reliability and
resilience of operations to meet production goals.
The Framework is presented in the remainder of this chapter. The unique perspectives or
mindset that an organization would have to take to adopt are first presented. These consist of the
need to consider and strive to make RE a quality of the system, making RE a strategy for the
firm, creating a culture that is amendable to RE, and the need to view systems functionally.
Then, the four main abilities of RE are discussed in terms of the probing questions from the
RAG. Guidance in meeting the questions and implementing RE is given based on the probing
questions. Finally, a graphical model of RE is developed using Yilmazs’ Sociocognitive
Framework for Engineering Work Systems.
4.2 The Elements of a Project-Based RE Construction Project
4.2.1 Perspectives
Practically any new outlook, or paradigm, involves a perspective shift on the part of the potential
adopters and eventual users of the new approach. These perspective shifts lay the foundation for
thinking about how the new initiatives fit into or replace current operations. Additionally, they
challenge the stakeholder’s mental model(s) of how the system traditionally operates. For
instance, those pursuing a sustainability agenda agree that design is best performed in an
integrative manner rather than separately. Lean Construction proponents favor a collaborative

116

approach among the trades for onsite planning and execution. In general, new approaches,
including RE, require that stakeholders revise their mental models about how they plan and
organize work.
Some of the different perspective involved in applying RE include thinking about resilience
as a quality of the system, approaching RE with intent (i.e. a strategy), pursuing a certain cultural
outlook, and describing systems as a collection of functions rather than structurally. Other
perspective shifts involved with RE are contained in the four premises of RE, such as embracing
performance variability in the normal course of work and understanding failure as a temporary
inability to deal with complexity, as explained in the literature review.
4.2.1.1 Resilience is a Quality of the System

In a perfect world RE would not be needed. The need for resilience implies some threat to the
system operation, namely, in the form of some degree of disruption or perturbation. It is posited
that resilience is a desired quality, or an inherent feature, of the system that is useful to combat
disruptions and that it can be engineered into construction project operations.
Understanding that RE is a desired “quality” of the system is perhaps the starting point to
discuss the mindset needed to implement resilience into construction operations. This means
that the goal of a resilient system to instinctively act in a resilient manner when an untoward
event occurs. However, in construction some planners and managers strive for the desired
quality of only being reliable, that is, accurately estimating the outcome, for instance within a
certain time frame or budget, when certain activities are performed on a project. This is evident
by the preparation of construction schedules, estimates, and other planning efforts. In fact,
certain approaches to construction production, notably Lean Construction, primarily strive for
reliability. Having reliable operations is thus a desirable quality. It makes the engineering of
117

resilience less difficult given that it reduces function variation. Reliability is an elusive quality
on construction projects. For instance, the Percent Plan Complete (PPC) is a measure of
reliability in Lean Construction projects. A PPC of eighty percent or higher is considered
excellent but does not occur except in firms that are well-versed in the Lean approach. It seems
then, in practice, that reliability is not enough to deal with the inevitable disruptions that the
project will encounter. A project must also be resilient to sufficiently handle disruptive events.
At this point the reader may pause and ask “How does a firm or project develop the quality of
being resilient?” That is, how does a person or organization make RE an integral part of
everyday operations? Even more abstractly, how does one go about making the fundamental
changes to develop any desired quality? The abstract answer to this question is beyond the scope
of this framework as the task here is to present the idea of RE and identify the elements that
seem useful to implement RE in a firm. However, many texts exist in popular literature that
deals with organizational change that may guide the reader. The steps to developing
organizational and project qualities may mirror those that one uses to develop personal qualities
and include things such as discipline, practice, and commitment to achieving that quality.
4.2.1.2 Resilience is a Strategy
Much like the decision to pursue a green building or a Lean Construction approach to building,
the choice to implement RE in construction operations is intentional and may even be termed a
strategic move. It will require commitment by top executives and others who control and
allocate project resources. There is an investment of time (e.g., more involved planning) and
money (e.g., additional resources) needed to engineer resilience just as there is for engineering
sustainability and reliability. As mentioned in the Literature Review, Madni and Jackson (2009)
take the view that resilience comes at a cost and that resilience should be only infused in “crucial
118

leverage points” in the corporate structure. There is nothing in the literature that precludes the
techniques of RE from being vetted in a pilot project or even a sub-project. This may be one
way of capturing the costs associated with a program to implement RE on a larger scale.
Identifying the ‘crucial leverage points’ of a project might be accomplished by an
introspective analysis of the firm’s ability to deal with disruptions. One way to do this is via the
RAG. After creating a baseline, Hollnagel et al. (2011) recommends using the RAG as a
management tool for ongoing improvement of resilience.
4.2.1.3 Culture for Resilience Engineering

The existing or potential climate and culture of a construction project are important aspects to
consider in implementing RE. However, climate and culture are elusive concepts and are not
easily open to formulaic applications to any industry. Additionally, there is no definitive RE
climate or culture perspective. In depth coverage of these concepts are beyond the scope of this
work. However, given that learning is one of the four basic abilities of RE organization, it may
be worthwhile to the firm considering a RE approach to risk management to explore a cultural
approach that promotes learning.
Dekker (2007) builds on the notion of Reason (2008) in suggesting that a ‘Just Culture’ that
can “balance learning from incidents with accountability for their consequences” as described in
the Literature Review. Here, incidents are defined as a disruption in the work flow of a lesser
degree than an accident. A Just Culture also promotes the RE position that both ‘root-cause’
thinking and ‘human error’ are an obsolete notions in complex sociotechnical systems.
Dekker (2007) suggests some ways that an organization might foster a Just Culture with
respect to safety. First, determining just what constitutes an incident and who, with respect to

119

expertise, gets involved in the aftermath (i.e., front-line supervisor or other personnel).
Additionally, Dekker suggests normalizing and legitimizing incidents by:
o

Viewing them as opportunities to learn

o

Abolishing financial and professional (e.g. license suspension) penalties for those
that are involved in the incident(s)

o Monitoring and attempting to prevent stigmatization of those involved in an
incident
o Implementing, or reviewing the effectiveness of, any debriefing programs or
critical incident/stress management programs the organization may have in place
to ensure that workers can view the incidents as ‘normal’ operational events
o Creating a staff position that deals with incidents apart from the front-line
supervisor. Also, eliminate any punitive actions that might impact the
performance review of the worker.
o Beginning the Just Culture perspective as part of the indoctrination of employees
so that they understand that reporting incidents is part of the organizations
learning
o Informing workers of their rights and duties in the event of an accident as well as
the organizations standard operating procedures with respect to accident
investigation. This eases the anxiety and tension of the worker.
Additionally, if reliability is considered to work in conjunction with resilience to maintain
systems control, and is a desired quality as stated above, then the culture of a typical HRO might
well be part of the RE culture. As described in the Literature Review, a “culture of reliability”
that distributes and instills the values of care and caution, respect for procedures, attentiveness,

120

and individual responsibility for the promotion of safety among members throughout the
organization” (Boin and Schulman 2008) might be adopted by the resilience seeking
organization.
Borys et al (2009) sums up the RE outlook on by saying:
“The adaptive age challenges the view of an organisational safety culture and instead recognises
the existence of socially constructed sub-cultures. The adaptive age embraces adaptive cultures
and resilience engineering and requires a change in perspective from human variability as a
liability and in need of control, to human variability as an asset and important for safety. In the
adaptive age learning from successful performance variability is as important as learning from
failure.”
This view of culture seems particularly applicable to construction given the polyglot of trades
and associated sub-cultures.
4.2.1.4 View Systems Functionally

Viewing systems as a set of coupled and mutually dependent functions, or the functional view,
may be, conceptually, the most difficult concept for the adopter of the RE approach to grasp and
understand. However, it is an essential way of thinking in a RE perspective.
The notion put forth here is that by thinking in functional terms the analysts are given the
opportunity to speak about and discuss production and safety in the same conversation. This
follows from the basic premise of RE that production and safety are inseparable topics. Thinking
in a functional manner necessitate that the broad nature of the work being undertaken is
understand as it occurs at the workface. It also requires that the FRAM premises as described in
the Literature Review are adopted. The FRAM methods suggest that those most intimate with
the work discuss and map how the functions are related. Then they can discuss how the
variability of the functions and how they may become out of control.

121

A detailed discussion of the FRAM is beyond the scope of this work and the reader is directed
to the Literature Review for a discussion of FRAM, for the purposes of this work a general
knowledge of FRAM is and the importance of the functional approach is sufficient.
The structural perspective is well ingrained into the way construction operations are planned
and managed. Speaking of the structural approach Hollnagel et al. (2011) states:
”Systems are usually defined with reference to their structure, that is, in terms of their parts and
how they are connected or put together. Common definitions emphasize both that the system is a
whole, and that it is composed of independent parts or objects that are interrelated in one way or
another. Definitions of this type make it natural to rely on the principle of decomposition to
understand how a system functions, and to explain the overall functioning in terms of the
functioning of the components or parts – keeping in mind, of course, that the whole is larger than
the sum of the parts.”
In the RE approach the functional perspective is the preferred approach, as Hollnagel et al.
(2011) posits about the use of the functional perspective:
“It is, however, entirely possible to define a system in a different way, namely in term of how it
functions rather in terms of what the components are and how they are put together. From this
perspective, a system is a set of coupled or mutually dependent functions. This means that the
characteristic performance of the system - of the set of functions - cannot be understood unless it
includes a description of all the functions, that is, the set as a whole. This delimitation of the
system is thus not based on its structure or on relations among components (the system
architecture). An organization, for instance, should not be characterized by what it is but by
what it does. Neither should it be characterized by the people who are in a given place (on the
organizational chart or in reality) but by the functions they perform."
In construction, the traditional model of how work is accomplished is conveyed via the CPM
schedule (n.b considered structural given the linear term “path”). Apart from a general safety
plan prepared prior to the commencement of project work, safety is traditionally discussed as
CPM activities become imminent in the field. Then an ‘Activity Hazardous Analysis’ (AHA) (or
a similar sounding named document) is prepared as a risk assessment tool that focuses on the
work at hand. Safety is rarely examined in the context of the whole project, or in a functional
perspective. The CPM schedule reflects ‘work-as-imagined’ from those distal to the work face

122

as opposed to “work-as-performed” in the field and ignores the complexity of the modern
construction project. It is of little use to disruption and perturbation analysis given its narrow
focus. It may be useful in that it identifies key activities and could serve as a springboard for
identifying functional areas.
The thoughtful reader will note that, perhaps, not all phases or parts of the construction
project are complex. For instance, and depending on the project, the beginning and near
completion stages of a construction project are devoid of the complex interactions that occur in
other stages of the project. Likewise, certain projects may simply not be as complex as others.
RE approaches are still well suited for these endeavors, albeit on a smaller scale.
The functional approach to systems analysis suits the fourth premise of RE which is to
consider that production and safety are inseparable. When used in the context of the FRAM,
functional thinking can be used as a tool to both assess risk and to control production. Thinking
from a functional perspective, instead of strictly thinking in structural terms, enriches the
understanding of disruptions. It requires thinking in terms of not only how the individual
functions under consideration may vary but also how other functions may vary because of the
instability of the chosen function. Thus, by necessity, the analyst must have an understanding of
the entire system and how work is actually accomplished at the ‘sharp-end.’ This makes the
distinction between a system and its environment, or the system boundaries less important
(Hollnagel et al. 2011). It is a more flexible approach than strictly relying on CPM.
Understanding how work is accomplished at the workface by implementing functional
thinking and the FRAM will require the planners to ask mutually important and critical questions
such as “How tightly are the functions coupled?” and “What are our ETTO’s?” in addition to
considering the aspects of the function.

123

Hollnagel et al. (2011) summarizes the differences between the two approaches and in the
following quote. In doing so he alludes to the notion that a system is reliable, that its variability
of functioning is acceptable as long as it is in control, while at the same time acknowledging that
this is not always the case, and that a system also needs to be resilient.
"The differences in perspective become clear when a system is defined in terms of how it
functions rather than in terms of its architecture and components. In this case the question is
whether the functioning achieves its purposes. But this cannot be simplified to a question of
whether the system is in a 'normal' state or a 'failed' state. It is instead a question of the
variability of functioning and whether the outcome is acceptable under the existing
conditions. But as soon as we say variability, we also acknowledged that any 'failure' will be
temporary, hence reversible. We should consequentially try to understand how likely the
variability of multiple functions may interact to produce an unintended - and in most cases
unwanted - outcome."
The use of the FRAM requires managers to ‘get their hands dirty’ and work collaboratively
with the trades in production planning while simultaneously considering safety. A function that
is in control is both reliable and safe by definition.
4.2.1.5 The Four Abilities

To be considered resilient, a project must have the four main abilities of RE as discussed
previously. The mix of the four varies and depends on each individual firm’s unique make-up.
They can be assessed and managed using the RAG’s probing questions adapted to construction.
The following sections look at the basic elements of the abilities of response, anticipating,
monitoring, and learning and some techniques of how they might be applied to a construction
project. The abilities work synergistically and have overlap.
4.2.1.5.1 Responding

In RE parlance, responding is dealing with the ‘actual’ disrupting situation at where it affects the
core business process (Paries 2010). In construction work this is at the workface.
124

Paries (2010) presents the anatomy of responding as consisting of assessing the situation and
asking if it is in or out of the realm of variability control. If the disruption is potentially out of
control, what adaptation, if any, is required to meet the situation? Knowing what to respond to
and when to respond, as well as what defenses and resources are available to meet the disruption,
are critical to an effective response. Responses should be both proactive and reactive. Proactive
responses anticipate the disruption and have pre-defined responses. Reactive: generate responses
create, invent, or derive ad hoc solutions.
The ‘probing questions’ that Hollangel (2010) has developed as part of the RAG offers an
opportunity to ‘reverse engineer’ the goals of RE. The probing questions for the ‘Ability to
Respond’ are shown in Table 4.1 with ‘guidance’ suggestions to aid the practitioner.
Construction Project Guidance the Analysis item - Ability to Respond
Event list

Is there a list of events for which the system has prepared responses? Do the
events on the list make sense and is the list complete?
Guidance: An event list a proactive approach to responding. Many events are
anticipated by OSHA 29 CFR 1926 (e.g. Hazard communication ). Other
examples of an event(s) include high-rise rescue of workers, workers trapped in
excavations, and an auto accident in a highway work zone that may go beyond
the guidance provided by compliance with OSHA regulations, such as multiple
events. ‘Making sense’ is based in the context of the project and the experience

Table 4.1: Construction project guidance analysis item - Response

125

Table 4.1 (cont’d)
of the analyst(s). A completed list may be a utopian state, given that disruptive
events may combine in unique ways. The analyst might consider the
simultaneous occurrence of two or more events, for instance, a worker dangling
from a high-rise structure and a trench cave-in are not unthinkable in the event
of an earthquake.
Selection of the events should be dynamic as the situation of a construction
site can vary dramatically as work evolves.
Background

Is there a clear basis for selecting the events? Is the list based on tradition,
regulatory requirements, design basis, experience, expertise, risk assessment,
industry standard, etc.?
Guidance: Most likely the event list is based on a mix of all of the above, with
‘meeting minimum regulatory requirements’ taking precedence. What may be
more important here is the process of selecting the events. In the spirit of the
functional outlook they should be based on ‘work-as-performed’ rather than
‘work-as-imagined.’ This implies that front-line supervisors be involved in
event selection along with higher-level resource allocators.

Relevance

Is the list kept up-to-date? Are there rules/guidelines for when it should be
revised (e.g. regularly or when necessary?) On which basis is it revised (e.g.
event statistics, accidents)?
Guidance: An initial event list might be generated at the beginning of the

126

Table 4.1 (cont’d)
project and updated periodically as work progresses. For instance, it could be
reviewed and updated at pre-work meetings in lieu of or in addition to an AHA.
As suggested it could be triggered by an accident. However, it may be more
useful if triggered by an incident review prior to an accident.
Threshold

Are there clear criteria for activating a response? Do the criteria refer to a
threshold value or a rate of change? Are the criteria absolute or do they depend
on internal/external factors? Is there a trade-off between safety and
productivity?
Guidance: This is closely allied with the monitoring ability and the use of
passive and active indicators of performance. In most cases the traditional
criteria for responding in construction is the event of an accident or injury.
Resilience aims to avoid this situation. Some typical threshold values, or
indicators might be excessive hours worked or a high turnover of employees.
Both external and internal factors will most likely be involved as working
conditions change (i.e. weather) and depending on the skill of management in
dealing with site workers. There are no studies related to Safety/Production
trade-offs, however, events such as schedule slippage may trigger increased
monitoring or workers and crews as they try to ‘make-up’ schedule deficits.

Response
list

How is it determined that the responses are adequate for the situations they refer
to? (Empirically, or based on analyses or models?) Is it clear how the responses

127

Table 4.1 (cont’d)
have been chosen?
Guidance: Conceivably, all three approaches are possible in a construction
context. Empirically, small pilot programs could help to assess the
effectiveness of responses. Analyses are part and parcel or the FRAM and
should be done in a group setting. Finally, simulation models, such as the
demonstration included in this work, could be used as could BIM to assess
response scenarios.
Resource allocators from upper-level management should be involved in this
activity as they can speak to the availability of needed financial resources.
Upper-level management is important in this aspect as well as speed, duration,
and resources as discussed below.
The engagement of stakeholders outside of the immediate project planning
team should be engaged for the response list. Possible groups include fire and
rescue departments, vendors (e.g. high-rise crane service providers), and
Hazardous Material response teams.
Speed

How soon can an effective response begin? How fast can full response
capability be established?
Guidance: This aspect underscores the importance of updating the event list
and resilience planning. Resources, such as excavators and cranes, and

128

Table 4.1 (cont’d)
personnel, are added and deleted on a construction site daily. The analyst
should also consider resources such as nearby construction projects that have
resources that could be utilized in an emergency condition. Importing resources
can add time to the response.
Duration

For how long can an effective response be sustained? How quickly can
resources be replenished? What is the 'refractory' period?
Guidance: This should be part of the above discussions. 'Refractory' period
refers to the time needed for recovery.

Resources

Are there adequate resources available to respond (people, materials,
competence, expertise, time, etc.)? How many are kept exclusively for the
prepared responses?
Guidance: Resource allocation has been discussed above. Additionally, the
analyst needs to consider resources such as psychiatric counseling after a
traumatic event as well as debriefing time after an unwanted or disastrous event.

Stop rule

Is there a clear criterion for returning to a 'normal' state?
Guidance: Essentially, this is the signal or directive to return to ‘business’ after
a disrupting event.

Verification

Is the readiness to respond maintained? How and when is the readiness to
respond verified?

129

Table 4.1 (cont’d)
Guidance: This aspect speaks to the discipline involved in keeping items such
as the event list and available resource list current in a dynamic environment.
In a truly resilient organization this is carried out by any associate or
stakeholder under the principle of ‘cross-checking’ without regard to authority
or seniority level.

4.2.1.5.2 Anticipating

The guidance to construction analysts in this section consists of some strategies of how to
anticipate potential disruptions via Wood’s ‘patterns’, time-frames of anticipation, and briefly –
the who and what of which disruptions to anticipate. Additionally, the ‘probing questions’ of
anticipation are reviewed and commentary is offered as it relates to construction are discussed.
4.2.1.5.2.1 Patterns in Anticipation

Woods et al. (2010) patterns as discussed in the Literature Review offer a formulaic way to
anticipate disruptions. A few of the patterns particularly relevant are discussed in the context of
construction as follows with the patterns underlined:
Resilient systems are able to recognize that adaptive capacity is falling or inadequate to the
contingencies and squeezes or bottlenecks ahead: Here Woods is speaking in production terms
(e.g., bottlenecks) about resilience and safety. This reiterates the suitability of the FRAM as a

130

vehicle to discuss safety and production in the same conversation. Additionally, the FRAM can
aid the analysts in examining how disruptions can combine in unanticipated ways. Adaptive
capacity, in the context of a construction project, refers to the capacity to handle disruptions. For
instance, to make-up schedule time lost to disruption additional shifts may be added. An initial
production rise may be expected and observed; however, subsequent drops in production due to
fatigue may signal a loss in adaptive capacity.
Resilient systems are able to recognize the threat of exhausting buffers or reserves. Again,
Woods is speaking in production terms about resilience and safety. In construction production
planning it is deemed prudent to keep a backlog of work to turn to in the event of a disruption or
other unplanned event. The link to safety may be when backlogs are exhausted and work is
‘pushed.’ This might result in overcrowded work areas (creating unsafe conditions) and even
further production loss. Thinking in terms of keeping buffers and reserves is not limited to the
workface. Organizationally, a construction company may want to limit taking on additional
projects to be able to respond to or avoid disruptions.
Resilient systems are able to recognize when to shift priorities across goal-trade-offs. The
example of the ETTO is most prominent in RE literature. In construction evidence or indicators
of this may consist of increased incidents of safety regulations on site. For instance, repeated
instances of neglecting to ‘tie-off’ or, in the instance of adding additional shifts above,
consistently working more than ten hour shifts or other excessive overtime, especially in harsh
(e.g. cold weather) conditions.
Resilient systems are able to make perspective shifts and contrast diverse perspectives that go
beyond their nominal system position. This pattern is closely tied to monitoring. Woods (2006)
terms these sub-ability ‘cross-scale’ interactions and characterizes it as ‘upward’ and

131

‘downward’ resilience. Simply put, decisions at one level of the organization, strategic or
operational, may affect behavior at the other. The goal is to be sensitive to both ends and make
adjustments based on ‘work-as-performed’ rather than ‘work-as-imagined.’ Downward
resilience refers to strategists’ directives to the front-line with regard to conveying clear goal
structures, communicating intent behind goals, and allocating adequate technology to reach the
goals (Tjorhom and Aase 2011). Failure to do so may result in sacrifice decisions at the
workface. Upward resilience refers to front-line workers using their experience and flexibility to
deal with situations not specified in the downward directives. In the context of the work
situation it may be necessary to make a sacrifice decision or otherwise deviate from ‘downward’
directives. Repeated deviations may result in changes to the operating directives or resource
allocation. The key is an open channel of communication between the two and a willingness to
learn from one another. Also implicit in the pattern is that the executive level needs to monitor
the workface for changes to directives and the need for adequate resources. Also, the ability of
learning from each end is implicit.
4.2.1.5.2.2 When to Anticipation

When to begin to start thinking about disruptions in the life-span of a project is a fundamental
question. Sheard and Mostashari (2008) gather input from several sources to suggest the time
horizon shown in Figure 4.1. They note that Westrums’ (2006) typology is most commonly
cited and suggests that anticipation include response preparation before an event, survival during
the event and recovery after an event. It should be noted that the literature points that
anticipation includes training to handle unforeseen situations or ‘being prepared to be
unprepared’ and handling disruptions in an ad hoc basis.

132

However, it seems that anticipation of disruptions in construction could benefit from the
expanded time scale that Sheard and Mostashari present below.

Figure 4.1: Time Periods (Sheard and Mostashari 2008)
Long term prevention could begin in the planning and early design stages. For instance,
designing steel structures with lanyard attachments points is an example of foresight. Short term
avoidance is the period preceding work and could be discussed as part of the FRAM or at preconstruction meetings. Intermediate term coping means anticipating responses to disruption
events when they occur. For instance, what is our response to an excavation failure in real-time?
To the greatest extent possible, maintaining project goals, or system functionality, should be the
aim here. Coping with the ongoing trouble refers to dealing with the aftermath of an event. For
instance, after a serious mishap involving fatalities workers may need grief counseling to regain
confidence in the project safety controls. Finally, long-term recovery speaks to the need to
133

return to full functionality after an untoward event. In some instances this may not be possible
and reorganization is required. This might involve the replacement of a violating or willful
contractor or redesign of a project.
4.2.1.5.2.3 General comments on Anticipation
The ‘probing question’ related to anticipation of the RAG present an opportunity to offer
guidance and general discussion related to RE and anticipation. This information is shown in
Table 4.2 below.
Construction Project Guidance Analysis item - Ability to
Anticipate
Expertise

Is there expertise available to look into the future? Is it in-house or
outsourced?
Guidance: There are good arguments for hiring an outside
organization (i.e. a consultant) to look for threats (i.e. disruptions) and
opportunities (i.e. ways to benefit from and thrive from disruptions) to
the project. A fresh outlook can ‘see’ things that insiders miss or are
accustomed to. However, no one knows the business as well as
insiders. Growing and nurturing an awareness within the project of
what might go right or wrong should be a strategic goal of upper
management.

Frequency

How often are future threats and opportunities assessed? Are

Table 4.2: Construction project guidance analysis item - Anticipate
134

Table 4.2 (cont’d)
assessments (and re-assessments) regular or irregular?
Guidance: This may depend on many factors such as the availability
of resources and the complexity and time-frame of the project.
Changing environmental conditions (e.g. Financial markets, political
factors) may necessitate more frequent updates
Communication

How well are the expectations about future events communicated or
shared within the organization?
Guidance: Weekly on-site safety meetings are one way to disseminate
news of possible future events along with ways to respond.

Assumptions about

Does the organization have a recognizable 'model of the future'? Is this

the future (model of

model clearly formulated? Are the models or assumptions about the

future)

future explicit or implicit? Is the model articulated or a 'folk' model
(e.g., general common sense)?
Guidance: The ‘model of the future’ is first a strategic activity. The
goal should be to develop a clear, coherent, and easily understandable
direction for resilience improvements and sustainability. Then is
should be communicated to all levels of the project/firm.

Time horizon

How far does the organization look ahead? Is there a common time
horizon for different parts of the organization
(e.g. for business and safety)? Does the time horizon match the nature
of the core business process?

135

Table 4.2 (cont’d)
Guidance: Production concerns on a construction project are typically
discussed in a six-week lookahead time frame. For field matters this
may be a good starting point. Other things, such as less immanent
threats, may need a longer time horizon by upper management. The
use of the FRAM, where safety and production are discussed
simultaneously, is recommended to synchronize time horizons.
Acceptability of risks Is there an explicit recognition of risks as acceptable and
unacceptable? Is the basis for this distinction clearly expressed?
Guidance: In construction OSHA safety regulations are the guiding
factors for acceptable risk behavior. RE aims to include that outlook
and ask that such things as ETTO’s are included in the analysis.
Aetiology

What is the assumed nature of future threats? (What are they and how
do they develop?) What is the assumed nature of future opportunities?
(What are they and how do they develop?)
Guidance: Aetiology means studying the cause of things. This
activity ties in closely with the learning ability and may be useful in a
retrospective analysis as how to analyze anticipations and patterns of
failure and success.

Culture

To which extent is risk awareness part of the organizational culture?
Guidance: Please refer to the discussion of the ‘Fair and Just Culture.’

136

4.2.1.5.3 Monitoring

In RE monitoring refers to where the project is operating at in terms of production and safety.
Sometimes this is referred to as ‘drift,’ as in drifting close to or out of safe operating ranges or
boundaries. This was mentioned in Chapter 1 in the discussion of ‘plateaus.’ Woods and
Wreathall (2008) characterized the boundaries of the operating ranges as a stress-strain curve.
Whatever metaphor is used the emphasis is on realistically determining is the project is operating
within ‘safe’ production and safety ranges.
The ideas of ‘mindfullness’ and sensemaking,’ borrowed from HRO theory, are useful in
monitoring. By being aware of the situation and seeing the ‘big picture’ (sensemaking) the
monitor(s) are more likely to recognize when the project is going off-plan and make corrections.
Lay (2011), writing in area of turbine maintenance work, offers some strategies for
monitoring conditions in the field. First, the project stakeholders need to be aware of risk
profile changes. This is termed ‘Pinging.’ Some of the indicators for profile changes that are
construction related include:
•

Multiple issues taking the crews attention

•

Progress stalling, schedule impacts, multiple delays

•

Mood of project leadership changes

•

Multiple quality and safety incidents, even if minor; an increase in errors

•

Common tasks not performed or performed late (such as getting permits)

137

•

Special situations with the potential to change worker’s moods (such as working over
holidays) or risk level (severe weather) on site

•

Decline in communication

•

High fatigue

•

Site housekeeping is poor

Lay notes that other practices in monitoring include training the entire organization in RE
techniques, pinging, likely error situations, and error likely ‘climates’ on site. Some error likely
climates she recognizes to monitor that apply to construction work are:
•

Intimidating field leadership style

•

Poor communications with field leadership

•

Field leaders who were distal from the work (e.g. stay in trailer while concrete is placed)

•

Shift competitions and crews not working as a team

•

Customer directing or overly involved in field service scope of work

•

Leaders not familiar with current practices and cultures (contract employees)

•

Leaders who weren’t open to help

Another dimension of monitoring includes being aware of the trade-offs being made,
especially the production – safety tradeoff mentioned previously.
The ‘probing questions’ related to monitoring are heavily invested in identifying and
analyzing lagging, current and leading indicators for the ability to monitor. Unfortunately, the

138

construction industry currently relies heavily only on lagging indicators. Common industry
indicators have been defined by regulators, they include: fatalities, accidents, and DFW – all
lagging. It is beyond the scope of this work to develop current and leading indicators for
construction.
4.2.1.5.4 Learning

Hollnagel (2011) provides guidance in the RAG probing questions for organizational learning
that may apply to the construction industry. However, the list must be ‘reversed engineered’ to
obtain a checklist that may aid learning. Looking at the ‘probing questions’ for learning the
following items are abstracted:
•

The organization should have a systematically clear principle for which events are
investigated and those that are not. This concerns which disruptions to learn from and
which to ignore.

•

Learn from success as well as failure. In other words, study ‘normal’ work and
performance variability as well as failures. The underlying assumption is that accidents
are rare events that do not provide many opportunities for learning while the study of
what ‘goes right’ provides many opportunities. This implies ‘continuous’ learning and
not just when an unwanted event occurs.
o Learning from actual work practices and conditions dovetails with ‘upward’ and
‘downward’ resilience as discussed previously. Both ends of the organizational

139

spectrum must attune themselves to how work is actually being performed in the
field so that effective responses may be crafted.
•

Provide formal support (i.e. Resource, personnel, time) for organizational learning
activities such as support for data collection and analysis.

•

Communicate what has been learned in a timely manner to all affected stakeholders.

•

Develop a means to verify (or confirm) that the intended learning has taken place.
Ensure that the learning is sustained as long as it is still relevant.

GameDay exercises are another way for individuals and organizations to learn how to better
adapt and respond effectively in the face of disruptions. These have been used by organizations
that have large-scale Web operations, such as Amazon.com. A GameDay simulation purposely
exposes critical systems to disruptions to uncover flaws and dependencies. Participants are
alerted that the system will be stressed but not told, the nature of the disturbance(s). It could be a
major power failure, data corruption, a fire in a data center, or a combination of unwanted events.
To graphically encapsulate the ideas presented in Section 4-2 Yilmaz’s Sociocognitive
Framework for computer software development was adapted to the RE case. The justification
for the use of this model is presented below as is the Sociocognitive Framework for Engineering
Work Systems. Then the graphic representing the RE at the workface is presented and
explained.

140

4.3 Developing a Graphical Model for RE in Construction: The Software Development Process
and The Sociocognitive Framework for Engineering Work Systems
4.3.1 The Software Development Process
The software development process and the processes used to construction the built environment
have much in common. Software development is “a knowledge acquisition activity (Armour
2003) that involves the transformation of the user needs into a software product that realizes the
requirements elicited from these needs (Yilmaz 2007). Yilmaz (2007) states that software
processes include a set of policies, procedures, and technologies within an organizational
structure to produce and maintain software products. “The process involves knowledge
acquisition activity phases, during which teams of engineers collaborate and coordinate within
the constraints imposed by the management, as well as organizational norms, technology,
culture, and policies” (Yilmaz 2007). Special characteristics of software development that are
similar to construction (or should be) include being goal-directed and adaptive, improving over
time, and being human-centered.
Human activity is at the core of software development, just as it is for the built environment.
Humans are the decision-makers and have control at multiple levels and locations of the work
systems. Models of software development must incorporate strategic change along with adaptive
human, team, organizational, and cultural factors. To this end Yilmaz (2009) proposes the
sociocognitive framework for engineering work systems. The reader is directed to the Yilmaz
(2009) paper for details of the model. This Yilmaz sociocognitive framework, along with the

141

FRAM and RE principles in general, serves as the inspiration for the RE Model for projects as
explained below.
4.3.3 The Resilience Engineering Model for Projects
Figure 4-2 summarizes the information contained in section 4-2 and serves to illustrate one
vision of how RE and the FRAM might be implemented at the workface as well as how the
executive, operational, and production functions might interact. It is inspired by Yilmaz’s SocioCognitive Model as described above. Paraphrasing Yilmaz’s description of his model, the RE
adaptation seeks to present the conceptualization of the critical elements of each level (e.g.,
operation, executive), as well as different aspects that simultaneously co-exist (e.g., social
dimension, human behavior dimension, organizational dimension) in the context of construction
project processes.
The executive, or ‘blunt end’ (named the strategic in the Yilmaz Model) works synergistically
(as shown by the double arrow) with the operational level. On a construction project it may be
populated with high level managers and owners of the general contractor and subcontractors. As
in the Yilmaz Model, it is responsible for monitoring, controlling, and adapting the operational
level via dynamic model updating. This necessarily entails monitoring the boundaries of the
system to determine if it is operating near levels that endanger safety. For instance, is the firm
operating in the uniform, extra, or plastic regions as described by the Stress-Strain state space in

142

Figure 4-2: Resilience Engineering at the Project Workface (inspired by Yilmaz 2007).
143

Figure 2-6. It is also responsible to (re)organize the social and physical structure of the
organization. Here high-level strategic functions are initiated that affect production level
activities and outcomes. The traditional functions of planning, organizing, controlling,
monitoring, and setting up reliable processes are initiated in broad strokes at the executive level
and the details are carried out the operational level. Additionally, the RE perspectives as
described in Section 4.2 are added at the executive level.
Executive management buy-in of the four perspectives is crucial. As stated previously,
engineering resilience into the system comes with a cost and may need to be strategically
implemented at only key junctures and activities if not project wide. Upper-level management
holds the resources needed to add resilience to a project. Not shown in the model are the
organizational inputs as described by Yilmaz above such as resources (budgets), culture, norms,
and objectives. Also, the environmental stressors (as inputs) placed on the project, as described
by Rasmussen in Chapter 1 are not shown.
The Operational level works synergistically with both the Executive and Production level to
carry out the details of the strategy as defined by the Executive level in order to facilitate
production. On a construction project the Operational level may be populated with project
managers of the various trades and general contractor or construction manager who are tasked to
carry out the strategy of the executive level. As in the Yilmaz Model, it consists of the
organizational, social, and informational (communication) subsystems as described above. A
detailed discussion of the subsystems is beyond the scope of this work but closely mirror
Yilmaz’s description above. The communication subsystem was renamed ‘information’ (which
includes ‘communication’) as described above but encompasses a broader perspective, and to

144

include emerging trends such as ‘Building Information Modeling (BIM), than simply
‘communication.’
The Production level, or those activities at the work face, carries out the strategy and detailed
planning of the Executive and applies it to the Operational level. It is populated by the various
trades that are needed to complete the work. It is directly impacted by disruptions, broadly
represented as ‘Type A’ and ‘Type B’ disruptions as explained in the literature review. The
trades may interact in such a way to reveal ‘emergence,’ a basic principle of the FRAM and
described in the literature review. This principle explains how the performance of the trades may
combine in unexpected ways to affect safety and productivity. The principle of approximate
adjustments is also in play at the workface. This is seen in the behavior of the interacting trades
as they react to disruptions in the face of ever increasing demands. This is manifest by the
ETTO, a behavior output of the system, along with productivity levels and ultimately cost.
Based on the output, adjustments are made in the executive and operational areas, as shown by
the arrows leading out of the output box.
The FRAM ‘snowflakes’ are overlaid onto Figure 4-2 to illustrate how the four basic
capabilities are coupled within the Framework. The functions have been placed on the diagram
where they impact the Framework most. For instance, the basic ability of ‘response’ is
embedded in the Production level given that responses are triggered by external and/or internal
events and are facilitated by the monitoring function (Hollnagel 2011). Constant monitoring is
placed between the operational and production levels given that monitoring primarily requires
attention by the Production and Operational levels, with assistance by the Executive level as
described above in the state-space discussion. Both anticipation and learning are embedded in

145

the Operational/Executive area given that they are both heavily influenced by past events and can
benefit from the experience of the Executive and Operational personnel.
Finally, a rough instantiation (or coupling of the four basic abilities) is presented on the
diagram. For instance, the information subsystem (consisting of plans and procedures) in the
Operational level is an input to the aspect ‘condition’ of the basic ability of ‘response.’ Other
instantiations are as shown.

146

Chapter 5: Resilience Engineering Conceptual Model Simulation
5.1: Introduction

This chapter addresses the Research Question “How can we begin to model and simulate
disruptions in construction operations using RE principles?” and Objective 3 of this work, which
is “To explore RE implementation in construction production settings that experience disruptions
in a formalized way using hybrid computational methods.” To do so the process of developing
the model was conceptualized using a formalized method developed by Robinson (2007b) that is
adapted for this purpose. Then the hybrid simulation model was coded using the Anylogic
software. Finally, the basic understanding obtained by the simulation is briefly discussed.
The model and corresponding simulation does not attempt to capture and encompass the
entire RE spectrum, represented in Figure 4.2 Here the focus is on looking at only the ETTO in
a simple production setting population by workers and managers. The ETTO is an important
part of RE and is used to understand performance variability as described in the literature review.
The approach is outlined more fully below.
Computer simulation is used as a means to further explore RE. The hybrid method of discrete
event and agent-based modeling was used to illustrate some RE characteristics. Computer
simulation was chosen due to the fact that RE is untested in the construction industry. Physically
implementing RE, an emerging and untested approach to safety in the construction, in a field
setting would introduce a moral hazard with regard to the safety of the workers. Computer
simulation offers a first step towards possible field implementation without jeopardizing worker
safety.
The Research Question and Objective 3 are approached abstractly and conceptually to
produce the computer simulation. It is abstract for two reasons. First, to simplify the process
147

that is investigated so that it may be viewed as applying across a broad range of construction
activities that experience disruptions. Second, it is abstracted in order to focus on the nature of
the disruptions and RE concepts rather than a particular construction activity or sequence of
activities, thus simplifying the model and simulation.

The characteristic of simplification is

desired to keep the model from becoming overly complex. In general, simple models run faster,
are more flexible, can be developed faster, require less data, and are easier to interpret (Robinson
2007a).
The Question and Objective 3 are approached conceptually in a similar vein to the
Framework developed by this paper. As presented earlier by Smythe (2004), speaking generally,
a “…conceptual framework should be intended as a starting point for reflection about the
research and its context. The framework is a research tool intended to assist a researcher to
develop awareness and understanding of the situation under scrutiny and to communicate this.
As with all investigation in the social world, the framework itself forms part of the agenda for
negotiation to be scrutinised and tested, reviewed and reformed as a result of investigation.”
Robinson (2007a, 2007b), talking specifically about computer simulation, contends that the aim
of a conceptual computer model should be a creative endeavor that is used to communicate,
debate, and agree upon a final coded suitable simulation model. Robinson is essentially
advocating writing the detailed specifications for a simulation prior to any coding.
Addressing simulation modeling in particular, Robinson (2007a, 2007b) published a series of
two articles that develop the background for a conceptual modeling framework. He (2007a)
contends that conceptual modeling is “…probably the most important aspect of a simulation
study.” However, it is also the least understood aspect of simulation modeling. In general, the
literature gives little detailed guidance on model creation and initiation of the model or a

148

formalized process of how to begin to model. Robinson (2007b) quotes Pidd in saying that
modeling is a process of “muddling” through. Robinson’s framework provides a disciplined
guide with regard to making decisions about a simulation model for a specific project. Although
it was developed with the discrete event method of simulation in mind he posits that the
framework has a wider applicability to other modeling methods.
The remainder of this chapter consists of two parts, first, the background of conceptual
modeling as presented by Robinson (2007a,b) is presented. Then the simulation model is
presented that addresses the Research Question under consideration as well as Objective three.
5.1 Background of Conceptual Modeling

In his papers Robinson discusses the lack of a formalized approach to conceptualizing simulation
models. In the first paper (2007a) he defines the meaning of conceptual modeling and the
requirements of a conceptual model. In the second paper (2007b) a framework for conceptual
modeling is presented. The basic ideas and important definitions of the papers are presented here
and Robinson’s framework is used to explain how the simulation was developed for this work.
The reader is directed to these papers for details and justifications for Robinson’s approach.
Robinson (2007a) states that "Conceptual modeling is the process of abstracting a model from
a real or proposed system. It is almost certainly the most important aspect of a simulation
project." He further states that it is more of an art than a science. The developers involved in a
conceptual modeling project consist of the client, the modeler, and the owners. Depending upon
the situation, these roles may be taken up by one person or a team of individuals working
together to develop the simulation model. A conceptual model is “A non-software specific
description of the computer simulation model (that will be, is or has been developed), describing
the objectives, inputs, outputs, content, assumptions and simplifications of the model.” In short,
149

it is the specifications for the model to be developed without regard to how it will be coded.
Conceptual modeling is “…is about determining the right model, not how the software will be
implemented” and is not software specific. Conceptual modeling is simply the process of
creating the conceptual model and requires the following activities:
1. Understanding the problem situation (a precursor to conceptual modeling),
2. Determining the modeling and general project objectives,
3. Identifying the model outputs (responses),
4. Identify the model inputs (experimental factors), and
5. Determining the model content (scope and level of detail), identifying any
assumptions and simplifications.

Taken together, the preceding activities make up the framework of conceptual modeling and
are shown within the ellipse in Figure 5.1 (with the exception of item 1, understanding the
problem situation). The details of the framework will be explained below.
Robinson (2007a,b) also discusses requirements for a conceptual model, however, these are
mostly client related and not appropriate for the discussion in this paper. However, they would
be applicable is a concrete real-world problem is encountered by the researcher. The
requirements consist of validity, credibility, utility and feasibility. The reader is directed to
Robinson’s papers for further details of the requirements. Robinson (2007a) emphasizes the
need to keep the model simple as described above.
5.1.1 Developing the Conceptual Model

The research question and objective associated with the model and simulation are explored using
Robinson’s Framework (2007a,b) as outlined above and illustrated in Figure 5.1.
150

Figure 5.1: Robinson’s conceptual model in the simulation project life-cycle. The conceptual
model framework is contained within the ellipse. The double arrows indicate the interaction
and iterative nature of items outside of the ellipse with each other and the framework
(Robinson 2007a)

5.1.2 Understanding the problem situation

The first step in conceptual modeling is to develop an understanding of the problem situation.
This is driven by a need to improve a problem situation. There are generally three scenarios that
exist when trying to model a real world problem:

•

The problem situation is clearly understood and expressed. This scenario is
the easiest for the modeler but is rarely encountered. If the problem situation
is clear, there most likely is no need for the simulation.
151

•

The problem situation is apparently well understood and expressed, although
it is not. This speaks to the client’s perception of the problem as compared to
reality. Through the process of modeling the modeler may unearth or uncover
different deficient areas than identified by the client. The experience of the
modeler and their ability to ask probing questions helps rectify this situation.

•

The problem situation is neither well understood nor expressed. Here the
modeler must be very adroit and engage the simulation team at framing the
issue at hand. Formal problem structuring methods such as cognitive
mapping and causal loop diagrams (as used in systems dynamics models) may
be used. However, as several researchers suggest that the simulation itself be
used as a problem structuring method. As Robinson (2007b) states, “The idea
is not so much to develop an accurate model of the system under
investigation, but to use the model as a means for debating and developing a
shared understanding of the problem situation. Validity is measured in terms
of the usefulness of the model in promoting this debate, rather than its
accuracy.”

5.1.3 Determining the Modeling and General Project Objectives

Determining the model objectives is the key to model development. This step is broken down
into three parts. First the overall aims, or what the organization wants to achieve with the model,
are explored. This is an important step and is continually reevaluated as it is the ultimate aim of
the simulation modeling. If the model does not contribute to the overall objectives of the firm or
study listed in this step then it may be of little value to the user.
152

Second, the modeling objectives are listed. These should be conceived as those that can be
achieved from the development and use of the model. The modeling objectives should strive to
answer the question ‘by the end of the study what do you hope to achieve?’ They are expressed
in terms of achievement, performance, and constraints. Achievement is what we hope to
accomplish by developing the model (e.g. increase throughput). Every endeavor is bounded by
some kind of restraint and this is no exception. An example of a constraint is the budget; another
is available space to perform an activity.
Finally, the general project objectives are listed. Here the modeler further clarifies the nature
of the model and its use given that this impacts the conceptual design. Consideration should be
given to the time scale, run-speed, visual display, ease-of-use, and model/component reuse.
5.1.4 Identifying the Model Outputs
The third step in developing the conceptual model is to identify the model outputs or responses.
Robinson (2007b) notes that it does not matter whether the responses (Step 3) are considered
first or the output (Step 4), however, he states that it is easier to consider what the model intends
to achieve rather than the inputs at the conceptual stage. The responses serve the purpose of
identifying whether the modeling objectives have been achieved or not. The responses follow
directly from the statement of modeling objectives. If the objectives are not being met then the
model may need to analyzed at a deeper level by the modeler and subject matter experts. Along
with response identification, the modeler should also consider data reporting (i.e. graphical
and/or numerical reports) and use of the model (e.g. for learning or other purposes).
5.1.5 Identify the Model Inputs
The model data that can be changed in order to achieve the modeling objectives stated on Step 2
are referred to as the model inputs or experimental factors. They are the means by which the
153

stated objectives will be achieved. They may be quantitative, such as a change in capacity of a
resource, or qualitative, such as change to the model structure. An advantage of using a
modeling method such as the discrete event (or process-centric) method is that experimentation
of situations which are difficult to model or predict otherwise, such as customer arrival rates, can
be completed fairly easily.
When understanding of the system or process is an objective then the list of experimental
factors needs to be more subtle. Identification of the most important factors by the domain
experts is crucial in this effort.
Finally, the methods of data entry should be considered ion this step.
5.1.6 Determining the Model Content
The model’s content consists of two main areas, the identification of the scope and the model’s
level of detail. The scope identifies the boundaries of the model and the level of detail of the
depth of the model. In this step the assumptions and simplifications are identified and the data
requirements are clarified.
Robinson (2007a) notes that at this juncture, that is prior to beginning Step 5, the use of
simulation as the appropriate vehicle for modeling should be questioned. The modeler should
ask the question “Is simulation the right approach for the problem situation at hand?” He also
notes that steps one through four are applicable to any modeling approach. From Step five
forward, the conceptual modeling framework presented here is specific to simulation.
Robinson’s Framework discusses simulation specifically in terms of discrete event models but
allows for the expansion of the Framework to include other methods. The parts of the discrete
event model are referred to as components; the most widely used components are entities,
activities, queues, and resources.
154

To determine the level of scope first the entities, activities, queues, and resources are
identified. Robinson (2007a) suggests using a three-step approach. Step one identifies the
model boundary by referring to the experimental factors and responses identified in Steps three
and four of the Framework. Step two consists of identifying the entities, activities, queues, and
resources in the real system that lie within the model boundary as well as their connections. Step
three is to assess the components included in the model in terms of validity, credibility, utility,
and feasibility; this assessment is not included in this dissertation and the reader is directed to
Robinson’s paper (2007b) for a full discussion of these criteria.
5.1.7 Determining the Model Level of Detail
This sub-step simply delineates the decisions involved in determining the level of detail for each
entity, activity, queue, and resource for the conceptual model. Again, the modeler should asses
these decisions against the validity, credibility, utility, and feasibility with the modeling
stakeholders for a real-world problem. Table 5-1 provides details for consideration of each
component.

155

Table 5.1: Template for consideration of level of detail by component type (from Robinson
(2007b)).

The assumptions and simplifications should also be explicitly stated in this Step five.
Assumptions are made when there are uncertainties or beliefs about the being modeled.
Simplifications are used to allow more rapid model development and ease of use.
The data requirements are also stated in this step. The level of detail table provides a list of
data requirements. There are three types of data required: contextual data, data for model
realization, and validation data. Only the first two are discussed in this work, contextual data
and realization data. Contextual data consists of things such as the layout of the process. Model
realization data, or things such as cycle time and mean time between failure (MTBF) data, are

156

obtained from the level of detail table. Finally, validation data is culled from historical records
of existing systems. Note that validation is beyond the scope of this dissertation.
5.2 A Conceptual Framework to Describe Disruptions in the Construction Process
This dissertation is driven by the objective to gain a better understanding of disturbances in the
construction process. Jackson and Madni (2009) claim that disturbances lay the groundwork for
accidents and poor safety in general production work. The claim by the author of this work is
that RE is a means to better understand and deal with single or multiple disturbances in a
construction setting. If disturbances are handled in a better way then perhaps accidents and
fatalities may decrease.
In this section the conceptual framework described above is applied to an abstract
construction process to gain a better understanding of disruptions and RE with the goal of
stimulating communication and debate regarding the use of RE to understand disruptions.
Robinson’s (2007a,b) framework as described in Figure 5.1 is followed to develop the model
and is amended to include agent-based simulation for the purpose of this dissertation. It is also
amended to suit the research activity at hand which is conceptual and abstract.
The five steps of conceptual modeling are now presented followed by the model coding and a
discussion of the experiments.
5.2.1 Step 1: Understand the Problem Situation

The problem situation under consideration is that disruptions are not well understood in the
context of construction production processes. Furthermore, the understanding of multiple
disruptions, including concurrent and sequential disruptions is not well understood.
Understanding disruptions is important given that some researchers have correlated their

157

occurrence with unwanted occurrences on construction projects, namely accidents. The
emerging paradigm of RE may help researchers and practitioners better understand disruptions.
The model developed uses some RE concepts in the simulation to aid communication and
negotiation among practitioners to gain a better understanding of disruptions and to ultimately
handle them better.
In terms of the scenarios discussed above, the disruption problem is neither well understood
nor expressed. Fatalities and accidents continue to occur on construction sites in
disproportionate numbers.

158

5.2.2 Step 2: Determine the Objectives
Project Aim
The overall aim of the simulation is to learn from the development and implementation of
this model. It will experimentally simulate disruptions in an abstract construction operation
setting using the ETTO Principle in a formalized way using the hybrid computational
methods of agent-based and discrete event modeling.
System throughput will be used to compare the experiments.
Modeling Objectives
•
•

•
•

To address the Research question and Objective 3
To understand the performance variability of the production process subject to
disruptions of varying intensity and occurring at the same or dissimilar times.
• This will be assessed generally by system throughput
• The disruptions will vary in intensity and occurrence. They are either
internal to the system or originating externally. The impact of the
disturbance will be reflected in a loss of resources for the particular
activity.
To represent and better understand the ETTO.
To use some RE concepts in the simulation to aid communication, negotiation, and
scrutiny among researchers and practitioners to gain a better understanding of
disruptions and to ultimately handle them better.

General Project Objectives
•

Time-scale: six month project, 8 hour working day, holidays are ignored.

•

Flexibility: limited level given that the model is abstracted and extensive model
changes are not expected (i.e. the process remains the same over each experiment)

•

Run-speed: real-time for illustrative purposes and virtual for debugging

•

Visual Data: simple 2D animation – this model is mainly for performing experiments
and the corresponding results. Graphics are only needed for model testing and
diagnosis of problems during experimentation.

•

Ease-of-use: simple interactive features given that the model is for use by the odeler.

Figure 5.2 : Listing the Project Aim and Objectives

159

5.2.3 Steps 3 and 4: Identifying the Model Outputs and Inputs
The proposed the model outputs (responses) are identified as follows:
Outputs (to determine achievement of objectives)
Throughput over the time-scale
Use:
•

Learning and understanding

Data reporting:
• Raw data (throughput)
• Bar Chart of daily throughput
• Mean, standard deviation, minimum and maximum daily throughput
Figure 5.3: The model outputs

The proposed model inputs (experimental factors) are identified as follows:
Experimental Factors
•
•
•
•

Baseline model of the production train with no disruptions or agents
Add disruptions to the baseline model
Add agents with “thorough” behaviors that perceive production pressure to the
baseline model
Add agents with “efficient” behaviors that perceive production pressure to the
baseline model

Figure 5.4 : Identifying the model inputs

160

5.2.4 Step 5: Determining the model content
The model content (scope and level of detail), identifying any assumptions and simplifications
are determined as follows:
Component
Entities

Detail

Include/exclude

Quantity

Activities

Include

Arrival Pattern

Exclude

Attribute

Exclude

Routing

Exclude

Other

Include

Quantity

Include

Nature (X in Y out)

Exclude

Cycle Time
Breakdown/repair

Include

Set-up/changeover

Exclude

Resources

Included

Table 5.2: Model Content
161

Justification
This is a linear
production scenario,
therefore one unit
“chunk” of work at a
time is considered.
Adds unneeded
complexity to the
simulation
Adds unneeded
complexity to the
simulation
Adds unneeded
complexity to the
simulation
Display style (bag
in animation)
Only one activity at
a time is considered
Adds unneeded
complexity to the
simulation
MTBF is
considered as
attached to a
resource and thus
an activity
Adds unneeded
complexity to the
simulation
Resources are
removed as a result
of the disruption

Table 5.2 (cont’d)

Queues

Resources

Shifts

Exclude

Routing
Quantity

Exclude
Include

Capacity

Exclude

Dwell Time

Exclude

Queue Discipline

Exclude

Breakdown/repair

Exclude

Routing

Exclude

Other

Exclude

Quantity

Include

Where required

Include

Shifts

Exclude

162

Adds unneeded
complexity to the
simulation
Ditto
Schema calls for
deletion/recovery of
resources as a result
of the
disruption/recovery
Not needed in this
model
This is determined
by process times
Adds unneeded
complexity to the
simulation
This impact is
accounted for in the
resource
Default to “first in,
first out”
Adds unneeded
complexity to the
simulation
Resources are
aggregated
Resources are
attached to each
activity to simulate
performance
variability
Adds unneeded
complexity to the
simulation

Model Assumptions and Simplifications
Assumptions
• All trades have the same delay times
• The queues are infinite
• Each trade has the same (5) number of resources
• No holidays
Simplifications
• Entities are readily available and not disrupted (i.e. Supply chain disruptions are
neglected)

Table 5.3: Model Assumptions and Simplifications
Data Requirements
• Delay times for trades
• Resource capacities
• Work schedule
• Resource schedule
• MTTF, MTTR
• Number of crews
• Project length
Table 5.4: Data Requirements

5.2.5 Coding the Computer Model
5.2.5.1 Experiment 1 Simulation
Figure 5.5 illustrates Experiment 1. This was used created as a baseline with which to compare
the results of the other experiments. Experiments 2 and 3 expand this model to include
disruptions and then agents respectively. The elements of the model are explained below.

163

Figure 5.5: Experiment 1

The model begins with a source that generates the entities. In this model the entites are
considered a unit of work that is delivered to the jobsite daily. The reader might assume that
they are something like modular housing units or precast panels that are deliverd to the field and
unloaded from the delivery truck and installed. The delivery may include anywhere from 1 to
five units per day. That is, 5 modular living units to install or 5 panels of precast concrete for
fascia. Each experiment begins with one unit delievered to the jobsite at 7 am each Monday
morning and progresses to five units per delivery per simulation run. Then prodcution pressure
is increased to include Monday and Tuesday deliveries, starting again at Monday morning at 7
am with one delivery and then one delivery Tuesday morning at 7 am. This sequence is repeated
in each simulation run for every day of the week until five units are delivered each morning at 7
am. The schema is realized in Anylogic as shown in Figure 5.6.

Figure 5.6: Sample Arrival schedule for work entities. Each workday begins with a delivery of 1
to 5 units. This figure indicates a delivery schedule of five units delivered at 7 am each day of
the work week.
164

The entities are processed by a series of trades or activities that complete the installation.
This is modeled as a “Service” object in Anylogic. Here the entity enters, seizes a resource from
the resource pool, is delayed (or processed), and then released after the process time. Also
embedded in the service object is a queue where the entity waits for a resource. In this model the
queues are assumed infinite for simplification.
Each trade has a corresponding amount of aggregate resources (e.g. “resource_A”), this could
be thought of as machinery, people, or anything else needed to complete the work. The number
of available reources effectively determines the capacity of each trade. For this model the
resources are abstracted and aggregated for simplification. Each trade is assigned 5 units of
resources. In this model the capacity of each trade is set to the maximum possible allowed by
the software but is limited by the resources available. Each incoming entity seizes a resource
unit, delays it, then releases it to the next trade unless it is complete for the purpose of the
simulation. This is accomplished via the “sink” object. The delay times and resource capacity
are parametized (e.g. “a_delay”)for ease of data input and flexibility. The delay times are set to a
triangular distribution and are stochastic.
A work schedule for the resources was set for this simulation. An eight-hour workday that
includes two fifteen minute breaks and a one hour lunch was chosen. The resource schedule is
controlled by an on/off scheme. That is, the resources are either working or available to work, or
off. The resource schedule is shown in Figure 5.7.

165

Figure 5.7: Resource schedule showing an eight-hour workday with two fifteen minute breaks
and a one hour lunch.
The arrival and resource schedules have been designed so that they are in synch. That is, a
delivery only occurs during working hours (i.e. everyday at 7 am when the crews start to work).
A simple two-dimensional graphic was created that shows how the entities move through and
are handled by the system. It is shown in Figure 5.8.

Figure 5.8: Animation for each Experiment. The number “5” indicates the capacity of each
trade.
5.2.5.2 Experiment 2 Simulation
In Experiment 2 (partially shown in Figure 5.9), Experiment 1 is expanded to include disruptions
to the model. The disruptions added are of either Type A (external to the system) or Type B

166

(systematic in nature and a disruption of function, capability or capacity) as described in the
literature review.

Figure 5.9: Experiment 2. Type A disruptions (those external to the system) are modeled as
events (i.e. event, event1, event2, event3, and event4).
Each event (i.e. event, event1, event2, event3, and event4) is scheduled by the modeler at a
certain time and date during working hours and occurs only once in each simulation run. In
effect, here the modeler may schedule disruptions that could be considered “acts of God” such as
a lightning strike, or a major supply chain disruption. The events disrupt by reducing the
capacity of each trade by 3 units, effectively slowing throughput. The capacity reduced is not
restored fully until reset by the Type B disruption discussed below.
Type B disruptions are modeled by statecharts as shown in Figure 5-7. This can be thought of
as a disruption caused by an equipment breakdown, or perhaps a routine maintenance of
equipment that shuts down work for a day, effectively halting production. The state of this
disruption is either “Working” or “Out of Order.” The states transition to one state or another by
the “Maintenance” or “Repair” transition link. The parameters “MTTF” and “MTTR” refer to
“Mean Time To Fail” and “Mean Time To Repair,” respectively. Each resource is given an
estimate to when it may be knocked out of commission, or fail. Each resource is assigned a
167

different MTTF to add to the stochastic nature of the program and is not synced to the resource
schedule. That is a disruption could occur in non-working hours and not affect production.
However, these disruptions could occur during working hours and be disruptive, adding to the
randomness of the program. Each time the maintenance transition fires it sets the resource
capacity to zero for that particular trade for one working day. After that, the repair transition
fires and sets the resource capacity back to full capacity (5 resource units).
The statecharts modeled for this simulation are specifically a “Disruption of Unreliability” as
described in the literature review.

Figure 5.10: Experiment 2. Type B disruptions (those internal and systematic) are modeled as
statecharts (statechart, statechart1, statechart2, statechart3, statechart4). The parameters MTTR
and MTTR contain the time between failures and recovery, respectively.

5.2.5.3 Experiment 3 Simulation
Experiment 3 expands Experiment 2 by adding agents to the simulation model. These agents can
sense their environment, specifically the amount of work entities moving through the system in
the vicinity of their trade and adjust the speed with which they do work accordingly.
Specifically, they sense if entities residing in the queues waiting to be processed. If the queue
168

preceding becomes too long they work faster. However, if the queue succeeding their work
becomes too long they will slow their pace. The acceptable queue lengths are described in
ranges and are illustrated below.
The agents behavior seeks to counteract the increased production and seek to provide the
thoroughness in the ETTO. The time to work on each entity is the “delay” in the service object
mentioned above. Each trade has one agent, which represents a crew. The crew has three
working speeds, normal, faster, and slower. Every crew has the same speed distribution. The
normal speed is as described in Experiment 1 as a triangular distribution with the arguments
(2.5,3,3.5) which corresponds to days needed to work on an entity. The faster and slower speeds
are distributed as (1.5, 2, 2.5) and (4.5,5,5.5), respectively.
As an example, the workers in Trade B will work at a normal speed as long the queue
preceding it is between zero and two (including the endpoints) and the Trade C queue is less than
or equal to 2. It will work at an increased speed if the Trade B queue is less than two and the
Trade C queue is less than or equal to 5. Finally, it will work at a slower pace, so that the work
is not “piled on” the succeeding crew, when the Trade C queue is greater than or equal to 15
units. These conditions were coded into the Anylogic program and a sample is shown below in
Figure 5.11.

Figure 5.11: Experiment 3 Java coding for Trade B.

169

5.2.5.4 Experiment 4 Simulation
In Experiment 4 the agents behavior was changed. Instead of looking to the queue immediately
succeeding it and adjusting the installation time, the crews only sought to maximize their own
effort. If the queue preceding it, its own queue, reached a threshold level, it sped up; otherwise
it went at normal speed.
This behavior mirrors a more efficient (as compared to thorough) attitude in the ETTO
Principle outlook. The Java coding for this behavior is shown in Figure 5.12

Figure 5.12: Experiment 4 Java coding for Trade B.
5.2.6 Understanding of the Experiments

The purpose of developing the conceptual model for this work is to show an exemplar of how
hybrid computer simulation could be used to illustrate RE principles under simulated field
conditions. It would be difficult and overly complex to represent all of the RE principles in one
model. Specifically, the ETTO Principle was illustrated with conditions of a typical construction
production process subject to internal and external disruptions and increasing production
pressure. The agents contained the ETTO behaviors of either efficiency or thoroughness. The
elements of a production line and disruptions were used to correspond to the RE premise that
production and safety are inseparable and that disruptions set the stage for accidents.
170

Stochasticity was added to the system in the form of triangular probability distribution for the
delay times and internal disruptions. Simulations were run over an approximately six month
period which corresponded to the project length. With the exception of the increasing
production pressure as entered by the modeler, all variables were kept constant for each
Experiment and simulation run. Twenty simulation runs were made for each delivery scheme.
The average throughput of each delivery scheme was compiled and compared; an example is
given in Table 5-1. Additionally, each chart was graphed; an example is shown in Figure 5.13.

Table 5.5: Compiled Average Throughput of the system for each experiment.

171

Figure 5.13: Graph showing the behavior of the production system subject to disruptions and
increasing production pressure. The production pressure is manifest as a delivery scheme
that progressively increases deliveries each day of the week. For instance, one unit is
delivered on Monday only, and then 1 unit is delivered on Monday and Tuesday, and so on.
The data for this graph is shown in Table 5.8.

Throughput was used as proxy for performance and evaluation of the system. In general,
most industries, including construction, aim for high throughput to maximize profits. The
highest average throughput, 240.5 units completed over the length of the project, occurred during
Experiment 3 on the Monday, Tuesday, Wednesday only delivery schedule that contained five
deliveries per day. This scheme had a high standard deviation of 26.8 with a range of 201 to 292
units over the length of the project. Standard deviation may be a proxy for variation in the
system.

172

Highest Throughput
Experiment 1

222.25

Experiment 2

189.5

Experiment 3

240.5

Experiment 4

234

Occurs at Delivery
Scheme
MTWTh
ONLY; 3
Deliveries
per day
MTWTh
ONLY; 5
Deliveries
per day
MTW
ONLY; 5
Deliveries
per day
MTWTh
ONLY; 5
Deliveries
per day

Standard Deviation
12.3

21.5

26.8

22.1

Table 5.6: Compiled performance statistics of the model.

Based on trends in the raw data and the compiled data in Table 5-9, the project would benefit
from crews that take a more thorough approach (in terms of the ETTO Principle) and choose a
Monday, Tuesday, Wednesday delivery schedule that has delivers five units per day. One reason
that the performance is superior in this Experiment may involve the use of agents that seek to be
thorough rather than strictly efficient. These crews may be able to mitigate the disruptions in a
manner that benefits the system.
The data appears to mimic Woods and Wreathall’s (2008) Stress-Strain Analogy for RE
shown in Figure 2-6. This is most clearly shown in Figure 5.13 above. Increasing demands
placed on the system, with the result of increased throughput, are plotted on the y-axis (i.e.
“Average Throughput”). This is what Woods posits corresponds to the stress. The strain, or
how the system stretches to accommodate increasing production pressure, is plotted on the y-axis
(i.e. “Number of Deliveries per Day of the Week”). The linear portion of the curves represents
173

the “on-plan” or “uniform” area. Here the project can easily handle the production schedule (or
deliveries) even in the face of disruption.
At the inflection point of about 213 units, using Woods and Wreathall’s analogy, the project
begins to adapt to handle the handle the increased production and disruptions. In material
science this is called the plastic region, Woods and Wreathall call it the “extra-region” and it
represents first-order adaptive capacity to deal with the increased demand. In this section the
data indicates that the variability of the system increase as the standard deviation jumps sharply,
indicating increased variability in the system. Comparing Woods and Wreathall’s idealized
graph with Figure 5.13, a noticeable difference is that the former is decidedly parabolic in the
extra-region, indicating continually changing condition. Figure 5.13 is linear in the extra-region.
This may be accounted for in that the variables in all of the trades are equal for the experiments.
At around 230 units, Experiments 3 and 4 show a slight downward trajectory as the demands
increase. This may correspond to Woods and Wreathall’s area of second-order adaptive
capacity. If resources are not increased in this region it is predicted that the system will fail.
5.3 Verification and Validation

Verification is the process of model debugging (Martinez 2010). North and Macal (2007) define
a verified model as one that works as designed. This model was verified by code debugging,
logic examination, and comparing several runs of the simulation with hand calculations. Code
debugging is embedded in the Java Eclipse platform of Anylogic. Code errors were corrected as
they were found. The logic examination and hypothetical test case were completed hand-in-hand
to ensure the model was running as envisioned and designed by the author. A simple test case
of the delivery schedules was conducted that tested the agents intervals and tolerance for

174

anticipated and completed work. All of the data was examined for reasonableness as the
experiments were conducted.
In the modeling world there is great debate over the need for and the process of validation.
This work was verified but not validated. Opinions run the gamut from the need for absolute
validation (Heath 2010) to the impossibility of validation (Sterman 2000). Martinez (2010)
posits that “A model is considered valid only for the purpose for which it is built, and not in
absolute terms.” In terms of Epstein’s 16 ways models are useful (as presented in Chapter 3) this
model was built to explain RE principles and the ETTO, especially, the interplay among
production, safety, and disruptions; illuminate the core dynamics of RE; demonstrate tradeoffs;
and to challenge the robustness of prevailing theory through perturbations. The computer
simulation was designed to be as abstract as possible but to still represent a typical construction
process. The simulation was inspired by the “Parade of Trades” dice game used by the Lean
Construction Institute (LCI) to illustrate variability in construction operations.
The author did not find any validation procedures for hybrid modeling. Validation schemes
do exist for agent-based and discrete event systems individually but there does not appear to be a
consensus in the literature with regard to best practices for validation either approach.
Speaking of DES and the relationship among the purpose of a study, the rigor it requires, and
its validation, Martinez (2010) states:
“When DES is used to demonstrate some principle or the effectiveness of some
technique, the emphasis should be on capturing the principle faithfully. Input
distributions need to make sense and be reasonable to individuals knowledgeable in the
area of application, but they do not need to be based on collected data.” and “Many DES
models exist solely to demonstrate a research product that in some ways claims to
advance the state-of-the-art in simulation modeling. This is another case where the
specific inputs and details that are used are of no significant consequence unless the
research is related to data synthesis—they must simply be reasonable in order to give
credibility to the work.”

175

No data was collected for this work and the inputs were based on the authors experience in
the construction industry and are reasonable for this work.
In his dissertation Son (2011) presents the argument that validating ABMs is more difficult
than other models due to path dependencies, multiple equilibrium, and even the absence of
equilibrium altogether. He cites Miller and Page (2007) and Banks et al (2004) in observing that
“The validity of ABM should be evaluated not only by predictive capability but also explanation
accuracy of formal models unlike traditional simulation approaches where validation is the
overall process of comparing the model and its behavior to the real systems and its behavior.”
Son posits that in ABM’s “If we examined actual test data to test simulation results, it would be
like comparing points to clouds of points. Any given result of an ABM would differ from others
depending upon the random features inherent in the rules.” Son also cites Yilmaz (2006) in the
position that ABMs should be consider a model to be valid on the basis of qualitative and
subjective evaluations of its contextual adequacy rather than an objective representation of the
system under study. Therefore, Son suggests using North and Macal’s (2007) guidance for
ABM validation consisting of agent behavior validation, interaction validation, and emergent
structure validation in addition to the validation of model inputs, model outputs, and processes in
the model:
•

Agent behavior validation: Do agent behaviors correspond to agents in the real
world? What theory is included in the model about agents?

•

Interaction validation: Do interaction mechanisms correspond to agents in the real
world? What theory is included in the model about processes?

•

Emergent structure validation: Does the model look right?

176

The agent behavior and interaction in this work closely match the real world. They are
grounded in RE premises and theory, especially the notion that “Safety and field operations
management are inseparable and do not operate independently.” Additionally, it explores the
premise that performance conditions are always underspecified in the face of disruptions and that
adjustments must be made. Finally, it examines performance variability in the form of the
ETTO.
With respect to interaction validation I argue that the mechanisms are grounded in real world
observations and that the inspiration for the process, the “Parade of Trades” dice game is a welltested physical simulation that has been used by industry and academia for many years to
illustrate performance variability.
Finally, the emergent structure validation is found in the similarity of the output to Woods and
Wreathall’s (2008) Stress-Strain Analogy for RE shown in Figure 2-13 and the output graph
shown in Figure 5.13. This is further explained in section 5.2.6. The model ‘looks right” and
acts as expected over a broad range of inputs.
In summary, the rigor of both DES and ABM validation depends upon the question being
addressed and the purpose of the model. Son (2011) notes that:
“When the purpose of a computational is explanations of behavior and hypothesis testing,
it is important to abstract the complexity of the real world to obtain insight and a
parsimonious explanation. When the purpose is exploration and theory generation,
validity requires a confirmation that the model makes sense in larger world of possibility.
Then, when the purpose is prediction and advice, the validity should take more practical
dimensions into consideration.”
The purpose of this research is to understand disruptions and how they might affect safety
using the emerging paradigm of RE by developing a conceptual framework. That is, to use
Son’s description, to explore whether RE might be useful in a larger sense in the construction

177

industry, the level of validation is sufficient in the authors’ opinion and no outside data collection
is necessary at this stage of the research.

178

Chapter 6: Conclusions
This Chapter summarizes how the Goals, Research Questions, and Objectives of this research
were met. It also states the expected contributions of this research, addresses the limitations of
the research, and identifies possible future research agendas.
6.1 Review of the Research Goals, Questions, and Objectives

This research was motivated by a desire to gain a better understanding of disruptions on
construction projects. It is suggested that the emerging paradigm of RE provides a lens through
which disruptions may be better understand, harnessed, and foreseen on construction projects.
One reason this is of interest is the disproportionally high accident and fatality rate in the
construction industry as compared to other industries. In general, the phenomena of disruptions
in general are not well-understood, categorized, or researched in the construction industry.
The goal of this study, to explore schemes and methods to understand, harness, and foresee
disturbances that arise from demands placed on the construction operations of project-based
organizations that deliver the built environment, was partially met by an examination and
analysis of the literature, the creation of a conceptual framework for RE implementation on
construction projects, and a computer simulation that mimicked a production process subject to
disruptions and populated by agents. The literature review chronicled previous schemes and
methods over the history of industrialized society to deal with safety issues in all industries. The
focus of this work was on exploring the concept of RE as one way to meet this goal. Hybrid
modeling was used to explore its utility in simulating internal and external disruptions with a RE
principle as a backdrop. Specifically, the notion of the ETTO was used as inspiration for the
hybrid computer model.

179

The Research Questions were approaches and answered in various parts of the dissertation.
The first question, “How does RE differ from traditional ways of thinking about how to deal with
disruptions?” was answered in the literature review. A distinct trajectory was traced that led to
the current thinking about RE. Many traditional approaches to safety and risk management are
encapsulated in the concept of RE, but are viewed anew through a “lens” of resilience. Some
traditional safety approaches, such as the idea of placing blame due to “human error” are rejected
due to the fact that many systems have become so complex that the blame is more likely
systematic than due to a singular person or event. RE also appears to be closely aligned with
Reliability Concepts. Indeed, RE researchers point out that reliability may be necessary to
handle disruptions that may endanger safety, but that reliability is not enough, a system also
needs to be resilient to effectively deal with disruptions.
The second question, “What are the current principles and practices of RE?” was addressed in
the literature review. The third question “What elements of RE may help a construction project
avoid, survive, and recover from disruptions?” was answered in the Framework. Finally, the
fourth question “How can we begin to simulate disruptions in construction operations and use
RE principles?” was explored in the computer simulation.
Objective 1, “Abstract the concept and underlying theories of RE and explore RE deployment
in non-construction industries for use in formulating Objectives 2 and 3.” was met via the
literature review. The conceptual framework mentioned in Objective 2 is presented in Chapter 4.
Finally, Objective 3 is explored in the simulation presented in Chapter 5.
6.2 Contributions to Knowledge of this Research

This research provides several contributions to the advancement of the body of knowledge in the
construction industry. In general, it introduces the idea that the emerging paradigm of RE may
180

be a way to deal conceptually and practically with disruptions to the construction process.
Specifically, it posits that RE is worthy of further investigation to alleviate the chronic safety
problem in the construction industry.
Other contributions include the extensive literature review, a conceptual framework that
provides a scaffold that other researchers and practitioners may build upon, and a process and
example of hybrid simulation modeling that incorporates disruptions and worker behavior in the
context of a production process. These are discussed below.
The literature review is a contribution in itself for two reasons. First, RE is an emerging
paradigm and the entire “story” of how it came into being and how it relates to approaches to risk
management and safety that came before it is neither clear nor chronicled in the literature. The
literature review in this work expanded upon Hale and Hovden (1998) and Bory’s et al. (2009)
work to provide a complete picture of the trajectory of RE by elucidating the “Five Ages of
Safety.” This clarity is not found elsewhere in the literature. The second contribution of the
literature review was to expose the gap that exists in disruptions research in construction.
Disruption analysis appears to be primarily confined to delay claims for litigation in the
literature.
A second contribution of this work is the development of the conceptual framework to
provide a starting point for the discussion of the use of RE in construction. To the author’s
knowledge, no other research has attempted to abstract and formalize RE principles and translate
them to the construction industry. The conceptual framework in this work follows Smythe’s
(2004) definition and is intended to help form the “…agenda for negotiation to be scrutinised
and tested, reviewed and reformed as a result of investigation.” The Framework provides
scaffolding for other researchers to test, build upon, and to debate.

181

Finally, this work provides both a conceptual simulation model that may be adapted to
explore other areas and facets of production systems subject to internal and external disruptions
of varying magnitude. The RE related ETTO Principle was used as an exemplar to test the
production and disruption scheme. It appears to offer greater insight and understanding of RE
and can be adapted to include richer agent behaviors and test other RE scenarios.
6.3 Limitations of this Research

The main limitation of this study is that it is abstract and conceptual and may or may not
represent actual conditions. In other words, it needs to be tested against a concrete construction
project scenario. This applies to both the Framework and Simulation. Another limitation lies in
that not all aspects of RE have been explored. Including all of the various aspects of RE into a
single simulation may prove unwieldy and overly complicated. Finally, the Literature Review
revealed a gap in research and understanding of the phenomena of disruptions and its relation to
the production process. More research needs to be conducted in this area.
6.4 Future Agenda

The area of RE is ripe for future research. The limitations of the study point the way to future
research in the area of disruptions, RE, production and the interrelation among the three. Further
studies need to be completed on the nature and frequency of occurrence of disruptions on
construction projects. Some future research ideas include measuring the baseline resilience of
construction companies and identifying their strengths and weaknesses in terms of response,
anticipation, monitoring, and learning. This work could be used to calibrate the Framework and
Simulation. Other work could include further simulating other areas of RE such as a Just Culture
and organizational behaviors, the so-called “soft” factors, using the computational technique of
182

system dynamics. This would enhance and triangulate the analysis by providing a richer and
fuller understanding of RE.
Systems Dynamics (SD) was developed at MIT in 1950’s by Jay Forrester. The SD method
helps users better understand how complex systems function over time. SD computes
interactions among abstracted system elements and allows the programmer to model non-linear
feedback loops. In short, SD focuses on cause and effect in systems and related feedbacks
(Sterman, 2000). System Dynamics uses a high level of aggregation, continuous flows, and
focuses on the pattern of behavior produced by a system, such as increasing costs, decreasing
quality, and stagnating waiting times (Sadsad and Donnell 2007). SD has been used in
construction to examine how construction managers learn and to develop curriculum based on
the SD model (Mukherjee et al. 2005). SD is considered a “top-down” approach as previously
discussed.

183

BIBLIOGRAPHY

184

BIBLIOGRAPHY

Adamski, A. J. & Westrum, R. (2003). Requisite Imagination: The Fine Art of Anticipating
What Might Go Wrong. In E. Hollnagel (Ed.), Handbook of cognitive task design. Mahwah, NJ:
Lawrence Erlbaum Associates.
Aldrich, M. (1997). Safety first: technology, labor, and business in the building of American
work safety, 1870-1939. Baltimore, Md: Johns Hopkins University Press.
Association for the Advancement of Cost Engineering (AACE) International, Inc. (2004).
“Estimating lost labor productivity in construction claims,” AACE International Recommended
Practice No. 25R-03.
Axelrod, R. and Tesfatsion L. (2005). “On-Line Guide for Newcomers to Agent-Based
Modeling in the Social Sciences.” <http://www.econ.iastate.edu/tesfatsi/abmread.htm> (April
21, 2009).
Barroso, M. P., and J. R. Wilson. (1999). "HEDOMS - Human Error and Disturbance
Occurrence in Manufacturing Systems: Toward the Development of an Analytical Framework".
Human Factors and ergonomics in Manufacturing. 9 (1), 87-104.
Bertelsen, S. (2003). “Complexity – Construction in a New Perspective.” Proc., 11th Conf. of
the Int. Group for Lean Construction, Blacksburg, VA USA.
Boin, A., and Schulman P. (2008). "Assessing NASA’s Safety Culture: The Limits and
Possibilities of High-Reliability Theory". Public Administration Review. 68 (6), 1050-1062.
Borshchev, A. and Filippov, A. (2004). “From System Dynamics and Discrete Event to Practical
Agent Based Modeling: Reasons, Techniques, Tools.” The 22nd International Conference of the
System Dynamics Society, July 25 - 29, 2004, Oxford, England.
Borys D, Else D, Leggett S. (2009). “The fifth age of safety: the adaptive age.” J. Health Saf.
Res. Pract. 1(1), 19-27.
CPWR--THE CENTER FOR CONSTRUCTION RESEARCH AND TRAINING. (2008). The
construction chart book: the U.S. construction industry and its workers. Silver Spring, MD.
Chenhall, Everon C. (2010). "Assessing safety culture, values, practices, and outcomes." PhD
thesis., Colorado State University.
Choudhry, R. M., Fang, D., & Mohamed, S. (2007). “The nature of safety culture: A survey of
the state-of-the-art.” Safety Science, 45 (10), 993-1012.
The Construction Industry Institute (CII) University of Texas at Austin. (1989). Management of
project risks and uncertainties. Austin, Texas, The Construction Industry Institute, University of
Texas.

185

Díaz-Cabrera, D., Hernández-Fernaud, E., & Isla-Díaz, R. ( 2007). “An evaluation of a new
instrument to measure organisational safety culture values and practices.” Accident Analysis &
Prevention, 39( 6), 1202-1211
Dekker, S. (2005). Ten questions about human error: a new view of human factors and system
safety. New York: Lawrence Erlbaum.
Dekker, S. (2007). Just culture: balancing safety and accountability. Aldershot, England:
Ashgate.
Epstein, J. M., & Axtell, R. (1996). Growing artificial societies social science from the bottom
up. Complex adaptive systems. Washington, D.C., Brookings Institution Press.
Epstein J.M. (2008). “Why model?” JASSS. 11 (4), 28-47.
Gehbauer, F. , Zülch, G. , Ott, M. , and Börkircher, M. (2007). “Simulation-based analysis of
disturbances in construction operations.” Proc., 6th Conf. of the Int. Group for Lean
Construction , East Lansing, Mich., 571–579.
Glendon, I.A., Clarke S., and McKenna E.F.. (2006). Human safety and risk management. Boca
Raton: CRC/Taylor & Francis.
Groeneweg, J. (1998). Controlling the controllable: the management of safety. DSWO Press,
Leiden, Netherlands.
Grote, G., and C. Kunzler. (2000). "Diagnosis of safety culture in safety management audits".
Safety Science, 34(1), 131-150.
Guldenmund, F. W. (2000). "The nature of safety culture: a review of theory and research".
Safety Science. 34(1-3). 215-257.
Hale, A. R. & Hovden, J. (1998). “Management and culture: the third age of safety. A review of
approaches to organizational aspects of safety, health and environment.” In A.-M.Feyer & A.
Williamson (Eds.), Occupational Injury: Risk, Prevention and Intervention London: Taylor &
Francis Ltd. 129-165.
Halpin, D. W., & Woodhead, R. W. (1976). Design of construction and process operations. New
York, Wiley.
Heath, B. L. (2010). “The history, philosophy, and practice of agent-based modeling and the
development of the conceptual model for simulation diagram.” PhD. dissertation, Wright State
University, Dayton, OH.
Heinrich, H. W., Petersen D.,. Roos N.R., Brown J., and Hazlett S.. (1980). Industrial accident
prevention: a safety management approach. New York: McGraw-Hill
Herrera, I. A., Hollnagel, E., Macchi, L., & Woltjer, R. (2010). “Exploring Resilience
Engineering Contribution to Risk Analysis in Air Traffic Management.” October.
EUROCONTROL.
186

Hollnagel, E. (2004). Barriers and accident prevention. Aldershot, Hampshire, England:
Ashgate.
Hollnagel, E., and D. D. Woods. (2005). Joint cognitive systems: foundations of cognitive
systems engineering. Boca Raton, FL: Taylor & Francis.
Hollnagel, E., Woods, D. D., & Leveson, N. (2006). Resilience engineering: concepts and
precepts, Ashgate, Aldershot, England.
Hollnagel, E. (2007). “Resilience Engineering Demystified”. Chair’s Newsletter. <
http://www.crc.mines-paristech.fr/isc/newsletter/news1_1.pdf>. (May 23, 2008).
Hollnagel, E., Nemeth, C. P., & Dekker, S. (2008). Resilience engineering perspectives. Volume
1, Remaining sensitive to the possibility of failure. Ashgate, Aldershot, England.
Hollnagel, E. (2009). The ETTO principle: efficiency-thoroughness trade-off: why things that go
right sometimes go wrong. Ashgate Farnham, England.
Hollnagel, E. (2011). “Epilogue: RAG–the resilience analysis grid.” Hollnagel, E., Pariès J.,
and Woods D.D.,and Wreathall J.. 2011. Resilience Engineering in Practice a Guidebook.
Ashgate Gower.
Hollnagel, E., Pariès, J., Woods, D., & Wreathall, J. (2011). Resilience engineering in practice: a
guidebook, Ashgate Farnham, England.
Hovden, J., Eirik A., and Herrera I.A..(2010). "Is there a need for new theories, models and
approaches to occupational accident prevention?" Safety Science. 48 (8): 950-956.
Ibbs, W., L.D. Nguyen and S. Lee (2007). "Quantified Impacts of Project Change," J.Prof. Issues
Eng. Edu.Prac., 133 (1), 45-52.
International Ergonomics Association (IEA) (2011). “Definition of Ergonomics
“<http://www.iea.cc/01_what/What%20is%20Ergonomics.html>. (Sep.3, 2011).
Jackson, S. (2010). Architecting resilient systems: Accident avoidance and survival and recovery
from disruptions. Hoboken, NJ: Wiley.
Kuenzi, M., and Schminke M.. (2009). "Assembling Fragments Into a Lens: A Review, Critique,
and Proposed Research Agenda for the Organizational Work Climate Literature". Journal of
Management. 35 (3): 634-717.
Kuhn, Thomas S. (1996). The structure of scientific revolutions. Chicago, IL: University of
Chicago Press
Kuivanen, R. (1996), Disturbance control in flexible manufacturing. Int. J. Hum. Factors
Manuf., 6 (1), 41–56
Lay, E. (2011). “Practices for noticing and dealing with the critical. A case study from
maintenance of power plants.” .
187

Leveson, N..(2004). "A new accident model for engineering safer systems." Safety Science
42(4), 237-270.
Lindau, R. A., & Lumsden, K. R. (1995). “Actions taken to prevent the propagation of
disturbances in manufacturing systems.” INT, J. PROD. ECON. 41 (1), 241-248.
Lingard, H., and Rowlinson S. (2005). Occupational health and safety in construction project
management. London: Spon Press.
Macchi, L., and Hollnagel E. (2011). “A Resilience Engineering approach for the evaluation of
performance variability development and application of the Functional Resonance Analysis
Method for air traffic management safety assessment.” Ph.D. Thesis. Paris: MINES ParisTech.
Madni A.M., and Jackson S. (2009). "Towards a conceptual framework for resilience
engineering". IEEE Systems Journal. 3 (2), 181-191.
Manuele, F. A. (2008). Advanced safety management focusing on Z10 and serious injury
prevention. Hoboken, N.J.: Wiley-Interscience.
Martínez, J. C. (1996). “Stroboscope: state and resource based simulation of construction
processes”. Ph. D. Thesis --University of Michigan, 1996.
Martinez, J. C. (2010). “Methodology for Conducting Discrete-Event Simulation Studies in
Construction Engineering and Management.” Journal of Construction Engineering and
Management, 48(1), 3-16.
McDonald, N. (2006). “Organizational resilience and industrial risk.” Resilience engineering.
Concepts and precepts. Aldershot: Ashgate, E. Hollinagel,DD Woods N. Leverson eds., Surrey
UK, Adgate Publishing, 155 – 180.
Mendonca, D. (2008). Measures of Resilient Performance. In: HOLLNAGEL, E., NEMETH, C.
P., & DEKKER, S. (eds). Resilience engineering perspectives. Volume 1, Remaining sensitive
to the possibility of failure. Ashgate studies in resilience engineering. Aldershot, Ashgate.
Miller, J. H., & Page, S. E. (2007). Complex adaptive systems: an introduction to computational
models of social life. Princeton, N.J., Princeton University Press.
Mitropoulos, P., Abdelhamid, T. S., & Howell, G. A. (2005). Systems Model of Construction
Accident Causation. Journal of Construction Engineering and Management. 131, 816-825.
Mukherjee, A., Rojas, E. & Winn, W. (2005) "Exploring Mental Models of Construction
Managers" ASCE Construction Congress 2005, San Diego.

Nemeth, C. P., Hollnagel, E., & Dekker, S. (2009). Resilience engineering perspectives. Vol. 2,
Preparation and restoration. Farnham, England, Ashgate.
North, M. J., & Macal, C. M. (2007). Managing business complexity: discovering strategic
solutions with agent-based modeling and simulation. Oxford, Oxford University Press.
188

Oglesby, C. H., Parker, H. W., Howell, G. A., & Parker, H. W. (1989). Productivity
improvement in construction. New York, McGraw-Hill.
Paries, J. (2011). “Lessons from the Hudson.” In Hollnagel, E., Pariès J., and Woods D.D.,and
Wreathall J.. 2011. Resilience Engineering in Practice a Guidebook. Ashgate Gower.
Pariès, J. (2012). “Resilience and the ability to respond.” Resilience engineering in practice: A
guidebook, Pariès, M. J., Hollnagel, E., Wreathall, M. J., & Woods, D. D. (Eds.). .. Ashgate
Publishing, Ltd.,3-8.
Perrow, Charles. (1984). Normal accidents: living with high-risk technologies. New York: Basic
Books.
Qureshi, Z.H.; Ashraf, M.A.; Amer, Y. (2007) "Modeling industrial safety: A sociotechnical
systems perspective," Industrial Engineering and Engineering Management, 2007 IEEE
International Conference on , vol., no., pp.1883,1887, 2-4 Dec. 2007.
Rasmussen, Jens. (1997). "Risk management in a dynamic society: a modelling problem".Safety
Science.27(2),183-213.
Rasmussen & Svedung. (2000). Proactive risk management in a dynamic society. [S.l.]: Swedish
Rescue Services A.
Re, A., and Macchi L.. (2010). "From cognitive reliability to competence? An evolving
approach to human factors and safety". Cognition, Technology & Work. 12 (2): 79-85.
Reason, J. T. (1990). Human error. Cambridge [England]: Cambridge University Press.
Reason, J. T. (2000). “Human error: models and management.” NCBI Resources,
<http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1070929> (Mar. 22, 2009).
Reason, J. T. (2008). Managing the risks of organizational accidents. Aldershot, Hants, Ashgate.
Roberts, K. H. (1993). New challenges to understanding organizations. New York: Macmillan.
Roberts, K. H. (2003). “HRO Has Prominent History.”
<http://www.apsf.org/newsletters/html/2003/spring/hrohistory.htm> (Nov.5, 2010).
Robinson, S. (2007a). "Conceptual modelling for simulation Part I: definition and requirements."
Journal of the Operational Research Society 59.3 (2007): 278-290.
Robinson, S. (2007b) "Conceptual modelling for simulation Part II: a framework for conceptual
modelling." Journal of the Operational Research Society 59.3:291-304.
Sadsad R. and McDonnell G. (2007), “Using multi-scale systems simulation to evaluate health
records solutions to improve medication use by the elderly in the community”. 8th PhD
Colloquium of the Student Chapter of the System Dynamics Society.
Sagan, S.D. (1993). The limits of safety: organizations, accidents, and nuclear weapons.
Princeton, N.J.: Princeton University Press.
189

Sarter, N. B., Woods, D. D., & Billings, C. E. (1997).” Automation surprises.” Handbook of
human factors and ergonomics, 2nd. Ed., G. Salvendy (Ed.), Wiley, 1926-1943.
Saurin, T.A., Formoso, C.T., and Cambraia, F.B., (2004),"A Human Error Perspective of Safety
Planning and Control." Proc. of 12th Annual Conf. of International Group for Lean Construction
(IGLC-12), Elsinore, Denmark.
Schein, E. H. (1999). The corporate culture survival guide: sense and nonsense about culture
change. San Francisco, Calif: Jossey-Bass.
Schein, E. H. (2010). Organizational culture and leadership. San Francisco, Calif: Jossey-Bass.
Senge, P.M. (2006). The fifth discipline. London: Random House Business.
Sheard, S., & Mostashari, A. (2008). A Framework for System Resilience Discussions. In 18 th
Annual International Symposium of INCOSE, Utrecht, Netherlands.
Shields, PM and Tajalli, H (2006).” Intermediate Theory: The Missing Link to Successful
Student Scholarship.” Journal of Public Affairs Education. 12(3), 313-334.
Simon, H. A. (1997). Models of bounded rationality. Cambridge, Mass, MIT Press.
Smyth, R. (2004). "Exploring the Usefulness of a Conceptual Framework as a Research Tool: A
Researcher's Reflections". Issues in Educational Research. 14 (2): 167-180.
Son, JeongWook. (2011.)” An integrated model of evolution of project teams in large-scale
construction projects.” PhD. Dissertation, University of Washington, Seattle, WA.
Sterman, John. (2004). Business dynamics: systems thinking and modeling for a complex world.
McGraw-Hill, Boston.
Syal, M. (1998). “Construction Research Agenda: Focus Areas and Topics.” American
Professional Constructor, 22(2), 8-12.
Tjorhom, B. and Aase, K. (2011). “The art of balance: using upward resilience traits to deal with
conflicting goals.” In Hollnagel, E., Pariès J., and Woods D.D.,and Wreathall J.. 2011.
Resilience Engineering in Practice a Guidebook. Ashgate Gower.
Toulouse, G.( 2002). "Accident risks in disturbance recovery in an automated batch-production
system". Human Factors and Ergonomics in Manufacturing & Service Industries. 12 (4), 383406.
Trist, E., and K. Bamforth. 1951. "Some Social and Psychological Consequences of the
Longwall Method of Coal-Getting". Human Relations. 4 (1): 3-38.
Watkins, M., Mukherjee, A., Onder, N., and Matilla, K. (2009). “Using Agent-Based Modeling
to Study Construction Labor Productivity as an Emergent Property of Individual and Crew
Interactions.” Journal of Construction Engineering and Management Volume 135(7), 657-667.

190

Weick, K. E. 1996. "The collapse of sensemaking in organizations: The Mann Gulch disaster".
Wildfire.
Weick, K. E., Sutcliffe K.M., and Obstfeld D.. (1999). Organizing for high reliability: processes
of collective mindfulness. Stamford: JAI Press.
Weick, K. E., Sutcliffe K.M.. (2001). Managing the unexpected: assuring high performance in an
age of complexity. San Francisco: Jossey-Bass.
Weick, K. E., Sutcliffe K.M., and Obstfeld D. (2005). "Organizing and the Process of
Sensemaking". Organization Science. 16 (4): 409-421.
Westrum, R. & Adamski, A.J. (1999) “Organizational Factors Associated with Safety and
Mission Success in Aviation Environments.” In D.J..Garland, J.A. Wise & V.D.Hopkin (Eds.)
Handbook of Aviation HumanFactors. Lawrence Erlbaum, Mahwah, NJ.
Westrum, R. (2006). “All coherence gone, New Orleans as a resilience failure.” In E. Hollnagel
& E. Rigaud (Eds.), Proceedings of the 2nd Resilience Engineering Symposium. Paris: Mines
Paris Les Presses.
Wild, A. (2005) "Uncertainty and Information in Construction: From the Socio-Technical
Perspective 1962-1966 to Knowledge Management - What Have We Learned?." In Knowledge
Management in the Construction Industry: A Socio-Technical Perspective, ed. Abdul Samad
Kazi, 203-224.
Woltjer, R., & Hollnagel, E. (2007). “The Alaska Airlines Flight 261 accident: A systemic
analysis of functional resonance.” In International Symposium on Aviation Psychology ISAP.
Wright State University. , 763-768.
Woods, D. D., & Cook, R. I. (2002). Nine Steps to Move Forward from Error. Cognition
Technology and Work. 4, 137-144.
Woods DD and Hollnagel (2006) “Prologue: Resilience Engineering Precepts.” In: Hollnagel E,
Woods DD, Leveson N (eds) Resilience engineering. Concepts and precepts. Ashgate, Aldershot,
England.
Woods, D.D., Patterson, E.S., & Cook, R.I. (2007). “Behind Human Error: Taming
Complexity to Improve Patient Safety.” In P. Carayon. (eds.). Handbook of Human
Factors and Ergonomics in Health Care and Patient Safety. Lawrence Erlbaum
Associates, Mahwah, New Jersey.
Woods DD, Wreathall J. (2008) “Stress-strain plots as a basis for assessing system resilience.”
In: Hollnagel E, Nemeth C, Dekker S (eds) Remaining sensitive to the possibility of failure.
Ashgate Publishing Company, Aldershot, pp 143–158.
Woods, D. D., Dekker, S., & Cook, R. (2010). Behind human error. Farnham, Ashgate.
Wreathall, J. (2009). "Leading? Lagging? Whatever!" Safety Science. 47 (4): 493-494.

191

Yilmaz, L. (2007). Modelling Software Processes as Human-Centered Adaptive Work Systems.
Lecture Notes in Computer Science, 4764, 148-159.
Yilmaz L. (2009). “Toward Systems Engineering for Agent-directed Simulation,"
Agent-Directed Simulation and Systems Engineering, L. Yilmaz and T.
Oren, eds., Wiley Series in Systems Engineering and Management, Wiley.
219-236.

192