STUDIES ON COMPLEX TASK NETWORKS BASED ON CONTEXTUAL SPECIFICS
                     IN ELECTRONIC MEDICAL RECORDS
                                             By
                                         Inkyu Kim
                                   A DISSERTATION
                                        Submitted to
                                Michigan State University
                        in partial fulfillment of the requirements
                                     for the degree of
     Business Administration – Business Information Systems – Doctor of Philosophy
                                            2022


                                            ABSTRACT
 STUDIES ON COMPLEX TASK NETWORKS BASED ON CONTEXTUAL SPECIFICS
                            IN ELECTRONIC MEDICAL RECORDS
                                                  By
                                              Inkyu Kim
As organizational processes have become more interconnected and interdependent, contextual
factors have become central to both information systems and process management. Despite the
importance of context, few studies investigate the influence of contextual factors on the structure
of business processes. Thus, in this dissertation, I examine the role of contextual specifics in the
structure of the clinical documentation process using data from electronic health records in
outpatient clinics. The dissertation includes three essays. In the first essay, I address the influence
of internal contextual factors on enacted complexity. The findings of the first essay provide a
unique opportunity to theorize on the specialization in enacted complexity of process by examining
the effects of: 1) the number of roles and 2) the degree of specialization. Contrary to expectations,
I find that complexity decreases when a greater number of roles are involved in the clinical process
and the roles are highly specialized. In the second essay, I turn my attention to the effects of
exogenous shocks on the clinical process: When routines are disrupted, are some patterns of action
more likely to be affected than others? I show that cohesion (defined as the consistency of context
between pairs of actions) has a particularly strong influence on the persistence of action patterns.
Lastly, in essay three, I suggest a path prediction model in a process based on action sequence and
its contextual specifics. The model uses a recurrent neural network that models both the observed
sequence of actions and the contextual factors in the process. As expected, the results show that
context can improve the prediction level of predictive models. In the case of outpatient medical
clinics, the strongest improvement in accuracy comes from two attributes: 1) the workstation


(location) where work is performed and 2) whether or not the system has been upgraded. Together,
these essays represent a rigorous framework for analyzing the role of context in organizational
processes and routines.


 This dissertation is dedicated to my wife, Jieun.
Thank you for making my days happily ever after.
                         iv


                                     ACKNOWLEDGEMENTS
I am deeply grateful to my dissertation chair, Brian T. Pentland, for the warm-hearted support
and guidance throughout the Ph.D. program. He has shown his belief in me ever since the day I
joined the program. He was not just my advisor, but also a mentor, role model, and father of my
life in the U.S. I would never have made it this far without him.
         I also would like to thank my committee members, Anjana Susarla, Kenneth A. Frank,
and Quan Zhang for their valuable feedback and comments. I am truly fortunate to have them on
my dissertation committee. I extend my appreciation to Julie Ryan Wolf and Alice Pentland for
their help in making these studies possible. I am thankful for the financial support provided by
the Department of Accounting and Information Systems at Broad College of Business.
         I appreciate the help and support from my office mate, Aaron Fritz, and the faculty, staff,
and my fellow Ph.D. students from the AIS department. Their help and support were crucial to
my journey in the Ph.D. program. I am also thankful to Jason Shin, Junghyun Mah, Sangmok
Lee, and Seokjoo Lee for being my collegial colleagues and valued friends for this arduous, but
worthwhile journey. Of course, I want to thank all my other friends who have supported me from
both of inside and outside of academia.
         I am also thankful to my parents, Doo Tae Kim and Hee Won Kim, and my brother,
Hyungkyu Kim, for their unconditional love and support. They have encouraged me with love
throughout my life.
         Last, but certainly not least, the deepest gratitude and love to my family, Jieun and Kyuri.
Jieun Kim, you are the most dedicated wife and loving mother that I could ever ask to have. This
academic journey would not have even started without your immense love and support. I am also
grateful for Kyuri Wynne Kim, who has become the most precious treasure in our life.
                                                    v


       This dissertation is supported by the National Science Foundation under Grants No. SES-
1734237 and BCS-2120530. Any opinions, findings, and conclusions, or recommendations
expressed in this material are those of the author(s) and do not necessarily reflect the views of
the National Science Foundation. This research was also supported in part by University of
Rochester CTSA (UL1 TR002001) from the National Center for Advancing Translational
Sciences (NCATS) of the National Institutes of Health (NIH). The content is solely the
responsibility of the author(s) and does not necessarily represent the official views of the
National Institutes of Health.
                                                 vi


                                            TABLE OF CONTENTS
LIST OF TABLES .......................................................................................................................... x
LIST OF FIGURES ....................................................................................................................... xi
INTRODUCTION .......................................................................................................................... 1
      0.1. Motivation for the Dissertation .................................................................................... 2
      0.2. Context Shapes Process ............................................................................................... 2
      0.3. Research Setting........................................................................................................... 3
      0.4. Representing Processes as Narrative Networks ........................................................... 4
      0.5. Overview of the Three Essays ..................................................................................... 5
        0.5.1. Enacted Complexity in Healthcare Routines: Evidence from Electronic Medical
      Records ............................................................................................................................... 5
        0.5.2. Dynamics of digitalization: Mechanisms of stability and change in digitalized
      work processes .................................................................................................................... 6
        0.5.3. Predicting Next Action based on Contextual Specifics: Evidence from Electronic
      Medical Records ................................................................................................................ 7
BIBLIOGRAPHY ........................................................................................................................... 9
CHAPTER ONE: ENACTED COMPLEXITY IN HEALTHCARE ROUTINES: EVIDENCE
FROM ELECTRONIC MEDICAL RECORDS........................................................................... 13
      1.1. Introduction ................................................................................................................ 13
      1.2. Theoretical Background ............................................................................................. 16
        1.2.1. Enacted Complexity ............................................................................................. 16
            1.2.1.1. Complexity as a network phenomenon ....................................................... 18
        1.2.2. Complexity in Healthcare .................................................................................... 21
        1.2.3. Number of Roles .................................................................................................. 22
        1.2.4. Role Specialization .............................................................................................. 23
      1.3. Research Context ....................................................................................................... 24
        1.3.1. Three Kinds of Outpatient Clinics ....................................................................... 25
        1.3.2. Clinical Roles are Specialized ............................................................................. 26
      1.4. Hypothesis Development ........................................................................................... 28
        1.4.1. Effect of Roles on Enacted Complexity............................................................... 28
        1.4.2. Effect of Role Specialization on Enacted Complexity ........................................ 29
      1.5. Methodology .............................................................................................................. 30
        1.5.1. Computing Enacted Complexity .......................................................................... 30
        1.5.2. Computing the Specialization Index .................................................................... 31
        1.5.3. Generalized Propensity Score Matching Method ................................................ 32
      1.6. Data Description ........................................................................................................ 33
      1.7. Model Estimation and Results ................................................................................... 36
        1.7.1. OLS Estimation.................................................................................................... 37
        1.7.2. Sensitivity Analysis ............................................................................................. 38
            1.7.2.1. Robust of infererence to case replacement (RIR) ....................................... 38
                                                                vii


             1.7.2.2. Impact threshold for omitted variable ......................................................... 39
        1.7.3. Causal Effect Estimation...................................................................................... 39
      1.8. Discussion .................................................................................................................. 43
        1.8.1. Specialization Makes Workflows Simpler........................................................... 43
        1.8.2. Enacted Complexity as a Network Phenomenon ................................................. 45
        1.8.3. Limitations ........................................................................................................... 45
      1.9. Conclusion ................................................................................................................. 46
BIBLIOGRAPHY ......................................................................................................................... 49
CHAPTER TWO: DYNAMICS OF DIGITALIZATION: MECHANISMS OF STABILITY
AND CHANGE IN DIGITALIZED WORK PROCESSES ........................................................ 56
      2.1. Introduction ................................................................................................................ 56
      2.2. Background ................................................................................................................ 59
        2.2.1. Information Systems and Organizational Routines ............................................. 59
             2.2.1.1. Co-evolution of routines and technology .................................................... 59
             2.2.1.2. Imbrication of routines and technology ...................................................... 60
             2.2.1.3. Routines as “shock-absorbers” ................................................................... 61
        2.2.2. The Importance of Persistence ............................................................................. 62
        2.2.3. Routine Dynamics as Network Dynamics ........................................................... 63
      2.3. Hypothesis Development ........................................................................................... 64
        2.3.1. Frequency of Edges.............................................................................................. 64
        2.3.2. Speed of Edges..................................................................................................... 65
        2.3.3. Coherence of Edges ............................................................................................. 66
      2.4. Illustration: Upgrading an EHR System .................................................................... 66
        2.4.1. Upgrading the EHR User Interface ...................................................................... 67
        2.4.2. Data Source .......................................................................................................... 67
             2.4.2.1. Selection of clinics ...................................................................................... 69
      2.5. Descriptive Findings .................................................................................................. 69
        2.5.1. Changes in the Narrative Networks ..................................................................... 69
        2.5.2. Visualizing Diachronic Changes.......................................................................... 70
      2.6. Analysis...................................................................................................................... 73
        2.6.1. Logit Models ........................................................................................................ 74
        2.6.2. Logistic Regression Results ................................................................................. 74
        2.6.3. Dyadic Prediction Model for Network Dynamics ............................................... 76
        2.6.4. Application of the Latent Space Model ............................................................... 77
        2.6.5. Results of Dyadic Prediction Models .................................................................. 78
        2.6.6. Summary of Results ............................................................................................. 79
             2.6.6.1. Frequency (H1) ............................................................................................ 79
             2.6.6.2. Speed (H2) ................................................................................................... 80
             2.6.6.3. Coherence (H3) ............................................................................................ 80
        2.6.7. Which Edges are Most Persistent? ....................................................................... 80
      2.7. Discussion .................................................................................................................. 83
        2.7.1. Putting Action into Context ................................................................................. 83
        2.7.2. Imbrication and Evolution ................................................................................... 84
        2.7.3. Routine Dynamics as Network Dynamics ........................................................... 85
      2.8. Limitations ................................................................................................................. 86
                                                             viii


      2.9. Conclusion ................................................................................................................. 87
BIBLIOGRAPHY ......................................................................................................................... 88
CHAPTER THREE: PREDICTING NEXT ACTION BASED ON CONTEXTUAL
SPECIFICS: EVIDENCE FROM ELECTRONIC MEDICAL RECORDS ................................ 95
      3.1. Introduction ................................................................................................................ 95
      3.2. Theoretical Background ............................................................................................. 97
        3.2.1. Process and Contextual Factors ........................................................................... 98
            3.2.1.1. Prediction models in process management ................................................. 99
      3.3. Data Description ...................................................................................................... 102
      3.4. Model ....................................................................................................................... 106
        3.4.1. Long Short-Term Memory Network .................................................................. 106
      3.5. Results ...................................................................................................................... 109
      3.6. Discussion ................................................................................................................ 111
      3.7. Conclusion ............................................................................................................... 113
BIBLIOGRAPHY ....................................................................................................................... 115
                                                                ix


                                                        LIST OF TABLES
TABLE 1.1. ACTION NETWORK COMPARISON FOR TWO DIFFERENT CLINICAL
VISITS .......................................................................................................................................... 20
TABLE 1.2. NUMBER OF CLINICS, VISITS, AND ROLES FOR EACH SPECIALTY ........ 25
TABLE 1.3. EXAMPLE DATA................................................................................................... 34
TABLE 1.4. DESCRIPTIVE STATISTICS ................................................................................. 35
TABLE 1.5. RESULTS OF REGRESSIONS ON ENACTED COMPLEXITY ......................... 37
TABLE 2.1. SIZE AND DENSITY OF THE NETWORK IN EACH CLINIC .......................... 69
TABLE 2.2. LOGISTIC REGRESSION RESULT ON EDGE PERSISTENCE ........................ 75
TABLE 2.3. RESULTS OF ANALYSIS FOR EDGE DISSOLUTION ..................................... 79
TABLE 2.4. SUMMARY OF RESULTS .................................................................................... 79
TABLE 3.1. REPRESENTATIVE PROCESS PREDICTIVE MODELS ................................. 101
TABLE 3.2. SAMPLE OF RAW DATA ................................................................................... 103
TABLE 3.3. EXAMPLE OF TOUCHPOINTS .......................................................................... 104
TABLE 3.4. VARIABLE DESCRIPTION ................................................................................ 105
TABLE 3.5. CONFIGURATION PARAMETERS OF THE LSTM NETWORK .................... 108
TABLE 3.6. RESULTS FROM PROPOSED APPROACH ...................................................... 109
                                                                       x


                                                   LIST OF FIGURES
FIGURE 0.1. NETWORK GRAPHS OF PATTERNS OF ACTIONS WITH CONTEXTUAL
SPECIFICS ..................................................................................................................................... 4
FIGURE 1.1. COMPLEXITY AS A FUNCTION OF COMPONENTS AND RELATIONS .... 19
FIGURE 1.2. ONE ROLE VS. THREE ROLES IN A PROCESS .............................................. 23
FIGURE 1.3. SPECIALIST VS. GENERALIST ROLES IN PROCESS .................................... 24
FIGURE 1.4. OUTPATIENT CLINIC LAYOUT ....................................................................... 25
FIGURE 1.5. ROLES ARE SPECIALIZED ................................................................................ 27
FIGURE 1.6. THE SAME ROLE SPECIALIZATION COULD RESULT IN DIFFERENT
NUMBERS OF PATHS .............................................................................................................. 30
FIGURE 1.7. NARRATIVE NETWORK WITH ROLE AND LOCATION .............................. 31
FIGURE 1.8. CAUSAL RELATIONSHIP BETWEEN NUMBER OF ROLES-ENACTED
COMPLEXITY ............................................................................................................................. 42
FIGURE 1.9. CAUSAL RELATIONSHIP BETWEEN SPECIALIZATION INDEX-ENACTED
COMPLEXITY ............................................................................................................................. 42
FIGURE 1.10. THE VISUALIZED EFFECT OF SPECIALISTS ON ENACTMENT OF
PROCESS ..................................................................................................................................... 44
FIGURE 2.1. CONVERTING EHR AUDIT TRAIL INTO NETWORKS ................................. 68
FIGURE 2.2. DIACHRONIC VIEW OF ROUTINES................................................................. 72
FIGURE 2.3. WHICH EDGES ARE MOST LIKELY TO PERSIST? ....................................... 81
                                                                  xi


                                        INTRODUCTION
Context changes our understanding of how the process works. In a recent review, Avgerou
(2019) argues that the role of context has been a major concern in research on information
systems in both theoretical and methodological ways for many years. For example, building a
generalizable IS theory confronts the issue of limited contextual insight due to the simplification
of contextual influence (Bamberger, 2008; Hong et al., 2004; Johns, 2006; Rousseau & Fried,
2001; Whetten, 2009), whereas context-specified research has a limitation of generalization
(Cheng et al., 2016). In process management, research on context-aware process acknowledges
the influence of contextual factors on the behaviors of the participants and technologies and
suggests the need for a context-integrated process design (Recker et al., 2009; Rosemann et al.,
2008; vom Brocke et al., 2016). As organizational processes have become more interconnected
and interdependent, contextual factors have become central to both information systems and
process management.
         Context can also affect how we describe and model business processes. In particular,
depending on how much context you consider in the process description, the process appears to
change (Rosemann et al., 2008). It may seem to be the same process, but it can look very
different. For example, a process looks simple when we recognize it as just a sequence of events,
but it looks more complex when we consider that each event in the process has its own distinct
contextual background (e.g., a distinct actor, a distinct location). As business process models get
more complex with more stakeholders and technologies involved, the notion of the context-
aware business process gets more important (Rosemann et al., 2006).
         Despite the importance of context, few studies investigate the influence of contextual
factors on the structure of business processes. Thus, in this dissertation, I examine the role of
                                                  1


contextual specifics in the structure of the clinical documentation process.
0.1. Motivation for the Dissertation
There are three motivations for this dissertation. First, there is a theoretical motivation: how does
context affect the structure and performance of a process? As previously mentioned, many
studies argue the importance of contextual specifics, but how the context affects the structure of
the process has not been studied yet. In this dissertation, I examine how the internal context of
the clinical documentation process is associated with enacted complexity of process and how the
process responds to changes in external contextual factors.
        Second, there is a methodological motivation: how can I detect which factors are likely to
influence the structure and execution of a process? Many factors could be considered as the
contextual environment for process, but their impacts on the structure of process vary. By
estimating standardized coefficients of internal factors and modeling the effects of disruption on
the structure of stochastic transitions between events in a process, I can compare the impact of
each contextual factor and see their influence on process dynamics.
        Third, there is a practical motivation: if I can better predict the sequences of action in the
execution of a process, I may be able to do a better job of supporting and perhaps automating
parts of that process. Based on the factors whose impacts are demonstrated in the first two
essays, I suggest a prediction model for the sequences of action for the clinic documentation
process in my third essay. The prediction model improves on the current state of the art and
could contribute to the automation effort (Aysolmaz et al., 2013).
0.2. Context Shapes Process
Rosemann et al. (2008) suggest the “onion model” to describe how contextual factors are layered
and how these layers can shape how a process works. According to the onion model, the context
                                                    2


of process consists of four different levels (immediate, internal, external, and environmental),
which refers to the layers of context from inside to outside. Based on this metaphor, I distinguish
between different layers of context. External and environmental context (that is truly “outside”),
include factors such as the season or the country. Outside factors do not change during the
execution of a process. Inside and immediate contextual factors, such as the person performing
each action, can change during the execution of a process.
        In research on process management, there is increased interest in the role of context, but
usually, they mean (a) sequential context (Becker & Intoyoad, 2017; Bose & van der Aalst,
2009; Gunther et al., 2008) or (b) external/environmental context, similar to the typical
exogenous variables (Avgerou, 2019). There are also studies considering and emphasizing
internal contexts (Li et al., 2010; Rosemann et al., 2008; van der Aalst & Dustdar, 2012), but it is
hard to find studies examining their impacts on process. Thus, in this dissertation, I examine the
role of internal and external contextual factors in-process structure and how contextual
information could be used for prediction.
0.3. Research Setting
All three essays use data from outpatient clinics at the University of Rochester Medical Center
(URMC). Our research partners at URMC extracted audit trail data from the EPIC Electronic
Health Record (EHR) system in several different medical specialties (including dermatology,
orthopedic surgery, and pediatric oncology) during different periods between 2016 and 2019.
Each essay uses a different specific set of data, as explained below. These records include
detailed, time-stamped records of EHR utilization in tens of thousands of patient visits.
                                                   3


0.4. Representing Processes as Narrative Networks
In this dissertation, I represent processes as narrative networks (Pentland & Feldman, 2007).
Narrative networks provide a useful way of summarizing patterns of actions (Pentland et al.,
2010). A narrative network is defined as a directed graph consisting of actions (events) as the
nodes and sequential relationships between the actions as edges (Pentland et al., 2017). A
narrative network is useful for the study because the nodes can be defined by multiple contextual
factors (e.g., action, actor, location) (Pentland et al., 2020). Depending on how much context you
include in the process description, the structure of the process changes. It’s the “same process”,
but it’s not the same process.
FIGURE 0.1 NETWORK GRAPHS OF PATTERNS OF ACTIONS WITH
CONTEXTUAL SPECIFICS
         Figure 0.1 shows an example of how considering contextual specifics can change how we
see patterns of actions in a process. Using ThreadNet 3 (Pentland et al., 2020), I convert the
clinical documentation process from one patient visit into a network. When the network consists
of actions only (as in the left side of Figure 1), it is hard to grasp patterns and directions of
actions because the actions are very densely connected. However, when I construct the network
                                                     4


so that nodes are described by actions and the actors who performed the actions (as in the middle
of Figure 0.1), it increases the number of nodes and begins to reveal structure that was not visible
with actions only. When I add another contextual factor, location (as on the right side of Figure
0.1), the additional structure becomes apparent. The clustered sections of the network reflect
different locations in the clinic. This example shows how adding context can change the apparent
structure of a process.
0.5. Overview of the Three Essays
This dissertation will explore the three different ways that context influences process. The three
essays are described in the following sections.
0.5.1. Enacted Complexity in Healthcare Routines: Evidence from Electronic Medical
        Records
In the first essay, I address the influence of contextual factors on enacted complexity.
Complexity has been a central problem in many disciplines including organizational studies,
process management, and information systems (Anderson, 1999; Pich et al., 2002; Rahmati et
al., 2020; Rettig, 2007), but context has not been considered as a factor that influences
complexity. By understanding and combining patterns of actions with their contextual specifics,
this essay extends our understanding of the antecedents of enacted complexity. I focus on the
impact of specialization on enacted complexity. Specialization is essential in organizational
processes, where most of the tasks require specified knowledge (Batista et al., 2005; Stitzenberg
& Sheldon, 2005). However, there has been no agreed-upon model and no empirical research
that analyzes the relationship between specialization and enacted complexity. Thus, in this essay,
I investigate the research question: how does specialization affect the enacted complexity of
process?
                                                   5


         To answer this question, I consider the implications of specialization for process
enactment. I investigate the effects of specialization in two distinct ways: 1) the number of
specialized roles in process and 2) the degree of specialization in each role. First, the
involvement of specialized roles is an important determinant of specialization. The more
specialized roles are involved, the more specialized a process is. However, adding roles may
make the process more complex as it adds more tasks. The degree of specialization of each role
provides another way to address the same basic question. Although a process is enacted by many
roles, the extent to which each role in the process is specialized may be different so the degree of
specialization differs depending on who is involved.
0.5.2. Dynamics of digitalization: Mechanisms of stability and change in digitalized work
         processes
In the second essay, I turn my attention to the effects of exogenous shocks on routines: What
mechanisms shape the dynamics of digitalization? Does the structure of the routine itself
influence the dynamics of digitalization and vice versa? More broadly, I investigate the
mechanisms through which organizational routines react to external disruptions.
To address these questions, I model routines as directed graphs (Pentland et al., 2017; van der
Aalst, 2019). Using latent factor selection models (Hoff, 2005), I study the hypothesis that the
effects of a technological change, a major upgrade of an EHR system, may influence structure
and patterns of action by discovering and comparing patterns of action pre-post disruption
(Pentland & Kim, 2021). In social networks, mechanisms like reciprocity, homophily, and
preferential attachment contribute to the formation and dissolution of network ties (Snijders,
2001), but analogous network-based mechanisms have never been defined or investigated in the
context of organizational routines. This essay contributes to current research on routine dynamics
as network dynamics (Feldman et al., 2016; Goh & Pentland, 2019) by providing a novel
                                                  6


application of dynamic network models (Hoff, 2005; Minhas et al., 2019) to theorize about the
dynamics of digitalization. The employed theory and method in this essay provide a way to
reinvigorate the sociotechnical foundations of the information systems field by explicitly
examining the systemic connections between technology and patterns of action.
0.5.3. Predicting Next Action based on Contextual Specifics: Evidence from Electronic
        Medical Records
Lastly, in essay three, I investigate how a predictive process model can be qualified based on
contextual specifics. In my first two essays, I focus on the influence of contextual factors on
complex networks and their stability from an exogenous disruption, a system upgrade. In this
essay, I utilize contextual specifics as ingredients to boost the prediction level of the flow of the
clinical documentation process.
        While the use of EMR systems was expected to make the documentation process
convenient and concise, the process is still complex because clinicians must record every step in
the system. As a result, complexity in the documentation process contributes to administrative
costs in the healthcare systems (Shrank et al., 2019). However, on the flip side, if there is a way
to find recognizable patterns and predict paths in the early stage, it may be possible to simplify
the process and save wasted costs and time (Lee & Dale, 1998).
        For an accurate prediction of the process, in this essay, I use different types of contextual
specifics as attributes for the prediction of actions in the process. As the clinical documentation
process is composed of careful collaborations of various specialists and occurs in real-time when
patients visit, the immediate contexts (actor and location) studied in essay 1 need to be used. In
addition, the external and environmental factors also can be good elements for the prediction
because, as shown in essay 2, the shape of the process is influenced by the external factors.
                                                   7


         Towards this end, I use a Recurrent Neural Network (Long short-term memory, LSTM)
to find recognizable patterns, which access and modify the sequence based on three types of
gates (input, output, and forget) (Hochreiter & Schmidhuber, 1997). I train the prediction models
to see if the prediction level changes when considering contextual factors as additional attributes,
which contextual factors are most impactful, and how much the contextual specifics can improve
the prediction level.
                                                 8


BIBLIOGRAPHY
      9


                                         BIBLIOGRAPHY
Anderson, P. (1999). Perspective: Complexity theory and organization science. Organization
        Science, 10(3), 216-232.
Avgerou, C. (2019). Contextual explanation: Alternative approaches and persistent challenges.
        MIS Quarterly, 43(3), 977-1006.
Aysolmaz, B., İren, D., & Demirörs, O. (2013). An effort prediction model based on BPM
        measures for process automation. In Enterprise, Business-Process and Information
        Systems Modeling (pp. 154-167). Springer.
Bamberger, P. (2008). From the editors beyond contextualization: Using context theories to
        narrow the micro-macro gap in management research. Academy of Management Journal,
        51(5), 839-846.
Batista, N., Batista, S. H., Goldenberg, P., Seiffert, O., & Sonzogno, M. C. (2005). Problem-
        solving approach in the training of healthcare professionals. Revista de Saúde Pública,
        39, 231-237.
Becker, T., & Intoyoad, W. (2017). Context aware process mining in logistics. Procedia Cirp,
        63, 557-562.
Bose, R. J. C., & van der Aalst, W. M. (2009). Context aware trace clustering: Towards
        improving process mining results. Proceedings of the 2009 SIAM International
        Conference on Data Mining.
Cheng, Z., Dimoka, A., & Pavlou, P. A. (2016). Context may be King, but generalizability is the
        Emperor! Journal of Information Technology, 31(3), 257-264.
Feldman, M. S., Pentland, B. T., D’Adderio, L., & Lazaric, N. (2016). Beyond routines as things:
        Introduction to the special issue on routine dynamics. Organization Science, 27(3), 505-
        513.
Goh, K. T., & Pentland, B. T. (2019). From Actions to Paths to Patterning: Toward a Dynamic
        Theory of Patterning in Routines. Academy of Management Journal, 62(6), 1901-1929.
Gunther, C. W., Rinderle-Ma, S., Reichert, M., van der Aalst, W. M., & Recker, J. (2008). Using
        process mining to learn from process changes in evolutionary systems. International
        Journal of Business Process Integration and Management, 3(1), 61-78.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8),
        1735-1780.
                                                  10


Hoff, P. D. (2005). Bilinear mixed-effects models for dyadic data. Journal of the american
         Statistical association, 100(469), 286-295.
Hong, W., Thong, J. Y., & Tam, K. Y. (2004). Does animation attract online users’ attention?
         The effects of flash on information search performance and perceptions. Information
         Systems Research, 15(1), 60-86.
Johns, G. (2006). The essential impact of context on organizational behavior. Academy of
         Management Review, 31(2), 386-408.
Lee, R. G., & Dale, B. G. (1998). Business process management: a review and evaluation.
         Business process management journal.
Li, J., Bose, R. J. C., & van der Aalst, W. M. (2010). Mining context-dependent and interactive
         business process maps using execution patterns. International Conference on Business
         Process Management.
Minhas, S., Hoff, P. D., & Ward, M. D. (2019). Inferential approaches for network analysis:
         Amen for latent factor models. Political Analysis, 27(2), 208-222.
Pentland, B. T., & Feldman, M. S. (2007). Narrative networks: Patterns of technology and
         organization. Organization Science, 18(5), 781-795.
Pentland, B. T., Hærem, T., & Hillison, D. (2010). Comparing organizational routines as
         recurrent patterns of action. Organization studies, 31(7), 917-940.
Pentland, B. T., & Kim, I. (2021). Narrative Networks in Routine Dynamics. In M. S. Feldman,
         B. T. Pentland, L. D'Adderio, D. Dittrich, C. Rerup, & D. Seidl (Eds.), Cambridge
         Handbook of Routine Dynamics. Cambridge University Press.
Pentland, B. T., Recker, J., Wolf, J. R., & Wyner, G. (2020). Bringing Context inside Process
         Research with Digital Trace Data. Journal of the association for information systems,
         21(5), 5.
Pentland, B. T., Recker, J., & Wyner, G. (2017). Rediscovering handoffs. Academy of
         Management Discoveries, 3(3), 284-301.
Pich, M. T., Loch, C. H., & Meyer, A. D. (2002). On uncertainty, ambiguity, and complexity in
         project management. Management Science, 48(8), 1008-1023.
Rahmati, P., Tafti, A. R., Westland, J. C., & Hidalgo, C. (2020). When All Products Are Digital:
         Complexity and Intangible Value in the Ecosystem of Digitizing Firms. MIS Quarterly,
         45(3), 1025-1058.
Recker, J., Rosemann, M., Indulska, M., & Green, P. (2009). Business process modeling-a
         comparative analysis. Journal of the association for information systems, 10(4), 1.
                                                  11


Rettig, C. (2007). The trouble with enterprise software. MIT Sloan management review, 49(1),
        21.
Rosemann, M., Recker, J., & Flender, C. (2008). Contextualisation of business processes.
        International Journal of Business Process Integration and Management, 3(1), 47-60.
Rosemann, M., Recker, J., Flender, C., & Ansell, P.-D. (2006). Understanding context-awareness
        in business process design. Proceedings of the 17th Australasian Conference on
        Information Systems.
Rousseau, D. M., & Fried, Y. (2001). Location, location, location: Contextualizing
        organizational research. Journal of organizational behavior, 1-13.
Shrank, W. H., Rogstad, T. L., & Parekh, N. (2019). Waste in the US health care system:
        estimated costs and potential for savings. Jama, 322(15), 1501-1509.
Snijders, T. A. (2001). The statistical evaluation of social network dynamics. Sociological
        Methodology, 31(1), 361-395.
Stitzenberg, K. B., & Sheldon, G. F. (2005). Progressive specialization within general surgery:
        adding to the complexity of workforce planning. Journal of the American College of
        Surgeons, 201(6), 925-932.
van der Aalst, W. M. (2019). A practitioner’s guide to process mining: Limitations of the
        directly-follows graph. Procedia Computer Science, 164, 321-328.
van der Aalst, W. M., & Dustdar, S. (2012). Process mining put into context. IEEE Internet
        Computing, 16(1), 82-86.
vom Brocke, J., Zelt, S., & Schmiedel, T. (2016). On the role of context in business process
        management. International Journal of Information Management, 36(3), 486-495.
Whetten, D. A. (2009). An examination of the interface between context and theory applied to
        the study of Chinese organizations. Management and organization review, 5(1), 29-55.
                                                 12


                                          CHAPTER ONE:
     ENACTED COMPLEXITY IN HEALTHCARE ROUTINES: EVIDENCE FROM
                              ELECTRONIC MEDICAL RECORDS
1.1. Introduction
Specialization of tasks in organizations contributes to enhanced performance with more efficient
productivity. By specialization, I mean the concentration on particular components of an
organization's task (Fahrenkopf et al., 2020). The benefits of specialization have long been
studied across diverse organizational settings (Fahrenkopf et al., 2020; Flueckiger, 1976;
Narayanan et al., 2009; Staats & Gino, 2012). Specialization allows organizations to reduce costs
and manage complexity (Crowston, 1997; Staats & Gino, 2012). Specialization sets the context
in which a process is performed (Rosemann et al., 2008)
        Complexity is a tremendous problem in organizations as processes have become more
interconnected and interdependent (Rahmati et al., 2020; Rettig, 2007; Sturmberg & Martin,
2013). While this is especially true in healthcare, where there is a growing concern about the
consequences of complexity (Shrank et al., 2019). Specialization is essential in healthcare, where
most of the tasks require specified knowledge (Batista et al., 2005; Stitzenberg & Sheldon,
2005), but there are no agreed-upon models for analyzing the relationship between specialization
and the complexity of healthcare work.
        In this study, I consider the implications of specialization for process enactment.
Healthcare services are embedded in a web of intersecting specialties, roles, and other contextual
factors. In the clinical process, each role participates in the process with a specialized set of skills
and patterned social behaviors (Turner, 2001). For example, a patient who arrives at the
orthopedic surgery clinic with a broken leg might engage with several provider roles, including
                                                   13


office staff, insurance pre-authorization, nurse, physician, and radiology technician. Later, the
same patient may have a follow-up visit for physical therapy, and he/she would not need as many
clinicians as the first visit. These two cases are differentiated from each other in that the number
of participants and the types of involved roles are different. In this case, how can we assess the
effects of specialization on this diverse set of possible workflows?
         To address this issue, I examine the effects of specialization in two distinct ways; 1) the
number of specialized roles in a process and 2) the degree of role specialization in a process.
First, the involvement of specialized roles is an important determinant of specialization. The
more specialized roles are involved, the more specialized a process is. However, adding roles
may make the process more complex as it adds more tasks. The degree of role specialization is
also another important factor to consider. In addition, the extent to which each role in the process
is specialized may vary, so the degree of specialization differs depending on which roles are
involved. For example, when a patient visits the clinic, the degree of specialization of a nurse is
lower than either clinical or administrative technicians because the nurse can cover a larger
variety of tasks.
         Based on these two aspects of specialization, I investigate the effects of specialization on
the enacted complexity of digitalized work processes in healthcare organizations. The
relationship between specialization and enacted complexity of work process in organizations is
especially important in healthcare organizations because the healthcare process consists of
intersecting specialties and other contextual factors and administrative procedures, such as
billing and insurance, are also very complex (Gottlieb et al., 2018; Sakowski et al., 2009).
Complexity has been considered as one of the main practical problems in healthcare service
(Kannampallil et al., 2011; Sturmberg & Martin, 2013; Thompson et al., 2016). The complexity
                                                   14


of organizational processes in clinical settings has been studied and characterized within the
process and its tasks.
         I focus on the relationship between specialists who concentrate on specific components of
tasks and enacted complexity of the process. I address the following specific research question:
Does specialization increase or decrease enacted complexity of a process?
To answer this question, I convert the work process into a narrative network (Pentland &
Feldman, 2007) and see the influence of specialization on the number of paths in the network
(Goh & Pentland, 2019). A narrative network is a special kind of “directly follows graph” (van
der Aalst, 2019) where the nodes are defined using additional contextual features, such as actors,
artifacts, locations, and so forth (Pentland et al., 2017). The intuition behind this measure of
enacted complexity is simple: a process with more alternative paths is more complex. This
measure embodies the idea that task complexity is indexed by the number of paths in the network
of events that lead to the attainment of task outcomes (Hærem et al., 2015).
         Using EPIC EMR1 audit trail data from three different types of clinics (dermatology,
orthopedic surgery, and pediatric oncology), I first examine if more involvement of specialized
roles in a process has causal effects on enacted complexity of patient visits. Intuitively, the
involvement of roles should increase enacted complexity because each role provides a
differentiated service from others. Adding more roles tends to add steps in the clinical process,
and more steps are associated with greater complexity (Wood, 1986). I also examine the effects
of role specialization on the complexity of patient visits. As I describe below, there are reasons to
expect that the effect could either increase or decrease enacted complexity.
1
  EPIC is the largest vendor of electronic medical record systems (Adsit et al., 2014; Holmgren et al., 2022)
                                                          15


        My main results are quite surprising. While individual-level theory of task complexity
(Campbell, 1988; Wood, 1986) suggests that more specialization should increase complexity, my
OLS regression results show that both indicators of specialization have significant, negative
effects on enacted complexity. This result may have been confounded by other important factors,
so I investigate the causal effects of specialization using a casual effect estimation, a generalized
propensity score matching method (Hirano & Imbens, 2004; Wu et al., 2018).
        I organize this essay as follows. In the next section, I provide theoretical background for
the development of models for the relationship between enacted complexity and contextual
factors and develop hypotheses. I then describe the research context and the dataset for the
empirical test and introduce the research model. Next, I interpret the results to explain how and
why specialization reduces enacted complexity. In the last section, I discuss the implications and
generalizability of this study.
1.2. Theoretical Background
1.2.1. Enacted Complexity
For this study, I first need to understand the concept of enacted complexity in a process.
Complexity has been studied as a key concept in diverse fields including business process, IS
and organization theory (Byström & Järvelin, 1995; Merali, 2006; Moldoveanu & Bauer, 2004;
Rivkin & Siggelkow, 2007; Simon, 1969; Zhou, 2013), but the traditional standard framework of
task complexity has been developed based on the concept brought from organizational
psychology (Campbell, 1988; Weick, 1965; Wood, 1986). Traditionally, task complexity is
described as the relationship between task inputs; required acts, and information cues to
complete tasks (Wood, 1986) and generally focuses on the individual level. The traditional
model of task complexity (Campbell, 1988; Wood, 1986) is based mainly on the number of
                                                   16


“required acts” (Liu & Li, 2012; Wood, 1986), independent of who performs the acts or where
they are performed. This point of view on complexity is based on decontextualized actions, so it
overlooks potential contextual factors, such as the role of the person performing the work
(Hackman, 1969).
         However, most organizational processes (such as outpatient clinical visits) are not
enacted by single individuals (Hærem et al., 2021; March & Simon, 1958; Nelson & Winter,
1982) and they are deeply enmeshed in organizational context (Avgerou, 2019; Rosemann et al.,
2008). Thus, I need a concept of complexity for organizational processes that is distinct from the
individual level task complexity.
         To address this problem, Hærem et al. (2021) introduce the idea of enacted complexity to
describe processes that are enacted by multiple actors within the organizational routines. Hærem
et al. (2015) extended the concept of task complexity to tasks that multiple actors perform and
integrate the concept with material context. The extended concept assumes that tasks are
embedded in a socio-material context (D'Adderio, 2011; Leonardi, 2011). The concept of
enacted complexity has started to appear in empirical research (Danner-Schröder & Ostermann,
2022; Goh & Pentland, 2019; Hansson et al., 2021).
         It is important to note that enacted complexity refers to EMR utilization (the record-
keeping process), not the complexity of the underlying EMR system. Complexity is not an
absolute property of an object or a system but depends on how the system is represented. Any
measure of complexity starts from a description of the identifiable regularities within the
particular empirical domain (Flood, 1987; Gell‐Mann & Lloyd, 1996). Thus, I define an index of
complexity, not an absolute number. Established measures of complexity from other disciplines,
                                                  17


such as the Lempel-Ziv complexity (Kaspar & Schuster, 1987; Lempel & Ziv, 1976) are indices
of complexity, not absolutes.
1.2.1.1. Complexity as a network phenomenon
In current theory, complexity arises from networks of interacting components (Kannampallil et
al., 2011; Kauffman, 1993). Drawing on Simon’s (1969) architecture of complexity and decades
of research on complex adaptive systems, Kannampallil et al. (2011) provide a framework that
embodies two key dimensions, as shown in Figure 1.1: components and relations. Components
correspond to the “required acts” that Wood (1986) uses to define component complexity: a task
with more “required acts” has greater component complexity.
        I can interpret the axes in Figure 1.1 in network terms. Components can be represented
by nodes in a network, as Wood (1986, p. 78) does when showing the sequence of actions
required to land an airplane. The relatedness of the components is represented by the edges in the
network. In Kauffman’s (1993) influential “nk” model of complex dynamic systems, the “n”
stands for the number of nodes in a network, and “k” stands for the degree of relatedness of those
nodes. For a given number of nodes (components), a network with more edges (relations) is
more complex.
                                                  18


FIGURE 1.1. COMPLEXITY AS A FUNCTION OF COMPONENTS AND RELATIONS
                             (Adapted from Kannampallil et al. 2011)
        Hærem et al. (2015) build on the network representation to extend the traditional idea of
task complexity introduced by Wood (1986) to include tasks performed by multiple actors.
Given a network that represents a task, enacted complexity can be operationalized as the number
of possible paths for getting the task done (Goh & Pentland, 2019; Pentland et al., 2020). This
definition relies on the same intuition as Wood’s (1986) concept of coordinative complexity,
which is based on the number of paths in an idealized model of a task (not the task enactments).
        This is analogous to McCabe’s (1976) concept of cyclomatic complexity, in which the
number of executable paths through a software module is used as an index of complexity. Fewer
paths mean lower complexity; more paths mean greater complexity. Goh and Pentland (2019)
note that this method is just an approximation. It does not depend on having a specific start or
stop for the process. Goh and Pentland (2019) provide the following formula, which is based on
McCabe’s (1976) metric:
                                                 19


        (1)                       "#$%&'( )*+,-'./&0 = 10 "."$∗('()'*+,-('*./)
        where nodes refer to the number of unique actions in the network and edges are the
number of unique sequential pairs of actions in the network. Using this metric, tasks with a
single execution path have complexity equal to one.
TABLE 1.1. ACTION NETWORK COMPARISON FOR TWO DIFFERENT CLINICAL
VISITS
                                    Visit A                                 Visit B
      # Nodes                          53                                      53
      # Edges                          97                                     127
       # Paths                        926                                   133,484
      Enacted
     Complexity                       8.29                                   13.82
   (logged value)
  Network Shape
        I visualize narrative network for two different patient visits from my data to show how
nodes and edges affect enacted complexity (see Table 1.1). While visits A and B have the same
number of actions (53 unique actions), they have a different number of edges (97 vs. 127). The
different number of edges makes difference in the number of paths in the network. As a result,
there is a huge gap in enacted complexity between the two clinical visits.
        The example in Table 1.1 shows the importance of understanding complexity as a
network phenomenon. In the traditional, individual-level theory of task complexity, more nodes
                                                 20


indicate greater complexity (Wood, 1986). However, when I consider how the nodes are
connected, they may or may not result in a greater number of possible paths. Although the
number of nodes is the same between the two visits in Table 1.1, there is a huge gap in the
number of paths as the number of edges increases. My goal in the analysis section is to
understand how specialization affects the number of paths in the process.
1.2.2. Complexity in Healthcare
Complexity in healthcare has been both theoretically and practically challenging. The growth in
complexity of the healthcare systems has caused a challenging environment for healthcare
reform due to its own attributes of the healthcare area, characterized by intersecting biological,
social, and political systems (Blanchfield et al., 2010; Long et al., 2018). As a collection of
interconnected actions of individuals and technologies, healthcare systems are recognized as one
of the representative complex adaptive systems (Plsek & Greenhalgh, 2001).
        Many studies have warned about the growth of complexity in healthcare systems. The
biggest problem of increased complexity in healthcare systems is that it increases cost and waste
(Shrank et al., 2019). Blanchfield et al. (2010) find that excessive administrative complexity
costs about 12 percent of net patient service revenue. As such, administrative complexity has
been concerned as the largest waste in healthcare systems of the U.S. To reduce it, Shrank et al.
(2019) suggest eliminating process that does contribute to quality improvement and/or access to
care.
        The waste of complexity is derived from the increased interconnection within and across
components of systems (Simon, 1969). From network perspective, the individuals and
technologies in healthcare systems are considered as nodes in healthcare systems and their
interrelatedness denotes the edges of the network (Kannampallil et al., 2011). As modern
                                                  21


healthcare systems have been developed, the work tasks have been more specified and
distributed between diversified actors with new technologies. Thus, as a result of specified actors
and artifacts in the healthcare process, it makes the process more complex.
         As such, previous studies address complexity in healthcare system and describe the role
of actors and technologies in it. However, few studies are giving much attention to the
interrelatedness of contextual specifics in healthcare systems and empirically examining its
impacts on complexity of process (Kannampallil et al., 2011). Previous studies have
demonstrated that specialization improves performance at the organizational level under similar
conditions (Clark & Huckman, 2012; Kalra & Li, 2008). For example, Clark and Huckman
(2012) find that specialization in areas related to cardiovascular care has positive impacts on
performance of cardiovascular patients (positive spillovers) and there are complementarities in
specialization across related areas. Kalra and Li (2008) show that firms signal quality to their
consumers by specialization. However, these studies have not examined the relationship between
specialization and enacted complexity. Hence, in this study, I examine how the contextual factors
affect the complexity of the healthcare process using the data on the clinical documentation
process.
1.2.3. Number of Roles
It is easy to count the number of roles in a clinical process. Figure 1.2 shows a simple example.
On the left side, I see a process with one role. On the right side, I see a process with two
additional roles. New roles will always add to the number of nodes in the network. However,
whether there are more (or fewer) possible paths will depend on how those nodes are connected
in the network.
                                                  22


FIGURE 1.2. ONE ROLE VS. THREE ROLES IN A PROCESS
1.2.4. Role Specialization
In addition to the number of roles, I can consider how specialized the roles are. There is a
consensus that specialization has played an important role in organizations. To illustrate the role
of specialization, I use the concept of specialist and generalist. Prior literature shows that
specialists and generalists in organizations can be conceptualized as two dimensions; 1) the
extent of task concentration and 2) task variety (Fahrenkopf et al., 2020; Narayanan et al., 2009;
Staats & Gino, 2012; Tyler, 1973). For example, Fahrenkopf et al. (2020) define specialists as
“those who have worked in organizations with a high degree of division of work across
individuals” and generalists as “those who have worked in organizations with limited or no
division of work across individuals”.
        Specialists focus on and repeatedly execute a narrow range of tasks based on specific
knowledge for those tasks, whereas generalists can cover a broader range of tasks within an
organization (Vermeiren & Raeymaeckers, 2020). Figure 1.3 shows the network of events
visualizing how role specialization influences the number of paths in process. Red circles in the
network show tasks of a very specialized role and green ones indicate actions that a generalist
performs.
                                                  23


FIGURE 1.3. SPECIALIST VS. GENERALIST ROLES IN PROCESS
                                                         = Very Specialized Role
                                                         = Not Specialized Role
1.3. Research Context
I analyze data extracted from the EPIC Electronic Medical Record (EMR) audit trail at the
University of Rochester Medical Center (URMC). Clinic organization provides a clear example
of a complex service organization with multiple roles with different specialties and the audit trail
data shows how the clinics work. For example, when a patient visits a clinic, multiple roles are
involved. Figure 1.4 is an actual layout of a dermatology clinic from my data2. In this layout,
there are multiple roles in this layout working at different locations. The green squares are
workstations where the individuals can input or access information on the patient. While the
patient visits the clinic, multiple individuals input information on the patient at different
locations.
         In this layout, I can observe two different contextual factors in the documentation
process: roles and workstations. The specialized roles are moving from one room to the others,
2
  I appreciate the layout from Dr. Julie Ryan Wolf at the University of Rochester Medical Center.
                                                          24


               and they are creating different paths in the process by using different workstations at different
April 25, 2022
               locations. All workstations provide identical functions regardless of their location, but each role
                   Outpatient Dermatology Clinic
               uses it in distinctive ways because all the roles have different specialties.
               FIGURE 1.4. OUTPATIENT CLINIC LAYOUT
                           Physician
                                                                                                      Admin
                                                                                                       Staff
                                                   LPN
                                                      Clin Tech
                                                                                               Resident
                                                                                                                   9
                                                                                                                    9
               1.3.1. Three Kinds of Outpatient Clinics
               My data is extracted from the EPIC Electronic Medical Record (EMR) audit trail from 13
               different clinics with three different clinical specialties (dermatology, orthopedic surgery, and
               pediatric oncology) at the University of Rochester Medical Center (URMC). Table 1.2 shows
               brief information on three areas of medical practice in the data. The total number of roles is not
               the sum of each area because many of the roles exist in all clinics (e.g., physician, nurse, etc…)
               TABLE 1.2. NUMBER OF CLINICS, VISITS, AND ROLES FOR EACH SPECIALTY
                             Specialty                      # Clinic           # Visits         # Roles
                             Dermatology                        4               9,818               8
                             Orthopedic Surgery                 8              131,345             28
                             Pediatric Oncology                 1               6,285              22
                             Total                             13              143,347             29
                                                                  25


1.3.2. Clinical Roles are Specialized
As mentioned above, each clinical role has a specialized set of skills. Role specialization can be
seen in the data. Figure 1.5 shows the similarity among the roles based on the frequency of
actions each role performs. I compute cosine distance based on their actions to compare how
similar/different action patterns each specialized role has. Red colors show that the two roles
have different action patterns using workstation systems, while blue colors indicate the tendency
to have similar patterns. As I assumed, there exist similarities among the specialized providers
depending on the service area (e.g., administrative, technician, assistant, diagnosis, etc.) and they
are clearly differentiated from each other. For example, the technologist group (Supervisor
imaging X-ray, CT-technologist, and Radiology-technologist) have very similar action patterns
with each other but are different from anyone else. Figure 1.5 provides a clue on how specialized
the roles are in the clinical process and how the action patterns of each clinician can be
differentiated/classified. The number of roles and role specialization will be the two major
variables of interest in the analysis.
                                                 26


FIGURE 1.5. ROLES ARE SPECIALIZED
                                                   Insurance_Specialist                3.5
                                                   Admin_Assistant
                                                   Secretary
                                                   Phys_Support_Specialist/Scheduler   3
                                                   Exercise_Physiologist
                                                   Health_Proj_Coordinator
                                                   OAS                                 2.5
                                                   Staff
                                                   Prior_Auth_Specialist
                                                   Quality_Assurance_Liaison           2
                                                   Rad_Technologist
                                                   CT_Technologist
                                                   Supervisor_ImagingXray              1.5
                                                   FELLOW
                                                   Physician
                                                   Resident                            1
                                                   Nurse_Practitioner
                                                   Physician_Assistant                 0.5
                                                   Medical_Assistant
                                                   Clinical_Tech
                                                   Podiatry_Radiology_Assistant        0
                                                   Licensed_Nurse
                                                   Registered_Nurse
                                                   DEXA_scanner
                                                   Clinic_Administrator
                                                   Financial_Coordinator
                                                   Receptionist
                                                   Administrator
                                                   Physical_Therapist
               Physical_Therapist
               Administrator
               Receptionist
               Financial_Coordinator
               Clinic_Administrator
               DEXA_scanner
               Registered_Nurse
               Licensed_Nurse
               Podiatry_Radiology_Assistant
               Clinical_Tech
               Medical_Assistant
               Physician_Assistant
               Nurse_Practitioner
               Resident
               Physician
               FELLOW
               Supervisor_ImagingXray
               CT_Technologist
               Rad_Technologist
               Quality_Assurance_Liaison
               Prior_Auth_Specialist
               Staff
               OAS
               Health_Proj_Coordinator
               Exercise_Physiologist
               Phys_Support_Specialist/Scheduler
               Secretary
               Admin_Assistant
               Insurance_Specialist
                                    27


1.4. Hypothesis Development
I am concerned with the effect of roles and specialization on enacted complexity. For each
independent variable (number of roles and role specialization), there are competing hypotheses
about their effect on enacted complexity.
        As we know from the formula for enacted complexity, there is a balancing act between
nodes and edges in the network that represents the process. If there are more nodes (for a given
number of edges), complexity will go down. If there are more edges (for a given number of
nodes), complexity will go up. Thus, the main question is how the roles affect the number of
nodes and edges in the network.
1.4.1. Effect of Roles on Enacted Complexity
As each role has a specialized set of skills, a process enacted by more distinct roles will tend to
include more required acts (Wood, 1986). Medical services are typically delivered by teams of
providers with differentiated roles. By role, I mean “a comprehensive pattern for behavior and
attitude that is linked to an identity, is socially identified more or less clearly as an entity, and is
subject to being played recognizably by different individuals” (Turner, 2001, p. 234). Intuitively,
as each provider provides a differentiated service from others based on their role, adding more
roles implies additional tasks in the clinical process. For example, a patient who arrives at the
clinic might engage with several roles, including office staff, insurance pre-authorization, nurse,
physician, and clinical technician. When the same patient returns to the same clinic a week later,
the clinical process for the visit might be simpler as involving only two provider roles (e.g.,
office staff and physical therapist). As the increased number of actors creates more paths in the
action network, it can increase enacted complexity. Figure 1.2 simply shows how additional roles
can increase the number of actions in process. When the roles are added in process, nodes are
                                                     28


added to the network, and it could increase the number of paths by generating more relations
between the actions. For example, if a patient needs to see a clinical technician after seeing a
physician, then it implies that the patient needs additional care service before leaving the clinic.
This will generate additional steps and relations in the network for the patient visit. However, as
we have seen above, the effect on enacted complexity will depend on how those steps are
connected in a network. Thus, I offer two competing hypotheses
       H1a: Processes enacted with more roles will have more enacted complexity.
       H1b: Processes enacted with more roles will have less enacted complexity.
1.4.2. Effect of Role Specialization on Enacted Complexity
Next, I consider the effects of the degree of role specialization on enacted complexity. Previous
studies have demonstrated that specialization improves performance at the organizational level
but have not examined the effects of specialization on complexity (Clark & Huckman, 2012;
Kalra & Li, 2008). Although medical settings consist of specialized tasks mostly, the depth of
specialization of each role would be different depending on the roles that clinicians play in the
clinical process. For example, nurse practitioners generally cover more various tasks than CT
technologists and exercise physiologists have a smaller number of tasks than physicians. As
such, each specialized role has a different degree of specialization and the impacts of each role
on the clinical process vary depending on how specialized the roles in a clinical visit are.
However, the effect of specialization will depend on whether the specialized roles add more
nodes or more edges to the network. The examples in figure 1.6 suggest two possible cases. In
one case, a specialized role adds three new actions that are sparsely connected to the other
actions in the visit. In practice, this would mean that the new role has few handoffs with other
roles (e.g., an x-ray technician). In the other case, the specialized role adds three new actions that
                                                   29


are densely connected to the rest of the actions in the visit. In practice, this would mean that there
are a lot of handoffs between the new role (e.g., a nurse) and the other roles. These two different
cases lead us to two alternative hypotheses:
       H2a: Greater role specialization causes increased enacted complexity.
       H2b: Greater role specialization causes decreased enacted complexity.
FIGURE 1.6. THE SAME ROLE SPECIALIZATION COULD RESULT IN DIFFERENT
NUMBERS OF PATHS
1.5. Methodology
In this section, I explain how I compute each of the major variables used in testing the
hypotheses. I also explain the use of Generalized Propensity Score matching, which is used for
causal inference.
1.5.1. Computing Enacted Complexity
Enacted complexity is operationalized based on the actions in each outpatient visit. Each visit
can be represented as a narrative network and enacted complexity is indexed by the number of
paths through the network (Goh & Pentland, 2019). To operationalize, I aggregate the action
trace data at the visit level. I extract unique actions with two immediate contextual specifics:
roles and workstations, in each process and compute the time spent to input the data in the
                                                    30


system for each patient visit in the EMR. The extracted actions in each visit are used as nodes in
the action network for each visit.
        Next, to compute the enacted complexity, I use the concept that Hærem et al. (2015)
suggest. Based on conceptualizing patterns of action as directed graphs, this concept allows
measuring the complexity of a task as enacted by multiple actors. To estimate enacted
complexity, I use the formula in equation (1) based on the network for each visit. The nodes in
the network represent the unique, contextually specific combinations of action, role, and
workstation that are observed in the data for each visit. A typical example would be a nurse
checking medications at a workstation in the examination room. Figure 1.7 shows how the
process can be represented as a network.
FIGURE 1.7. NARRATIVE NETWORK WITH ROLE AND LOCATION
1.5.2. Computing the Specialization Index
Next, I describe the construction of a new variable, the specialization index, which captures the
extent to which the roles involved in a patient visit are specialized. The specialization index is
                                                 31


the ratio of the unique actions that each role performs to the total unique actions performed by all
roles in the system. The index is constructed as follows:
                                           N( unique actions performed by role i)
                   (2)            %! = −
                                                N(unique actions in the system)
        At one extreme, %! = −1 would mean that role i performs every action in the systems at
least once. The index will be lower when the role i performs fewer actions. I further
                                                                                 #!"
operationalize a weighted specialization index. The weight is given as 3!" =     $"
                                                                                     where 4!" is the
number of actions a specialized role i performs in the patient visit j and 5" is a total number of
actions performed for the patient visit j. I place the weights on each role in the patient visits and
calculate the average weighted specialization index for each patient visit as
                                              %
                    (3)                 6" = & ∑'!(% 3!" %!
                                               "
        where 8" is the number of specialized roles in the patient visit j. Based on the visit level
specialization index, I examine the relationship between specialization degree and enacted
complexity of patient encounters.
1.5.3. Generalized Propensity Score Matching Method
I estimate causal effects using the generalized propensity score (GPS) (Hirano & Imbens, 2004).
I investigate the expected outcome at different levels of two continuous variables: 1) the number
of specialized roles and 2) specialization index in equation (3). To accommodate continuous
variables (also called “exposures”), I use the Generalized Propensity Score (GPS), which is
defined as the conditional density function of the exposure given the covariates (Hirano &
Imbens, 2004; Imbens, 2000; Wu et al., 2018). GPS is widely used for causal inference and the
basic idea for this method is to get the same confidence with a random assignment experiment,
                                                   32


but with my current dataset. It has a balancing property that is conditional on observable
covariates. If subjects belong to the same GPS strata, the exposure level is regarded as random.
Therefore, in this study, I use a robust GPS matching approach, proposed by Wu et al. (2018), to
remove bias and estimate the exposure-response function.
        The main goal of the GPS matching method is to find matched observations by assessing
the balance of covariates across different levels of specialization in the data. Specifically, first, I
compute a GPS for each data point based on a function of the exposure and other observed
covariates. Next, I find an observation that has the closest values of exposure and GPS to E and
f(E|X). I use the outcome of this observation as the counterfactual outcome of a subject with X
and E. The matched unit is used as a valid representation of observations with the exposure level,
considering the potential confounders have been adjusted. Finally, the expected outcome at a
predetermined exposure level is estimated by averaging the outcomes of the matched units with
such an exposure value.
1.6. Data Description
I used audit trail data from the Electronic Medical Record (EMR) at the University of Rochester
Medical Center (URM). The collected data traces actions of the medical record-keeping process
for each patient from 24 clinics (4 dermatology, 19 orthopedic surgery, and 1 pediatric
oncology). The data includes 143,347 patient visits from April 2nd, 2018 to November 29th,
2018. Each observation contains contextual factors for patient visits: role, workstation, diagnosis
group, as well as timestamps. Especially, roles and workstations are closely interrelated with the
actions because some actions only can be performed by specific roles at specific locations. I
consider the role and workstation as immediate contextual factors, which are directly related to
                                                 33


actions in process (Rosemann et al., 2008). Table 1.3 describes the first five minutes of one visit
as an example of the data from the first five minutes of one visit.
TABLE 1.3. EXAMPLE DATA
                                                                                            Clinic
     Time                Action                 Role         WorkStation       Diagnosis     ID
  2/2/15 8:53        Checkin Time            Admin Tech             W1         Neoplasm       A
  2/2/15 8:53         Mr_Snapshot            Admin Tech             W1         Neoplasm       A
  2/2/15 8:53         Mr_Reports             Admin Tech             W1         Neoplasm       A
  2/2/15 8:53         Mr_Snapshot            Admin Tech             W1         Neoplasm       A
  2/2/15 8:53         Mr_Reports             Admin Tech             W1         Neoplasm       A
  2/2/15 8:55         Mr_Snapshot            Admin Tech             W1         Neoplasm       A
  2/2/15 8:55         Mr_Reports             Admin Tech             W1         Neoplasm       A
  2/2/15 8:56         Mr_Snapshot            Admin Tech             W1         Neoplasm       A
  2/2/15 8:56         Mr_Reports             Admin Tech             W1         Neoplasm       A
  2/2/15 8:56      Ac_Visit_Navigator         Lic.Nurse             W3         Neoplasm       A
  2/2/15 8:56         Mr_Histories            Lic.Nurse             W3         Neoplasm       A
  2/2/15 8:56      Mr_Enc_Encounter           Lic.Nurse             W3         Neoplasm       A
  2/2/15 8:56        Mr_Vn_Vitals             Lic.Nurse             W3         Neoplasm       A
  2/2/15 8:56         Mr_Reports              Lic.Nurse             W3         Neoplasm       A
  2/2/15 8:56          Flowsheet              Lic.Nurse             W3         Neoplasm       A
  2/2/15 8:56      Mr_Vn _Complaint           Lic.Nurse             W3         Neoplasm       A
  2/2/15 8:56         Mr_Reports              Lic.Nurse             W3         Neoplasm       A
  2/2/15 8:56         Mr_Snapshot             Lic.Nurse             W3         Neoplasm       A
  2/2/15 8:56         Mr_Reports              Lic.Nurse             W3         Neoplasm       A
  2/2/15 8:57         Mr_Reports             Admin Tech             W1         Neoplasm       A
  2/2/15 8:57         Mr_Snapshot            Admin Tech             W1         Neoplasm       A
  2/2/15 8:58         Mr_Reports              Lic.Nurse             W2         Neoplasm       A
  2/2/15 8:58      Ac_Visit_Navigator         Lic.Nurse             W2         Neoplasm       A
  2/2/15 8:58      Mr_Enc_Encounter           Lic.Nurse             W2         Neoplasm       A
  2/2/15 8:58         Mr_Histories            Lic.Nurse             W2         Neoplasm       A
  2/2/15 8:58         Mr_Reports              Lic.Nurse             W2         Neoplasm       A
  2/2/15 8:58        Mr_Vn_Vitals             Lic.Nurse             W2         Neoplasm       A
  2/2/15 8:58          Flowsheet              Lic.Nurse             W2         Neoplasm       A
  2/2/15 8:58         Mr_Reports              Physician             W4         Neoplasm       A
  2/2/15 8:58        Mr_Vn_Vitals             Lic.Nurse             W2         Neoplasm       A
  2/2/15 8:58         Mr_Histories            Lic.Nurse             W2         Neoplasm       A
  2/2/15 8:58         Mr_Histories            Lic.Nurse             W2         Neoplasm       A
       ...                  ...                   ...               ...           ...         ...
                                                 34


The shaded rows in Table 1.3 show how the role and workstation change throughout a visit at the
level of individual actions. In contrast, Diagnosis and Clinic ID could be interpreted as external
factors as they have the same values throughout the visit.
        This data provides a unique opportunity to study the effects of specialization in a
narrative network. This is because it includes fine-grained, time-stamped information about
actions and roles, which vary throughout each patient visit. With two years of data, I can see how
routines change over time. It provides a detailed trace of actions that are taken in the
recordkeeping work for each clinic day. This allows us to analyze complex action patterns in
each visit.
        The number of roles is simply the number of unique roles within each patient visit. There
are 30 types of specialized roles (physician, clinical tech, licensed nurse, residents, etc.). I count
the number of unique roles that participated in the clinical process during each patient visit. I
also count workstations and other factors that could influence the complexity of the visit. These
are used as control variables in the analysis. Table 1.4 shows descriptive statistics of the
variables used for the study.
TABLE 1.4. DESCRIPTIVE STATISTICS
                            Variable                      Obs       Mean       Std. Dev.
                      Enacted Complexity                143,663      6.86         3.41
                      Specialization Index              143,663      -0.12        0.06
                    Logged Number of Roles              143,663      1.69         1.51
                Logged Number of Workstations           143,663      2.19         3.31
                 Logged Number of Procedures            143,663      0.41         0.98
                   Logged Number of Events              143,663      5.40         0.48
                     Logged Visit Duration              143,663      2.58         1.34
                                                  35


         I also control for the visit level observed heterogeneity by adding the number of
workstations, the number of events, performed procedures, and the duration of the visit, all of
which are visit-varying variables. The complexity of the narrative network may vary depending
on the procedures because the likelihood of actions on the procedures may differ. Duration time
for the visit also needs to be controlled, because the required time for each visit also changes
according to the patient visits. Lastly, I capture the variation by adding the number of events
since longer visits (with more events) tend to have larger networks (more unique nodes and
edges) and greater enacted complexity.
1.7. Model Estimation and Results
To examine the effects of the contextual specifics on the enacted complexity of the clinical
process, I specify two cross-sectional models. Two models are needed because the two aspects of
specialization include overlapping information and cannot be included in the same model. In
each model, the complexity of visit i’s network is a function of specialization and a set of control
variables:
  (4) log;<" = = > + @ABC" D% + E)*+,-.#.!*'" F% + E/+*0123+1 F4 + GHI4JKAE" F5 + E161'.- " F7
                                                                     "
                   + L" + M" + N"
 (5) log;<" = = > + 6" D% + E)*+,-.#.!*'" F% + E/+*0123+1 F4 + GHI4JKAE" F5 + E161'.- " F7 + L"
                                                                 "
                  + M. + N"
         In both models, <" represents the enacted complexity computed based on nodes and edges
in visit j. @ABC" denotes the vector of a variable for specialized roles: number of roles (eq. (4))
and 6" is the specialization index for visit j. (eq. (5)). I also add the vector of control variables
such as the number of workstations, events, and procedures and time duration of visits in
                                                    36


seconds. Lastly, L" refers to time-invariant clinic fixed effects and M. are time fixed effects to
capture unobserved heterogeneity of seasonality.
1.7.1. OLS Estimation
In this section, I report the results of ordinary least squares (OLS) regression. Overall, I observe
strong significant effects of the number of specialists and degree of role specialization on the
enacted complexity (see Table 1.5).
TABLE 1.5. RESULTS OF REGRESSIONS ON ENACTED COMPLEXITY
                           VARIABLES                     (1)          (2)
                           Number of roles          -0.4604***
                                                      (0.0370)
                           Specialization Index                  -11.9598***
                                                                   (0.3141)
                                                      (0.0087)     (0.0084)
                           Constant                -27.2624*** -32.8934***
                                                      (0.2259)     (0.3134)
                           Observations               143,663      143,663
                           R-squared                   0.7492       0.7655
                           YM Dummies                   YES          YES
                           Workstation Control          YES          YES
                           Events Control               YES          YES
                           Procedure Control            YES          YES
                           Duration Control             YES          YES
                           Clinic Control               YES          YES
                                 Robust standard errors in parentheses
                                   *** p<0.001, ** p<0.01, * p<0.05
         The first column in Table 1.5 shows the effects of specialization in the clinical process: 1)
the number of roles (column (1)) and 2) the degree of specialization on the enacted complexity
(column (2)). I check the variance inflation factor (VIF) for the concern on multicollinearity
among the variables for the explanatory variable (Belsey et al., 1980). The VIF value is less than
four, which ensures that multicollinearity is not a concern. As seen in column (1) in Table 1.5,
                                                     37


more roles are negatively associated with enacted complexity at a significant level. This result
shows that more specialists tend to simplify the process, consistent with hypothesis H1b.
         Consistent with the results for the number of roles, the role specialization index also
shows a negative and significant association with enacted complexity. This is consistent with
hypothesis H2b. Thus, both results show a negative relationship between specialization and
enacted complexity.
1.7.2. Sensitivity Analysis
         From the OLS estimation, I recognize there may be concerns about biased effects due to
unobserved or omitted confounding variables. To prevent invalid inferences, I leverage my data
and design as much as possible. Specifically, I controlled for the number of workstations, events,
and procedures and the time duration of visits in seconds, clinics, and seasonality. Nonetheless,
there may still be concerns about omitted variables. Therefore, I use the Konfound-it app to
conduct sensitivity analysis (Frank et al., 2013). I quantify how strongly an omitted confounding
variable would have to be correlated with specialization and enacted complexity to invalidate
any inferences I made (Frank, 2000) and how much bias there would have to be due to the
omitted variables or any other source (Frank et al., 2013).
1.7.2.1. Robustness of inference to case replacement (RIR)
First, I draw on Frank et al (2013) as in the Konfound-it app to quantify how much bias there
would have to be due to omitted variables or any other source to invalidate our inference. The
results indicate that 84.249% of the estimated effect of the number of roles on enacted
complexity would have to be due to bias to invalidate the inference of an effect of the number of
roles. Correspondingly, to invalidate the inference one would have to replace 84.249% of the
observed data with null hypothesis cases of no effect of the number of roles. For the
                                                 38


specialization index, to invalidate an inference, 94.853 % of the estimate would have to be due to
bias.
1.7.2.2. Impact threshold for omitted variable
Next, I also quantify how strongly an omitted confounding variable would have to be correlated
with specialization and enacted complexity to invalidate our inference. For the number of roles,
the result indicates that an omitted variable must be correlated at 0.167 with the explanatory
variable and with enacted complexity (with opposite signs) to invalidate the inference.
Correspondingly, the impact of an omitted variable must be 0.028 to invalidate the inference.
For the specialization index, the minimum impact to invalidate an inference of an effect of
specialization on enacted complexity is based on a correlation of 0.309 with the outcome. This
implies that the impact of an omitted variable must be 0.095 to invalidate the inference.
        The results of the sensitivity analysis imply the possibility of a confounding effect,
especially for the number of roles (0.167), as the correlation coefficient lower than 0.2 is
normally considered a weak correlation by social science standards (Cohen & Cohen, 1983).
Thus, in the next section, I adjust for any potential confounding effects using the generalized
propensity score (GPS) matching method (Wu et al., 2018).
1.7.3. Causal Effect Estimation
I use the GPS matching method to adjust for the potential confounder effects and remove the
endogeneity bias. I use R package CausalGPS for the GPS matching (Wu et al., 2018). First, I
use a non-parametric, cross-validation-based SuperLearner algorithm to estimate the GPS of
specialization (the number of roles and specialization index) conditioning on all other covariates
including potential confounders. SuperLearner is an algorithm that uses cross-validation to
estimate the performance of multiple machine learning models, or the same model with different
                                                  39


settings (Kennedy et al., 2017; van der Laan et al., 2007). I implement and combine four
different algorithms: 1) extreme gradient boosting machines, 2) multivariate adaptive regression
splines, 3) generalized additive models, and 4) random forest, using the SuperLearner R package
(Polley & van der Laan, 2010). Next, I use the caliper matching function to approximate
randomized data points with the balanced pre-exposure covariates by jointly matching the units
on the estimated GPS and treatment. To do this, I tune 1) the caliper parameter as the radius of
the neighborhood around the exposure level and 2) the scale parameter, which assigns weight
between the exposure and the estimated GPS. The specified caliper matching function is as
follows:
        (8)        P89: (C, 3) = arg          min      || (UC ∗ (3! , V! ), (1 − U)3!∗ ) − (UC ∗ , (1 − U)3 ∗ )||
                                       !:<! ∈[<?@,<B@]
            where 3! is the ith exposure level, 3!∗ and C ∗ represent the standardized Euclidean
transformed exposure and GPS estimates, δ is the caliper parameter, λ is the scale parameter, and
||.|| is a Manhattan distance matching method. I rely on the data-driven method to find the best
combination of the parameters that lead to the smallest absolute correlation between the
covariates and exposure. The goodness of a covariate matching is quantified by absolute
correlation: a value below 0.1 indicates a good balance of the covariate (Wu et al., 2018; Zhu et
al., 2015). After the data-driven process, I use the caliper matching function with the scale and
caliper parameters equal to 1.0 and 0.16, respectively, to match the subjects. I have assessed
absolute correlations of all the covariates across different levels of exposure and the average
absolute correlation is 0.87 (number of roles) and 0.67 (specialization index), indicating the
covariates are well balanced. Finally, I generate the matched data by the imputed outcome values
from the caliper matching function.
                                                      40


        Next, using the matched dataset, I estimate a smooth exposure-response function by the
non-parametric kernel smoothing. The kernel smoothing fits a kernel smoother on the generated
matched set to get the smoothed average exposure-response function (Wu et al., 2018).
        Figure 1.8 shows a negative causal relationship between the number of roles and enacted
complexity. The figure shows that the magnitude of causal effects is very substantial. The
analysis implies that a lower level of complexity is expected if a greater number of specialized
roles are involved in the patient visit. This result demonstrates that the unexpected effect of the
number of roles in OLS regression has substantial causal effects on enacted complexity.
                                                   41


FIGURE 1.8. CAUSAL RELATIONSHIP BETWEEN NUMBER OF ROLES-ENACTED
COMPLEXITY                Number of Roles and Enacted Complexity
                                                  8.4
                             ln(ARW Complexity)
                                                  8.0
                                                  7.6
                                                          1.5     1.6           1.7                1.8           1.9   2.0
                                                                                      ln(N role)
FIGURE 1.9. CAUSAL RELATIONSHIP BETWEEN SPECIALIZATION INDEX-
ENACTED COMPLEXITY
                                                  7.000
                                                  6.995
                          ln(ARW Complexity)
                                                  6.990
                                                  6.985
                                                  6.980
                                                          −0.20         −0.15            −0.10           −0.05         0.00
                                                                                Specialization Index
       The result of causal estimation for the specialization index is also consistent with the
OLS regression. Figure 1.9 shows that the higher the specialization index patient visit has, the
less complex process of the visit tends to be. As seen in the figure, some of the visits with lower
specialization index increase enacted complexity, but mostly the magnitude of the effects is
substantial. This result indicates that role specialization reduces enacted complexity even after
the confounding effect is adjusted.
                                                                                      42


1.8. Discussion
Intuitively, when more roles are involved in a process, or the involved roles are more specialized,
a process seems likely to be more complex. While there may indeed be more required acts
(Wood, 1986), my results show that the workflow has lower enacted complexity. This
paradoxical result has some interesting implications.
1.8.1. Specialization Makes Workflows Simpler
This study points to a fundamental concept of specialization in organizational work structure.
Compared to generalists, who perform a large number of actions, specialists focus on a relatively
small number of distinct actions (Fahrenkopf et al., 2020; Narayanan et al., 2009). Specialists
tend to reduce enacted complexity because they have a narrow and deep task range, and there are
fewer relations between actions. In healthcare settings, all the provider roles are considered as
specialists as every provider has their own specialty.
        I visualize the effect of specialists on process enactment in terms of the narrative network
(see Figure 1.10). While the graph-based only on actions has a smaller number of nodes
compared to the context-aware network which considers specialized roles for nodes, it has many
more edges between nodes, which increases enacted complexity. As such, contrary to intuition,
adding more roles tends to simplify the graph because, in a healthcare setting, roles tend to be
specialists. Specialization tends to decrease the enacted complexity of the clinical workflow.
                                                   43


FIGURE 1.10. THE VISUALIZED EFFECT OF SPECIALISTS ON ENACTMENT OF
PROCESS
                                                        Visit level
                       Representation                    (1 visit)
                         Action only
                       Context-aware
                        (action + role)
        My analysis also leads to important substantive findings on roles and specialization. As a
component of task complexity, the traditional component complexity states that a task gets more
complex when there are more events (actions) because it is based on the only “content of
activity” (Campbell, 1988; Wood, 1986). While it is undeniable that the number of events is an
important factor to be considered, there can be also many other factors that have an impact on
complexity. For example, the applied technologies in the work process affect the individuals’
work practice the and structure of organizations (Orlikowski & Barley, 2001), and this change
causes significant complexity and variation (Butler & Gray, 2006). I acknowledge the potential
influence of social and material factors on the complexity of how the process is enacted. To
address this gap, I examine the extent to which the enacted complexity is influenced by the social
and material context of work. I specifically focus on the social factor, specialized workers,
controlling the material context, represented as digitalized systems for the tasks. The results of
                                                 44


this study reveal that patient visits with more specialized roles decrease enacted complexity,
compared to less specialized providers.
1.8.2. Enacted Complexity as a Network Phenomenon
The results of this study show that specialists tend to decrease enacted complexity. This
contradicts practitioner literature, which has argued that more touchpoints result in greater
complexity (Rawson et al., 2013; Richardson, 2010). It also contradicts the traditional theory of
individual-level task complexity, where more required acts indicate greater complexity (Wood,
1986). The critical difference is that I conceptualize enacted complexity as a network
phenomenon. The measure of enacted complexity considers how the touchpoints (or required
acts) are related (Kannampallil et al., 2011; Kauffman, 1993).
        This network perspective provides a framework for managing enacted complexity in the
process. To reduce complexity at the systemic level, it helps to untangle the network. Fewer
edges will tend to reduce the space of possible paths. To increase complexity, it helps to add
edges. The goal should be to minimize excess complexity. The contribution of this study to the
practical problem is simple: the number of touchpoints (or required acts) does not tell the entire
story. Enacted complexity grows exponentially as a function of the number of relations between
nodes in the network.
1.8.3. Limitations
This study has several limitations. First, although I investigate the relationship between
specialization and enacted complexity of process, it does not directly measure how specialization
affects organizational performance. For example, there exists a big difference between specialists
and generalists in terms of learning and productivity (Narayanan et al., 2009). Specialization of
tasks for specialists enables a deeper understanding of concentrated tasks based on the learning
                                                  45


curve than less specialized individuals (Dane, 2010; Flueckiger, 1976). In contrast, generalists
can get impeded when it comes to learning tasks, as they are easily exposed to too much variety
of tasks. For future study, studying the impact of specialization on learning and productivity of
process would help us understand the quality of the organizational process. Second, this study
examines the antecedents of enacted complexity in the clinical documentation process, but EMR
records do not represent all the clinical processes in the clinics. It would be interesting to
examine other settings of the clinical process. Second, this study examines the antecedents of
enacted complexity, but I also need to examine the consequences. For future work, studying to
operationalize enacted complexity could give us a better understanding of the effects of enacted
complexity in process.
1.9. Conclusion
The findings provide a unique opportunity to theorize on the relationship between specialization
and enacted complexity in the clinical documentation process. Using simple measurements of
specialization, I find that greater specialization causes lower enacted complexity. Adding a
specialized role into the process decrease enacted complexity because each role performs a set of
distinct actions that are sparsely connected with the actions performed by other roles. As a result,
the network as a whole becomes less densely connected and less complex.
        This study deepens our understanding of the context in the organizational process. Roles
and role specialization are established aspects of organizational design and structure, but their
impact on process structure has not been examined. While prior works have focused on the
effects of specialization or generalist experience on organizational performance, I identify the
effects of specialization on enacted complexity (Fahrenkopf et al., 2020; Narayanan et al., 2009).
                                                   46


The results of this study suggest the potential benefit of specialization of roles and its impact on
the simplification of process.
         By focusing on roles and role specialization, this study examines the contextual
antecedents of enacted complexity. The content of an activity affects complexity of the action
patterns, but at the same time, there needs to be a consideration of the potential influence of the
context of the activity. The traditional model of task complexity (Campbell, 1988; Wood, 1986)
explicitly excludes the effects of context and process enactment. In doing so, it overlooks the
potential influence of social factors (such as role structure) on the complexity of process
enactment. This study addresses this gap and examines the extent to which the context of the
work influences the complexity of action patterns in the clinical documentation process.
         Lastly, I also shed light on the possibility of automation for tasks in healthcare
information systems. Process mining studies have focused on automated process discovery. The
complexity of workflows has been considered as one of the biggest barriers to actualizing
automation of processes across industries because it is hard to anticipate potential errors (Fast-
Berglund et al., 2013; Lyell & Coiera, 2017; Rojo Abollado et al., 2017; Woods, 1996). Augusto
et al. (2022) show that automated process discovery can be more challenging when the event log
records a small amount of process behavior that varies greatly than when the event log records a
huge amount of process behavior that varies little. I can interpret this as the automatically
discovering process is more difficult when there are more relations among actions. Hence,
reducing the complexity is the first step toward the automation of the process. To do this, first I
need to understand the structure of the process based on the contextual factors and see how much
process is entangled. Considering process as sequences of actions (events) may allow us to see
only the tip of an iceberg of the process because paths cannot be revealed without considering
                                                    47


contextual specifics (Leopold et al., 2018). Even if it looks like just one action, each action has a
different depth of explanation on the event because an action can be different “events” depending
on who performed the action or where the action was performed. Thus, context-awareness
provides a deeper level of understanding of process.
                                                48


BIBLIOGRAPHY
      49


                                        BIBLIOGRAPHY
Adsit, R. T., Fox, B. M., Tsiolis, T., Ogland, C., Simerson, M., Vind, L. M., Bell, S. M., Skora,
        A. D., Baker, T. B., & Fiore, M. C. (2014). Using the electronic health record to connect
        primary care patients to evidence-based telephonic tobacco quitline services: a closed-
        loop demonstration project. Translational behavioral medicine, 4(3), 324-332.
Augusto, A., Mendling, J., Vidgof, M., & Wurm, B. (2022). The connection between process
        complexity of event sequences and models discovered by process mining. Information
        Sciences, 598, 196-215.
Avgerou, C. (2019). Contextual explanation: Alternative approaches and persistent challenges.
        MIS Quarterly, 43(3), 977-1006.
Batista, N., Batista, S. H., Goldenberg, P., Seiffert, O., & Sonzogno, M. C. (2005). Problem-
        solving approach in the training of healthcare professionals. Revista de Saúde Pública,
        39, 231-237.
Belsey, D. A., Kuh, E., & Welsch, R. E. (1980). Regression diagnostics: Identifying influential
        data and sources of collinearity. John Wiley.
Blanchfield, B. B., Heffernan, J. L., Osgood, B., Sheehan, R. R., & Meyer, G. S. (2010). Saving
        billions of dollars—and physicians’ time—by streamlining billing practices. Health
        Affairs, 29(6), 1248-1254.
Butler, B. S., & Gray, P. H. (2006). Reliability, mindfulness, and information systems. MIS
        Quarterly, 30(2), 211-224.
Byström, K., & Järvelin, K. (1995). Task complexity affects information seeking and use.
        Information processing & management, 31(2), 191-213.
Campbell, D. J. (1988). Task complexity: A review and analysis. Academy of Management
        Review, 13(1), 40-52.
Clark, J. R., & Huckman, R. S. (2012). Broadening Focus: Spillovers, Complementarities, and
        Specialization in the Hospital Industry. Management Science, 58(4), 708-722.
Cohen, P., & Cohen, J. (1983). Applied Multiple Regression/Correlation Analysis for the
        Behavioral Sciences (2nd ed.). Erlbaum.
Crowston, K. (1997). A coordination theory approach to organizational process design.
        Organization Science, 8(2), 157-175.
                                                  50


D'Adderio, L. (2011). Artifacts at the centre of routines: Performing the material turn in routines
       theory. Journal of Institutional Economics, 7(2 Spec), 197-230.
Dane, E. (2010). Reconsidering the trade-off between expertise and flexibility: A cognitive
       entrenchment perspective. Academy of Management Review, 35(4), 579-603.
Danner-Schröder, A., & Ostermann, S. M. (2022). Towards a Processual Understanding of Task
       Complexity: Constructing task complexity in practice. Organization studies, 43(3), 437-
       463.
Fahrenkopf, E., Guo, J., & Argote, L. (2020). Personnel mobility and organizational
       performance: The effects of specialist vs. generalist experience and organizational work
       structure. Organization Science, 31(6), 1601-1620.
Fast-Berglund, Å., Fässberg, T., Hellman, F., Davidsson, A., & Stahre, J. (2013). Relations
       between complexity, quality and cognitive automation in mixed-model assembly. Journal
       of manufacturing systems, 32(3), 449-455.
Flood, R. L. (1987). Complexity: A definition by construction of a conceptual framework.
       Systems research, 4(3), 177-185.
Flueckiger, G. E. (1976). Specialization, learning by doing and the optimal amount of learning.
       Economic Inquiry, 14(3), 389-409.
Frank, K. A. (2000). Impact of a confounding variable on a regression coefficient. Sociological
       Methods & Research, 29(2), 147-194.
Frank, K. A., Maroulis, S. J., Duong, M. Q., & Kelcey, B. M. (2013). What would it take to
       change an inference? Using Rubin’s causal model to interpret the robustness of causal
       inferences. Educational Evaluation and Policy Analysis, 35(4), 437-460.
Gell‐Mann, M., & Lloyd, S. (1996). Information measures, effective complexity, and total
       information. Complexity, 2(1), 44-52.
Goh, K. T., & Pentland, B. T. (2019). From Actions to Paths to Patterning: Toward a Dynamic
       Theory of Patterning in Routines. Academy of Management Journal, 62(6), 1901-1929.
Gottlieb, J. D., Shapiro, A. H., & Dunn, A. (2018). The complexity of billing and paying for
       physician care. Health Affairs, 37(4), 619-626.
Hackman, J. R. (1969). Toward understanding the role of tasks in behavioral research. Acta
       psychologica, 31, 97-128.
Hærem, T., Pentland, B. T., & Miller, K. D. (2015). Task complexity: Extending a core concept.
       Academy of Management Review, 40(3), 446-460.
                                                 51


Hærem, T., Yooeun, J., & Hansson, M. (2021). Complexity in routine dynamics. Cambridge
       Handbook of Routine Dynamics.
Hansson, M., Hærem, T., & Pentland, B. T. (2021). The effect of repertoire, routinization and
       enacted complexity: Explaining task performance through patterns of action.
       Organization studies, 01708406211069438.
Hirano, K., & Imbens, G. W. (2004). The Propensity Score with Continuous Treatments. Applied
       Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives, 226164,
       73-84.
Holmgren, A. J., Downing, N. L., Tang, M., Sharp, C., Longhurst, C., & Huckman, R. S. (2022).
       Assessing the impact of the COVID-19 pandemic on clinician ambulatory electronic
       health record use. Journal of the American Medical Informatics Association, 29(3), 453-
       460.
Huselid, M. A. (1995). The impact of human resource management practices on turnover,
       productivity, and corporate financial performance. Academy of Management Journal,
       38(3), 635-672.
Imbens, G. W. (2000). The Role of the Propensity Score in Estimating Dose-Response
       Functions. Biometrika, 87(3), 706-710.
Kalra, A., & Li, S. (2008). Signaling Quality Through Specialization. Marketing Science, 27(2),
       168-184.
Kannampallil, T. G., Schauer, G. F., Cohen, T., & Patel, V. L. (2011). Considering complexity in
       healthcare systems. Journal of biomedical informatics, 44(6), 943-947.
Kaspar, F., & Schuster, H. (1987). Easily calculable measure for the complexity of
       spatiotemporal patterns. Physical Review A, 36(2), 842.
Kauffman, S. A. (1993). The origins of order: Self-organization and selection in evolution.
       Oxford University Press, USA.
Kennedy, E. H., Ma, Z., McHugh, M. D., & Small, D. S. (2017). Non‐Parametric Methods for
       Doubly Robust Estimation of Continuous Treatment Effects. Journal of the Royal
       Statistical Society: Series B (Statistical Methodology), 79(4), 1229-1245.
Lempel, A., & Ziv, J. (1976). On the complexity of finite sequences. IEEE Transactions on
       information theory, 22(1), 75-81.
Leonardi, P. M. (2011). When flexible routines meet flexible technologies: Affordance,
       constraint, and the imbrication of human and material agencies. MIS Quarterly, 35(1),
       147-167.
                                                  52


Leopold, H., van der Aa, H., & Reijers, H. A. (2018). Identifying candidate tasks for robotic
        process automation in textual process descriptions. In Enterprise, business-process and
        information systems modeling (pp. 67-81). Springer.
Liu, P., & Li, Z. (2012). Task complexity: A review and conceptualization framework.
        International Journal of Industrial Ergonomics, 42(6), 553-568.
Long, K. M., McDermott, F., & Meadows, G. N. (2018). Being pragmatic about healthcare
        complexity: our experiences applying complexity theory and pragmatism to health
        services research. BMC medicine, 16(1), 1-9.
Lyell, D., & Coiera, E. (2017). Automation bias and verification complexity: a systematic
        review. Journal of the American Medical Informatics Association, 24(2), 423-431.
March, J. G., & Simon, H. A. (1958). Organizations John Wiley & Sons. New York.
McCabe, T. J. (1976). A complexity measure. IEEE Transactions on software Engineering(4),
        308-320.
Merali, Y. (2006). Complexity and Information Systems: the emergent domain. Journal of
        Information Technology, 21(4), 216-228.
Moldoveanu, M. C., & Bauer, R. M. (2004). On the relationship between organizational
        complexity and organizational structuration. Organization Science, 15(1), 98-118.
Narayanan, S., Balasubramanian, S., & Swaminathan, J. M. (2009). A matter of balance:
        Specialization, task variety, and individual learning in a software maintenance
        environment. Management Science, 55(11), 1861-1876.
Nelson, R., & Winter, S. (1982). An Evolutionary Theory of Economic Change. Harvard
        UniversityPress.
Orlikowski, W. J., & Barley, S. R. (2001). Technology and institutions: What can research on
        information technology and research on organizations learn from each other? MIS
        Quarterly, 25(2), 145-165.
Pentland, B. T., & Feldman, M. S. (2007). Narrative networks: Patterns of technology and
        organization. Organization Science, 18(5), 781-795.
Pentland, B. T., Mahringer, C. A., Dittrich, K., Feldman, M. S., & Wolf, J. R. (2020). Process
        multiplicity and process dynamics: Weaving the space of possible paths. Organization
        Theory, 1(3), 2631787720963138.
Pentland, B. T., Recker, J., & Wyner, G. (2017). Rediscovering handoffs. Academy of
        Management Discoveries, 3(3), 284-301.
                                                 53


Plsek, P. E., & Greenhalgh, T. (2001). The challenge of complexity in health care. Bmj,
        323(7313), 625-628.
Polley, E. C., & van der Laan, M. J. (2010). Super Learner in Prediction.
Rahmati, P., Tafti, A. R., Westland, J. C., & Hidalgo, C. (2020). When All Products Are Digital:
        Complexity and Intangible Value in the Ecosystem of Digitizing Firms. MIS Quarterly,
        45(3), 1025-1058.
Rawson, A., Duncan, E., & Jones, C. (2013). The truth about customer experience. Harvard
        business review, 91(9), 90-98.
Rettig, C. (2007). The trouble with enterprise software. MIT Sloan management review, 49(1),
        21.
Richardson, A. (2010). Using customer journey maps to improve customer experience. Harvard
        business review, 15(1), 2-5.
Rivkin, J. W., & Siggelkow, N. (2007). Patterned interactions in complex systems: Implications
        for exploration. Management Science, 53(7), 1068-1085.
Rojo Abollado, J., Shehab, E., & Bamforth, P. (2017). Challenges and benefits of digital
        workflow implementation in aerospace manufacturing engineering.
Rosemann, M., Recker, J., & Flender, C. (2008). Contextualisation of business processes.
        International Journal of Business Process Integration and Management, 3(1), 47-60.
Sakowski, J. A., Kahn, J. G., Kronick, R. G., Newman, J. M., & Luft, H. S. (2009). Peering Into
        The Black Box: Billing And Insurance Activities In A Medical Group: Standardizing
        benefit plans and billing procedures might help reduce complexity and billing/insurance
        costs—but only if applied strictly. Health Affairs, 28(Suppl1), w544-w554.
Shrank, W. H., Rogstad, T. L., & Parekh, N. (2019). Waste in the US health care system:
        estimated costs and potential for savings. Jama, 322(15), 1501-1509.
Simon, H. A. (1969). The architecture of complexity, the sciences of the artificial. Cambridge,
        MA: MITPress.
Staats, B. R., & Gino, F. (2012). Specialization and variety in repetitive tasks: Evidence from a
        Japanese bank. Management Science, 58(6), 1141-1159.
Stitzenberg, K. B., & Sheldon, G. F. (2005). Progressive specialization within general surgery:
        adding to the complexity of workforce planning. Journal of the American College of
        Surgeons, 201(6), 925-932.
                                                 54


Sturmberg, J. P., & Martin, C. M. (2013). Handbook of systems and complexity in health.
       Springer.
Thompson, D. S., Fazio, X., Kustra, E., Patrick, L., & Stanley, D. (2016). Scoping review of
       complexity theory in health services research. BMC health services research, 16(1), 1-16.
Turner, R. H. (2001). Role theory. In Handbook of sociological theory (pp. 233-254). Springer.
Tyler, W. B. (1973). Measuring organizational specialization: The concept of role variety.
       Administrative science quarterly, 383-392.
van der Aalst, W. M. (2019). A practitioner’s guide to process mining: Limitations of the
       directly-follows graph. Procedia Computer Science, 164, 321-328.
van der Laan, M. J., Polley, E. C., & Hubbard, A. E. (2007). Super Learner. Statistical
       Applications in Genetics and Molecular Biology, 6(1).
Vermeiren, C., & Raeymaeckers, P. (2020). Network managers as facilitators: A case study on a
       network of specialist and generalist service providers. Human Service Organizations:
       Management, Leadership & Governance, 44(4), 317-331.
Weick, K. E. (1965). Laboratory experimentation with organizations. Handbook of
       organizations, 194-260.
Wood, R. E. (1986). Task complexity: Definition of the construct. Organizational behavior and
       human decision processes, 37(1), 60-82.
Woods, D. D. (1996). Decomposing automation: Apparent simplicity, real complexity.
       Automation and human performance: Theory and applications, 3-17.
Wu, X., Mealli, F., Kioumourtzoglou, M.-A., Dominici, F., & Braun, D. (2018). Matching on
       generalized propensity scores with continuous exposures. arXiv preprint
       arXiv:1812.06575.
Zhou, Y. M. (2013). Designing for complexity: Using divisions and hierarchy to manage
       complex tasks. Organization Science, 24(2), 339-355.
Zhu, Y., Coffman, D. L., & Ghosh, D. (2015). A Boosting Algorithm for Estimating Generalized
       Propensity Scores with Continuous Treatments. Journal of Causal Inference, 3(1), 25-40.
                                                55


                                          CHAPTER TWO:
  DYNAMICS OF DIGITALIZATION: MECHANISMS OF STABILITY AND CHANGE
                             IN DIGITALIZED WORK PROCESSES
2.1. Introduction
Updates, revisions, upgrades, and enhancements are pervasive aspects of digitalization.
Organizations and individuals face an on-going barrage of changes in the digital artifacts I use.
While many of these changes go unnoticed, some can cause significant disruption. By disruption,
I mean changes to the ongoing pattern of action that is enabled/constrained by the artifact being
upgraded. Upgrades can disrupt individual habits and organizational workflows in intended and
unintended ways. While they are pervasive, they are not always visible, so their consequences
are difficult to detect and analyze.
        To address this problem, I build on Swanson’s (2019) understanding of technology as a
routine capability (p. 1008, emphasis in original):
        we argue that device-enabled routines constitute technology, in terms of
        capabilities achieved in human practices. Devices must, in effect, be “wrapped”
        in routines in the constitution of technology. Routines are seen as integral to
        technology itself.
        When digital technologies are changed or upgraded, the effect is always mediated by
routines. This insight is important because we know that routines can be difficult to change
(Becker, 2004; Cohen et al., 1996). Information technology (IT) artifacts are constantly being
upgraded, but how does this influence the routinized patterns of action that are entangled with
those artifacts? The idea of technology as a routine capability provides the theoretical foundation
for my main research question: what mechanisms shape the dynamics of digitalization? Does the
structure of the routine itself influence the dynamics of digitalization and vice versa?
                                                  56


        Dynamics are important because IT-induced change is not instantaneous or frictionless
(Berente et al., 2016; Goh et al., 2011; Keen, 1981; Laumer et al., 2016) Technologists
(re)design artifacts, hoping for new patterns of action (Pentland & Feldman, 2008), but they are
often disappointed, as routines buffer the “shock” of new technology (Berente et al., 2016) and
old ways of working remain in place. New technology provides an occasion for structuring
(Barley, 1986), but it also provides an occasion for workarounds (Alter, 2014; Frank et al., 2011;
Zhao & Frank, 2003) and appropriation (DeSanctis & Poole, 1994).
        Field studies show that digitalization proceeds through a process of imbrication
(Leonardi, 2011) or co-evolution (Goh et al., 2011). These are recursive, endogenous processes.
Leonardi (2011) describes imbrication as the successive layering of human and material agency.
Goh et al. (2011) describe a process of successive refinements of technology and routines. Like
Goh et al. (2011), I model routines as narrative networks (Pentland & Feldman, 2007) and
compare the network before and after a change in technology. Rather than using observational
fieldwork, I use digital trace data to construct an extremely detailed picture of how routines
change over time.
        To better understand the mechanisms that shape these dynamics, I zoom in on one
technological change, followed by one adjustment: a major upgrade of the Electronic Health
Record (EHR) system at an academic medical center in the Northeastern U.S. I use process
mining to discover and compare patterns of action pre- and post-disruption (Pentland et al.,
2021b). Process mining provides an accurate, diachronic description of routine dynamics. The
central theoretical contribution is that the structure of the routine – represented as a weighted,
directed graph – influences the tendency of the routine to persist over time. I hypothesize and test
                                                   57


the effect of three mechanisms that influence the tendency of action patterns to resist disruption
and re-form quickly after disruption.
        Current theory points to frequency and speed as major indicators of routinization.
Routines that are fastest and most frequently repeated should be most likely to persist. However,
my analysis indicates that speed is irrelevant, and coherence is the most important factor. By
coherent, I mean that sequentially adjacent pairs of actions tend to share the same context
(Pentland et al., 2017). Coherence points to the importance of materiality (rather than cognition)
as an explanation for the persistence of routines after a disruption.
        The theoretical contribution of this paper is made possible by the novel application of
dynamic network models (Hoff, 2005; Minhas et al., 2016) to theorize about the dynamics of
digitalization. Swanson’s (2019) theory of technology as routine capability implies that the
dynamics of digitalization are inextricably connected to the dynamics of routines. Whether we
conceptualize this as imbrication or coevolution, the dynamic network framework offers a novel
perspective on the dynamics of digitalization. Rather than relying on actor-centric or device-
centric explanations (Swanson, 2019), it provides an explanation based on the structure of the
pattern action itself. The dynamic network lens affords a variety of practical insights, as well. It
provides a simple way to assess the impact of upgrades and other disruptions and it demonstrates
how quickly routines can form after a disruptive event.
        I begin by reviewing current research on information systems, organizational routines,
and the process of digitalization. I introduce the use of network models to study change
processes, such as upgrades and other disruptions. Based on the current theory, I develop a set of
hypotheses about the effects of disruptions. I test these hypotheses using data from five
                                                  58


outpatient medical clinics. I discuss the implications of this approach for research on the
dynamics of digitalization.
2.2. Background
2.2.1. Information Systems and Organizational Routines
Through observational field research, information systems researchers have begun to examine
the relationship between technology and routines, defined as “repetitive, recognizable patterns of
action carried out by multiple actors” (Feldman & Pentland, 2003, p. 95). The entanglement of
artifacts and routines is axiomatic to the current theory on routines (D’Adderio, 2011; Feldman
et al., 2022), and there is a growing body of work on information systems that build on concepts
and methods from research on habits and routines (e.g., Beverungen, 2014; Limayem et al.,
2007; Lyytinen et al., 2010; Mendling et al., 2021; Pan et al., 2007; Polites & Karahanna, 2013;
Thummadi & Lyytinen, 2020; Zhang et al., 2021). There is also a strong tradition of practice-
based scholarship that examines patterns of technology-in-use without explicitly framing those
patterns as routines (e.g., Orlikowski, 2000).
         Within this literature, observational field studies provide the best evidence of the
recursive relationship between technology and routines. This work builds on the long-standing
theme of technology adaptation (Leonard-Barton, 1988; Majchrzak et al., 2000; Tyre &
Orlikowski, 1994), but explicitly focuses on technology and routines. I focus on three studies
that provide an especially clear picture of how changing technology is entangled with changing
routines: Goh et al. (2011), Leonardi (2011), and Berente et al. (2016).
2.2.1.1. Co-evolution of routines and technology
Goh et al. (2011) conducted a detailed field study of the implementation of new healthcare
information technology (HIT) system for in-patient care in a hospital. Based on their fieldwork,
                                                   59


they “propose a dynamic, process model of adaptive routinization of HIT that explicates the
mechanisms through which HIT systems are incorporated into hospital routines” (2011, p. 566).
Goh et al. (2011) model healthcare routines as narrative networks (Pentland & Feldman, 2007).
They compare the network before and after the implementation of new systems that include
hardware and software (e.g., “computers on wheels”). Drawing on adaptive structuration theory
(DeSanctis & Poole, 1994), Goh et al. (2011) conceptualize the interaction of technology and
routines as a process of co-evolution:
        Methodologically, this study demonstrates that organizational routines viewed as
        narrative networks provide a rich and promising lens through which to understand
        the HIT adaptation process. We find that routines are not simply passively
        disrupted by technology, but rather interact through functional affordances and
        symbolic expressions. These interactions trigger agentic forces that actively
        modify the newly implemented IT artifacts. (Goh et al. 2011, p. 583)
        Goh et al. (2011) focused on the initial implementation of new systems. They mapped
changes in two key routines for in-patient care: consulting and rounds. They identify three
phases but do not put a specific time window on adaptation and subsequent refinements. They
note that after initial implementation, the technology is subject to ongoing, repeated refinement.
The system upgrade I report here could be considered as a typical refinement in their framework.
2.2.1.2. Imbrication of routines and technology
Leonardi (2011) uses a field study of automotive crash testing to illustrate the idea of
imbrication. Leonardi (2011, p. 147) argues that:
        Imbrication of human and material agencies creates infrastructure in the form of
        routines and technologies that people use to carry out their work. Routine or
        technological infrastructure used at any given moment is the result of previous
        imbrications of human and material agencies.
        Through careful qualitative fieldwork, Leonardi (2011) describes this process as a series
of steps where technical changes are followed by adaptation in the routines and vice versa. In
                                                  60


this way, he breaks down the co-evolutionary process described by Goh et al. (2011) into
discrete steps.
2.2.1.3. Routines as “shock absorbers”
Berente et al. (2016) studied the implementation of an enterprise resource planning systems at
NASA. They documented numerous ways that routines diverged from the intent of the designers.
From these observations, they theorized that routines can act as “shock absorbers” that buffer
organizational structures and processes from changes in technology (Berente et al., 2016). Over
time, there is mutual adjustment and alignment between the systems and routines.
Throughout these field studies, I can identify three themes that are relevant to my inquiry in this
paper. First, as Swanson (2019) argues, I see that routines and information systems are
integrated. Technologies are wrapped in routines; the technology only functions in the context of
the routines where it is used (for treating patients, simulating car crashes, or managing budgets
and inventory).
        Second, I see the familiar gap between the systems as designed and patterns of action as
enacted (Boudreau & Robey, 2005; Pentland & Feldman, 2008; Vaast & Walsham, 2005). As
Orlikowski (2000, p. 412) notes, people “have the option, at any moment and within existing
conditions and materials, to ‘choose to do otherwise’ with the technology at hand.” Technology
shapes but does not determine how people choose to use it. Thus, when technological artifacts
change (as they do in a system upgrade), behavior does not necessarily follow.
        Third, technology and routines change in succession as a process of repetitive, stepwise
change or coevolution. This perspective adds nuance to the classic debate between technological
determinism and constructivism (Leonardi & Barley, 2008). The relationship between
technology and practice is mutually constitutive, but a closer look at the process reveals that
                                                  61


changes are punctuated. In the analysis that follows, I zoom in on the dynamics of one of these
punctuations.
2.2.2. The Importance of Persistence
By definition, upgrades and other disruptions happen in the context of ongoing routines. The
world does not start fresh with every new version of Windows. Field studies (such as Goh et al.
2011 and Leonardi 2011) have focused on what changes, but they have paid less attention to
what persists. This emphasis is appropriate because the field of information systems has an
inherent interest in innovation (Yoo et al., 2010). However, work and organization can't continue
unless parts of the routine persist.
        When action patterns persist over time, this persistence can be interpreted in several
ways, such as inertia (Gilbert, 2005), resistance (Becker et al., 2005), persistence (Howard-
Grenville, 2005), regeneration (Birnholtz et al., 2007) or resilience (Grote et al., 2009). Inertia
and resistance seem negative, while resilience and regeneration seem positive; but either way,
the tendency of routines to persist is a crucial but under-appreciated aspect of digitalization.
Researchers have examined the effect of habits on the continued use (Limayem et al., 2007;
Polites & Karahanna, 2013), but this research is framed in terms of individual-level habits and
choices. By definition, organizational routines embody patterns of action that engage multiple
individuals (Feldman & Pentland, 2003).
        Schulz (2008) offers an encyclopedic list of mechanisms that keeps routines "on track",
ranging from very macro (institutional norms) to very micro (neuronal priming). Cohen and
Bacdayan (1994) present evidence that routines are stored in the procedural memory of
individuals performing the routine, so that routine can be considered concatenated habits.
                                                  62


Theoretical explanations of routine persistence have not considered the structure of the routine
itself as a factor. I introduce and develop this central idea in the next section.
2.2.3. Routine Dynamics as Network Dynamics
Routine dynamics concerns understanding the mechanisms that influence stability or change in
action patterns (Feldman et al., 2022). An organizational routine can be represented as a valued,
directed graph where the vertices represent categories of action and the edges represent
sequential relations between those categories (Pentland et al., 2017). In process mining, this is
called a "directly follows graph" (DFG) (van der Aalst, 2019). Where a conventional social
network represents relations between actors (e.g., people), a DFG represents relations between
categories of actions. In research on organizational routines, these graphs are often referred to as
“narrative networks” (Pentland & Kim, 2021).
         In a narrative network, a path represents a possible way of getting something done (Goh
and Pentland 2019). When a change occurs, such as a software upgrade, it may affect the
structure of the network. However, some of the edges in the network need to stay the same, or
else the work would cease because there would be no paths for getting things done. For this
reason, persistence matters.
         To model the dynamics of digitalization, I need to explain edge formation/dissolution,
which is the fundamental mechanism of the network dynamics (Snijders, 2001). Pentland et al
(2019) use this approach to simulate the dynamics of drift in digitalized processes. My goal is to
explain why the structure of the routine changes (or persists) after an upgrade or other disruption.
In social network research, models that predict edge formation or deletion are often referred to as
selection models because they predict how people select other people as interaction partners
(Steglich et al., 2010). There are well-established selection mechanisms that drive dynamics in
                                                   63


social networks, such as homophily and preferential attachment (Snijders, 2001). My goal here is
to identify and test generalizable mechanisms that drive the analogous persistence and
dissolution of edges in networks of routines during the dynamics of digitalization.
2.3. Hypothesis Development
Network dynamics can be defined in terms of two basic processes: edge formation and edge
dissolution (Snijders, 2001). In this paper, I focus on mechanisms that influence the persistence
(or dissolution) of existing edges. I state three simple hypotheses, all of which concern how the
structure of the routine before a disruption predicts the structure of the routine after a disruption.
Each hypothesis involves a particular way of weighting the edges in the network. Edges indicate
sequential relations between actions and each edge is part of a larger path (a way of getting
things done). The weights on the edges indicate the properties of that piece of the path: How
frequently is it followed? How fast is it, on average? How much does the context change from
one action to the next?
2.3.1   Frequency of Edges
Repetition is definitional of routinized behavior (Becker, 2004). Edges that repeat frequently
form the "ruts in the road" (Birnholtz et al., 2007) that define routinized patterns of action.
Repetition is an indicator of behavior that minimizes search and cognitive effort (Hansson et al.,
2021; March & Simon, 1958).
        To test the effect of frequent repetition on persistence, I conceptualize the frequency of
edges in a straightforward way, like the frequency of communication in a social network
(Wasserman & Faust, 1994). For this hypothesis, the edges in the network are weighted
according to how frequently they occur each day. I expect more frequent edges to persist after a
disruption to the network:
                                                   64


        H1: Frequent edges are more likely to persist after a disruption.
2.3.2. Speed of Edges
Speed has long been recognized as an indicator of routinization (Cohen & Bacdayan, 1994; Su et
al., 2013). Cohen and Bacdayan (1994) use the speed of response to define the routinization of
moves in a card game. Su et al. (2013) use speed of response to identify routines in human-
computer interaction. These findings align with the idea that routinized patterns of action are
important for efficiency (Becker, 2004).
        To test the effect of speed on the structure of a routine after a disruption, I compute the
mean duration of each handoff in the network, where handoff is defined as the transition from
one action to the next (Pentland et al., 2017). This definition generalizes the conventional notion
of handoff (which assumes that handoffs are between two different actors) to include actions
performed by the same actor at a later time, perhaps in a different location or using a different
technology. For example, a nurse might enter some data for a patient on one workstation in the
examination room and then review or update that data for that same patient a few minutes later
on a different workstation in another part of the clinic. Thus, clinical staff can hand work off to
themselves.
        For this hypothesis, edges in the network are weighted according to how long they take to
perform, on average, using time-stamp data from the event log. Edges with shorter mean
duration indicate faster ways of getting things done. I hypothesize that fast edges (edges with
shorter mean duration) are more likely to persist after a disruption than slower edges (edges with
longer mean duration).
        H2: Faster edges are more likely to persist after a disruption.
                                                  65


2.3.3. Coherence of Edges
Unlike repetition and speed, coherence is not one of the classic indicators of routinization.
Coherence is defined by the extent of similarity (or difference) between the context of
sequentially adjacent pairs of actions (Pentland et al., 2017). Coherence can easily be computed
on a narrative network where the nodes are defined by multiple contextual factors (Pentland et
al., 2017). Coherence represents the number of contextual factors that remain the same across an
edge. For example, are two adjacent actions in the network performed by the same actor? Do
both actions occur in the same place? Do they involve the same tools or technology? Coherence
provides a way to quantify the effects of materiality (embodiment and embeddedness) on the
pattern of action (Feldman et al., 2022).
        Coherence can be operationalized in a narrative network, where each node is defined by a
number of contextual factors, such as place, actor, and technology. When more factors change,
the context is less coherent. When fewer factors change, the context is more coherent. Coherence
provides another way of weighting the edges in the network. The logic of this hypothesis is
similar with the logic for effects of homophily in social networks ("birds of a feather…"). Thus, I
expect that more coherent edges (same actor, same place, same technology) will be more likely
to persist:
       H3: More coherent edges are more likely to persist after a disruption.
2.4. Illustration: Upgrading an EHR System
To test these hypotheses, I use data from a medical center in the Northeastern U.S. where there
was a major upgrade of their electronic health record (EHR) system. I examine the patterns of
action for six weeks, three weeks before and after the upgrade.
                                                 66


2.4.1. Upgrading the EHR User Interface
In October 2019, the medical center upgraded from EPIC v2017 to EPIC v2019. This upgrade
was considered a major system upgrade. The changes included: 1) creation of a Storyboard
which rearranged the layout of patient information and activities, 2) use of sexual orientation
gender identity (SOGI) and preferred name appearing for patient interactions; 3) display of cost
for inpatient medications and testing at time of order for provider decision making; 4) expansion
of view to widescreen mode, which can require hardware replacement to use. Two other high-
impact changes influencing medical workflow, but not changing it directly included: 1) the
ability of users to view data from multiple EPIC organizations and 2) online registration for
Business Continuity Access (BCA) for faster downtime recovery.
        A campaign to bring awareness of these widespread and high-impact changes began in
April 2019 followed by detailed information sessions in July 2019. Training and practice
sessions for users were implemented in August 2019. All upgrade changes were complete and
live on October 14, 2019.
        The impact of this upgrade on clinical activity was unclear and most likely varied by
department. The widespread upgrades minimized screen jumps, consolidated important
information to be viewable from anywhere in the chart, and allowed users to accomplish more on
a single screen with fewer clicks and scrolling. It was anticipated that there would be minimal
disruption from this upgrade if all users were prepared appropriately prior to the “go-live” date in
October.
2.4.2. Data Source
I analyzed data extracted from the audit trail of the EHR system. EHR audit trail data is
increasingly being used to model clinical workflows (Adler-Milstein et al., 2020). The subset of
                                                  67


records used here includes detailed, time-stamped records of EHR utilization in 4885 patient
visits at five clinics from three different medical specialties (two from Dermatology, two from
Orthopedic surgery, and one from pediatric oncology). The data include all visits to each of these
clinics from September 16, 2019 (three weeks before the start of the system upgrade) to
November 10, 2019 (three weeks after), before and after of system upgrade date (October 14th).
Within this period, I excluded weekends and some weekdays for each clinic when less than
2,000 actions are performed.
           Figure 2.1 includes a brief example of the audit trail data. Figure 2.1 shows how
ThreadNet (Pentland et al., 2020) can be used to convert EHR audit trails into networks. Figure
2.1 shows a small part of an audit trail for one patient visit. Each row is a time-stamped action.
Each unique row becomes a node in the network and sequentially adjacent nodes become edges
in the network. The resulting network is a narrative network where each node is defined by the
combination of action, role, and workstation.
 FIGURE 2.1. CONVERTING EHR AUDIT TRAIL INTO NETWORKS
                           EHR Audit Trail
                                                             Work-
       Time Stamp                Action            Role      station
      10/7/19 10:49:03       REGHARACCTCRT     Admin Tech       A
      10/7/19 10:49:04        RGWKFLBEGIN      Admin Tech       A
      10/7/19 10:49:05        FORM_VIEWED      Admin Tech       A
      10/7/19 10:49:05         RGEPTBSCDM      Admin Tech       A
      10/7/19 10:49:06        FORM_VIEWED      Admin Tech       A
      10/7/19 10:49:07 MR_DEMOGRAPHICS_VIEWED Admin Tech        A
      10/7/19 10:49:09         RGEPTADDRS      Admin Tech       A
      10/7/19 10:49:10   REG_SC_EPTLANGUAGE    Admin Tech       A
      10/7/19 10:49:11 REG_SC_EARDEMOGRAPHICS Admin Tech        A
      10/7/19 11:16:44    AC_VISIT_NAVIGATOR    Physician       B
      10/7/19 11:16:48   VISIT_DIAGNOSES_VIEW   Physician       B
      10/7/19 11:16:50
      10/7/19 11:16:51
                        MR_PROBLEM_LIST_ACCESS
                         VISIT_DIAGNOSES_VIEW
                                                Physician
                                                Physician
                                                                B
                                                                B
                                                                     ThreadNet
      10/7/19 11:16:52       MR_LOS_ACCESS      Physician       B
      10/7/19 11:17:01  MR_REVIEW_ENCOUNTER     Physician       B
      10/7/19 11:17:04     MR_REVIEW_MEDIA      Physician       B
      10/7/19 11:17:05    MR_REVIEW_ORDERS      Physician       B
      10/7/19 11:17:06     MR_CHART_REVIEW      Physician       B
      10/7/19 11:17:07     MR_CHART_REVIEW      Physician       B
      10/7/19 11:23:14         MR_REPORTS      Clinical Tech    C
      10/7/19 11:23:16    AC_VISIT_NAVIGATOR   Clinical Tech    C
      10/7/19 12:23:42   SEC_FLOWSHEET_VIEW        Nurse        C
      10/7/19 12:23:43 UCW_RELATED_ENCOUNTERS      Nurse        C
      10/7/19 12:23:44  MR_REVIEW_ENCOUNTER        Nurse        C
      10/7/19 12:23:48    MR_REVIEW_ORDERS         Nurse        C
      10/7/19 12:23:57     MR_CHART_REVIEW         Nurse        C
             …                      …                ….         …
           The inclusion of contextual factors, such as role and workstation, is a departure from
standard practice in process mining, which often treats actions as decontextualized. However, I
include context here because routines are enacted from situated actions (Feldman et al. 2022).
                                                                               68


Thus, I use unique combination of action-role-workstation as nodes and pairs of nodes to define
the networks in this study.
2.4.2.1. Selection of clinics
The data analyzed here were collected as part of a larger study that included three medical
specialty areas: dermatology, orthopedic surgery, and pediatric oncology. Where possible, I
present data from two clinics in each of those specialty areas to improve the generalizability of
the analysis. Pediatric oncology only had one clinic.
2.5. Descriptive Findings
Before testing my three main hypotheses, it is helpful to describe the effects of the disruption in
more detail. I present two kinds of simple, descriptive analyses to help the reader build intuition
about the EHR system upgrade and its effects.
2.5.1. Changes in the Narrative Networks
Table 2.1 shows the average number of visits per day in each clinic, as well as the size and
density of the narrative network in each clinic before and after the upgrade. These networks have
thousands of edges (between 8760 and 25,167), but the density is low. Only a tiny fraction of the
possible edges was observed. With the exception of the orthopedic clinics, the networks had
fewer nodes after the upgrade.
TABLE 2.1. SIZE AND DENSITY OF THE NETWORK IN EACH CLINIC
                          Visits        Before Upgrade                After Upgrade
                         per day   Nodes Edges Density Nodes Edges Density
          DERM A          16.68     1,852 10,989 0.0032 1,596 8760                  0.0034
          DERM B           46.05     3,911 25,167 0.0016 3,494 21,281 0.0017
         ORTHO A           9.73     1,247 11,800 0.0075 1,289 11,193 0.0017
         ORTHO B          13.78     4,003 17,159 0.0011 4,647 19,844 0.0010
          PEDONC           9.45     3,376 16,152 0.0014 2,543 11,990 0.0019
                                                 69


        Each clinic must be analyzed separately because workstation codes (and some of the
roles) are different in each clinic. As a result, the action-role-workstation combinations in each
clinic have different labels and the networks cannot simply be aggregated.
2.5.2. Visualizing Diachronic Changes
Figure 2.2 shows the changes to the pattern of action over time using the network time-series
visualization recommended by Pentland et al. (2021a). The figure shows three weeks before and
after the upgrade on October 14th. On that date, 40 actions were added to the EPIC system that
serves all of the clinics, while 60 actions were removed from the system.
        This visualization addresses a simple question: how much is this network changing over
time? The horizontal axis represents time measured in days; each point in the figure represents
one day in one clinic. The vertical axis represents the cosine similarity of the network of each
clinic on each day compared to the first day in the time series for each clinic. This similarity
measure is based on the frequency of nodes or edges in the network, which change from day to
day. The left side of Figure 2.2 is based on the nodes; the right side of Figure 2.2 is based on the
edges. When the graph stays horizontal from day to day, the pattern of action is staying the same.
For clarity, I removed a handful of outliers with very few patient visits per day.
        Figure 2.2 shows the same data at three different levels of contextual specificity. Each
row of the figure incorporates more situational context into the definition of the nodes in the
network (Pentland et al., 2020). In the top row, the nodes are defined by actions only. In the
middle row, the nodes are defined by action + role. This adds the social context of each action:
who did what. In the bottom row, the nodes are defined by action + role + workstation. This adds
the material context of each action: who did what and where. In each panel of Figure 2.2, I show
                                                    70


the mean value and 95% confidence interval on the mean, before and after the upgrade. This
clearly shows a significant disruption at all three levels of contextual specificity.
        I show these three levels of contextual specificity for two reasons. First, it shows how
situating the pattern of action in its social and material context increases the apparent variability
of the routine. Second, all of my subsequent analysis is conducted on the actions situated in their
social and material context (the highest level of contextual specificity) because I want to
understand the effect of contextual coherence. Figure 2.2 helps convey the substantial amount of
natural variability that exists in these clinical work processes. However, as expected, there is still
a discernable difference before and after the upgrade. My goal in the analysis that follows is to
understand how the disruption affects these fine-grained, situated patterns. To address my
research question, I need to see beyond the obvious noise in Figures 2.2 (c) and 2.2 (d) and
extract signals that help us understand stability.
                                                   71


FIGURE 2.2. DIACHRONIC VIEW OF ROUTINES
                   Nodes                                                                                                   Edges
                                    How much are the                                                               How much are the
                                nodal frequencies changing?                                                    edge frequencies changing?
                                            (a)                                                                            (b)
                                       Action Nodes                                                   Action Edges
                                1.00                                                           1.00
    Action only
                                0.75                                                           0.75
                                                                        Clinic                                                                  Clinic
                                                                           DERM_A                                                                  DERM_A
                       cosine                                                         cosine
                                                                           DERM_B                                                                  DERM_B
                                0.50                                                           0.50
                                                                           ORTHO_A                                                                 ORTHO_A
                                                                           ORTHO_B                                                                 ORTHO_B
                                                                           PEDONC                                                                  PEDONC
                                0.25                                                           0.25
                                0.00                                                           0.00
                                  Sep 15    Oct 01    Oct 15   Nov 01                              Sep 15      Oct 01    Oct 15       Nov 01
                                                     Days                                                               Days
                                                     (c)                                                                        (d)
                                  AR Nodes                                                               AR Edes
                           1.00                                                                   1.00
    Action-Role
                           0.75                                                                   0.75
                                                                         Clinic                                                                  Clinic
                                                                             DERM_A                                                                 DERM_A
                  cosine                                                                 cosine
                                                                             DERM_B                                                                 DERM_B
                           0.50                                                                   0.50
                                                                             ORTHO_A                                                                ORTHO_A
                                                                             ORTHO_B                                                                ORTHO_B
                                                                             PEDONC                                                                 PEDONC
                           0.25                                                                   0.25
                           0.00                                                                   0.00
                                Sep 15      Oct 01    Oct 15   Nov 01                                 Sep 15    Oct 01    Oct 15       Nov 01
                                                     Days                                                                Days
                                                                                 72


FIGURE 2.2. (CONT’D)
                                                              (e)                                                                 (f)
                                                ARW Nodes                                                  ARW Edges
      Action-Role-Workstation
                                         1.00                                                       1.00
                                         0.75                                                       0.75
                                                                                  Clinic                                                         Clinic
                                                                                     DERM_A                                                         DERM_A
                                cosine                                                     cosine
                                                                                     DERM_B                                                         DERM_B
                                         0.50                                                       0.50
                                                                                     ORTHO_A                                                        ORTHO_A
                                                                                     ORTHO_B                                                        ORTHO_B
                                                                                     PEDONC                                                         PEDONC
                                         0.25                                                       0.25
                                         0.00                                                       0.00
                                           Sep 15    Oct 01     Oct 15   Nov 01                       Sep 15    Oct 01   Oct 15         Nov 01
                                                              Days                                                       Days
     Contrary to the literature on information systems implementation and adaptation (e.g.,
Majchrzak et al. 2000), there was not an extended period of adjustment. The routines adapted to
the new software immediately after the upgrade. Using a simple OLS regression, I tested the rate
of change after the disruption and found that it is not significantly different than zero (see
Appendix A). There is a lot of variation from day to day, but there is no trend. This implies that
new routines stabilized very quickly after the new system went live.
2.6. Analysis
I examine my hypotheses in five different clinics with two kinds of models. Logistic regression
provides an easy-to-interpret model of edge dissolution (Minhas et al., 2019). It also provides a
simple way to test for collinearity in the independent variables. However, the standard errors
from this model are naïve because they ignore dependencies in the data (Hoff, 2005). Therefore,
to fully account for network effects I use the dyadic prediction model for network dynamics
described by Hoff (2005, 2009) which uses latent spaces and random effects to account for
                                                                                     73


dependencies in the data. Across all five clinics, with both kinds of models, the results are
similar. I discuss the details of these analyses in the next sections.
2.6.1. Logit Models
I construct a logit regression model to examine evidence of the effects of edge characteristics on
the structure of routines. The logit model is the simple and well-defined model to examine the
relationship between the directed dichotomous relations of the edges and statistics of network
characteristics (Robins et al., 1999; Wasserman & Pattison, 1996). I specify the structure as the
persistence of edges. The proposed model is as follows:
    (1) logit;XCI%K%JCEVC!". = = D% ;YICZHCEV[!".?% =+ D4 ;%\CCG   ]]]]]]]]!".?% = +
         D5 ;VAℎCICEVC!".?% =
    In this model, the time period t represents three weeks before and three weeks after. The
dependent variable in this model is XCI%K%JCEVC!". , which is a binary variable and equals 1 if
edge between actions i and j in the network exists before and after the system upgrade, and 0 if it
only exists before the upgrade. Thus, the edges considered in this analysis include only those that
existed before the system upgrade. YICZHCEV[!".?% represents the frequency of the edge from
                                      ]]]]]]]]!".?% reflects the average speed of the edge 3!" , as in
the previous time period, as in H1. %\CCG
H2. VAℎCICEVC!".?% represents the extent to which actions i and j share a coherent context, as in
H3. I estimate the model for each clinic separately because they have different sets of edges. I use
standardized variables so I can compare the relative magnitudes of the effects in the models.
2.6.2. Logistic Regression Results
Table 2.2 shows the result of the logistic regression in each clinic. I use standardized variables,
with log transformations for frequency and speed.
                                                     74


TABLE 2.2. LOGISTIC REGRESSION RESULT ON EDGE PERSISTENCE
                            (1)                (2)                (3)             (4)         (5)
  Variables            DERM_A              DERM_B            ORTHO_A         ORTHO_B      PEDONC
  H1: Frequency         0.7031***          0.7823***         0.6651***       0.6943***    0.6090***
                         (0.0291)           (0.0219)           (0.0300)        (0.0258)    (0.0233)
  H2: Speed              - 0.0213           0.0525**            0.0343         - 0.0238     0.0092
                         (0.0255)           (0.0187)           (0.0286)        (0.0226)    (0.0290)
  H3: Coherence         0.1834***          0.4296***         0.5580***       0.4944***    0.7493***
                         (0.0280)           (0.0217)           (0.0334)        (0.0270)    (0.0467)
  Constant             -1.8727***         -2.6562***         -3.3371***      -3.1807***  -4.3291***
                         (0.0874)           (0.0697)           (0.1107)        (0.0874)    (0.1570)
  Observations            10,906             24,886             10,300          16,999      16,046
  Pseudo R2               0.0906              0.131              0.124           0.127       0.114
Naïve robust standard errors in parentheses *** p<0.001, ** p<0.01, * p<0.05
        In Table 2.2, I observe that in all clinics the probability of edge persistence increases with
the frequency of edges from the previous period. I can interpret this as two actions tend to persist
more after the system upgrade the more frequently they were performed before the upgrade. In
contrast, the magnitudes of the estimates for speed are typically less than their naïve standard
errors, with the exception of one Dermatology clinic (DERM_B) whose estimate is positive and
more than three times its naïve standard error. I infer that in contrast to my hypothesis, speed of
the edges does not increase their tendency to persist. Lastly, I infer that coherence has significant
and positive coefficient on persistence. This indicates that the probability of edge persistence
after the system upgrade increases when the actions are performed by the same role and at the
same workstation.
        From the results of the logit regression models, I infer that the edge characteristics,
frequency (H1) and coherence (H3), have positive effects on edge persistence. However, the
speed of the edge (H2) does not seem to affect persistence. As explained above, the results from
logistic regression may be biased because of unobserved random effects due to the
                                                        75


interdependence between the nodes. Thus, I use the dyadic prediction model for network
dynamics described by Hoff (2005, 2009).
2.6.3. Dyadic Prediction Model for Network Dynamics
The dyadic prediction model is introduced to account for the interdependent patterns in network
and make predictions about the paths based on not only the observed characteristics of the nodes
and edges but also unobserved random effects on the base rate of edges (Minhas et al., 2019).
Considering the interdependence between actions is especially important because the actions and
edges are not independent.
        In social networks, to estimate how actors choose others with whom to interact, the
logistic selection model is generally considered as
                                              /E< F
                 (2)                    log _%?/E<!" F` = DG + D% |a! − a" |
                                                    !"
        where 3!" is interactions, edge persistence, between i and j, a! is a characteristic of node i
such as weight, and DG is odds of tie occurring when ba! − a" b = 0. Lastly, D% represents the
change in log odds of a tie occurring for a one unit increase in |a! − a" |. In this context, if each
of the interactions is conditionally independent, I can write the joint likelihood function as
                                                                   /E< F
                 (3)                    X(W|D, X) = ∏!H" log _%?/E<!" F`
                                                                       !"
        Where I write W as the network matrix of 3!" , X as the array of x, and D as the regression
coefficient. However, there exist random effects representing potential interdependence in the
process network. As a directed network, the event in the process is selected based on the
previous paths, and it also influences the next event. Amis-specified model without considering
the potential dependencies can have simultaneous dependencies for reciprocity between events
                                                   76


(Hoff, 2009; Holland & Leinhardt, 1981). Thus, it is important to specify a model that considers
potential dependencies in the data. The suggested random effect model is as follows:
                                                /E<!" F
                (4)                     log _           ` = D1I x1,!," + D-I x-,! + D+I x+," + h! + h"
                                              %?/E<!" F
                (5)                     h! = FG + F% a! + H!
                (6)                     h" = FG + F% a" + i"
        Where x2,!," refer to edge covariates, x-,! and x+," represent covariates for sending and
receiving nodal attributes, and θi and θj are the random effects of senders and receivers. In my
model, I interpret senders as predecessor actions and receivers as successor actions.
        There is another potential dependence associated with transitivity and clusterability of
nodes in the network (Hoff, 2005). This third-order dependence pattern can be accounted for
with the similarity of relational patterns of two nodes (Minhas et al., 2019). Each node has
unobserved attributes which can affect the ties between nodes. In the latent factor model, these
unobserved factors of nodes are considered an unobserved vector of factors based on similar
relational patterns. Hoff (2005), 2009) suggests adding H! i" term in the model, which represents
the similarity between pairs of nodes on each dimension based on the latent nodal attributes of
sending and receiving nodes. Thus, the final model proposed by Hoff (2009) is as follows:
        (7)                     [!" = D1I x1,!," + D-I x-,! + D+I x+," + h! + h" + H! i" + j!,"
2.6.4. Application of the Latent Space Model
My goal is to predict the frequency of all edges in the narrative network that represents the
clinical documentation process. To do so, I use the previous state of the process (at time t-1) to
predict the current state of the process (at time t). Using this approach, I can test my three
hypotheses within the model as follows:
                                                    77


                 (8)             XCI%K%JCEVC!". = D% ;YICZHCEV[!".?% =+ D4 ;%\CCG]]]]]]]]!".?% = +
         D5 ;VAℎCICEVC!".?% = + h! + h" + H! i" + C!"
         where h! and h" are random effects relating to the base rate of actions i and j. If i and j
occur more or less often, that will directly influence how often 3!" occurs. As I apply the model
here, h! and h" reflect the change in the repertoire of actions. I am interpreting random effects (h!
and h" ) as control variables: Controlling for changes in base rates of the actions, what drives
changes in the pairs of actions? Lastly, H! i" represents the similarity between pairs of nodes on
each dimension (action i and j) of a latent space and C!" is the error term.
2.6.5. Results of Dyadic Prediction Models
To estimate the latent space models, I use the R package amen (https://cran.r-
project.org/web/packages/amen/amen.pdf) which uses an MCMC (Markov Chain Monte Carlo)
procedure. As with the logistic regression, I use standardized variables, with log transformations
for frequency and speed. Table 2.3 shows the results for each of the clinics in my data. The
results show that standard errors of the variables are significantly decreased compared to the
result of logit regression, as a lot of the variance is explained by the random effects and latent
factors.
                                                   78


TABLE 2.3. RESULTS OF ANALYSIS FOR EDGE DISSOLUTION
                             DERM A DERM B ORTHO_A ORTHO_B                        PEDONC
       H1: Frequency          0.516*** 0.968 *** 0.559***             0.573***     0.573***
                               (0.003)       (0.006)     (0.004)       (0.004)      (0.004)
       H2: Speed              0.101*** 0.122***            0.026       0.077**       0.015
                               (0.003)      (0.0006)     (0.044)       (0.003)      (0.003)
       H3: Coherence          1.229*** 1.675***         1.353***      1.300***     1.348***
                               (0.004)       (0.005)     (0.006)       (0.005)      (0.006)
       Constant              -6.105*** -5.833*** -6.727*** -6.246***              -6.649***
                               (0.020)       (0.027)     (0.037)       (0.032)      (0.004)
       Random Effect: 4!        0.926          0.709       1.116         0.710       0.846
                               (0.012)       (0.012)     (0.016)       (0.011)      (0.035)
       Random Effect: k"        0.805          0.557       0.819         0.517       0.553
                               (0.018)       (0.019)     (0.025)       (0.017)      (0.019)
       # nodes                  1,851          3,910       3,090         4,002       3,375
       # edges                 10,906         24,886      16,503        16,999      16,049
2.6.6. Summary of Results
Table 2.4 summarizes the results of the analysis for both kinds of models.
TABLE 2.4. SUMMARY OF RESULTS
                                      Logistic           Dyadic
                  Hypothesis        Regression         Prediction         Overall
                                       Model             Model
                                       Strong,           Strong,
                H1: Frequency        significant       significant      Supported
                                    in all clinics    in all clinics
                                  Weak, opposite Weak, opposite
                                                                            Not
                H2: Speed             direction         direction
                                                                        supported
                                    in all clinics    in all clinics
                                       Strong,           Strong,
                H3: Coherence        significant       significant      Supported
                                    in all clinics    in all clinics
2.6.6.1. Frequency (H1)
As expected, the frequency of an edge is a strong predictor of its tendency to persist after a
disruption. This finding aligns with everything I know about repetitive patterns of action: they
                                                   79


tend to keep repeating (Schulz, 2008). However, this is the first time this hypothesis has been
tested in empirical research.
2.6.6.2. Speed (H2)
The hypothesized effect of speed is not supported by the data. Contrary to existing theory, it
would appear that slower edges are slightly more likely to persist than faster edges. This effect is
small and not always statistically significant, so I should not overstate its implications.
Nevertheless, it is interesting because it seems to contradict the idea that speed indicates
routinization, which was introduced by Cohen and Bacdayan’s (1994) pioneering lab
experiment.
2.6.6.3. Coherence (H3)
The hypothesized effect of coherence is also supported by both models in all of the clinics. In the
dyadic prediction model, where the coefficients and standard errors are less subject to bias, the
magnitude of this effect is consistently much larger than the effect of frequency. This suggests
that relations between actions are strongly shaped by contextual factors, accounting for the effect
of repetition. In my data, edge persistence is shaped by the role of the person performing the
action and the workstation where it is performed.
2.6.7. Which Edges are Most Persistent?
Contrary to the stereotype of routines as fixed patterns of action (March & Simon, 1958), these
outpatient clinical routines are quite variable. Any given edge has a substantial probability of
disappearing (or reappearing) from one time period to the next, especially after a disruption.
Nevertheless, it is interesting to examine which edges are most nearly locked in.
        The dyadic prediction model estimates the probability of each edge persisting after the
upgrade. Using this result, I can identify the edges in each clinic that are most likely to survive
                                                  80


(Persistenceij ≥ 0.95). In Figure 2.3, I use a simple 3-D scatter plot to show how these highly
persistent edges compare to the others. In Figure 2.3, larger red points represent edges with more
than or equal to 95% probability of persistence. In contrast, smaller blue dots represent the edges
with less than 95% probability of persistence after the upgrade. The results are similar in all of
the clinics, so to save space I present one clinic from each medical specialty.
FIGURE 2.3. WHICH EDGES ARE MOST LIKELY TO PERSIST?
                                                                                             DERM_A
                                                    < 0.95
                                                    >= 0.95
      Dermatology
                                                2.0
                                                1.5
                                  coherence
                                                                                                                                          14
                                                1.0                                                                                  12
                                                                                                                              10
                                                                                                                        8
                                                0.5                                                                6
                                                                                                               4                 d
                                                                                                                              ee
                                                                                                                            sp
                                                                                                           2
                                                                                                                       ln_
                                                0.0                                                    0
                                                          0       1   2   3         4   5      6   7
                                                                              ln_freq
                                                                                            ORTHO_B
                                               < 0.95
                                               >= 0.95
      Orthopedic Oncology
                                              2.0
                                              1.5
                            coherence
                                                                                                                                          14
                                              1.0                                                                                    12
                                                                                                                               10
                                                                                                                         8
                                              0.5                                                                  6
                                                                                                               4                 d
                                                                                                                               ee
                                                                                                                             sp
                                                                                                           2
                                                                                                                       ln_
                                              0.0                                                      0
                                                      0       1       2   3        4    5      6   7
                                                                          ln_freq
                                                                                        81


FIGURE 2.3. (CONT’D)
                                                                           PEDONC
                                      < 0.95
                                      >= 0.95
      Pediatric Clinic
                                     2.0
                                     1.5
                         coherence
                                                                                                                        14
                                     1.0                                                                           12
                                                                                                        10
                                                                                                    8
                                     0.5                                                        6
                                                                                            4                       d
                                                                                                                  ee
                                                                                                                sp
                                                                                        2
                                                                                                             ln_
                                     0.0                                            0
                                           0    1   2     3       4        5   6
                                                        ln_freq
       Clearly, coherence dominates the picture. For all clinics, most of the persistent edges are
at the highest level of coherence. This implies that having the same/similar contextual factors
correlates with lock-in. What this means, in concrete terms, is that the most persistent pairs of
sequentially adjacent actions are performed by the same person at the same workstation. In other
words, materiality dominates the picture. Although it is based on the top 5% of persistence, the
visualization in Figure 2.3 reinforces the findings from the models. The edges that are most
likely to persist have the highest frequency and coherence. In contrast, speed does not have a
clear relationship to persistence probability.
       Notice, however, that in the Dermatology clinic, 16 edges persisted with lower coherence.
When coherence is zero, the pairs of actions are performed by a different person at a different
workstation. The most persistent handoffs in DERM_A are between the clinical coordinator and
the nurses or clinical technicians. At the next level of coherence, the most persistent pairs of
actions are performed and transferred to each other at the same workstation mostly or the same
role tend to take different actions at different locations.
                                                                      82


2.7. Discussion
This paper provides a novel perspective on the dynamics of digitalization. The empirical
foundation for this theory is generated through process mining, which is usually used to discover
a stationary model of a process (van der Aalst 2012). Here, I am using process mining to help
build theory about stability and change in routines, as suggested by Pentland et al (2021). The
contributions here go beyond the specific findings in these particular clinics. The main
methodological contribution concerns the use of dynamic network models to analyze routine
dynamics. I borrow a foundational idea from social network analysis (that network structure
influences network dynamics) and apply this idea to routine dynamics. The theoretical
contribution concerns the extension of Swanson’s (2019) concept of technology as routine
capability and the use of routine dynamics to develop a new theory about the dynamics of
digitalization. In the following sections, I discuss these contributions in more detail.
2.7.1. Putting Action into Context
The essential conceptual move in this research is to locate actions in context. In a recent review,
Avgerou (2019) examines the role of context in IS research. Her key message is that context is
crucially important and enters IS-related phenomena in a host of different ways. Typically, I
think of context as outside, in the background, like the weather. However, as Rosemann et al.
(2008) point out, context can permeate to the finest-grained level of description. At this fine-
grained level, context can change constantly throughout the execution of a process or routine as
work is handed from one person to another, one place to another, one system to another, and so
on. Explicitly locating actions in their immediate context aligns with the emphasis on situated
action that has been the driving for the last 20 years of research on organizational routines
(Feldman et al., 2022).
                                                  83


         In this paper, I put action into context at this fine-grained level in two different ways.
First, I put actions into sequential context. I do this by defining sequentially adjacent pairs of
actions as the unit of analysis. These pairs of actions are the edges in the narrative network that
represents a routine. This constitutes a departure from more familiar research traditions that
emphasize isolated decisions by individual actors (e.g., psychology, behavioral economics).
Actions are never isolated; they are always part of a larger trajectory, path, or line (Ingold, 2015).
         Second, coherence puts actions into context by taking the actor (role) and location
(workstation) into account. Without a doubt, there are many other contextual factors that could
be included, but the combination of action+actor+location is indicative of the technology-in-use
(Orlikowski, 2000). When I take the technology out of context (as suggested by Figures 2.2 (a)
and 2.2 (b), the effects of change seem straightforward and perhaps even deterministic. When I
examine actions in context, I see an entirely different picture, where the changes on October 19
are situated in a stream of continually changing networks.
2.7.2. Imbrication and Evolution
Where Goh et al. (2011), Leonardi (2011), and Berente et al. (2016) used ethnographic
fieldwork, I have used archival trace data to zoom in on one particular technological change. As
a methodology, fieldwork is well suited to the analysis of innovation and change because it can
provide a more holistic perspective. The influence of culture, power, emotion, and conflict are all
potentially on display and available for analysis. There is no way that an archival method, based
on digital trace data, can offer those kinds of insights. What trace data and process mining can
offer, however, is a complementary perspective that is not available to any human observer.
         Imbrication and evolution are conceptualized as an ongoing series of changes, so I zoom
in on one of those changes in detail. I examine the mechanisms that influence the tendency of
                                                    84


routines to persist. Persistence can be interpreted as an indicator of a resilience (Grote et al.,
2009), or resistance (Becker et al., 2005). Either way, persistence is an essential, take-for-granted
aspect of digitalization. As routines evolve (Goh et al., 2011) or undergo successive refinements,
changes, and re-alignments, significant parts of the overall pattern of action remain the same.
Where IS research has generally put the changes in the foreground, I have put continuity in the
foreground, as in Figure 2.3. In doing so, I see that only a small fraction of the overall pattern of
action is truly locked in. At the level of situated action, there is a great deal of variability in the
networks of action that are constitutive of this technology-in-use.
2.7.3. Routine Dynamics as Network Dynamics
In research on social networks, mechanisms like reciprocity, homophily, and preferential
attachment contribute to the formation and dissolution of network ties (Snijders, 2001). Until
now, analogous network-based mechanisms have never been defined or investigated in the
context of digitally enabled routines. It is important to recognize that hypotheses 1-3 represent a
first attempt at defining network-based mechanisms that influence the dynamics of routines and
therefore, the dynamics of digitalization. These mechanisms may seem simple, but so are the key
mechanisms that drive the dynamics of social networks: homophily (“birds of feather...”),
preferential attachment (“the rich get richer...”) and transitive closure (“the friend of my
friend...”). In theory, simplicity is a virtue.
         My analysis suggests that routines persist for structural reasons, such as frequency of
repetition and coherence of context. The effect of coherence is particularly strong in these five
clinics: roughly twice as strong as the effect of repetition. In Figure 2.3, coherence is strongly
associated with the most persistent edges. As it is defined in my data, coherence refers to the
continuity of the actor and the location from one action to the next. Thus, pairs of actions with
                                                   85


the highest coherence are performed by the same actor in the same location. For this reason, I
can interpret the effect of coherence in terms of materiality. The metaphorical “ruts in the road”
that make routines recognizable are embodied in the actors and places where they are performed.
2.8. Limitations
This study has some obvious limitations. First, I have data from a narrow context. This is
essentially a case study of one software upgrade in a few clinics within a single medical system.
The findings would be more generalizable if they were reproduced in a broader range of settings.
        Second, I study a rather simple disruption: a system upgrade. It would be helpful to study
a broader range of disruptions. For example, the COVID epidemic disrupted medical services in
a variety of ways, from interruptions (e.g., lockdowns) to new technology (e.g., telemedicine). In
this study, the routines immediately adapted to the upgrade. With more severe disruptions, I
would not expect adaptation to occur as quickly. Data from different kinds of disruptions would
provide additional tests of my hypotheses concerning the influence of frequency, speed, and
coherence on the persistence of routines.
        Third, I don’t have measures of other variables (such as attitudes or incentives), nor do I
have an interview or observational data about this upgrade. These variables would add richness
to the story and allow us to discuss alternative explanations and consequences. The data I report
here was collected as part of a larger study that was not specifically focused on upgrades or
disruptions. Future studies would undoubtedly benefit from a combination of fieldwork and
archival methods.
        Fourth, I only address the dissolution of existing edges, not the formation of new edges.
As a result, my analysis is limited to existing paths, not new paths. In future studies, it may be
possible to use the attributes of actions to predict edge formation, as well.
                                                  86


2.9. Conclusion
The entanglement of technology and human behavior has been a central concern of information
systems theory and practice for decades (Bostrom & Heinen, 1977; Mumford & Weir, 1979;
Orlikowski, 1992) and remains a central “axis of cohesion” for the IS discipline (Sarker et al.,
2019, p. 695). The theory and method I employ here offer a way to reinvigorate the
sociotechnical foundations of the information systems field by explicitly examining the systemic
connections between technology and patterns of action. As my analysis shows, this relationship
can be noisy and complex. This is especially true when I examine it with fine-grained trace data.
        The tools I demonstrate here provide a rigorous new way to analyze stability and change,
even in a setting that has a great deal of variability. As a discipline, information systems scholars
tend to focus on innovation and change (Yoo et al., 2010). In most of my research, change is the
figural part of the picture. But change always happens against a background of stability. As
digitalization continues to progress, I need to see figures and ground if I want to understand the
whole picture.
                                                   87


BIBLIOGRAPHY
      88


                                        BIBLIOGRAPHY
Adler-Milstein, J., Adelman, J. S., Tai-Seale, M., Patel, V. L., & Dymek, C. (2020). EHR audit
        logs: a new goldmine for health services research? Journal of biomedical informatics,
        101, 103343.
Alter, S. (2014). Theory of workarounds. Communications of the Association for Information
        Systems, 34(55).
Avgerou, C. (2019). Contextual explanation: Alternative approaches and persistent challenges.
        Mis Quarterly, 43(3), 977-1006.
Barley, S. R. (1986). Technology as an occasion for structuring: Evidence from observations of
        CT scanners and the social order of radiology departments. Administrative Science
        Quarterly, 31(1), 78-108.
Becker, M. C. (2004). Organizational routines: a review of the literature. Industrial and
        Corporate Change, 13(4), 643-678.
Becker, M. C., Lazaric, N., Nelson, R. R., & Winter, S. G. (2005). Applying organizational
        routines in understanding organizational change. Industrial and corporate change, 14(5),
        775-791.
Berente, N., Lyytinen, K., Yoo, Y., & King, J. L. (2016). Routines as shock absorbers during
        organizational transformation: Integration, control, and NASA’s enterprise information
        system. Organization Science, 27(3), 551-572.
Beverungen, D. (2014). Exploring the interplay of the design and emergence of business
        processes as organizational routines. Business & Information Systems Engineering, 6(4),
        191-202.
Birnholtz, J. P., Cohen, M. D., & Hoch, S. V. (2007). Organizational character: on the
        regeneration of Camp Poplar Grove. Organization Science, 18(2), 315-332.
Bostrom, R. P., & Heinen, J. S. (1977). MIS problems and failures: A socio-technical
        perspective. Part I: The causes. MIS Quarterly, 1(3), 17-32.
Boudreau, M.-C., & Robey, D. (2005). Enacting integrated information technology: A human
        agency perspective. Organization Science, 16(1), 3-18.
Cohen, M. D., & Bacdayan, P. (1994). Organizational routines are stored as procedural memory:
        Evidence from a laboratory study. Organization Science, 5(4), 554-568.
                                                89


Cohen, M. D., Burkhart, R., Dosi, G., Egidi, M., Marengo, L., Warglien, M., & Winter, S.
        (1996). Routines and other recurring action patterns of organizations: contemporary
        research issues. Industrial and corporate change, 5(3), 653-698.
D’Adderio, L. (2011). Artifacts at the centre of routines: Performing the material turn in routines
        theory. Journal of Institutional Economics, 7(2), 197-230.
DeSanctis, G., & Poole, M. S. (1994). Capturing the complexity in advanced technology use:
        Adaptive structuration theory. Organization Science, 5(2), 121-147.
Feldman, M. S., & Pentland, B. T. (2003). Reconceptualizing organizational routines as a source
        of flexibility and change. Administrative science quarterly, 48(1), 94-118.
Feldman, M. S., Pentland, B. T., D’Adderio, L., Dittrich, K., Rerup, C., & Seidl, D. (2022). What
        Is Routine Dynamics? In M. S. Feldman, B. T. Pentland, L. D’Adderio, K. Dittrich, C.
        Rerup, & D. Seidl (Eds.), Cambridge Handbook of Routine Dynamics (pp. 1-18).
        Cambridge University Press.
Frank, K. A., Zhao, Y., Penuel, W. R., Ellefson, N., & Porter, S. (2011). Focus, fiddle and
        friends: Sources of knowledge to perform the complex task of teaching. Sociology of
        Education, 84(2), 137-156.
Gilbert, C. G. (2005). Unbundling the structure of inertia: Resource versus routine rigidity.
        Academy of Management Journal, 48(5), 741-763.
Goh, J. M., Gao, G., & Agarwal, R. (2011). Evolving work routines: Adaptive routinization of
        information technology in healthcare. Information Systems Research, 22(3), 565-585.
Grote, G., Weichbrodt, J. C., Günter, H., Zala-Mezö, E., & Künzle, B. (2009). Coordination in
        high-risk organizations: the need for flexible routines. Cognition, technology & work,
        11(1), 17-27.
Hansson, M., Hærem, T., & Pentland, B. T. (2021). The effect of repertoire, routinization and
        enacted complexity: Explaining task performance through patterns of action.
        Organization Studies.
Hoff, P. D. (2005). Bilinear mixed-effects models for dyadic data. Journal of the american
        Statistical association, 100(469), 286-295.
Hoff, P. D. (2009). Multiplicative Latent Factor Models for Description and Prediction of Social
        Networks. Computational and Mathematical Organization Theory, 15(4), 261-272.
Holland, P. W., & Leinhardt, S. (1981). An exponential family of probability distributions for
        directed graphs. Journal of the American Statistical Association, 76(373), 33-50.
                                                 90


Howard-Grenville, J. A. (2005). The persistence of flexible organizational routines: The role of
        agency and organizational context. Organization Science, 16(6), 618-636.
Ingold, T. (2015). The life of lines. Routledge.
Keen, P. G. (1981). Information systems and organizational change. Communications of the
        ACM, 24(1), 24-33.
Laumer, S., Maier, C., Eckhardt, A., & Weitzel, T. (2016). Work routines as an object of
        resistance during information systems implementations: Theoretical foundation and
        empirical evidence. European Journal of Information Systems, 25(4), 317-343.
Leonard-Barton, D. (1988). Implementation as mutual adaptation of technology and
        organization. Research Policy, 17(5), 251-267.
Leonardi, P. M. (2011). When flexible routines meet flexible technologies: Affordance,
        constraint, and the imbrication of human and material agencies. MIS Quarterly, 35(1),
        147-167.
Leonardi, P. M., & Barley, S. R. (2008). Materiality and change: Challenges to building better
        theory about technology and organizing. Information and organization, 18(3), 159-176.
Limayem, M., Hirt, S. G., & Cheung, C. M. (2007). How habit limits the predictive power of
        intention: The case of information systems continuance. MIS Quarterly, 31(4), 705-737.
Lyytinen, K., Rose, G., & Yoo, Y. (2010). Learning routines and disruptive technological
        change: Hyper‐learning in seven software development organizations during internet
        adoption. Information Technology & People.
Majchrzak, A., Rice, R. E., Malhotra, A., King, N., & Ba, S. (2000). Technology Adaptation:
        The Case of a Computer-Supported Inter-Organizational Virtual Team. MIS Quarterly,
        24(4), 569-600.
March, J. G., & Simon, H. A. (1958). Organizations. Blackwell.
Mendling, J., Berente, N., Seidel, S., & Grisold, T. (2021). Pluralism and pragmatism in the
        information systems field: the case of research on business processes and organizational
        routines. The Data Base for Advances in Information Systems, 52(2).
Minhas, S., Hoff, P. D., & Ward, M. D. (2016). A new approach to analyzing coevolving
        longitudinal networks in international relations. Journal of Peace Research, 53(3), 491-
        505.
Minhas, S., Hoff, P. D., & Ward, M. D. (2019). Inferential approaches for network analysis:
        Amen for latent factor models. Political Analysis, 27(2), 208-222.
                                                 91


Mumford, E., & Weir, M. (1979). Computer systems in work design--the ETHICS method:
        effective technical and human implementation of computer systems: a work design
        exercise book for individuals and groups. New York: Wiley.
Orlikowski, W. J. (1992). The Duality of Technology: Rethinking the Concept of Technology in
        Organizations. Organization Science, 3(3), 398-427.
Orlikowski, W. J. (2000). Using technology and constituting structures: A practice lens for
        studying technology in organizations. Organization Science, 11(4), 404-428.
Pan, S. L., Pan, G., Chen, A. J., & Hsieh, M. H. (2007). The dynamics of implementing and
        managing modularity of organizational routines during capability development: Insights
        from a process model. IEEE Transactions on Engineering Management, 54(4), 800-813.
Pentland, B. T., & Feldman, M. S. (2007). Narrative networks: Patterns of technology and
        organization. Organization Science, 18(5), 781-795.
Pentland, B. T., & Feldman, M. S. (2008). Designing routines: On the folly of designing
        artifacts, while hoping for patterns of action. Information and organization, 18(4), 235-
        250.
Pentland, B. T., & Kim, I. (2021). Narrative Networks in Routine Dynamics. In M. S. Feldman,
        B. T. Pentland, L. D'Adderio, D. Dittrich, C. Rerup, & D. Seidl (Eds.), Cambridge
        Handbook of Routine Dynamics. Cambridge University Press.
Pentland, B. T., Recker, J., Wolf, J. R., & Wyner, G. (2020). Bringing Context inside Process
        Research with Digital Trace Data. Journal of the association for information systems,
        21(5), 5.
Pentland, B. T., Recker, J., & Wyner, G. (2017). Rediscovering handoffs. Academy of
        Management Discoveries, 3(3), 284-301.
Pentland, B. T., Vaast, E., & Wolf, J. R. (2021a). THEORIZING PROCESS DYNAMICS WITH
        DIRECTED GRAPHS: A DIACHRONIC ANALYSIS OF DIGITAL TRACE DATA.
        MIS Quarterly, 45(2).
Pentland, B. T., Vaast, E., & Wolf, J. R. (2021b). Theorizing Process Dynamics with Directed
        Graphs: A Diachronic Analysis of Digital Trace Data. MIS Quarterly, 45(2), 967-984s.
Polites, G. L., & Karahanna, E. (2013). The embeddedness of information systems habits in
        organizational and individual level routines: Development and disruption. MIS Quarterly,
        37(1), 221-246.
Robins, G., Pattison, P., & Wasserman, S. (1999). Logit models and logistic regressions for
        social networks: III. Valued relations. Psychometrika, 64(3), 371-394.
                                                  92


Rosemann, M., Recker, J. C., & Flender, C. (2008). Contextualisation of business processes.
        International Journal of Business Process Integration and Management, 3(1), 47-60.
Sarker, S., Chatterjee, S., Xiao, X., & Elbanna, A. (2019). The sociotechnical axis of cohesion
        for the IS discipline: Its historical legacy and its continued relevance [Article]. MIS
        Quarterly, 43(3), 695-A695.
Schulz, M. (2008). Staying on Track: a Voyage to the Internal Mechanisms of Routine
        Reproduction. In M. C. Becker (Ed.), Handbook of Organizational Routines (pp. 228-
        257). Edward Elgar.
Snijders, T. A. (2001). The statistical evaluation of social network dynamics. Sociological
        Methodology, 31(1), 361-395.
Steglich, C., Snijders, T. A., & Pearson, M. (2010). Dynamic Networks and Behavior:
        Separating Selection from Influence. Sociological Methodology, 40(1), 329-393.
Su, N. M., Brdiczka, O., & Begole, B. (2013). The routineness of routines: Measuring rhythms
        of media interaction. Human–Computer Interaction, 28(4), 287-334.
Swanson, E. B. (2019). TECHNOLOGY AS ROUTINE CAPABILITY [Article]. Mis Quarterly,
        43(3), 1007-1024.
Thummadi, B. V., & Lyytinen, K. (2020). How much method-in-use matters? A case study of
        agile and waterfall software projects and their design routine variation. Journal of the
        Association for Information Systems, 21(4), 7.
Tyre, M. J., & Orlikowski, W. J. (1994). Windows of opportunity: Temporal patterns of
        technological adaptation in organizations. Organization Science, 5(1), 98-118.
Vaast, E., & Walsham, G. (2005). Representations and actions: the transformation of work
        practices with IT use. Information and organization, 15(1), 65-89.
van der Aalst, W. M. (2019). A Practitioner’s Guide to Process Mining: Limitations of the
        Directly-Follows Graph. Procedia Computer Science, 164, 321-328.
Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications.
Wasserman, S., & Pattison, P. (1996). Logit models and logistic regressions for social networks:
        I. An introduction to Markov graphs andp. Psychometrika, 61(3), 401-425.
Yoo, Y., Henfridsson, O., & Lyytinen, K. (2010). Research commentary—the new organizing
        logic of digital innovation: an agenda for information systems research. Information
        Systems Research, 21(4), 724-735.
                                                   93


Zhang, Z., Lee, H., Yoo, Y., & Choi, Y. (2021). Theorizing routines with computational
       sequence analysis: a critical realism framework. Journal of the Association for
       Information Systems.
Zhao, Y., & Frank, K. A. (2003). An ecological analysis of factors affecting technology use in
       schools. American Educational Research Journal, 40(4), 807-840.
                                                94


                                        CHAPTER THREE:
  PREDICTING NEXT ACTION BASED ON CONTEXTUAL SPECIFICS: EVIDENCE
                          FROM ELECTRONIC MEDICAL RECORDS
3.1. Introduction
As the increased number of paths makes a process more complex, it becomes difficult to predict
what happens next. The increased complexity in the process makes monitoring and predicting
process a significant factor in both industries and disciplines related to organization and business
process (Allen & Varga, 2006; Augusto et al., 2022; Rettig, 2007; Russell et al., 2006).
In process mining, the sequence of events is essential in determining the “flow of control”, which
provides a model for the expected sequence of actions in a process (Bozkaya et al., 2009; van der
Aalst et al., 2005; van der Aalst & Weijters, 2004; van der Werf et al., 2008). However, relying
only on the sequence itself may not provide enough clues for prediction when organizational
processes are more complex. It is especially hard to understand contextualized processes, where
the control flow may depend on contextual factors. When a firm tries to adopt a new business
process, it often fails when there is no consideration of contextual factors (vom Brocke et al.,
2016). Prior studies discuss the importance of contextual factors in the design of the business
process (Ploesser et al., 2009; Rosemann et al., 2008; van der Aalst & Dustdar, 2012), but few
studies focus on contextual factors in process prediction.
        Context is particularly important in healthcare, where very specific procedures and
specialties exist. For example, when clinical employees input patient information at a
workstation for electronic medical record (EMR) systems for the recordkeeping process, taking a
particular action (e.g., check_meds) takes on a different meaning depending on who performs it
and where it is performed. The office staff can check_meds at the workstation in the front office.
                                                  95


This might be in response to a patient question (e.g., can I refill this prescription?). This might
occur as the patient is checking in or checking out. Alternatively, a nurse, resident or doctor
might check_meds in the examination room, or outside the examination room, in order to
confirm the dosage, look for conflicts, or write a new prescription. These examples point out that
when the physician checks the patient’s medication, it has a different significance than when the
office staff does so. It looks like the same action in the event log, but it is not, because the
immediate context is different.
        As such, while the adoption of EMR systems is intended to make recordkeeping
processes more efficient, studies argue that EMR systems cause entanglements of processes that
can increase process complexity (Frankel et al., 2005). Thus, to understand the entangled
process, it needs to be understood based on the sequence of events with its context. Without
consideration of context, the entangled process cannot be grasped clearly.
        To address this, I investigate prediction models based on contextual specifics as well as
the sequence of actions, using clinical documentation process data. Specifically, I examine if
context can help to get better prediction results with fewer parameters / simpler models / or less
training. The research questions I address here are as follows; 1) Can contextual specifics make
patterns more recognizable and predictable? 2) Can we use context to get better results
with fewer parameters/ simpler models?
        To address these questions, I use Long Short-Term Memory Networks (LSTM) (a kind of
Recurrent Neural Network (RNN)) which models both the observed sequence of actions and
their contextual factors in process. I build on work by Camargo et al. (2019), who trained LSTM
networks to predict sequential process patterns. In this study, I extend the idea of Camargo et al.
                                                  96


(2019) on associated resource pools as contextual factors to see the importance of context in the
business process prediction.
        For the analysis, I compare models using different types of variables: 1) sequence of
actions, and 2) contextual specifics with the action sequence. First, I predict the action in the
clinical documentation process based on the sequence of actions only. Next, I add different
contextual factors; role, workstation, diagnosis group, and others, and see how the prediction
level changes with the factors. Lastly, I examine how the results could be changed depending on
the different settings of hyperparameters. This analysis provides important findings as the results
show that some contextual specifics improve the process prediction more than others. I show that
the more relevant contextual information is included, the more accurate prediction is feasible.
        I organize the rest of this study as follows. In the next section, I review the literature on
how RNN has been used for process prediction and the relations between actions and their
contextual specifics. Then I describe the data sources used for the study in section 3. In section 4,
the model is developed to predict actions in the clinical documentation process. I report the
results of the estimates in the subsequent section and conclude the paper by discussing the
contribution of the results and limitations of the study.
3.2. Theoretical Background
Predicting what happens next is not an unrealistic and future technology anymore. Imagine that
you have a friend who has dinner with you often and you are about to text him again to ask to
join dinner tonight. You have added dinner events with your friend in the calendar on your phone
for a few weeks. Based on this “context”, when you text, your phone will automatically suggest
words on asking to join dinner tonight, such as time, location, or even menu. This is a very
common example that shows the convenience of prediction. As such, it is obviously possible to
                                                   97


predict the next event more accurately based on context. In this section, I explain what role
contextual factors play in the organizational process and introduce prediction models in process
management.
3.2.1. Process and Contextual Factors
Recognizing patterns in business processes is not a new rising domain. Numerous studies in the
business process discipline have investigated business process mining to decompose entangled
patterns of business processes (Gacitua-Decar & Pahl, 2009; Mejia Bernal et al., 2010; van der
Aalst et al., 2007). However, the importance of context in process management was overlooked
in many process mining analyses (Kronsbein et al., 2014; Li et al., 2010; van der Aalst &
Dustdar, 2012). Even after the importance of contextual factors is discussed, many studies
neither reflect the factors for the prediction model nor consider with a narrow perspective. Prior
studies show how to classify contextual factors based on the characteristics of each. Contextual
factors are largely divided into two dimensions; internal and external factors (Kronsbein et al.,
2014). While internal factors are important to recognize the patterns because these factors are
directly related to events (i.e: particular roles or location in the process), external elements
influence the occurrence of events from outside of the process. In the onion model for contextual
factors (Rosemann et al., 2008), these two factors are segmented into more specific types of
contexts, depending on how frequently the factors are changed during the execution of the
process. For example, while suppliers and customers are somewhat controllable in the
organizational process, climate or seasonality cannot be controlled but its impact on the patterns
of actions can be substantial (vom Brocke et al., 2016). Extending the internal and external
contextual factors to the more specific types of context helps figure out which types of
contextual factors influence the prediction levels in process.
                                                    98


        Many studies on monitoring and managing processes discuss the importance of
contextual specifics, but those factors are seldomly used for the predictive process models. For
this study, following Rosemann et al. (2008), I use immediate and external contextual factors for
the prediction. As the immediate layers, I use actors (who), workstation as location (where), and
diagnosis group of patients for each visit as immediate context. As the external factors, I use flu
season information and if the system is upgraded
3.2.1.1. Prediction models in process management
Prior to the introduction of RNN, predictive process models were generally based on diverse
probabilistic models (Breuker et al., 2016; Pravilovic et al., 2013; van Dongen et al., 2008).
However, since RNN was introduced, most of the studies on process prediction models have
depended on it because of its enhanced features in processing sequential data (Lipton et al.,
2015).
        Compared to Convolution Neural Network (CNN), RNN can handle and model sequence
data (Graves et al., 2006). Simply put, RNN helps predict what comes next in one thing
following another. RNN architecture applies to the predictive model for process monitoring
because RNN can learn order dependence in the input sequence. In other words, RNN can
encode information from all the events in previous steps so that it is proper to construct the
predictive model for the next actions in the clinical documentation process. However, RNN has a
fatal challenge of the vanishing gradient problem which does not capture long-term dependencies
in sequences. To alleviate, there have been many alternative approaches with modified RNN,
such as LSTM, which utilizes forget gate to complement short-term memory and vanishing
gradient of the RNN (Gers et al., 1999; Hochreiter et al., 2001).
        In the business process management (BPM) discipline, studies show how deep learning
techniques allow us to predict the next events in the business process (Becker & Intoyoad, 2017;
                                                 99


Camargo et al., 2019; Tax et al., 2017; Tello-Leal et al., 2018). RNN, especially the LSTM
network, is frequently used for business process monitoring because it has been developed to
deal with sequential data (Gers et al., 1999; Gers et al., 2002). Using the LSTM network,
numerous studies propose approaches for predictive business process monitoring (Di
Francescomarino et al., 2017; Evermann et al., 2017; Tello-Leal et al., 2018). For example, Tax
et al. (2017) model a predictive process monitoring function. This approach predicts the next
activity and its timestamp based on the event logs. Mehdiyev et al. (2020) propose a multi-stage
business process prediction model for a loan application process and show the improvement of
the prediction performance for rare case events.
         Previous studies used a history of events and its related information to predict the next
event, but few studies focus on how contextual information influences the prediction level.
                                                 100


TABLE 3.1. REPRESENTATIVE PROCESS PREDICTIVE MODELS
                                                     Predictive
 Authors           Prediction Object                                  Dataset             Inputs
                                                     Model
                                                     Non-
 van Dongen et                                                                            Occurrences of events, case
                   Cycle Time prediction             parametric       bezwaar WOZ
 al. (2008)                                                                               attributes, duration
                                                     Regression
                                                                      Event logs in
 Pravilovic et al.                                   Predictive
                   Next event log and its attributes                  Process Mining      Events, resource, lifecycle, time
 (2013)                                              clustering trees
                                                                      book
 Breuker et al.                                      RegPFA           2012, 2013 BPI
                   Next event                                                             Events
 (2016)                                              predictor        challenges
 Choi et al.       Next Clinical Events (Diagnosis                                        Diagnosis, Medication codes, and
                                                     LSTM             Historical HER data
 (2016)            and Medication Categories)                                             procedure codes
 Evermann et al. Next event with resources or                         2012, 2013 BPI      Events, event life cycle, resource
                                                     LSTM
 (2016, 2017)      organizational group in a process                  challenges          name, Organizational Group
 (Tax et al.,                                                         Helpdesk, 2012 BPI
                   Next event and its timestamp      LSTM                                 Events, timestamp
 2017)                                                                challenge
 Tello-Leal et al. Next activity in manufacturing                     Executed production
                                                     LSTM                                 Events, resources, time-stamp
 (2018)            process                                            process data
 Mehdiyev et al.                                     LSTM and         Helpdesk, 2012,     Events as n-gram, organizational
                   Next activity process
 (2020)                                              CNN              2013 BPI challenge  information
                                                             101


         Process predictive models from previous studies generally have high accuracy (0.6-0.8)
without consideration of contextual factors. If I use the suggested models in Table 3.1 for
prediction, the high performance of the predictive models may be assured. However, previous
studies train and test the models using the event log data that are extracted from relatively simple
processes. These processes have a relatively small lexicon and a small number of possible paths.
In process mining, process complexity correlates with the quality of the automated process
discovery (Augusto et al., 2022). This implies that simple event logs make it easy to find patterns
and predict the next events. However, a complex process like clinical documentation has a large
lexicon and billions of possible paths (Pentland et al., 2020), so it is harder to discover and model
the process.
         In this study, I show that even with complex event logs, the quality of the predictive
models can be improved with contextual factors. By adding diverse types of contextual factors, I
expect to see a more accurate prediction level in complex processes in the neural network.
Hence, I compare the network based on the sequence of action only and the neural network of
sequential actions with its contextual factors.
3.3. Data Description
For the analysis, I use the EMR audit trail data. It lists sequential touchpoint event logs for the
clinical documentation process. Each touchpoint refers to an event that occurs when a “specific
clinic staff” member accesses a “specific patient record” at a “specific workstation”. An event
represents the execution of specific actions. The event logs include 529 distinct actions of the
clinical documentation process. Each event includes attributes on event timestamp, role,
workstation, flu season, system upgrade, and clinic information.
                                                 102


TABLE 3.2. SAMPLE OF RAW DATA
                Flu        VISIT
   Tstamp                             Workstation_ID      Role         Action Code
                Season ID
   4/2/18 10:49 Non_Flu         1        Bcabrkderm           OAS      Regharacctcrt
   4/2/18 10:49 Non_Flu         1        Bcabrkderm           OAS      Rgwkflbegin
   4/2/18 10:49 Non_Flu         1        Bcabrkderm           OAS      Form_Viewed
   4/2/18 10:49 Non_Flu         1        Bcabrkderm           OAS      Rgeptbscdm
   4/2/18 10:49 Non_Flu         1        Bcabrkderm           OAS      Form_Viewed
   4/2/18 10:49 Non_Flu         1        Bcabrkderm           OAS      Mr_Demographics_Viewed
   4/2/18 10:49 Non_Flu         1        Bcabrkderm           OAS      Rgeptaddrs
   4/2/18 10:49 Non_Flu         1        Bcabrkderm           OAS      Reg_Sc_Eptlanguage
   4/2/18 10:49 Non_Flu         1        Bcabrkderm           OAS      Reg_Sc_Eardemographics
   4/2/18 12:16 Non_Flu         1        Brkdermdt6        Physician   Ac_Visit_Navigator
   4/2/18 12:16 Non_Flu         1        Brkdermdt6        Physician   Visit_Diagnoses_View
   4/2/18 12:16 Non_Flu         1        Brkdermdt6        Physician   Mr_Problem_List_Access
   4/2/18 12:16 Non_Flu         1        Brkdermdt6        Physician   Visit_Diagnoses_View
   4/2/18 12:16 Non_Flu         1        Brkdermdt6        Physician   Mr_Los_Access
   4/2/18 12:17 Non_Flu         1        Brkdermdt6        Physician   Mr_Review_Encounter
   4/2/18 12:17 Non_Flu         1        Brkdermdt6        Physician   Mr_Review_Media
   4/2/18 12:17 Non_Flu         1        Brkdermdt6        Physician   Mr_Review_Orders
   4/2/18 12:17 Non_Flu         1        Brkdermdt6        Physician   Mr_Chart_Review
   4/2/18 12:17 Non_Flu         1        Brkdermdt6        Physician   Mr_Chart_Review
                                                             Admin
   4/2/18 12:23 Non_Flu         2        Brkdermproc                   Mr_Reports
                                                              Tech
                                                             Admin
   4/2/18 12:23 Non_Flu         2        Brkdermproc                   Ac_Visit_Navigator
                                                              Tech
                                                            Clinical
   4/2/18 12:23 Non_Flu         2        Brkdermproc                   Sec_Flowsheet_View
                                                              Tech
                                                            Clinical
   4/2/18 12:23 Non_Flu         2        Brkdermproc                   Ucw_Related_Encounters
                                                              Tech
                                                            Clinical
   4/2/18 12:23 Non_Flu         2        Brkdermproc                   Mr_Review_Encounter
                                                              Tech
                                                            Clinical
   4/2/18 12:23 Non_Flu         2        Brkdermproc                   Mr_Review_Orders
                                                              Tech
                                                            Clinical
   4/2/18 12:23 Non_Flu         2        Brkdermproc                   Mr_Chart_Review
                                                              Tech
   4/2/18 12:23  Non_Flu        3        Brkdermproc         Nurse     Mr_Reports
   4/2/18 12:28  Non_Flu        3        Brkdermproc         Nurse     Mr_Reports
   4/2/18 12:33  Non_Flu        3        Brkdermproc         Nurse     Mr_Reports
   4/2/18 12:38  Non_Flu        3        Brkdermproc         Nurse     Mr_Reports
   …                       …          …                   ….           …
        Table 3.2 shows a sample subset of raw data for the clinical documentation process. The
raw dataset consists of a list of actions with its specific attributes as described, but the data shape
                                                   103


needs to be processed to analyze. Thus, prior to analysis, I conduct data pre-processing by
transforming data from individual action levels to consecutive actions with contextual factors
(Table 3.3). Each of the rows in Table 3.3 shows a series of actions that are performed at each
touchpoint (Visit ID + Role + Workstation) with the contextual information.
TABLE 3.3. EXAMPLE OF TOUCHPOINTS
Visit                                        Diagnosis      Flu
        Role             Workstation                                   Action
ID                                           Group          Season
                                             Uncertain
1       Clinical_Tech    Bcabrkderm                         No_flu     As_Appt_Desk
                                             Neoplasm
                                             Actinic                   Mr_Review_Encounter,
1       Physician        Brkdermproc1                       No_flu
                                             Keratosis                 Mr_Chart_Review_Viewed…
                                             Seborrheic                Rgwkflbegin, Form_Viewed,
1       Clinical_Tech    Haikugenericw                      No_flu
                                             Keratosis                 Rgeptbscdm…
                                                                       Mr_Reports, Mr_Synopsis,
2       Clinical_Tech    Brkdermproc1        Dermatitis     Flu
                                                                       Ac_Visit_Navigator…..
2       Clinical_Tech    Clisup              Rosacea        Flu        As_Appt_Desk
                                                                       Mr_Reports, Mr_Reports,
2       Clinical_Tech    Dermfromisdt5       Psoriasis      Flu
                                                                       Sec_Flowsheet_View….
                                                                       Ac_Visit_Navigator,
2       Physician        Dermfromisdt5       Nevi           Flu
                                                                       Ucw_Related_Encounters….
                                                                       Ac_Visit_Navigator,
3       Physician        Bcabrkderm          Nevi           No_flu
                                                                       Sec_Flowsheet _Report
…       …                …                   …              …          …
        Table 3.4 summarizes the characteristics of attributes for this study. The number of
identified roles and workstations is 47 and 1,343. In this essay, I use only categorical contextual
factors for the comparison.
                                                104


TABLE 3.4. VARIABLE DESCRIPTION
                                                                               # of Values
                    Variable Name              Variable Type
                                                                          (Mean for Numeric)
                        Actions                  Categorical                        529
                          Role                   Categorical                         47
                      Workstation                Categorical                      1,343
                   Diagnosis Group               Categorical                        160
                         Clinic                  Categorical                         12
                       Flu Season          Dummies (Categorical)
                   System Upgrade Dummies (Categorical)
          In the next stage, I eliminate consecutively duplicated actions because I regard them as
un-informative. After removing the duplicates, I list all the events in one column for each
touchpoint and create data points that consist of five consecutive sequential actions3. For
example, if an event chunk contains six sequential actions e = [A,B,C,D,E,F,G], it generates
three observations [A,B,C,D,E], [B,C,D,E,F] and [C,D,E,F,G], which consist of four input
variables and one target variable.
          Next, I add contextual factors as additional attributes to train the model. To add the
factors to the model, I set the contextual factors before the sequence of actions (e.g., [factor 1,
factor 2, …, A,B,C,D,E]). In this way, the context sets the stage for each sequence of actions.
 Next, I encode the input sequences. This step is required to convert the character strings, the
specific actions in this study, into a unique integer. For the encoding process, using tokenizer, I
find all the unique values from the entire dataset and convert them into a numeric feature. Based
on the dataset of sequential event logs, I split the inputs into two types; training and target
variable. The first four actions and contextual factors are regarded as input datasets to train the
3
  Predicting sequence within touchpoints represents an important simplification in the analysis. If we tried to predict
the sequence between touchpoints, we would need to include contextual factors for each action, so there would be a
combinatoric explosion in the size of the lexicon (529 actions * 47 roles * 1343 workstations…) It would be
impossible to train a model of this complexity with the available data.
                                                         105


model and the last action is set as the expected value that corresponds to input variables. In other
words, the model is trained using the training dataset to predict the target variable.
3.4. Model
3.4.1. Long Short-Term Memory Network
A recurrent neural network (RNN) is a class of deep artificial neural networks based on a
sequential process (Baziotis et al., 2017). The state output at each time consists of the hidden
state as well as the old state with the outputs of previous steps as follows.
                 (1)                    ℎ! = #" (ℎ!#$ , &! )
         In eq (1), ℎ! denotes a new state at time t founded on a function with parameters W and
&! , an input vector at time t. The model learns the name of the actions embedding at each step
and only passes useful information as weighting vector W makes a prediction on the label
assigned to the current action name.
         However, a standard RNN has a vanishing gradient issue over long sequences that makes
the RNN difficult to train (Pascanu et al., 2013). Applying RNN to text analysis requires
overcoming this issue because long sentences/lists of the words are loaded as the dataset. To
overcome the gradient issue, Long Short-Term Memory (LSTM) network is used by including
three types of gates (input gate, output gate, and forget gate) and a cell memory state.
         The word vector (a type of action in this study), (% , in a sentence with length N
(sequence of actions in this study) is generated from word embeddings as dense vector
representations of words (Nakov et al., 2019). Each LSTM unit contains an input gate )! , a forget
gate #! , an output gate *! , a memory cell +! , a hidden state ℎ! , and the word embedding input, &! ,
at time step t.
                                                    106


                                                      ℎ
                 (2)                            , = - !#$ .
                                                       &!
                 (3)                   #! = /(0& ∙ , + 3& )
                 (4)                   )! = /(0' ∙ , + 3' )
                 (5)                   *! = /(0( ∙ , + 3( )
                 (6)                   5! = 678ℎ(0) ∙ , + 3) )
                 (7)                   +! = #! ∘ +!#$ + )! ∘ 5!
                 (8)                      ℎ! = *! ∘ tanh (+! )
         Each gate consists of the weighted matrices (0' , 0& , 0( ) and biases of LSTM (3' , 3& , 3( )
in the training process. The weighted matrices and biases parameterize the transformations of
three gates with the embedding inputs respectively (Xu et al., 2016). / is the sigmoid function
and the operator ∘ denotes element-wise multiplication.
         In LSTM, each gate plays important role in the process. In the input gate, I first decide
how to update each unit. Next, forget gate controls the extent to which the previously stored
information in the memory cell is forgotten. Lastly, the output gate controls the exposure of the
internal memory state. Through this process, the hidden state captures and stores both past and
future required information. For the prediction model, I use LSTM and train the sequence of
actions list in clinics.
         For the analysis, I implement parameters of LSTM network using Keras framework,
since it provides the required functionalities to model LSTM network (Keras-team, 2019). First, I
set the embedding dimensionality as 529, the number of unique actions in the clinical
documentation process, and the length for the sequence set as 5, implying four sequential actions
for training and one for predicted action. The basic model is trained for 50 epochs in batches of
size 128. To encode input vectors to the hidden layer, I adopt the Rectified Linear unit (ReLU) as
an encoding activation function (Ketkar & Santana, 2017). Compared to other activation
                                                  107


functions, ReLU, as one of the most popular activation functions, has several advantages in
terms of computation time and efficiency of gradient propagation (Xu et al., 2016). The ReLU
activation function is defined as follows;
        (9)                     ℎ = #*+,- (&) = max(0, &)               ℎ ∈ [0,1]
        This activation function produces a linear function only if & ≥ 0, otherwise it outputs
only 0. For the classification, I employ Softmax activation function as last layer. Softmax is
generally used for a multi-class classification (Mehdiyev et al., 2020). To estimate a discrete
probability of class i, Softmax layer is defined as:
                                                 ./0("! 2)
        (10)                    F(G = )|&) = ∑
                                                 ! ./0 ("! 2)
        where w is a weighted parameter and x indicates the input vector. Based on the
probability distribution of classes, a class with the highest probability of prediction is selected.
Table 3.5 shows the hyperparameter configurations for this study.
TABLE 3.5. CONFIGURATION PARAMETERS OF THE LSTM NETWORK
                        Parameters                                     Value
        Sequence length of actions for prediction                          4
                  Embedding dimension                                     50
                          Epoch                                           50
                         Batch size                                      128
                         Activation                                     ReLU
              Activation for classification                           Softmax
                           Loss                               Categorical_Crossentropy
                                                   108


3.5. Results
Table 3.6 summarizes the overall performance for the next action prediction task in the clinical
documentation process. I use weighted average accuracy, precision, recall, and F-score value for
the comparison. Overall, the suggested approach with contextual factors has better performance
than the model with a sequence of action only, and each of the factors has different impacts on
the prediction level. The initial result of the study shows the capacity to predict the next action in
the clinical documentation process. I have tested four different types of models; 1) the sequence
of actions model, 2) the model with the internal contextual factors, 3) the model with the external
contextual factors, and 4) the model considering all the contextual factors.
TABLE 3.6. RESULTS FROM PROPOSED APPROACH
                                                       Accuracy    Precision Recall F-score
     No Contextual
     Factor
                              One Action               0.283       0.26        0.04      0.05
                              Two Actions              0.373       0.57        0.14      0.20
                              Three Actions            0.423       0.61        0.22      0.30
                              Four Actions             0.454       0.66        0.26      0.36
     Internal Contextual
                              Four Actions +
     Factors
                              Role                     0.461       0.68        0.27      0.37
                              Workstation              0.471       0.69        0.29      0.38
                              Role + Workstation       0.478       0.69        0.30      0.40
     External Contextual
                              Four Actions +
     Factors
                              Diagnosis Group          0.458       0.68        0.27      0.36
                              Flu Season               0.455       0.67        0.29      0.36
                              System Upgrade           0.469       0.67        0.29      0.38
                              Diagnosis Group +
                              Flu Season +             0.475       0.69        0.29      0.39
                              System Upgrade
     All Contextual                                    0.494       0.70        0.32      0.42
     Factors
        In the first model, I predict the next action only based on the sequence of actions for the
base model. To examine the effects of sequence of actions, I run the models including different
                                                   109


number of actions. For the internal contextual factors, I add a role and workstation as those
immediate contexts are the attributes that directly facilitate the execution of process (Rosemann
et al., 2008). Next, I use the diagnosis group of patients, flu seasons, and system upgrade as
external contextual-specific covariates since they are impactful factors on the process, but
beyond the controllable boundary of the organization. Lastly, I include all the factors for the
prediction to see the extent to which contextual factors affect the prediction level.
         The average validation accuracy for all learning rates of each model shows that as I
assumed, the action is the most important factor for the process predictive model. However, the
margin of increase is reduced when more action sequences are added, so I added the contextual
factors as additional attributes in the model. The internal contextual factors generally have
slightly higher predictive power than the external factors (0.478 vs. 0.475). Specifically, the
workstation works better than the role (0.471 vs. 0.461), but the combination of role and
workstation does not show much difference with workstation (0.476 vs. 0.471). This result
implies that workstation as location (where) is more informative because clinicians perform
specific tasks at a specific location. Although the role as the actor provides information on what
role each clinician performs, the location information could provide much more detailed
information.
         In case of the external factors, whereas most of the factors do not boost accuracy a lot
(Diagnosis group = 0.458 and flu season = 0.455, system upgrade does increase accuracy as
much as workstation (0.469). This makes sense because the system upgrade changes the lexicon
of the actions. After the system upgrade, some of the actions are no longer available and new
actions are added. These new and removed actions could create new habits for the system use.
Thus, the system upgrade attribute is informative to predict the next events, as it infers that new
                                                  110


pattern of actions are created or some paths are removed from the process. In case of diagnosis
group and flu season, in contrast, there is no dramatic change in accuracy for both models. I
expected that the system usage patterns of the users might change depending on whether or not it
is the flu season or patients’ diagnosis, but they don’t seem to be very informative. These results
show that although the internal contextual factors generally boost accuracy more, there are still
important external factors that may affect the quality of the process predictive model.
3.6. Discussion
This essay represents a first step toward revealing the importance of contextual factors in process
prediction. I use RNN to model the observed sequence of actions and their contextual factors
together in the process. Specifically, I use Long Short-Term Memory Networks (LSTM) to find
recognizable patterns and predict events (Gers et al., 2002; Tello-Leal et al., 2018).
The main contribution of this study concerns the idea of contextual information on process
prediction. There is no doubt that the most essential attribute of the predictive process model is
the sequence of actions. However, adding more actions does not fully reflect the structure of
complex process because there is no consideration of context. The result of this study shows that
the internal contextual factors increase the prediction level more than the external contextual
factors.
         From the internal contextual factor, the influence of the workstation is very interesting. In
the EPIC EMR system, every workstation provides the same function for users. So, the
workstations can be regarded as identical from the point of view of the users. However, every
workstation is located in a different place, so the workstation indicates the location of the work
(e.g., in the examination room, at the nurses’ station in the hall, in the front office, etc.). In this
perspective, the effect of the workstation may not be surprising because the physical
                                                  111


environment of a hospital could determine its influence. A busy hallway is different from a
private office. Of course, these contextual differences are not generally conceptualized as
relevant to process execution, but this study suggests that they can be.
        The system upgrade, on the other hand, is an important external factor that increases the
prediction accuracy. This variable provides a simple indicator of whether the system is upgraded
when a patient visits a clinic, but it seems to play an important role in the prediction model. This
implies that the patterns of the system use may change when the system is upgraded. Habitual
patterns of actions can be changed depending on the system the users use, and it affects the
prediction level considerably. This points out that although the external contextual factors are not
controllable as much as the internal factors are, they still need to be considered when it comes to
predicting the next events in process.
        This study extends our understanding of the entangled relationship between contextual
factors (features of nodes) and actions (nodes) and the extent to which the factors could impact
predicting the next actions in EMR settings. Currently, the clinical process has been more
complex because of entangled relationships among numerous stakeholders and new
technologies. Complexity of the process influences the quality of the model, so the
understanding of the relationship could provide clues to disentangling complex relationships and
finding recognizable patterns (Augusto et al., 2022). The recognizable patterns are useful for
organizing actions in the clinical documentation process.
        My results show relatively less accuracy and precision than studies that use simpler event
logs for training and testing (e.g., the studies in Table 1). However, the purpose of this essay is
different from other process predictive frameworks in two ways. First, my analysis shows that
the suggested approach can be applied and worked in real process datasets that are extremely
                                                   112


complex. Second, I extend the idea of a process predictive model based on LSTM. To the best of
my knowledge, prior studies suggest a predictive model based on the previous events only or
with a few contextual factors, but there are few studies to see the effects of contextual factors.
The main goal of the study is to see how the contextual factors affect the prediction, rather than
introducing a higher performance prediction model using LSTM.
        Another contribution of this study is its practical implication in the clinical
documentation process in terms of text suggestion. Currently, clinical documentation is regarded
as a process that requires considerable time consumption (Friedman et al., 2004; Lin et al.,
2018). Predicting the next actions suggests what comes next and it helps input the documentation
process faster. The application of my approach with contextual factors could reduce the number
of suggested actions and increase human accuracy. In other words, using suggested actions in the
documentation process could even reduce the chance that clinical practitioners may input wrong
information by mistake. I assume that considering contextual factors in the prediction model for
the process could help the interdependent organization process be efficient and effective.
3.7. Conclusion
This essay uses a deep learning approach to predict the next actions in the clinical documentation
process and investigates the effectiveness of contextual factors in predicting events. To examine
the effects of contextual factors on predictive performance, I apply the deep learning model
using LSTM recurrent neural networks and compare different models with different
combinations of attributes. This paper shows how the LSTM-based approach performs for
predicting the sequence of actions in the clinical documentation process. As expected, the results
show that context can improve predictive models. In the case of outpatient medical clinics, the
strongest improvement in accuracy comes from two attributes: 1) the workstation (location)
                                                 113


where work is performed and 2) whether or not the system has been upgraded. This result
implies positive potential to demonstrate the significance of contextual factors in the predictive
model for the clinical documentation process.
                                                114


BIBLIOGRAPHY
      115


                                         BIBLIOGRAPHY
Allen, P. M., & Varga, L. (2006). A co–Evolutionary Complex Systems Perspective on
        Information Systems. Journal of Information Technology, 21(4), 229-238.
Augusto, A., Mendling, J., Vidgof, M., & Wurm, B. (2022). The connection between process
        complexity of event sequences and models discovered by process mining. Information
        Sciences, 598, 196-215.
Baziotis, C., Pelekis, N., & Doulkeridis, C. (2017). Datastories at semeval-2017 task 4: Deep
        lstm with attention for message-level and topic-based sentiment analysis. Proceedings of
        the 11th international workshop on semantic evaluation (SemEval-2017).
Becker, T., & Intoyoad, W. (2017). Context aware process mining in logistics. Procedia Cirp,
        63, 557-562.
Bozkaya, M., Gabriels, J., & van der Werf, J. M. (2009). Process diagnostics: a method based on
        process mining. 2009 International Conference on Information, Process, and Knowledge
        Management.
Breuker, D., Matzner, M., Delfmann, P., & Becker, J. (2016). Comprehensible Predictive Models
        for Business Processes. MIS Quarterly., 40(4), 1009-1034.
Camargo, M., Dumas, M., & González-Rojas, O. (2019). Learning accurate LSTM models of
        business processes. International Conference on Business Process Management.
Choi, E., Bahadori, M. T., Schuetz, A., Stewart, W. F., & Sun, J. (2016). Doctor ai: Predicting
        clinical events via recurrent neural networks. Machine Learning for Healthcare
        Conference.
Di Francescomarino, C., Ghidini, C., Maggi, F. M., Petrucci, G., & Yeshchenko, A. (2017). An
        eye into the future: leveraging a-priori knowledge in predictive business process
        monitoring. International Conference on Business Process Management.
Evermann, J., Rehse, J.-R., & Fettke, P. (2017). Predicting process behaviour using deep
        learning. Decision Support Systems, 100, 129-140.
Frankel, R., Altschuler, A., George, S., Kinsman, J., Jimison, H., Robertson, N. R., & Hsu, J.
        (2005). Effects of exam-room computing on clinician-patient communication. Journal of
        general internal medicine, 20(8), 677-682.
Friedman, C., Shagina, L., Lussier, Y., & Hripcsak, G. (2004). Automated encoding of clinical
        documents based on natural language processing. Journal of the American Medical
        Informatics Association, 11(5), 392-402.
                                                 116


Gacitua-Decar, V., & Pahl, C. (2009). Automatic business process pattern matching for
         enterprise services design. 2009 World Conference on Services-II.
Gers, F. A., Schmidhuber, J., & Cummins, F. (1999). Learning to forget: Continual prediction
         with LSTM.
Gers, F. A., Schraudolph, N. N., & Schmidhuber, J. (2002). Learning precise timing with LSTM
         recurrent networks. Journal of machine learning research, 3(Aug), 115-143.
Graves, A., Fernández, S., Gomez, F., & Schmidhuber, J. (2006). Connectionist temporal
         classification: labelling unsegmented sequence data with recurrent neural networks.
         Proceedings of the 23rd international conference on Machine learning.
Hochreiter, S., Bengio, Y., Frasconi, P., & Schmidhuber, J. (2001). Gradient flow in recurrent
         nets: the difficulty of learning long-term dependencies. In F. K. John & C. K. Stefan
         (Eds.), A field guide to dynamical recurrent neural networks. IEEE Press.
Keras-team, K. D. (2019). The python deep learning library. Available (accessed 5 May 2019):
         https://keras. io.
Ketkar, N., & Santana, E. (2017). Deep Learning with Python (Vol. 1). Springer.
Kronsbein, D., Meiser, D., & Leyer, M. (2014). Conceptualisation of contextual factors for
         business process performance. Proceedings of the International MultiConference of
         Engineers and Computer Scientists.
Li, J., Bose, R. J. C., & van der Aalst, W. M. (2010). Mining context-dependent and interactive
         business process maps using execution patterns. International Conference on Business
         Process Management.
Lin, S. Y., Shanafelt, T. D., & Asch, S. M. (2018). Reimagining clinical documentation with
         artificial intelligence. Mayo Clinic Proceedings.
Lipton, Z. C., Berkowitz, J., & Elkan, C. (2015). A critical review of recurrent neural networks
         for sequence learning. arXiv preprint arXiv:1506.00019.
Mehdiyev, N., Evermann, J., & Fettke, P. (2020). A novel business process prediction model
         using a deep learning method. Business & information systems engineering, 62(2), 143-
         157.
Mejia Bernal, J. F., Falcarin, P., Morisio, M., & Dai, J. (2010). Dynamic context-aware business
         process: a rule-based approach supported by pattern identification. Proceedings of the
         2010 ACM Symposium on Applied Computing.
                                                   117


Nakov, P., Ritter, A., Rosenthal, S., Sebastiani, F., & Stoyanov, V. (2019). SemEval-2016 task 4:
        Sentiment analysis in Twitter. arXiv preprint arXiv:1912.01973.
Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the difficulty of training recurrent neural
        networks. International conference on machine learning.
Pentland, B. T., Mahringer, C. A., Dittrich, K., Feldman, M. S., & Wolf, J. R. (2020). Process
        multiplicity and process dynamics: Weaving the space of possible paths. Organization
        Theory, 1(3), 2631787720963138.
Ploesser, K., Peleg, M., Soffer, P., Rosemann, M., & Recker, J. C. (2009). Learning from context
        to improve business processes. BPTrends, 6(1), 1-7.
Pravilovic, S., Appice, A., & Malerba, D. (2013). Process mining to forecast the future of
        running cases. International Workshop on New Frontiers in Mining Complex Patterns.
Rettig, C. (2007). The trouble with enterprise software. MIT Sloan management review, 49(1),
        21.
Rosemann, M., Recker, J., & Flender, C. (2008). Contextualisation of business processes.
        International Journal of Business Process Integration and Management, 3(1), 47-60.
Russell, N., Ter Hofstede, A. H., van der Aalst, W. M., & Mulyar, N. (2006). Workflow control-
        flow patterns: A revised view. BPM Center Report BPM-06-22, BPMcenter. org, 06-22.
Tax, N., Verenich, I., La Rosa, M., & Dumas, M. (2017). Predictive business process monitoring
        with LSTM neural networks. International Conference on Advanced Information Systems
        Engineering.
Tello-Leal, E., Roa, J., Rubiolo, M., & Ramirez-Alcocer, U. M. (2018). Predicting Activities in
        Business Processes with LSTM Recurrent Neural Networks. 2018 ITU Kaleidoscope:
        Machine Learning for a 5G Future (ITU K).
van der Aalst, W. M., De Medeiros, A. A., & Weijters, A. J. (2005). Genetic process mining.
        International conference on application and theory of Petri nets.
van der Aalst, W. M., & Dustdar, S. (2012). Process mining put into context. IEEE Internet
        Computing, 16(1), 82-86.
van der Aalst, W. M., Reijers, H. A., Weijters, A. J., van Dongen, B. F., De Medeiros, A. A.,
        Song, M., & Verbeek, H. (2007). Business process mining: An industrial application.
        Information Systems, 32(5), 713-732.
van der Aalst, W. M., & Weijters, A. J. (2004). Process mining: a research agenda. Computers in
        industry, 53(3), 231-244.
                                                118


van der Werf, J. M. E., van Dongen, B. F., Hurkens, C. A., & Serebrenik, A. (2008). Process
       discovery using integer linear programming. International conference on applications and
       theory of Petri nets.
van Dongen, B. F., Crooy, R. A., & van der Aalst, W. M. (2008). Cycle time prediction: When
       will this case finally be finished? OTM Confederated International Conferences" On the
       Move to Meaningful Internet Systems".
vom Brocke, J., Zelt, S., & Schmiedel, T. (2016). On the role of context in business process
       management. International Journal of Information Management, 36(3), 486-495.
Xu, Y., Huang, Q., Wang, W., & Plumbley, M. D. (2016). Hierarchical learning for DNN-based
       acoustic scene classification. arXiv preprint arXiv:1607.03682.
                                               119