, 3... 1‘... .. .

4H: .

 

. n.
a... nuwWMm.
.5. .3 J

Gr
t v{r\.v1\l‘ov.‘1\|tt\al‘¢fltp|
i..u.t.x3!.l\lJﬂIt\cO¢| Rib!
i.’.v\.ll~§:tmvxntoiiztct
IKIQ‘nII titt‘li..v!
6.... )3.rl.v z...“ is}
{ill-:3. :

 

 

I . ‘0 2 ligl‘lt \IIL
ﬁriﬁ 3.3%; 4 .z

 

: hittr.’ Pi.

..o. 52.; ...cuvil. . . .. , .. . . , , . . L . : . . , .
..a .3 . . . . . . . .w it... ‘ . it-
. . 1333‘s.. a . . .. .. . . . . . . 0

E

 

LIBRARY
Michigan State
University

 

 

 

This is to certify that the
dissertation entitled

DIGITALLY-ENABLED ORGANIZATIONAL ROUTINES
AT THE ORGANIZATION-ENVIRONMENT BOUNDARY:
BUFFERING AND THE ROLE OF TECHNOLOGY

presented by

Derek William Hillison '

has been accepted towards fulﬁllment
of the requirements for the

PhD degree in Business Information Systems

 

Major Professor’s Signature
I'Zi/ /// 0 if

Date

MSU is an Aﬁimrative Action/Equal Opportunity Employer

 

 

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

5/08 K:/Prolecc&Pres/CIRC/DateDue.mdd

 

DIGITALLY-ENABLED ORGANIZATIONAL ROUTINES AT THE
ORGANIZATION-ENVIRONMENT BOUNDARY: BUFFERING AND THE ROLE
OF TECHNOLOGY
By

Derek William Hillison

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Business Information Systems

2009

ABSTRACT

DIGITALLY-ENABLED ORGANIZATIONAL ROUTINES AT THE
ORGANIZATION-ENVIRONMENT BOUNDARY: BUFFERING AND THE ROLE
OF TECHNOLOGY
By

Derek William Hillison

Boundary units of an organization uniquely experience the tension between
adaptation to environmental variation and maintaining stable outcomes for the rest of the
organization. In our world of just-in-time supply chain systems, lot-sizes of one, lean
manufacturing and an increasing focus on services, traditional forms of buffering such as
queuing and warehousing are not available or less effective. This tension between
stability and ﬂexibility must be reconciled in the actions and processes of the boundary
unit as reﬂected in recognizable patterns of action. In addition, the development and
organizational adoption of workﬂow technologies has reduced coordination costs while
automating and reifying business rules, both enabling and constraining organizational
actions. The assimilation of these workﬂow systems may fundamentally alter the
qualities of ﬂexibility and rigidity in the performance of organizational routines,

consequently altering properties of organizational ﬂexibility and adaptation.

Copyﬁghtby
Derek William Hillison

2009

I dedicate this dissertation to my wife, who stood by me and supported me through all the
snow, dark days, and cold of East Lansing winters as well as the gloriously beautiful
summers. She still loves me. I also dedicate this to my Uncle Gary, who always wanted

to see me become an author.

ACKNOWLEDGEMENTS

I want to thank Dr. Brian Pentland for his tireless support and patience in advising
me through the dissertation process. I don’t think 1 could have done this with a different
advisor. I also want to thank my committee for ensuring the quality and impact of my
work, while allowing me the time and freedom to explore. Dr. V. Sambamurthy was
always willing to guide me to find my story and support it within the literature. Dr.
Calantone provided technical, political and social guidance that was very helpful and
always appreciated. Dr. Thorvald Haerem provided much of the data collection support
in Norway, opened his home to me and even took a valuable Saturday to show me around
the more charming places within Oslo. Michigan State University, the Center for
Leadership of the Digital Enterprise, The Norwegian School of Management and the
Norwegian Research Council provided funding and support for this dissertation. I could
not have obtained the data without Thorvald and the help of Compello AS, its president
and employees, and Alf Slettemoen. I want to thank my wife, parents, and brother for all
the love, dedication and help they provided—it is immeasurable. To all those deserving
of thanks who I leave out solely in the interest of saving letters, you have my appreciation

and gratitude.

TABLE OF CONTENTS

LIST OF TABLES ......................................................................................................... VIII
LIST OF FIGURES ........................................................................................................... X
CHAPTER l—EXECUTIVE SUMMARY ....................................................................... 1
CHAPTER 2—INTRODUCTION ..................................................................................... 4
Overview ......................................................................................................................... 4
Impact ............................................................................................................................. 7
Research Design .............................................................................................................. 8
Results ........................................................................................................................... IO
Outline of Chapters ....................................................................................................... 11
CHAPTER 3—LITERATURE REVIEW AND THEORY DEVELOPMENT ............... 13
Introduction ................................................................................................................... 13
Review of Literature ..................................................................................................... 13
The Organization as an Open System ........................................................................... 15
Boundaries and Buffering ............................................................................................. 16
Patterns of Action ......................................................................................................... 18
The Role of Technology ............................................................................................... 20
Theory and Model Development .................................................................................. 22
Process Variety at the Boundary ................................................................................... 23
The Relationship between Process Variation and Outcomes ....................................... 25
Technology Enablement and Constraint ....................................................................... 26
Application to Mail Handling ....................................................................................... 30
CHAPTER 4—METHODOLOGY AND RESEARCH DESIGN ................................... 32
Introduction ................................................................................................................... 32
Research Design ............................................................................................................ 32
Operationalized Variables ............................................................................................. 39
Markov Transition Matrix Method ............................................................................... 42
String Distance Approach ............................................................................................. 43
CHAPTER 5—MARKOV ANALYSIS AND FINDINGS ............................................. 45
Introduction ................................................................................................................... 45
Determining the Order of the Processes ....................................................................... 45
Examining Stationarity of the Process .......................................................................... 46
Group Comparisons ...................................................................................................... 48
Does the Process Vary with Differential Inputs? .......................................................... 51
Are Processes that Have Different Outcomes Similar? ................................................ 55
Does the Process Vary with the Amount of Automation Present? ............................... 58
Discussion ..................................................................................................................... 62

vi

CHAPTER 6—STRING DISTANCE ANALYSIS AND FINDINGS ............................ 66

Introduction ................................................................................................................... 66
Model Variables ............................................................................................................ 67
Preparing the Data ......................................................................................................... 67
Initial Exploration ......................................................................................................... 68
Extracting Metric Relationships between Sequences ................................................... 70
Choosing a Distance Function ...................................................................................... 71
Results ........................................................................................................................... 74
Interpreting the MDS Sequence Dimensions ................................................................ 74
Qualitative Analysis of MDS Dimensions .................................................................... 75
Using Multiple Regression to Understand MDS Dimensions ...................................... 78
Multiple Regression and Visualization ......................................................................... 81
What do these Dimensions Mean? ................................................................................ 82
Evaluating the Research Questions ............................................................................... 83
Construct Formation ..................................................................................................... 84
Inputs, Process, Outcome: Buffering ........................................................................... 85
Discussion ..................................................................................................................... 90
Introduction ................................................................................................................... 94
Theoretical Implications ............................................................................................... 94
Organizational Theory .................................................................................................. 94
IT Impact ....................................................................................................................... 98
Methodological Implications ...................................................................................... 100
Practical Implications .................................................................................................. 101
Managerial Impacts ..................................................................................................... 101
Impacts on Information System Design ...................................................................... 103
Impact on Information System Users ......................................................................... 104
Limitations .................................................................................................................. 105
APPENDIX 1: STRING MATCHING DISTANCE ...................................................... 111
APPENDIX 2: MARKOV ANALYSIS ....... ERROR! BOOKMARK NOT DEFINED.
APPENDIX 3: STRING DISTANCE ANALYSIS ....................................................... 119
REFERENCES ............................................................................................................... 128

vii

LIST OF TABLES

Table 1: Summary of scholarly contributions ..................................................................... 8

Table 2: Deﬁnitions of organizational routines with ostensive and performative aspects

of the organizational routine ............................................................................................. 19
Table 3: Methods for measuring sequential variety .......................................................... 34
Table 4: Example event log from workﬂow system ........................................................ 37
Table 5: Sequences extracted from the workﬂow event log ............................................ 37
Table 6: Variables contextualized for the invoicing business process ............................ 39
Table 7: Variables operationalized and linked to the analysis model .............................. 40
Table 8: Descriptive information for invoice variables .................................................... 41
Table 9: Descriptive information automation percentage ................................................ 42
Table 10: Constructs and variables used to segment sequences ....................................... 50
Table 11: Descriptive information for groups of invoice amount ................................... 51
Table 12: Descriptive information for groups of invoice amount ................................... 52
Table 13: Group comparisons for invoice amount ........................................................... 53
Table 14: Groups and descriptives for total vendor count ............................................... 54
Table 15: Groups and descriptives for vendor experience ............................................... 54

Table 16: Group comparisons for total number of invoices for a vendor and the vendor

experience, for each invoice. ............................................................................................ 55
Table 17: Descriptives for the length (in days) of the process ........................................ 56
Table 18: Comparing processes with similar outcomes .................................................. 57

Table 19: Groups and descriptives for automation (automated actions, sequence length).
........................................................................................................................................... 58

Table 20: Groups and descriptives for automation (automation percent) ........................ 59

viii

Table 21: Group comparisons for the amount of automation present within a process .. 59

Table 22: Group comparisons for the amount of automation present within a process,
visual stratiﬁcation process ............................................................................................... 62

Table 23: Summary of results—bullets indicate similarity, or an insigniﬁcant statistical
test ..................................................................................................................................... 64

Table 24: Qualitative results from examining raw sequences and extracted variables 76

Table 25: Sample size and basis for samples and data .................................................... 77
Table 26: Descriptive information for invoice variables .................................................. 78
Table 27 Descriptive information for sequence variables ................................................ 78

Table 28: Two and three dimensional standardized regressions on input variables, entry

and approval phase ............................................................................................................ 79
Table 29: Two and three dimensional standardized regressions on outcome variables,
entry and approval phase .................................................................................................. 79
Table 30: Two and three dimensional standardized regressions on automation variables,
entry and approval phase .................................................................................................. 80
Table 31: Improvement in explained variance (adj r-square) in 3-dimensional solution
over the 2-dimensional solution ........................................................................................ 81
Table 32: Transition matrix for entry phase .................................................................. 113
Table 33: Entry phase—Order of the process ................................................................ 114
Table 34: Approval phase—Order of the process .......................................................... 115
Table 35: Group statistics for visual stratiﬁcation based on automation ....................... 117
Table 36: Omnibus test of stationarity results ............................................................... 118
Table 37: Subsequent tests of homogeneity ................................................................... 118
Table 38: Entry covariance matrix for input, process, outcome, and automation ......... 126
Table 39: Entry correlation matrix for input, process, outcome, and automation ......... 126

Table 40: Approval covariance matrix for input, process, outcome, and automation... 127

Table 41: Approval correlation matrix for input, process, outcome, and automation 127

ix

LIST OF FIGURES

Figure 1: Examining an organizational boundary unit and its routines more closely: It
can be seen how the business process absorbs variety through the impact of
organizational routines and technology only if we reject the ‘black box’ perspective of
organizational and business processes. ............................................................................... 6

Figure 2: Melao and Pid (2000) show a general model of a business process that also
represents the open system view of the organization ........................................................ 15

Figure 3: Research Model—the proposed relationships between environmental variation,

sequential variety, and technology and outcomes from a business process. .................... 23
Figure 4: Analysis model, showing contextualized variables and relationships ............. 33
Figure 5: Flowchart of the invoice processing routine .................................................... 35
Figure 6: Distribution of automation for entry phase. ...................................................... 60
Figure 7: Distribution of automation for approval phase ................................................. 60
Figure 8: Entry (square) and approval (triangle) sealed in two dimensions. .................... 69

Figure 9: Entry scree plot—Top line is raw, next line down is Jaccard, bottom line is
Euclidian. Stress is the y-axis, number of dimensions on the x-axis ............................... 72

Figure 10: Approval scree plot—Top line is raw, next line down is Euclidian, bottom
line is Jaccard. Stress is the y—axis, number of dimensions on the x-axis ........................ 73

Figure 11: Coefﬁcient and explained variance for input-process-outcome buffering
model, entry phase ............................................................................................................ 85

Figure 12: Coefﬁcient and explained variance for input-process-outcome buffering
model, approval phase ...................................................................................................... 86

Figure 13: Coefﬁcient and explained variance for input-process-outcome with
automation model, entry phase ......................................................................................... 88

Figure 14: Coefﬁcient and explained variance for input-process-outcome with automation

model, approval phase ...................................................................................................... 89
Figure 15: Scree plot for Hi (H0 = logc , Hi 2 Hsymbol) for entry phase ................. 116
Figure 16: Scree plot for Hi (HO = logc , Hi = Hsymbol) for approval phase ......... 116

Figure 17: Entry and approval Sheppard Plots, 1 through 5 dimensions, continued next
two pages ........................................................................................................................ 119

Figure 18: V1 and V2 of 3-d projection, entry phase showing lines representing the
regression coefﬁcients of variables of interest ................................................................ 122

Figure 19: V1 and V3 of 3-d projection, entry phase showing lines representing the
regression coefﬁcients of variables of interest ................................................................ 123

Figure 20: V2 and V3 of 3—d projection, entry phase showing lines representing the
regression coefﬁcients of variables of interest ................................................................ 123

Figure 21: V1 and V2 of 3-d projection, approval phase showing lines representing the
regression coefﬁcients of variables of interest ................................................................ 124

Figure 22: V2 and V3 of 3-d projection, approval phase showing lines representing the
regression coefﬁcients of variables of interest ................................................................ 124

Figure 23: V1 and V2 of 3-d projection, approval phase showing lines representing the
regression coefﬁcients of variables of interest ................................................................ 125

xi

Chapter 1—Executive Summary

Title: Digitally-Enabled Organizational Routines at the Organization-

Environment Boundary: Buffering and the Role of Technology

This research seeks to answer two research questions:

Research Question I .' How does a business process at the organization-
environment boundary utilize various patterns of action to moderate the impact of
environmental variation on process outcomes?

Research Question 2: What is the impact of information technology use on the
variety found within a digitally-enabled business process?

These research questions are viewed from a perspective that integrates theory from these

three traditions:

0 Systems theory and cybemetics advance the ideas of buffering environmental input at
the interface between organization and environment and the necessary regulation of
variety.

0 Organizational routines give a perspective from which to empirically study the
emergent patterns of action within a business process.

0 The appropriation and assimilation of information systems, focusing on the use of
technology within a business process allows the investigation of immediate
antecedents and consequences of information technology.

These perspectives are integrated into a general model describing the inputs to a process,

the activities that are undertaken within the process, and the outcomes of that process.

This model will be evaluated using two methods of analysis on data collected from an

invoice processing workﬂow system.

Viewing the sequences of action generated within the business process as a Markov
model allows familiar statistical techniques based on chi-square tests of homogeneity
to evaluate the impact of input variation on the process and its outcomes.
Alternatively, using sequential variety measures developed from string distance, the
input-process-outcome model can be evaluated through regression, path analysis, or

structured equation modeling.

Practically, the results of this research increase managerial understanding of the sources

and consequences of variety in their processes. Designers of information systems that

support business and organizational processes such as workﬂow, supply chain, or ERP

also beneﬁt from a study of the impact of technology on these processes. The

methodologies employed in this work can be reapplied in other areas, allowing outcome-

based selection and retention of speciﬁc characteristics of business process performances.

Theoretically, this research enriches at least three traditions of scholarship:

Evaluation of costs and beneﬁts of Information Technology. By studying the
immediate antecedents and consequences of digitally-enabled business process, we
gain a better understanding of how the use of technology can achieve economic and
organizational beneﬁts and costs.

Consideration of variety in Organizational Routines. While there is an understanding
that routines necessarily exhibit variety in their execution, the drivers and
consequences of this essential variety are less understood. From some perspectives,
process variation is bad (control, audit, TQM), but from others, process variation is

good (service quality, responsiveness).

Extension of sequence methodology. Methodologically, I am applying sequence
methods developed in sociology, biology, information theory and social psychology.
This research represents both an application and extension of these methodologies,
and should result in novel insights and further research in other areas using these

techniques.

The main ﬁndings of this work are based on data from the processing of 2000 invoices in

one organization. In this particular workﬂow:

Support for buffering at the boundary has been obtained. Inputs have little
relationship to outcomes, but do impact how the process unfolds.

Automation shows a differential effect on two subprocesses of invoicing. In the data
entry phase, it is a substitute for process-based buffering, while in the approval phase,
automation is a complement.

Markov and string distance methods complement each other in that they study the

antecedents and outcomes from temporal structures differently.

Chapter 2—Introduction

“. . .we must accept the coexistence of mutually contradictory phenomenon without trying
to resolve the contradiction. . .new technologies will permit customized manufacture on a
mass basis. Rather than being limited by the paradox, they seem to embrace and
transcend it” (Davis, 1989).

Overview

Classically, organizational scholars describe mechanisms of insulating and
protecting the ‘technical core’ of an organization from environmental variety and
uncertainty by warehousing, demand leveling, and contingent action plans (Thompson,
1967). Boundary-spanning units of an organization experience the challenge of
absorbing, managing, and controlling environmental variety (Lynn, 2005; Meznar &
Nigh, 1995; Thompson, 1967; Yan & Louis, 1999). The rise of information technology
and innovative business models such as just-in-time inventory and custom manufacturing
make many traditional methods of insulation less useful.

Actions that are contingently performed based on context or stimulus allow
processes to embrace and absorb a given amount of variation, yet remain manageable and
achieve controlled outcomes (Ashby, 1956, 1968; Davis, 1989; March & Simon, 1958;
Simon, 1996). At the same time, the virtualization of workﬂow and the advent of inter-
organizational information systems have increased the technical structure that boundary-
spanning processes must operate within (Basu & Kumar, 2002; Chen, Chen, & Shao,
2003; Georgakopoulos, Homick, & Sheth, 1995), challenging our understanding of

process variety and stability under these conditions.

I develop a worldview that integrates the open system perspective of the
organization with the necessary regulation of variety from the environment. I focus upon
contingently ﬁring actions and the use of technology within business processes at the
boundary of the organization and environment. Viewing these business processes as
performative aspects of organizational routines (F eldman & Pentland, 2003) allows the
concept of sequential variety to describe the nature of the patterns of action that are
expressed (Pentland, 2003a, 2003b). Combining this perspective with cybernetic
homeostasis and the use of technology helps describe how a boundary business process
can absorb variable inputs from the environment, and explore the impact of technology
on the process and its outcomes.

The boundary process shown in Figure 1 is a regulator or mediator of variety,
represented by the relative sizes of arrows on the input and output sides of the routines.
Through the contingent expression of actions, boundary processes obtain a controlled and
managed ﬂow of outputs despite the incursion of environmental variety without utilizing
traditional forms of buffering such as warehousing, demand leveling, quotas and
rationing. Technology and actions become the mechanism through which the buffering

of variety occurs.

   

    
   

The

Environment /

Inputs From '*7
Environment ‘.,Y3.r.lety

  
 

if::::iil Redm

i'Wdrkflow I 1 Variety . Out ut/

ITéchrTéIogy I i—* "‘y Outcfame
‘“' I from Process

  

. I L, L.
Routines I
Regulate I

I I

 

 

    

  

L_

 

Variation

 

Tthe‘rfacETtO/EhviFoﬁnmentrﬂ
(Boundary Organizational Unit)
Figure l: Examining an organizational boundary unit and its routines more closely:
It can be seen how the business process absorbs variety through the impact of
organizational routines and technology only if we reject the ‘black box’ perspective
of organizational and business processes.

I examine actions performed within an information system poised at the
intersection of an organization and its environment. I use process data gleaned from a
workﬂow information system designed for the processing and decision-making
surrounding the invoice payment business process. This study gives a rare glimpse into
the antecedents and consequences of technology use at the business-process level, by
isolating the automational aspects of the system and studying their drivers and
consequences.

This dissertation represents an innovative study of the antecedents and
consequences of sequential variety. Workﬂow mining techniques give an unprecedented
view into the performative aspects of an organizational routine. The workﬂow system
structures and enables the type and sequence of activities that are performed, while

providing necessary data for research. This focus on the actual behaviors and use of

technology allows the novel use of sequential methods to study the drivers and

 
  

consequences of temporal action structures. This study represents a unique opportunity
to answer empirically the following research questions:
Research Question I .' How does a business process at the organization-
environment boundary utilize various patterns of action to moderate the impact of

environmental variation on process outcomes?

Research Question 2: What is the impact of information technology use on the
variety found within a digitally-enabled business process?

Impact

This research is aimed at two main groups of scholars. First, given the rise of
interest by organizational scholars in agility, simultaneous exploration and exploration
(ambidexterity) and hypercompetitive environments, a reexamination of the classic ideas
of buffering and environmental adaptation is warranted. Second, as business processes
are increasingly virtualized and digitized, those who study the organizational impact of
information technology will have a keen interest in the outcomes of this dissertation and
resulting research. Methodologically, this dissertation uses novel methods to measure
and analyze sequential processes. Researchers involved in studying the sequential
structure of processes such as negotiation, organizational change, or auditing among
others beneﬁt from the evaluation and extension of these methods. Table 1 shows these

scholarly contributions, along with connections to extant literature.

 

Contribution Reference

 

FOCUS 0” a technologically-enabled busrness (Kohli & Hoadley, 2006; Mukhopadhyay
process. and its immediate antecedents and et al 1997)

consequences

 

Isolation of the automation aspect of IT use (Mooney. Gurbaxani, & Kraemer, 1996)

 

Antecedents and consequences of

sequential variety of a work process (Pentland, 2003a, 2003b)

 

 

(van der Aalst et al., 2003; van der Aalst
& Weijters, 2004; van der Aalst,
Weijters, & Maruster, 2004)

Application of workflow mining to
organizational research questions

 

 

 

Table 1: Summary of scholarly contributions

Similarly, there are two main groups of practitioners that beneﬁt from this
research: managers and information systems professionals. As managers seek to improve
the performance of business processes in the face of environmental change and variety, a
better understanding of the antecedents and consequences of process variety may give
insight into their attempts to creatively control these processes. Managers, designers and
users of workﬂow technologies need to understand the complex impacts of technology
use on the structure and resilience of business processes. For example, real-time
workﬂow analysis systems that are organized around extracting and correlating patterns
of action with their antecedents and consequences are core to the development of a
‘digital dashboard’ for processes (Weske, van der Aalst, & Verbeek, 2004). In addition,
managers may need to be reminded of the inherent variety in performing a task in the
‘one best way’.
Research Design

The acquisition business process sits at the interface between an organization and

its vendors, and contains a subprocess of invoicing (Dunn, Cherrington, & Hollander,

2005). Processing invoices is a perfect setting to study the buffering of environmental
variety because of the conﬂicting pressures of institutional norms and rational
management against the ﬂexibility necessary to meet the needs of the vendor and internal
constituencies. For example, managers require a process that is controlled and
manageable, but variations in the requirements of the vendors and contracts may reduce
both consistency and the ability to direct action from above (Baird & Wéisberg, 1982).

I use data extracted from an invoice processing system at a construction company
in Norway to evaluate the relationships between inputs, sequential variety, outcomes, and
technology. I obtain a log of all the actions and their parameters that take place within
the ﬂow of work surrounding the invoice as it is scanned, entered in the system, and
approved. The event log is processed into a list of sequentially ordered actions,
associated with each invoice. These sequences form the basis of the analysis in this
dissertation, allowing the use of multiple complimentary methods to explore the research
questions.

Using a multi-method approach increases the amount of effort, but generates a
richer, more comprehensive picture of a phenomenon. The two methods I have chosen
have traditional uses in their respective areas (Abbott, 1990b, 1995; Sankoff & Kruskal,
1983), have been applied to workﬂow and business processes (Cook & Wolf, 1998; van
der Aalst, 2003; van der Aalst et al., 2003; van der Aalst & Weijters, 2004), yet remain
novel in application for the theoretic areas I am targeting. I ﬁrst view the process as a
Markov chain of probabilistically determined actions. Then, I use string distance and
multidimensional scaling to understand the sequence of actions within the process and to

address the research questions.

The Markov approach views each set of processes as a matrix of transition
probabilities between actions (Gottman & Roy, 1990). The sequential structure is
determined by evaluating the information that a given action provides about future
actions within the sequence (Anderson & Goodman, 1957). Sequences are stratiﬁed by
the variables relating to inputs, outcomes and the use of automational technology. Log-
linear contingency table tests (chi-square) assess the impact of these variables on the
temporal structure of the process and the impact of changes in the process on outcomes
(Anderson & Goodman, 1957; Bishop, Fienberg, & Holland, 1975; Gottman & Roy,
1990)

For the second approach, the sequences are analyzed by using string-distance
techniques (Abbott, 1990b; Pentland, 2003b; Sankoff & Kruskal, 1983) and then scaled
for interpretation (Kruskal & Wish, 1978). The sequence is regarded as a string of
symbols, and the distance between these strings is computed by counting the number of
steps it takes to convert the ﬁrst sequence into the second. The resulting distances are
then sealed using non-metric multidimensional scaling, visualized and then correlated
with variables of interest to explore their relationships (Kruskal & Wish, 1978). These
scaled distances are used to represent the process in a partial least squares analysis of the
relationships between inputs, the process, outcomes, and automation.

Results

Overall, I ﬁnd support for the input-process-outcome model as buffering the rest
of the organization from the variety found in the inputs. Sequences are driven by inputs,
and are loosely linked to outcomes. This means that the process is absorbing some of the

variance introduced by the inputs. The two methods I use complement each other in that

10

they explore the relationships between antecedents and consequences of the temporal
structures expressed in the performance of an organizational routine

The Markov analysis indicates that vendor experience and invoice amount drive
heterogeneity in the process. When the sequences are stratiﬁed by these variables, the
resulting transition matrices indicate that there is a lack of similarity between groups of
similar inputs. Using this method, the entry and approval phases of the invoicing routine
show differences in their relationship to outcomes. When stratiﬁed by the length of time
between scanning and full approval, entry sequences are homogenous, but the approval
phase sequences were different from each other. The Markov analysis indicated
automation as a strong driver of heterogeneity for both the entry and approval phases.
For example, none of the transition matrices from four groups of sequences stratiﬁed by
automation were similar with each other.

While the scaled string-distance approach resulted in dimensions of the process
that were difﬁcult to interpret, there was indication that the process was at least partially
driven by the input variables. There were marked differences in the patterns of
signiﬁcance and magnitude of coefﬁcients between the entry and approval phases of the
invoicing routine, and this was also seen in the partial least squares analysis on the full
model. The entry phase utilizes automation as a substitute for buffering through
contingent action, while the approval phase uses technology as an adjunct to process
buffering.

Outline of Chapters
The remainder of this dissertation is structured into the following chapters: In

chapter 3, I review the relevant literature and develop the theoretical model. I then

11

describe the design of this research and the data that I have collected in chapter 4. This is
followed by chapters detailing the results of each of the two methods I use : Markov
(chapter 5) and string-distance (chapter 6). In chapter 7 I discuss the implications and

limitations of the results, and describe future avenues of inquiry.

12

Chapter 3—Literature Review and Theory Development

Introduction

In this chapter I review the extant literature and describe and support my theory. I
begin with a brief orientation to the literature and motivation of my research questions.
Then, I discuss the speciﬁc relationships between technology use, environment, process,
and outcome, supporting these with examples from the literatures of organizational

theory and technology impact. I include a reinterpretation of a technology impact study

 

(Mukhopadhyay, Rajiv, & Srinivasan, 1997) to show how the proposed theory can be
applied to processes in other organizations such as the United States Postal Service.
Review of Literature

Modern organizations are challenged by conﬂicting pressures of ﬂexibility and
stability as they moderate the ﬂow of resources they use to add value to their outputs.
Business processes that exist on the boundary must perform much of this regulation, but
many strategies such as JIT inventory and mass customization preclude the use of classic
buffering techniques such as demand leveling and queuing. This leaves boundary
processes to contingently express different action plans or subroutines in an attempt to
reduce the variety present in organizational inputs. Viewing these business processes as
patterns of action, the perspective of organizational routines provides a structure to
explore the performative aspects of these processes as they respond to variant inputs and
moderate the variety to the rest of the organization.

While there have been several examples of empirical research into buffering
(Brown & Eisenhardt, 1997; Koberg, 1988), none focused speciﬁcally on sequential

variety (and performances of an organizational routine) as a measure of contingently

13

driven actions from external stimuli. We know that buffering is a well-accepted feature
of successful open systems, and that theory predicts the use of contingent actions as
responding to the environment, but there have been few empirical studies (Culnan (1992)
as an exemplar) that explicitly study how organizations respond to stimuli at the level of
the business process. This leads to research question 1:

Research Question 1: How does a business process at the organization—

environment boundary utilize various patterns of action to moderate the impact of

environmental variation on process outcomes?

Understanding answers to this research question begin with our perspective of the
organization. As models of the organization evolved towards open system perspectives,
theorists recognized a need to include features of the environment that affect the internal
operation and management of the organization (Scott & Davis, 2007). Scott and Davis
(2007) deﬁne the open systems view of organizations as those that are “capable of self-
maintenance on the basis of throughput of resources from the environment”.

“That a system is open means, not simply that it engages in interchanges with the

environment, but this interchange is an essential factor underlying the system’s

viability” (Buckley, 1967), quoted in Scott & Davis (2007)

The ﬁrst implication of the open system perspective is that there must be a
boundary between what we identify as the organization and what exists outside. Second,
this boundary acts as a buffer that protects the internal workings of the organization by
managing uncertainty and variant inputs from the environment. Third, principles of
cybernetic systems can be applied to an open system to give it life and allow the system
to learn and react to changes in the environment. Finally, locating the mechanisms of

buffering and applying principles of homeostasis at the boundary bring these features of

organizational theory down to the level of business processes.

14

The Organization as an Open System
Figure 2 shows a representation of the open systems view of an organization. At
a lower level of analysis, this same ﬁgure also can characterize a model of a single
business process, as the conversion of inputs into outputs by the interaction of technology
and human action (Melao & Pidd, 2000). This is a simpliﬁcation of the same worldview
model presented in the previous chapter.
/// \

.// Environment N/ BOUndaN
//1I‘” \ \I

\
I .__‘/_i\
i
i

I Inputs OutputsI

     
 

Figure 2: Melao and Pid (2000) show a general model of a business process that also
represents the open system view of the organization.

As the organization is opened to the wider environment, managers and workers
must deal with the set of events and attributes that is necessarily wider and more varied
than exists within the organization. At the same time, norms of rationality drive
managers to develop and utilize structures that work towards efﬁciency and effectiveness
in the conversion of inputs to outputs (Spender & Kessler, 1995; Thompson, 1967). This
tension complicates the manager’s ability to successﬁrlly reach expectations of

stakeholders in the face of changes in the environment. For example, large variations in

15

demand for technical customer service over the phone make it difﬁcult for managers to
achieve target hold times when new products are launched or during wide-spread
outages. It is this variation from the environment that creates uncertainty for managers.

Uncertainty creates a question for the organization to answer: How does the need
for optimization and control balance against the need for adaptation and ﬂexibility within
organizational processes? This question is key to understanding the structures of
ﬂexibility and stability in action, and the consequences of enabling and constraining
technologies. The conﬂict arises as uncertainty reduces the ability of the manager to
optimize, but the pressures of rationality and efﬁciency drive decisions and structure
towards those that reduce uncertainty and cognitive load (March & Simon, 1958;
Thompson, 1967; Weick, 1979). At the same time, continued performance in the face of
changing environments requires learning, change, and adaptation (March, 1991).
Boundaries and Buffering

The interaction between the organization and the surrounding environment
requires a boundary to organize and identify what is ‘inside’ and what is ‘outside’. As an
interface, this boundary is the location of interrelations to “other entities through
processes of resource (inputs) acquisition and product/service (output) disposal” (Yan &
Louis, 1999). The organizational interface or ‘skin’, serves not only as a demarcation or
identiﬁcation, but also allows only those inputs that are desired to cross (Simon, 1996), in
effect, protecting and buffering the organization from the uncertainty and full variety of
the environment. This protection from the environment is what allows homeostatic

systems to exist.

16

Following Thompson (1967), Lynn (2005) deﬁnes buffering as “the regulation
and/or insulation of organizational processes, functions, or individuals from the effects of
environmental uncertainty or scarcity”. Koberg (1988) directly studies how two types of
organizations buffer their technical aspects of production from environmental
uncertainty. Koberg focused only on some of Thompson’s types of buffering, relating
them to items developed Khandwalla (1974). These were the degree to which units
“maintained buffer stocks and reserve supplies of essential material” of spare parts or
educational materials (Koberg, 1988).

Koberg (1988) also found that in school settings, decentralization was
signiﬁcantly related to buffering, forecasting, and smoothing, indicating that these
techniques were taking place at a lower level than in oil companies, the other
organization type she studied. By moving the buffering techniques down to the work unit
that can best control uncertainty and requires the most uncertainty management, Koberg
suggests schools can succeed despite the lack of technical structure.

Ashby (1956, 1958, 1968) focuses on the cybernetic principles of homeostasis
when discussing the behavior of complex systems such as organizations or functional
areas within organizations. The law of requisite variety (Ashby, 1956, 1958, 1968) states
that the amount of environmental variety that can be dealt with by a system is directly
related to the amount of variety in its possible responses. This feature of homeostatic
systems applies at multiple levels including the organizational level and the business
process level, as an organization can be seen as a nested structure of these processes.

Thompson (1967, p 81) discusses how buffering techniques occur within

boundary spanning units but more recently, Yan and Louis (1999) describe how

17

buffering, spanning, and uncertainty management techniques have been pushed down to
the work-unit level due to business process reengineering programs, the advent of cross-
functional teams, and the introduction of advanced information technologies. To move
our understanding beyond warehousing and queuing as buffering mechanisms, we must
examine the variety of speciﬁc action sequences that a process employs to buffer the
organization. A helpful perspective to observe and analyze repeating organizational
actions can be found in the concept of the organizational routine. This is considered in
the next section.
Patterns of Action

Causal research into organizational outcomes from processes typically has been
focused the variance properties of inputs and outputs, treating the generating process as a
‘black box’ (Melao & Pidd, 2000, 2008; Pentland, 2003b). This happens because the
process is often seen as ﬁxed, as it is in many manufacturing contexts. In many cases,
this assumption of a ﬁxed process should be challenged, as it precludes learning,
adaptation or variability from study. If we reject the ‘black box’ perspective, we must
adopt a view of organizational processes that focuses on the actions that take place, rather
than solely their inputs or outcomes. The organizational routines literature provides a
perfect perspective for the investigation of organizational actions.

Becker (2004) notes that organizational routines have been characterized as
patterns of action. He continues by describing how several authors have deﬁned
organizational routine, concentrating on those that embrace a pattern focus. Feldman and

Pentland (2003) also discuss organizational routines as patterns of action, and like Winter

18

(1964) and Koestler (1967), highlight its changeable nature. Table 2 lists these

 

 

deﬁnitions.

Winter (1964) “Pattern of behavior that is followed repeatedly but is subject to
Quoted in (Becker, 2004) change if conditions change"

Koestler (1967) “Flexible patterns offering a variety of alternative choices”

Quoted in (Becker, 2004)

 

Feldman and Pentland (2003) “Repetitive, recognizable patterns of interdependent actions.
carried out by multiple actors, but they cannot be understood as

static, unchanging objects.”

 

 

Cohen et al. (1996) “A routine is an executable capability for repeated performance
in some context that been learned by an organization in

response to selective pressures"

 

 

Table 2: Deﬁnitions of organizational routines with ostensive and performative
aspects of the organizational routine

F eldman and Pentland further to develop the duality of organizational routines:
ostensive and performative aspects.
“Organizational routines consist of two aspects: the ostensive and the
performative. The ostensive aspect is the ideal or schematic form of a routine. It is
the abstract, generalized idea of the routine, or the routine in principle. The
performative aspect of the routine consists of speciﬁc actions, by speciﬁc people,
in speciﬁc places and times. It is the routine in practice. Both of these aspects are
necessary for an organizational routine to exist” (F eldman & Pentland, 2003).
The ostensive aspects are those understandings of an abstract nature that deﬁne
the identity of the routine, often including its purpose. The performative aspects of the
routine are what happens when the routine is ‘executed’. The performances are the
sequences of action, performed by various actors and locate them within time and place.

In this research, I am focusing exclusively on the performative aspects of the invoicing

routine. Even though the performative aspects of a routine are linked to the ostensive

19

 

aspects and hence can be identiﬁed or named with a singular (the invoicing routine),
there is an essential variation in the execution routine due to the agency of its participants
(Feldman & Pentland, 2003).

This essential variation has been applied to organizational routines (Pentland,
2003a) and was developed from the concept of sequence variety (Abbott, 199%, 1995;
Abbott & Tsay, 2000). Sequential variety is the property of a set of action sequence or
performances to exhibit differences in their selection and order of tasks. While this
concept holds promise to help scholars understand ﬂexibility in organizational routines
and processes, few studies of the antecedents and consequences of sequential variation
exist (Pentland, Haerem, & Hillison, 2007).

The tension between variation and stability in business processes is found within
organizational routines as well. To reduce coordination costs and allow for consistency
in outcomes, routines must be stable. At the same time, routines must allow for
contingent adaptation to immediate conditions and deal with participants and tools that
may vary in performance and ability. In this way, organizational routines face similar
tensions as business processes and organizations themselves in reliability of outcomes
and variability in action (Feldman, 2000; Feldman & Pentland, 2003; March, 1991;
Nelson & Winter, 1982; Pentland, 2003a; Weick, 1979, 1998). This ‘stability in action’
perspective represents an important connection between business process management
and organizational theory (Singh, Pentland, Yakura, & Hillison, 2009).

The Role of Technology
The impact of IT has often been studied from ﬁrm and industry levels, but

business-process level studies are less common (Wagner, Beimborn, Franke, & Weitzel,

20

2006). Focusing on a speciﬁc business process and technology allows for a more
efﬁcient study of the impact of information technology, as the impact of variance
introduced by exogenous or intervening factors is reduced. Kohli and Hoadley (2006)
also note the practical value of intermediate or process-level measurement, describing
how more detailed measurement allowed ﬁrms to better understand the consequences of
lT-driven business process reengineering projects. One of the reasons there are few
studies may be because of the complex interaction between technology and
organizational structure. This complexity is highlighted in the literature as technology is
seen as a simultaneous enactment and consequence of organizational structure
(Orlikowski, 1992).

Organizational processes become digitally-enabled through the use of
communication technologies such as the intemet, email, phone, fax, OCR, XML, or
coordination technologies such as workﬂow systems. By virtualizing the processes, these
technologies increase the ability of the organization to coordinate temporally and
spatially disparate actions within the business process, increasing the possible span and
scope of a digitally-enabled organizational routine, while also increasing the transparency
and managerial control that is possible over the routine (Overby, 2008). This tension
between enablement and constraint of organizational action is another recognition of
complexities to be found in the study of technology impact.

The balance of forces driving ﬂexibility, stability, enablement, and control of
organizational routines is complicated by the introduction of information technologies.
Organizational technologies incorporate implicit models of work that structure future

performance and can reify business rules (van der Aalst et al., 2003). Business process

21

reengineering perspectives typically focus on the technological imperative and the
enabling aspects of technological adoption and assimilation (Davenport & Short, 1990;
Hammer, 1990), while other perspectives focus on the constraints that technologies place
on organizational action (Benders, Batenburg, & van der Blonk, 2006; Gosain, 2004).
This divergence in the literature makes it difﬁcult to predict the outcomes of technology
use, without empirical study. A focus on expressed patterns of action allows the
integration of both perspectives in the study of technology use.

Instead of focusing solely on the antecedents and consequences of organizational
IT use, this study sees the patterns of action and use of the information system as an
intermediate consequence between assimilation and impact. Given the centrality of
process variety to theories of organizational routines, learning and buffering, and the
complicated and contextualized ﬁt-driven impact of information technologies, it becomes
vitally important for researchers to understand the micro-level consequences of IT use.
Locating this in a framework of input-process-outcome allows a much ﬁner grained
understanding of what IT use really is, and allows research to explore consequences and
antecedents of use at the same time. This leads to the second research question:

Research Question 2: What is the impact of information technology use on the
variety found within a digitally-enabled business process?

Theory and Model Development

In this section, I develop the general world view and theory of input-process-
output that guides my research. I begin with one of the archetypical process perspectives
given by Melao and Pid (2000), as shown in Figure 2. I then detail the support for my
propositions, continuing the literature review and discussing how the model should

operate in the context of workﬂow and invoice processing. 1 start with the model shown

22

in Figure 3, and move left to right, then top to bottom in my discussion. Beginning with
the boundary of the organization and its business processes, I describe how inputs drive
contingent actions and the impact of these contingent actions on sequential variety. I
then discuss how various levels of sequential variety drive outputs. I conclude with an
exploration of how the environment affects technology use, and the impact of technology

on process variety and outcomes.

 

 

 

 

 

 

 

 

 

 

 

 

............................ 0 ............‘Q
\
Input 1a + ‘ 1b +
Dissimilarity Sequential Variety Outcomes
2b M
23(. 1°\‘\
Technology
Use

 

 

 

Figure 3: Research Model—the proposed relationships between environmental
variation, sequential variety, and technology and outcomes from a business process.
Process Variety at the Boundary

Thompson (1967) lists several forms of buffering the organization from the
environment, such as queuing, warehousing, rationing, and demand leveling. While these
are appropriate for some manufacturing organizations, they are becoming less available
or appropriate for many organizations due to the rise of just-in-time supply chains,
custom manufacturing, lot sizes of one, and service provision. Lynn (2005) discusses the
ideal relationships between buffering, requisite actions, centralization, and uncertainty.
His conceptualization (p. 90) describes traditional forms of buffering as a way for

organizational systems to deal with variety in excess of the system’s ability to respond

23

through action. In the context of this dissertation, queuing is appropriate for the system
to deal with a large inﬂux of invoices at a given time, but demand leveling, rationing, and
forecasting seem to be less relevant. Along with Lynn’s (2005) decentralized manner of
dealing with uncertainty, Yan and Lewis (1999) note that many of the ﬁinctions
previously thought to occur at an organizational level have been pushed downward to the
level of the business process by advances in technology and changes in organizational
demographics and structure. For the workﬂow system under study, a perspective that

_ examines the contingent actions that are undertaken seems more appropriate as a
regulator of variety than Thompson’s (1967) traditional conception of buffering at the
organizational level.

March and Simon (1958, p. 45) recognized the inherent ﬂexibility in routines
through the contingent use of subroutines, noting that even routines such as those
performed on manufacturing assembly lines may have “the character of a strategy rather
than a ﬁxed program” (p.45). These subroutines or selected activities would be chosen
according to appropriate signals. Since the routines within a boundary unit would be
largely driven by stimuli from outside the organization, we should see variation in these
routines based on changes in these signals from the environment. The performances of
these routines, enacted in the boundary organizational unit, should achieve a variety equal
to those features in the environment important enough to warrant differences in execution
of the routine (Ashby, 1956, 1958, 1968). In this way the environment of a business
process or routine may introduce variation in the actions or their order within the process,
leading to the following proposition.

P1 a: Higher levels of environmental variation will be associated with higher
levels of sequential variation within a business process.

24

The Relationship between Process Variation and Outcomes

Theory that predicts the consequences of process variation also may indicate a
complex relationship with outcomes, given the diversity of ways variation is viewed in
literature (Pentland, 2003a, 2003b). It can be a harbinger of lower quality in
manufacturing contexts (Oakland, 1999) or an indicator of higher quality in service
provision (Leidner, 1993). The nature and results of this study gives some capability to
resolve these contradictions.

The processing of invoices may be more similar to services than to manufacturing
contexts, so variation in this process should be positively associated with outcomes
related to qualitative measures such as success, failure, or meeting a deadline. At the
same time, the institutional and legal norms such as accounting standards and internal
controls would tend to make extremely variant processes more costly (Dunn etal., 2005).
The balance between control and efﬁciency of a business process often determines its
performance, reliability, and cost.

There is a distinct difference here between predicted systemic and performance-
level effects. At the systemic level, more process variety indicates a greater ability for
the routine to deal with environmental variation. At the level of the individual
performance, I predict qualitative measures of outcome to be higher with more sequential
variety. These qualitative and systemic beneﬁts from increased sequential variety might
come at a quantitative expense of increased processing time or cost. This is an interesting
implication of differing directions of effect based on the level of analysis. This leads to a

second proposition:

25

P] b: Higher levels of sequential variety will be weakly associated with a longer
completion time when measured at the level of individual sequences.

Technology Enablement and Constraint

At the operational level of business process, technologies typically are given,
often beyond the discretion of the individual actor, and yet there must be enough
ﬂexibility to meet the needs of that particular business process or routine. An
information system has inputs that can be handled automatically because of their
formatting and qualities, but there may be inputs that fall outside of this speciﬁcation that
call for special handling as exceptions. The quality and characteristics of inputs can
impact an organization’s ability to use technology in a business process (Mukhopadhyay
et al., 1997). Automated mail sorting technology can read the addresses on a variety of
machine written and bar-coded mail, but has difﬁculty reading hand-written addresses
and mail damaged by water. These must be hand-sorted because of the joint
characteristics of the input and the technology.

Culnan (1992) found a similar situation with mail handling at the US. Senate.
Form letters were devised for speciﬁc issues and constituencies, and it took many letters
regarding new issues to generate a new response format. The Senate used several
information systems to generate the form letters and also for other correspondence. After
the incoming mail was categorized, a combination of information systems and
organizational work was completed to respond to or ignore the various mail inputs.

In Culnan’s (1992) case, inputs that were common and recognized were able to be
processed by existing form letters within the system. If the letters were outside of this
speciﬁcation, they were ignored, collected for future use, or handled as an exception.

The number of responses matched the different types of inputs that the joint features of

26

the organization and technology allowed within the correspondence function. Senators
were protected from too much mail by a system that categorized similar inputs,
responded appropriately, and learned to add new responses as needed.

One implication of the cybernetic view of systems and information theory also
can bring understanding to this issue. The law of requisite variety requires variety in the
responses equal to that found in the environment of the system (Ashby, 1956, 1958,
1968). The information content found within the set of available responses is a function
of how many responses there are and how many types of signals from the environment
are needed to determine the correct response. Automational technologies, such as those
found in workﬂow software, use these signals to ﬁre without decision-making by
humans.

To conserve human attention (Simon, 1973), the information system should
automate those tasks where the information needs for decision-making can be determined
in advance and match exactly the needed systemic response. This implies that those tasks
that should be automated are those that do not require additional information or decision-
making to direct the correct task. This set of actions must, by deﬁnition, be lower than is
possible in a human-automation hybrid system at a steady state, given that the amount of
variation in the environment and inputs is wider than can be predicted due to limits of
human prediction and cognition. This suggests another proposition:

P2a: Environmental variation decreases the use of technology for a given
business process.

As noted earlier in this chapter, the relationship of technology to sequential
variety is complex. Technology both enacts and is shaped by organizational structure

(Orlikowski, 1992). The classic tension between organizational ﬂexibility and stability is

27

 

complicated by the introduction of technologies that may fundamentally constrain and
enable organizational action. This is in addition to the impact this tension has on
organizational structure.

This complexity is also present in the literature, as different groups of scholars
highlight the equivocal nature of technology in how it impacts on organizational
processes. For example, Poole and Desanctis (1990) describe how an information system

can be subverted for purposes not in its spirit by users while Gosain (2004) highlights the

 

isomorphic pressures from the controlling aspects of an information system.

Business process reengineering perspectives typically focus on the technological
imperative (Orlikowski, 1995) and the enabling aspects of technological adoption and
assimilation (Davenport & Short, 1990; Hammer, 1990; Lee & Dale, 1998; O'Neill &
Sohal, 1999), while other perspectives focus on the constraints that technologies enact on
organizational action (Benders et al., 2006; Gosain, 2004). Typically, each perspective’s
core focus is on either enablement or control (Singh et al., 2009). This complicates our
ability to predict, and requires some additional context to understand how these effects
will impact this study. The design of features within a speciﬁc class of information
system can facilitate the development of a contextually relevant prediction. I discuss
these next.

Business processes that utilize workﬂow systems experience more structuring of
sequence and action because an implicit model of the process becomes the basis of work
design (Georgakopoulos et al., 1995; van der Aalst et al., 2003). While recent advances
have increased the ﬂexibility in the handling of exceptional cases and variable processes

(Carlsen, 1997; Narendra, 2004), not all workﬂow systems are designed this way. In

28

fact, managers may wish to specify the actual workﬂow pattern explicitly to exert control
over the process. Business rules may be implemented in the system by automating tasks
based on predetermined criteria. For example, an invoice may be automatically approved
if it is under a certain dollar amount, or invoices for speciﬁed vendors may be passed
directly to the required approver. These indicate that the use of technology would be
reﬂected in the patterns of action that are expressed, speciﬁcally as a reduction of process
variance. Thus, the following proposition:

P2b: Increased use of technology will be associated with lower sequential variety
for a given business process.

Brynjolfsson and Hitt (2000) implicitly recognize that it is the structures of use
that generate value, because achieving beneﬁt from information technology requires
complimentary organizational action. Soh and Markus (1995) outline a process model of
IT business value creation, synthesized from integrating several extant research models.
Three overall information technology processes are distilled: IT conversion, IT use, and
competition (p. 37). The use process focuses on the connection between IT assets and IT
impacts, contingent on how the IT is used appropriately in context.

From the perspective of the business process, Mooney et al. (1996) describe three
main dimensions of IT business value, focusing on automational, informational, and
transformation aspects of the technology in use. Workﬂow systems may be installed as
part of a business process reengineering transformation, but once in place the effects are
mainly from the automation and informational features of the technology. This research
focuses primarily on the impact of automational features of a technology.

Studies have linked information system use with organizationally important

outcomes on revenue and mortality in a healthcare setting (Devaraj & Kohli, 2003) when

29

studied at the level of the business process. Mukhopadhyay, Rajiv, and Srinivasan (1997)
also study the impact of IT use on a business process, ﬁnding that IT use increased the
throughput of mail, with appropriate inputs. All of these studies point to increased value
from IT use, contingent on context and appropriate inputs.

P2c: Technology use will be associated with lower cost, shorter completion times
for a given business process.

Application to Mail Handling

An empirical example of the theory I use can be found through a reinterpretation
of Mukhopadhyay et al. (1997). They examined the impact of a new mail reading and
sorting technology at a United States Post Ofﬁce. This technology increased the speed of
sorting and reduced errors in almost all cases except mail that was wet or poorly hand-
addressed. The technology was unable to machine-read the address on mail with these
properties, and they had to be sorted by hand.

As a system, a different response was developed for each of the types of variant
inputs, as predicted by the law of requisite variety. Extending the example given by
Mukhopadhyay et al., we can gain insight through the impact of later events on the postal
system in the United States. In 2001, someone mailed anthrax spores to many people and
some became infected and died. The impact on the mail system was immediate and
widespread, as people were afraid to open mail; some mail sorting services were halted
while decontamination was completed by people in biohazard suits. In response, the US
mail service implemented testing for anthrax in mail at all processing centers (Klatell,
2006). Again, in terms of inputs, the anthrax contaminated mail was variant beyond the
system’s ability to handle them, and a new set of actions was implemented to allow the

system to operate safely to detect and quarantine contaminated mail.

30

This scenario demonstrates the theory developed in this chapter. It is an example
of a system where the inputs drive the changes in process, resulting in improved
outcomes. In addition, the role of technology as one of the factors that both enables and
constrains actions is clear. In the next chapter, I describe the research design that will

validate a contextualized model derived from the theory seen in Figure 3.

31

Chapter 4—Methodology and Research Design

Introduction

This chapter outlines the collection and processing of data, connecting the
theoretic model in the previous chapter to the context of invoice processing and to the
methods I use to evaluate the research questions (Figure 4). I seek to understand how
signals from the environment (inputs) drive heterogeneity in processes, and how this
affects outcomes from the process. Also, I want to discover how automation affects the
heterogeneity of the process and the impact of information technology use.

I conclude this chapter with an overview of each analysis performed in
subsequent chapters. One important methodological contribution of this work is the
development and evaluation of different measures of sequential variety in organizational
processes. Table 3 lists these in the order of their increasing inclusion of sequential
information. These have been utilized in a stream of related work with similar data, but
from a larger set of four organizations (Pentland, Haerem, & Hillison, 2009a, 2009b,
2009c,2009d)

Research Design

The research design calls for comparing variation in several sets of variables:
those related to environmental factors, the sequence of actions in the routine, the amount
of automation, and measures of outcome. The environment must be suitably deﬁned and
measured; the routine must be situated in an organizational unit that acts as an interface
between the environment and the rest of the organization; and the outcome variables must
be measurable and linked to the speciﬁc behaviors found in the routine. To infer

causation, temporal precedence must be established, some form of mathematic

32

relationship must be found, and this must be supported by a plausible story explaining the
relationships between the variables (Bollen, 1990; J. Cohen, Cohen, West, & Aiken,
2003; Kenny, 1979)

To explore the relationships between environment, process, outcomes, and
technology, I begin by using conditional probabilities of the events in the sequences,
treating the organizational routine as a Markov process that varies in response to
contextual variables (Gottman & Roy, 1990). This views the variety in the process as the
set of possible states and the transitions between these states (Ashby, 1976). For the
second analysis, I use optimal string matching techniques to calculate a distance of each
sequence from each of the others to create a distance matrix (Sankoff & Kruskal, 1983),
measuring the amount of sequential variety by analyzing each sequence of actions in
relation to the others (Pentland, 2003a, 2003b). Variety in this analysis seems closer to a
measure of dispersion around a set of modal sequences, and may be thought of as the
‘standard deviation’ of a set of sequences in some ways. As noted above, Table 3 shows

these methods and others in terms of increasing amounts of sequential information used.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0 Length of
---------------------------------------------------- Time to
........... 0 "”':-~....Complete
g " “
Dissimilarity ta + Heterogeneity of Process 1b (+2 I
0f Inputs Sequential Variety Outcomes
/
/ 2 H
/ 2,, r. 1° 0
Invoice Amount Amount of
V d
vzgdg: Count Automation 0 Actions undertaken by system/
Experience total actions in the sequence

Figure 4: Analysis model, showing contextualized variables and relationships

33

 

 

 

 

 

 

 

 

 

 

 

 

Strengths Weaknesses Citations
Lexicon Size Easy to obtain Very coarse measure (Pentland,
of variety 2003a, 2003b)
Lexicon Can discriminate No sequence (Pentland,
Distribution between processes information used 2003a, 2003b)
Entropy Uses relative No sequence (Ashby, 1956,
probability of information used 1958, 1968;
execution Shannon &
<5; Weaver, 1949)
a First Order Well established, Omits information (Ashby, 1976;
3, Markov used in variety of about longer Gottman & Roy,
,3 ﬁelds sequences 1990; Pentland,
8 2003a, 2003b)
.3 Higher Order Uses additional Math gets (Gottman &
2 Markov information about difficult...harderto Roy, 1990)
E longer sequences prove better ﬁt than
go. lower order
g String Well established in May be less (Abbott, 1990b;
3 Distance a variety of fields appropriate for study Abbott &
g of organizational Hrycak, 1990;
e Uses the entire processes Abbott & Tsay,
sequence of 2000; Pentland,
actions Conceptually, what 2003a, 2003b;
do insertions, Sankoff &
deletions and Kruskal, 1983)
substitutions mean
for organizational
processes?

 

Table 3: Methods for measuring sequential variety

Research Site and Data Collection

The invoice processing routine lies at the interface between vendors and an

organization, making it a perfect opportunity to evaluate the antecedents and

consequences of process variance. Despite institutional and legal norms regarding the

form and general process that must be followed, there is a large amount of variability in

how organizations can make and document the decisions regarding payment of the

invoice (Dunn et al.. 2005).

 

This research uses data collected from an invoice processing workﬂow system in

use at a construction company in Norway. Invoices most commonly enter the system on

34

paper or less often via an electronic portal. If the invoice is on paper, it is scanned and
optical character recognition is performed for initial data entry, while some information
must be entered manually. Invoices can be immediately sent to the ﬁnancial system for
payment; others require multiple approvals, thus the number of approvals can be larger
than the number of invoices. Once the approvals are complete, the invoice is paid. An
overview of the invoice process as conceived by designers of the workﬂow system is
shown in Figure 5 (Compello Software, 2007). The data obtained from the invoice
processing system is not immediately ready for analysis, as a signiﬁcant amount of

extraction, conversion and transformation is needed to obtain useful data.

 

 

      
 

 

 

 

 

. . ,..-.-..._,. - -.----.-. N0 3..-- ..-”... is“. “7
Scanning OCR OCR Form def.
Yes
e-faktura
-.~-- .... _ Ready? __________ Financial
' " system
Electronic Registration/Distribution 4
l :
. . . --—' ~- :
V “...... :
.. i '
.' “"‘ ' ” i '
Authorization Fully posted

Figure 5: Flowchart of the invoice processing routine
Workﬂow mining techniques are utilized on the action logs that software provides
(Agrawal, Gunopulos, & Leymann, 1998; Agrawal & Srikant, 1995; van der Aalst et al.,

2003; van der Aalst & Weijters, 2004; van der Aalst, Weijters, & Maruster, 2004). The
35

resulting data provide the input variables, sequential information, amount of automation,
and outcome variables for both the string-distance and Markov analyses. To obtain the
sequences, the event log was parsed, and each entry and approval sequence was extracted
and linked to the related invoice. An example event log is shown in Table 4, and

processed sequences are shown in Table 5.

36

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Invoice # Phase Action Code
202132 Entry Enter invoice no. 8
202132 Entry Enter invoice date 7
202132 Entry Enter due date 6
202132 Entry Enter amount 3
202132 Entry Enter currency 4
202132 Entry Enter document type 5
202132 Entry Enter vendor account 20
202132 Entry Enter period 10
202132 Entry Enter text 12
2021 32 Entry Approve 1
202132 Approval Enter period 10
202132 Approval Enter currency 4
202132 Approval Enter amount 3
2021 32 Approval Enter text 12
202132 Approval Distribute to approver 23
202132 Approval Distribute to approver 23
2021 32 Approval Notify 25
202132 Approval Enter account 2
202132 Approval Enter Tax-code 11
202132 Approval Enter value dim. 1 (DP) 13
2021 32 Approval Approve 1
2021 32 Approval Approve 1

 

 

Table 4: Example event log from workﬂow system

 

 

 

 

 

 

 

 

 

 

Identiﬁer Phase Action Sequence
112568 Entry 8, 7, 6, 3, 4, 5, 20, 10, 12, 1, 9
112569 Entry 8, 7, 6, 3, 4, 5, 20, 10, 12, 1, 9
112573 Entry 8, 7, 6, 3, 4, 5, 20, 10, 12, 1
112568 Approval 10, 4, 3, 12. 23, 23, 25, 2, 11, 28, 16, 1, 13, 23, 27, 25, 15, 1
112572 Approval 10, 4, 13, 3, 12, 14, 23, 23,25, 2, 11, 23, 16, 25, 1, 1
112573 Approval 10,4, 3, 12, 13, 14, 23, 23,23, 23, 25, 2, 11, 15,25, 1, 1,23, 16, 1, 1

 

Table 5: Sequences extracted from the workﬂow event log

37

 

I have data from the system’s ﬁrst installation in 2001 through July 2007,
resulting in 58,000 invoice sequences that are available for analysis. A random sample of
2000 invoices from June 2005 through May 2006 was selected for further analysis in this
dissertation. I chose this time period to for two reasons: an attempt to avoid start—up
learning effects and to center the data on year end (December 31). Because the data
source was a construction company in Northern Europe, I expected strong seasonal
effects and wanted to minimize and control their impact while still allowing a large
sample. Centering the sample on year end allowed both the split-half and four-group
tests in the Markov analysis to help isolate seasonal effects from the systemic variation I
was seeking. The number of invoices was set at 2000 because this was close to the
technical limitations of string-distance analysis.

Evaluation of the theoretical model presented in chapter 3 requires that the
concepts and relationships must be contextualized and operationalized, as presented in
Figure 4. Table 6 deﬁnes and outlines the variables for environment, technology use, and
outcomes speciﬁcally for the invoice process and research site. I discuss their
operationalization (Table 7) in the next section. Since I am interested in how ‘different’
each invoice process is from the others, scaling, clustering, or stratiﬁcation techniques are
used on the set of environmental and outcome variables. Because I am utilizing multiple
analysis methods, I include heterogeneity of the process (Markov approach) and

sequential variety (string distance approach) as measures of the process itself.

38

 

Variable Type

Definition

 

Environment

How many times has this particular vendor provided an
invoice for payment? (total and incremental)

 

Invoice amount

 

Number of automated actions that were undertaken during

 

 

 

 

Technology Use this process

Markov method
Process

Scaled string distance

Length of time spent processing this invoice
Outcomes

 

 

Number of people that ‘touched' the invoice

 

Table 6: Variables contextualized for the invoicing business process

Operationalized Variables

The relationship between characteristics of the invoice and characteristics of the
sequences was complex. For example, there was always one entry sequence per invoice,
but there could be many approval sequences. The variables that related to the invoice,
such as amount and length of time thus could be linked to sequences that also occurred on

other invoices. There were many invoices generating the same sequences, and also some

sequences appeared on multiple invoices. Table 7 shows how I operationalized the

variables, and where each contextualized variable shown in Table 6 ﬁts into the analysis

model in Figure 4.

39

 

 

Construct

Variable

Variable Name

 

Input

Invoice Amount—the log of the invoice
amount (per invoice)

LogAngmt

 

Vendor Count—how many times in total
did the vendor have invoices in the entire
data set (per invoice)

TotalVendorCount

 

Vendor Experience—how many times a
particular vendor had invoices prior to the
current invoice (per invoice)

VendorExperince

 

Automation

Number of actions undertaken by the
system divided by the total number of
actions within a sequence (per sequence)

AutoPCT

 

Outcome

 

 

Length of time between invoice scanning
and the completion of the workﬂow (per
invoice)

 

SC_to_AA

 

Table 7: Variables operationalized and linked to the analysis model

Invoices sometimes had several amounts that were entered on the system,

apparently to update the record as more information was obtained. I used the average of
these on each invoice to compute the variable “Invoice Amount”. This variable showed a

huge range (20 Norwegian kroner (NOK) to 1.5 million NOK), and was skewed left, so

the log 10 was taken to obtain a more normal distribution (LogAngmount).

I used two methods to explore the concept of vendor experience. First, I

examined the total number of times that particular vendor appears in all of the data I have
(1/1/2002 through 5/31/2006) for each invoice (TotalVendorCount). I also calculated the
number of times a particular vendor had appeared on a given invoice before the current

invoice (VendorExperience). This second method increases the count by 1 every time an

invoice from that vendor occurs. The ﬁrst invoice from a vendor would have an

experience level of I, the second 2 and so on. These are two similar yet different ways to

40

 

measure the amount of experience the organization had processing invoices with a
particular vendor, and they do correlate highly (R2 = .987, p=0).

Automation was measured at the sequence level, as it was a property of the
sequence rather than the invoice. I calculated this variable by dividing the number of
automated actions by the total number of actions that comprised the sequence
(AutoPCT). This was a relative measure, and thus the percentage of automation can be
compared across sequences of different lengths.

Since every invoice in the sample was paid, there was no variance to explain by
using this as a measure of outcome. Also, there is an issue here, because ‘good’
performance of the invoicing process would only be known later, if a correct invoice was
not paid or a fraudulent invoice is paid. I chose to use the length of time it took for the
organization to approve the invoice as a proxy for how much organizational effort was
expended. The actual time used was the number of days it took from when the invoice

was scanned to when it was marked ‘all approved’ or ‘fully posted’ as noted on Figure 5.

 

 

 

 

 

 

N Min Max Mean Std. Deviation
Angmount 1990 20.66 1529886 20073.1 74779.704
LogAngmount 1990 1.31 6.18 3.6143 .73301
TotalVendorCount 1991 1 2392 516.57 658.595
VendorExperience 2000 0 21 17 416.56 552.989
SC_to_AA 1867 0 233 6.17 8.234

 

 

 

 

 

 

 

 

Table 8: Descriptive information for invoice variables

Table 8 details some descriptive information for variables that relate to the
invoices, while Table 9 shows the information that is based on sequence such as the
automation percentage. There were 2000 invoices in the sample that generated 2000

entry sequences and 2852 approval sequences. The majority of the missing N comes

41

from the outcome variable for the length of time for processing. In many of these cases
the ﬁnal date for all-approved was not present, yet the invoice was marked paid. I am
assuming there was a problem converting or extracting the dates during the workﬂow

mining phase of the project that lead to this issue.

 

 

 

 

 

 

 

 

 

 

N Min Max Mean Std. Deviation
Entry 2000 0 1 .00 .4524 .23241
Approval 2852 0 0.78 .1493 . 14698

 

Table 9: Descriptive information automation percentage
Markov Transition Matrix Method

My ﬁrst analysis views each sequence as a series of transitions between symbols,
based on knowledge of previous symbols in the sequence. In this case, the symbols
represent actions that take place within the sequence. 1 include an overview here but full
details and results can be found in chapter 5.

Gottman and Roy (1990) describe a technique using Markov transition matrices to
test the effects of contextual variables on the structure of sequences. First, the
appropriate model order will be determined by examining the inclusion of additional
actions in the sequence, using a nested calculation for entropy. A moving window of
varying sizes is passed through the sequence to calculate the amount of information that
is gained from knowledge of the differently sized subsequences. A likelihood ratio chi-
square test is then performed comparing the hypothesis that the order is r against the null
hypothesis that the order is r-l (Anderson & Goodman, 1957; Gottman & Roy, 1990, p.
62). This tells informs us about the number of subsequences appropriate for further

analysis.

42

Once the order of the Markov process has been determined, the stability of the
conditional probabilities between sets of events is measured over time. An omnibus test
for stationarity is performed by splitting the sample in half, and a likelihood ratio chi-
square test is performed to test for equality of mean probabilities of transition for each
sample. Other tests can be performed by splitting the sample into relevant time periods,
chosen arbitrarily or as suggested by theory. Trend or cyclic effects can also be tested in
this way.

After an examination of temporal stability of the transition probabilities, I
investigate the impact of context variables. This is done by splitting the sequences into
sets based on these context variables, and testing for homogeneity on the resulting
matrices. I split the sample into four groups of similar sizes, stratiﬁed by each variable,
and test the four groups against each other for homogeneity. The results are detailed in
chapter 5.

String Distance Approach

The second analysis I perform uses relative distances between the sequences to
obtain a set of metric locations for each sequence in a scaled space. I detail the procedure
and the results of the analysis in chapter 6, but I provide a brief overview here.

First, I obtain a string distance (see Appendix 1) for every sequence as it
compares to every other sequence in the sample. I chose this distance measure because it
has been widely utilized in a variety of ﬁelds, and has been used by researchers in the
information technology (Sabherwal & Robey, 1993) and organizational routines
literatures (Pentland, 2003a, 2003b). String distance also known as Levenshtein distance,

has been utilized in sociology and other ﬁelds (Abbott, 1990b, 1995; Abbott & Hrycak,

43

1990; Abbott & Tsay, 2000; Dijkstra, 2001; Dijkstra & Taris, 1995; Sankoff & Kruskal,
1983; van Driel & Oosterveld, 2001) to good result.

The resulting matrix lists each sequence on the column and row headings and the
distance between each sequence in the cells. These can be summed across, and a
measure of distance can be calculated for each sequence that indicates how different a
given sequence is from all of the others (Abbott & Hrycak, 1990; Sabherwal & Robey,
1993). On the other hand, the entire matrix of relational distances can be used as the
input to a multidimensional scaling algorithm. Given the ordinal nature of the data, non-
metric scaling is most appropriate. This technique extracts the underlying metric
relational structure between the sequences within the matrix, based on the structure of
their ordinal relationships (Kruskal & Wish, 1978). The extracted dimensions are then
examined and explored using multiple correlation and visualization techniques for

interpretation. For details and results, see chapter 6.

44

Chapter 5—Markov Analysis and Findings

Introduction

This chapter describes the ﬁrst analysis I used to evaluate the relationships
between inputs, the process, and outcomes, as well as the impact of technology. I ﬁnd
support for the majority of the model, concluding that vendor experience and invoice
amount drive heterogeneity in the process. The entry and approval phases have different
relationships with outcomes. Entry processes are homogenous with respect to this
variable, but the approval-phase sequences were different when stratiﬁed along elapsed
time. For both phases, automation emerged as a strong driver of heterogeneity.

After performing the workﬂow mining discussed in the previous chapter, I
convert the sequences into matrices representing the counts of transitions between
temporally adjacent actions. An example of this is presented in Error! Reference
source not found., Table 32. These matrices are then analyzed by a class of discrete
statistics based on the expected and observed probabilities of these transitions,
mathematically similar to the analysis of contingency tables (Anderson, 1957; Chatﬁeld
1973, Bishop, Feinberg, and Holland, 1975; Gottman and Roy, 1990).

Determining the Order of the Processes

Following Gottman and Roy (1990), the order of the Markov sequence must be
determined at the outset of the analysis. This helps the researcher understand the
temporal structure that is present in the set of sequences based on the conditional
probabilities of prior steps. According to the likelihood ratio tests suggested by Chatﬁeld
(1973), both phases have a digram and tri gram structure (Error! Reference source not

found., Table 33 and Table 34). This means that the information given by the previous

45

symbol helps predict the current symbol for a digram structure, and the process is of ﬁrst
order. A trigram structure includes information given by the previous two symbols to
predict the current symbol and that the process would also be of second order. This does
not mean that two or three symbols explain the actual temporal structure, but given the
data, those structures best predict the transitions between actions.
Higher-order (second order, third order, etc) Markov transition matrices are often .,;
‘sparse’, meaning that much of the matrix is 0 or of low observation. This can cause

problems for testing the stationarity of the sequence (Gottman and Roy, 1990; Capella
1980). I also examine the scree plots of [:1 , (Error! Reference source not found.,

Figure 15 and Figure 16), and determine that the additional information from the trigram

structure may not be as great as indicated by the likelihood ratio tests (Chatﬁeld 1973, p.
16-17). H, is the amount of information given by the sequence, based on a moving

window of size i. The scree plot shows that there is not much beneﬁt from moving
beyond a trigram structure, but the digram structure still shows an improvement over no
temporal information. For mathematic efﬁciency, I chose to view the entry and approval
phase sequences as having digram structure and have ﬁrst-order Markov properties for
the remainder of the analysis.
Examining Stationarity of the Process

1 test whether the processes are stable over time, to determine if the transition
probabilities change between time periods. The statistical test for this gives a binary
result: either the processes are similar (H0) or they are signiﬁcantly different (Ha).
Gottman and Roy (1990) caution readers not to view this omnibus test as an evaluation of

the validity of future tests.

46

I conclude that each set of sequences is not stable over time, according to the
omnibus test of stationarity, (Entry Phase p(LR = 2664.487, df =255) = 0, Approval
Phase p(LR = 5726.618, df =399) = 0; see Error! Reference source not found., Table
36) Given that the research site is a construction company, there may be a seasonal effect
on the processes. It also may be that some of the variables of interest in this study are
also moving with time (such as vendor experience), increasing the heterogeneity of the
processes along this dimension.

This represents an interesting implication for organizational theory.
Organizational routines are widely believed to be rigid and unchanging (M. D. Cohen,
2007), and singular in response to stimuli once search has been eliminated from the
process (March & Simon, 195 8). A competing perspective highlights organizational
routines as a generative structure (Howard-Grenville, 2005; Pentland & Rueter, 1994) -
one that generates performances that vary in response to learning (March, 1991; Nelson
& Winter, 1982), agency (Feldman, 2000), and is consistent with continuous
organizational change (Sorenson, 2003). In a related working paper, Pentland et al
(2009b) found this heterogeneity over time in three of four organizations from similar
data sets.

I also perform a follow-up test and split the sequences into 4 equal groups to see if
I could ﬁnd some homogeneity over time at a smaller interval than 6 months, and to
explore the source of overall difference (Error! Reference source not found., Table 37).
For the entry phase, periods 1 and 2 were similar and 3 and 4 were similar, consistent
with the omnibus test above, and indicating that there was a natural split at the half-way

point of the sample, representing year end. The approval phase showed similarity

47

between the ﬁrst two periods, and these were distinct from the remaining periods.
Periods 3 and 4 were not alike, and distinctly different from 2 and 3. This supports
seasonality as a possible explanation for heterogeneity over time in the approval phase.

This ﬁnding also connects to the debate between the stable/changing perspectives
of organizational routines. As I consider further the groups of process executions (the
performative aspects of the organizational routines), I ﬁnd that there is a natural variation
that is present. Also, one would expect that these variations might ‘average’ themselves
out over time, but this is evidence that organizational routines may change over time in a
drifting, endogenously changing fashion, rather than an externally driven selection and
retention manner. These ideas are developed further in a longitudinal analysis of the
performances at four sites in a working paper currently in development (Pentland et al.,
2009b)
Group Comparisons

Viewing the processes as sequence with Markov properties allows the comparison
of subgroups within the total set of sequences. In addition to time (stationarity), I also
evaluate the overall model of input-process-output. Gottman and Roy (1990) suggest
segmenting the sequences by variables of interest and deriving the Markov transition
matrices of the appropriate order for each segment. These matrices can then be tested
with the usual chi-square or likelihood ratio statistics used for contingency tables.

The group comparisons performed here are ones of similarity or differentiation:
are groups of processes stratiﬁed by a given variable alike or different? Table 10 shows
the speciﬁc hypotheses that are tested to evaluate the research model, and describes the

connections between constructs and variables. All the variables of interest were binned

48

or stratiﬁed, with a ﬁxed percentage of cases in each group (25% for four groups). For
example, the log of the invoice amounts was calculated for each invoice, and they were
ordered smallest to largest. The bottom 25% of the invoices were selected and assigned
to group 1. For the entry phase, this means approximately 500 sequences are in each
group. The approval phase has more than 500 because there can be many approvals per
invoice. The approval phase also did not have exactly the same number of sequences in
each group because every invoice did not have the same number of approvals.

The variables measuring input (invoice amount, vendor) and outcome (number of
days for the process) are linked to the invoice, while the measure for automation is linked
speciﬁcally to the sequence. In all cases, the entry and approval sequences are tested
separately, but the bins or strata for the variables derived from the invoice are linked to
all of the sequences for which that invoice generates. Table 10 on the next page
describes these variables, noting which research question and speciﬁc hypothesis will be

tested in this chapter.

49

 

Construct

Variable

Specific Hypotheses

 

Invoice Amount—the log of the
invoice amount (per invoice)

H0: Invoices for large amounts will generate
similar sequences as those for small amounts

Ha: Invoices for large amounts will generate
different sequences as those for small
amounts

 

Vendor Count—how many
times in total did the vendor
have invoices in the entire data
set (per invoice)

H0: Invoices from common vendors will
generate similar sequences as those from
uncommon vendors

 

 

 

 

Input . . .
Ha. lnvorces from common vendors wrll
generate different sequences from uncommon
vendors
Vendor Experience—how many H0: Invoices from common vendors will
times a particular vendor had generate similar sequences as those from
invoices prior to the current uncommon vendors
invoice (per invoice)
Ha: Invoices from common vendors will
generate different sequences from uncommon
vendors
Number of actions undertaken H0: Highly automated sequences will be
by the system divided by the similar to those that are less automated
Automation total number of actions within a
sequence (per sequence) Ha: Highly automated sequences will be
different from those that are less automated
Length of time between invoice H0: Invoices that take longer to complete will
scanning and the completion of generate similar sequences to those that take
the workflow (per invoice) less time
Outcome

 

 

Ha: Invoices that take longer to complete will
generate different sequences to those that
take less time

 

Table 10: Constructs and variables used to segment sequences

50

 

Does the Process Vary with Differential Inputs?

Some organizations have specialized plans in place for different vendors, while
others may set up contingent plans to respond to the levels of currency amounts for a
particular invoice. In addition, the organization may have routinized responses to
vendors it deals most with, so the amount of experience with a particular vendor may

affect the process as well.

 

 

 

 

 

 

 

 

 

Avg Min Max

Phase Group N Log Log Log
Amt Amt Amt
1 498 2.704 1.315 3.080
Entry 2 497 3.364 3.083 3.619
3 498 3.852 3.621 4.083
4 497 4.558 4.084 6.185
1 678 2.717 1.315 3.080
2 648 3.372 3.083 3.619

Approval

3 657 3.857 3.621 4.083
4 869 4.591 4.084 6.185

 

 

 

 

 

 

 

 

Table 1]: Descriptive information for groups of invoice amount, stratified on the
log of the amount

First, I examine the impact of invoice total amount on the process (Table 12).
Given the large range of currency amounts (20NOK to 1,529,886NOK) and the skewness
of its distribution, the logic was taken of each amount, and this was used to stratify the
invoice (Table 11). Table 12 also shows the actual amounts that correspond to the
different groups. The likelihood ratio tests indicated that there was heterogeneity in the

four groups for both the entry phase and approval phases (Table 13).

51

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Phase Group Avﬂmt Min Amt Max Amt
1 605.95 20.67 1,201.00
2 2,460.49 1,211.07 4,162.67
Entry
3 7,451.68 4,176.00 12,095.47
4 69,838.89 12,128.75 1,529,886.00
1 613.97 20.67 1,201 .00
2 2,508.92 1,211.07 4,162.67
Approval
3 7,498.74 4,176.00 12,095.47
4 73,743.40 12,128.75 1,529,886.00

 

Table 12: Descriptive information for groups of invoice amount, stratiﬁed on the

log of the amount

For the entry phase, group 1’s (20NOK through 1200NOK) processes were
different from the remainder. Also, there were no discemable differences in the entry
processes that handled the larger invoice amounts (1,200kr thru 1,500,000kr). The
approval phase had a similar pattern, but a more complex result. The group with the
smallest invoices was different from the rest, but for groups 2, 3 and 4, there were
similarities but no transitivity. The conclusion is that the smallest invoices had patterns
of action distinct from the remaining groups, and there are some similarities among the

groups of sequences of the higher invoice amounts for both the entry and approval

phases.

52

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Phase Test LR Df p-value
Overall 1323.34 765 0.

1 v 2 668.39 255 0.

1 v 3 698.25 255 0.

Entry 1 v 4 859.75 255 0.
2 v 3 95.47 255 =1.

2 v 4 112.69 255 =1.

3V4 125.16 255 =1.

Overall 2015.69 1197 0.

1 v 2 543.05 399 0.

1 v 3 844.06 399 0.

Approval 1 v 4 1196.79 399 0.
2 v 3 233.26 399 =1.

2 v 4 468.96 399 0.009

3 v 4 395.2 399 0.544

 

Table 13: Group comparisons for invoice amount

I perform several procedures calculating and stratifying the sequences to explore
the concept of vendor experience. First, I examine the total number of times that
particular vendor appears in all of the data (1/1/2002 through 5/31/2006) for each invoice
(Table 14). I also calculate the number of times a particular vendor had appeared on a
given invoice before the current invoice (Table 15). This second method increases the
count by 1 every time an invoice from that vendor occurs. The ﬁrst invoice from a
vendor would have an experience level of 1, the second 2 and so on. These are two

similar ways to measure the amount of experience the organization had processing

invoices with a particular vendor, and they correlate highly (R2 =.987).

53

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Avg of Total
Vendor
Phase Group N Count Min Max
1 515 26.074 1 66
Entry 2 485 142.054 68 247
3 533 468.407 250 763
4 458 1.520.766 772 2392
1 656 27.407 1 66
2 788 141.201 68 247
Approval
3 667 456.772 250 763
4 741 1,682.767 772 2392

 

 

Table 14: Groups and descriptives for total vendor count

 

 

 

 

 

 

 

 

 

 

 

Avg of
Vendor
Phase Grogp N Experience Min Max
1 502 15.787 0 42
Entry 2 498 100.325 43 187
3 502 330.711 188 546
4 498 1,223.329 547 2117
1 612 16.333 0 42
2 780 97.759 43 187
Approval
3 683 323.316 188 546
4 777 1.381.834 547 2117

 

 

 

 

 

 

Table 15: Groups and descriptives for vendor experience

Taken together, these two related measures paint a similar overall picture that is
different in only a few details (Table 16). The overall tests show that the four groups are
heterogeneous in their processes, and for the most part are consistent across subgroup
tests. There is some indication that the membership of groups 2 and 3 were different
between the two variables for the approval phase. Together, these results suggest that the
experience an organization has with a vendor does impact the process for those invoices.
The organization has similar processes for invoices from uncommon vendors, and
different processes for common vendors, a ﬁnding that is consistent with the law of

requisite variety.

54

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Total # of Invoices Experience with Vendor

Phase Test LR df p-value LR df p-value
Overall 2215.45 765 0. 1878.10 765 0.

1 v 2 212-21 255 0.976 259.65 255 0.407

1 v 3 1083.81 255 0_ 332.15 255 0_

Entry 1 v 4 1119.26 255 0. 1095.83 255 o_
2 v 3 746.51 255 0, 425.17 255 o_

2 v 4 795.5 255 0_ 682.57 255 o.

3 v 4 378.92 255 0. 361.03 255 0_

Overall 3313.87 1197 0_ 3182.70 1197 0.

1 v 2 383-96 399 0.697 437.78 399 0.088

1 v 3 486.38 399 0002 50455 399 0.

Approval 1 v 4 1481.98 399 0, 147537 399 o,
2 v 3 485-69 399 0.002 387.19 399 0.655

2 v 4 1642.83 399 0_ 1517.57 399 o.

3 v 4 1382.16 399 0, 1266.63 399 o,

 

 

 

 

 

 

Table 16: Group comparisons for total number of invoices for a vendor and the
vendor experience, for each invoice.

This analysis does not allow one to discern what exactly the differences between
sets of process executions are. The use of order statistics, introduced in the discussion
section, would allow the extraction of a ‘primal’ pattern of each set of invoices, and these
differences between common vendor patterns and singular vendor patterns could be
enumerated. This would be an interesting extension of this dissertation.

Are Processes that Have Different Outcomes Similar?

The next step was to determine whether processes that had similar outcomes were
generated by similar processes. I calculated the time that a particular invoice spent in
process within the workﬂow—from the time it was scanned to when it was ﬂagged ‘all-
approved’. Some of the invoices had missing data for either the start or end date, and the
length of time for the process was incalculable. Table 17 shows these and the

characteristics of the groups stratiﬁed by process time.

55

 

 

 

 

 

 

 

 

 

 

 

Avg of
Days of
Phase Group N Process Min Max
Missing 133
1 576 1.286 0 2
Entry 2 487 4.061 3 5
3 419 6.899 6
4 385 15.366 9 233
Missing 326
1 779 1.287 0 2
Approval 2 654 4.013 3 5
3 591 6.90 6 8
4 502 15.09 9 233

 

 

 

 

 

 

 

 

Table 17: Descriptives for the length (in days) of the process

There was a marked difference in how the processes related to outcome between
the entry and approval phases (Table 18). When the entry phase processes were stratiﬁed
according to the length of time the invoice spent in the workﬂow, they were
homogeneous overall. The approval phase processes were different from each other,
even with the same categorization scheme as the entry phase processes. This may be
because much of the work that takes place during the time elapsed between scanning and
approval takes place during the approval phase. The entry phase would almost never take

more than one day on its own.

56

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Phase Test LR df p-value
Overall 686.71 765 0.98

1 v 2 168.19 255 1.

1 v 3 188.66 255 0.999

Entry 1 v 4 371.72 255 0.
2 v 3 134.92 255 1.

2 v 4 290.21 255 0.064

3 v 4 181.91 255 1.

Overall 2149.54 1197 0.

1 v 2 653.23 399 0.

1V3 1100.11 399 0.

Approval 1 v 4 1386.22 399 0.
2 v 3 264.61 399 1.

2 v 4 476.79 399 0.004

3 v 4 320.54 399 0.999

 

 

 

 

 

 

 

Table 18: Comparing processes with similar outcomes

The subgroup tests suggest a much more complicated story. Even though the
sequences were similar overall, groups 1 and 4 were different enough in the entry phase
to become signiﬁcant. The approval phase subtests shows a very interesting result, where
periods 2 and 3 were similar along with 3 and 4, yet period 2 was different from period 4.
I believe that transitivity may not apply to the tests of homogeneity for these processes, or
that 2 is ‘enough’ like 3 and 3 is ‘enough’ like 4 to pass the test, yet 2 and 4 are different
‘enough’ in transition probabilities to be signiﬁcantly different when stratiﬁed by the
elapsed time of the process.

This brings to mind the paradox of the ‘ship of Theseus’, and is related to similar
discussions in philosophy of change and identity. In the legend, as pieces of the ship
wore out, they were replaced with new timber and planks to the point where none of the
original pieces were present in the ship. The question arises, is this still the ship of
Theseus? The transitivity between processes stratiﬁed by their outcome brings this to the

fore, even though time is not the dimension that change is measured against. At one

57

level, it is all invoicing, and almost all the invoices are processed within 15 days of
scanning them. The Markov method allows me to discern when processes that I
‘identify’ as the same exhibit differences in their sequence and choice of actions, and
conclude that there are differences at a lower level of abstraction and higher level of
detail. Since the processing time of invoices may be variable of interest for managers
seeking to control costs, the issue of transitivity and the question of ‘same or different’
represent another signiﬁcant contribution of this work.

Does the Process Vary with the Amount of Automation Present?

The amount of automation present in a given sequence is measured as the number
of actions undertaken by the workﬂow system in a sequence divided by the length of that
sequence. This measure is related to the sequence, rather than the invoice. Thus, a given
invoice may have an automation score for the entry phase, and several different scores for
each of the approval sequences that are present. Table 19 and Table 20 show these
groups and their characteristics, including the number of automated actions, the length of

sequences, and ﬁnally the ratio of automated actions to sequence length.

 

 

 

 

 

 

 

 

 

Avg Avg
Phase Group N Auto Min Max Len th Min Max
Actions 9
1 551 1.89 0 2 10.26 9 14
Entry 2 469 3.64 3 5 10.15 9 20
3 596 5.52 5 6 10.14 10 14
4 384 8.20 6 10 10.12 9 13
1 620 0.29 0 3 1 1.22 2 45
2 924 1.53 1 4 13.63 7 31
Approval
3 607 2.17 1 4 13.44 6 27
4 701 4.74 1 13 15.21 4 37

 

 

 

 

 

 

 

 

 

 

 

Table 19: Groups and descriptives for automation (automated actions, sequence
length).

58

 

 

 

 

 

 

 

 

 

Phase Group N Aa‘ggoﬂ‘ Min Max
1 551 18.43% 0.% 20.00%
Entry 2 469 35.92% 25.00% 40.00%
3 596 54.48% 41.67% 60.00%
4 384 81.06% 63.64% 100.00%
1 620 1.96% 0.% 7.41%
2 924 10.97% 7.69% 14.29%
Approval
3 607 16.14% 14.81% 18.18%
4 701 31.95% 18.75% 78.57%

 

 

 

 

 

 

 

 

Table 20: Groups and descriptives for automation (automation percent).

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Phase Test LR df p-value
Overall 11625.23 765 0.

1 v 2 3557.55 255 0.

1 v 3 5901.05 255 0.

Entry 1 v 4 3797.48 255 0.
2 v 3 1845.35 255 0.

2 v 4 2806.09 255 0.

3 v 4 2975.18 255 0.

Overall 5532.13 1197 O.

1v 2 1892.11 399 O.

1 v 3 2660.79 399 0.

Approval 1 v 4 3189.06 399 0.
2 v 3 770.61 399 0.

2 v 4 1452.01 399 O.

3 v 4 998.02 399 0.

 

 

 

 

 

 

 

Table 21: Group comparisons for the amount of automation present within a
process

The results of the likelihood ratio tests indicate that for each phase, each of the
four stratiﬁed groups is different in their processes (Table 21). I explore this further by
examining histograms, and then use an alternate stratiﬁcation process. Figures 6 and 7
show the average amount and distribution of automation percentage is very different

between phases.

59

 

 

 

 

 

 

500d _

400‘

300‘

200'

100‘ J
ll- n [ill

0 I I I I I
0.00 0.20 0.40 0.60 0.80 1.00
Percent of sequence that is automated
Mean =0 45

Std. Dev. ='0.232
N =1,999

Figure 6: Distribution of automation for entry phase.

 

500 "
400 ‘
300‘
200 '
100 ‘
0 I .. W

0.00 0.20 0'40 0160 0.80
Percent of sequence that is automated

 

 

 

 

 

 

Mean =0.18
Std. Dev. =0.144
N =2,407

Figure 7: Distribution of automation for approval phase

60

The approval phase is marked by a smaller overall amount of automated actions
based on mode and mean. I also bin the approval phase into chunks that would separate
out the modal set from the grouping of processes at .55 automation and higher obtaining
similar results. Instead of stratifying based on quartiles, I perform a visual stratiﬁcation
technique based on their distribution (ﬁgures 6 and 7). There were still four groups
(Error! Reference source not found., Table 35), but 1 capture modal areas together
instead of groups of equal size. As seen in Table 22, the results are qualitatively identical
to those in Table 21. Taken together, this gives strong evidence that processes that are
more highly automated have patterns of action that are distinct from those with less
automation.

The implication is that automation affects the temporal and task structure that
digitally-enabled organizational routines exhibit. This is consistent with the literature on
technology impact, but the analysis performed here represents a new way of looking at
technology and its effect on business processes. This allows a much ﬁner grained
measure of technology use, in the percent of each execution that is automated. The use of
order statistics as an adjunct to this analysis could allow the extension of this and allow

further discovery.

61

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Phase Test LR df p-value
Overall 9150.95 765 0.
1 v 2 935.93 255 0.
1 v 3 2417.66 255 0.
Entry 1 v 4 2705.73 255 0.
2 v 3 3493.46 255 0.
2 v 4 4508.71 255 0.
3 v 4 2380.91 255 0.
Overall 7064.57 1197 0.
1 v 2 2354.58 399 0.
1 v 3 2863.56 399 0.
Approval 1 v 4 4276.91 399 0.
2 v 3 655.23 399 0.
2 v 4 1324.42 399 0.
3 v 4 1242.19 399 0.

 

 

 

 

 

 

 

Table 22: Group comparisons for the amount of automation present within a
process, visual stratification process

Discussion

When a routine is viewed as a Markov process, there are several connections to
how organizational routines are theorized. The probabilities of transition between actions
within a sequence can be thought of as representing the dispositions or habitual nature of
routines (Pentland et al., 2009d). This makes the Markov approach especially useful to
study what the performances are, how many different types there are, and how changes
unfold. This perspective also ﬁts with the theoretical concepts of patterns and essential
variety that are found in the performance of organizational routines.

Tests of homogeneity allow the researcher to discover how alike or different two
sets of performances are from each other, and may lead to changes in how we identify
routines as the same or different, or know if we have one routine or many. The discovery
of similarities and differences between performance sets that were not transitive was

unexpected. This may challenge how we think of the concept of routine identity.

62

The concept of stationarity is similar to our understanding of how changes in
organizational routines can be seen in situ. Routines that exhibit change, even
incremental change, can be seen as alterations in the choice and temporal structure of
actions. The order of the process can be seen as the amount of temporal interdependence
between actions, but limitations from the sparseness of the transition matrix make this
difﬁcult mathematically to apply to the data collected for this dissertation.

While the analysis employed here only allows the investigation into similarity or
difference as a binary decision, the overall research model was well supported (Table 23).
There are similarities between processes with similar inputs (no group I was statistically
similar to any group 4). For the approval phase, there are differences between processes
with differential outcomes. This makes sense, since the majority of the ‘work’ of
approving an invoice does not take place during the entry phase.

Automation seems to drive heterogeneity in the process, in that when the
processes were stratiﬁed by the percent of actions that were automated, no group was
similar in transitions to any other. On the other hand, this may be an indication of the
endogeneity of technology within the process—actions are automated with different
frequency because those tasks are easier or more difﬁcult for the system to do them
without human interaction. In the next chapter, I complement this analysis with the

results from the string matching analysis.

63

 

Research Question

Variable

Results

 

Does the process
vary with differential
inputs?

Invoice Amount—the log of the
invoice amount (per invoice)

Entry—Group 1 is different from

 

groups 2, 3, 4
1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2 3 4
,
2 o o
2 o o
A o o
A rov al—
1 2 3 4
1
2 o
I o o
4 0
Vendor Count—how many times Entry— Groups 1 and 2 are different _
in total did the vendor have from 3 and 4
invoices in the entire data set 7— 1 2 3 4
(per invoice) 1 o
2 o
A
Aggrov al—
1 2 3 4
1 o
2 o o
i o
1
Vendor Experience—how many Entry—
times a particular vendor had 1 2 3 4
invoices prior to the current 1 o
invoice (per invoice) ; o
2
ﬁrm al—
1 2 3 4
1 o
2 o o
2 o
z

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Does the process
vary with the amount
of automation
present?

 

Number of actions undertaken by
the system divided by the total
number of actions within a
sequence (per sequence)

Entry—All strata are different

 

 

Approval— All strata are different

 

Table 23: Summary of results—bullets indicate similarity, or an insignificant

statistical test

64

 

 

 

 

Are processes that
have different
outcomes similar?

 

Length of time between invoice
scanning and the completion of
the workﬂow (per invoice)

 

 

 

 

 

 

 

 

 

 

 

 

Entry—
1 2 3 4
o o
I o o o
C o o o
l o o
Approv al—
1 2 3 4
1 o
1 o o
4 o

 

 

 

 

 

 

 

Table 23 Continued

65

 

 

Chapter 6—String Distance Analysis and Findings

Introduction

In this chapter, I describe the analysis utilizing string distance and
multidimensional scaling techniques to explore how inputs drive sequential variety and
its impact on outcomes. I also examine the impact of automation on the process. This
second method views each process execution as an ordered set of symbols, and calculates
distances between them based on the number of insertions, deletions and substitutions of
symbols that it takes to convert one sequence into another. I begin this chapter with a
description of how I prepared the data for analysis. Then, I discuss multidimensional
scaling as a way to extract the relationships between sequences, and then analyze these
relationships by using a partial least squares technique.

Overall, I ﬁnd support for the research model, but there are also differences
between the entry and approval phases of the invoicing business process in the nature and
form of both buffering and automation. A key ﬁnding is the differing role of automation
between the entry and approval phases in that automation serves as a substitute for
action-based buffering in the entry phase, and a complement in the approval phase.
Using scaled string distance holds promise for empirically studying organizational
routines. With this data, the method correctly discriminates between types of sequences
from entry and approval, and extracts variables that describe facets of the process.

Despite the conﬁrmatory nature of the research design, the application of MDS to
string distance in this dissertation represents an exploration. Given idiosyncrasies of the
data and the structure of the model, PLS was not the best choice for analysis. This data

will be reanalyzed using GLM, logit or probit to deal with problems of non-normality,

66

identiﬁcation and endogeneity. Initial work with these methods shows promise and may
lead to more robust ﬁndings than those presented here.
Model Variables

The invoice parameters are inputs to the invoice handling process, and the process
generates outputs and outcomes. 1 am using the invoice amount (LogAngmt) and the
experience that the organization has with the invoice vendor (TotalVendorCount,
VendorExperience) as the input variables. Because of the extremely skewed distribution
of invoice amount, I use the logo instead of the raw amount. As with the Markov
analyses, automation percent (AutomationPCT) is a ratio of the number of automated
actions within a sequence divided by the length of the sequence. Outcomes are deﬁned
as the length of time spent processing from data entry to ﬁnal approval (SC_to_AA). The
main difference in this analysis from the Markov analysis is in the way I represent the
sequence in the analysis, and the techniques I am using to model my theory.
Preparing the Data

As with the Markov analyses, I process the event log into a set of sequences
related to each invoice. This results in a list of entry sequences (one for every invoice)
and a list of approval sequences. I then remove the duplicate sequences from the set
leaving a list of sequence types for both entry and approval. This is done because the
multidimensional scaling algorithms I employ do not support having items with zero
distance in the matrix.

Then, I create an N by N matrix, N equaling the number of sequence types in the
set. Each cell in the matrix represents the number of insertions, deletions and

substitutions that at takes to convert one sequence into the other, or the string distance

67

between each sequence. This measure, known also as a Levenshtein distance (Sankoff &
Kruskal, 1983), has been utilized in studies found in various literatures (Abbott, 1983,
1990b, 1995; Sabherwal & Robey, 1993). I chose this distance measure because it has
been used in the realm of information systems development (Sabherwal & Robey, 1993),
was suggested for use by social science researchers (Abbott, 1983, 1990a, 1995), and
speciﬁcally has been used to describe the concept of sequential variety in organizational
routines (Pentland, 2003a, 2003b). Another commonly used distance measure is cosine
angle distance, typically used for interpretation of search queries and AI language
processing, but this doesn’t have the rich theoretical connections of the Levenshtein
distance.

This matrix is considered a dissimilarity matrix, as a zero represents identity, and
higher numbers indicate sequences that are more dissimilar. These relationships between
sequences are ordinal, rather than metric, suggesting non—metric multidimensional scaling
should be utilized. (Kruskal & Wish, 1978).

Initial Exploration

I performed several analyses with various scaling algorithms and techniques,
starting with the same initial dissimilarity matrix. I knew from the system and the data
that there were two phases of the invoice approval process. I decided to put both sets of
sequences into the same matrix and determine where the scaling would locate them. This

is shown in Figure 8.

68

 

 

 

 

 

I.)
[:x
V. .. I.
o I...
’3 A
N _
o L..‘.
Q
0. ..
o I
A
“4 ..
C?
A
~. A
d: “4 1;
Q A
l l l I l
-04 ~02 0.0 02 04

X1

Figure 8: Entry (square) and approval (triangle) sealed in two dimensions.

For this application, two dimensions of scaled string distance clearly group all of
the entry sequences (square) together, and approval sequences (triangle) together in
another location. Even if it were not known that there were two groups of sequences, this
analysis would have indicated this heterogeneity. Different subroutines within an overall
routine are discerned, an important demonstration of this technique when applied to

performances of organizational routines. Also, note that the entry sequence cluster is

69

much more compact than the approval sequence cluster. This is an indication that the
entry sequences are more homogenous than the approval sequences, and is consistent
with sequential variety measures for each group (Pentland. et al., 2009d).
Extracting Metric Relationships between Sequences

To perform multidimensional scaling, some choices are required by the
researcher, based on the purpose of the analysis and idiosyncrasies of the data. First, there
are several algorithms available to convert the dissimilarity matrix into a distance matrix.
Second, the number of dimensions that the underlying data will be projected onto must be
determined. Two criteria are suggested to help the researcher make these decisions, one

based on a ﬁt measure, the other on the pragmatic use of the dimensional data.

~ 2
2i¢j[6(d,-j ’dijl
Xi¢jdijz

 

S=

 

Equation 1: Stress

When the non-metric multidimensional scaling (N MDS) algorithm converges, it
gives a measure related to the ﬁt of the solution called ‘stress’ (equation 1). A researcher
should seek to minimize the stress while preserving his or her ability to understand or use
the data in analysis. Stress, then is a measure of ‘badness of ﬁt’(Kruskal & Carrol, 1969;
Kruskal & Wish, 1978). Two sets of visualization plots help make this decision, along
with the calculated value of stress. A Sheppard plot is used to graph ordination distances
against original dissimilarities, and also gives a goodness of ﬁt measure (Oksanen,

2009b)

7O

Choosing a Distance Function

I decided to examine some other ﬁelds and how they utilize multidimensional
scaling. Vegetation ecology researchers use non-metric multidimensional scaling
techniques to understand the relationships between the ecological content of a given area
and variables such as altitude, orientation, rainfall, and soil composition. Researchers in
this ﬁeld suggest that Euclidian and Jaccard distance functions tend to do the best job of
extracting meaningful dimensions from data similar to mine (Oksanen, 2009a). The
statistical package I used in the R program was developed speciﬁcally for vegetation
ecologists, but the MDS functions are generic and have been used by researchers in other

ﬁelds as well (Oksanen, 2009a).

 

N 2
djk : 'Zl(xij -xlk)
1:

Equation 2: Euclidian Distance Calculation

I evaluate a non-transformational conversion (raw), one based on Euclidian
distance (Equation 2) and one based on Jaccard distances (Appendix 3, Equation 5). The
raw distance matrix is simply the matrix of string distances between sequences. Figure 9
and Figure 10 show a scree plot of the stress for each of the distance functions, for
extracting dimensions 1 through 5. The shape of the curve is diagnostic, and should be
monotonically downward sloping. Figure 10 shows that the approval phase raw distance
data caused some difﬁculty for the MDS algorithm when moving from 2 to 3 dimensions.
In both the entry and approval phases, the Jaccard and Euclidian distance ﬁmctions
performed better than the raw distance, and there was little difference between the

absolute stress levels for these two distance calculations.

71

Error! Reference source not found. in Appendix 3 show the Sheppard plot for
the MDS projections of the Euclidian distance matrix and I, 2, 3, 4, and 5 dimensions. I
also evaluated the Sheppard plots of the Jaccard and raw matrix MDS solutions, but these
are not shown. Given the higher stress values for the raw matrix, and the lack of
signiﬁcant difference between Jaccard and Euclidian for the scree plot and Sheppard

diagrams, I chose to use the Euclidian distance for the remainder of the analysis.

Q :_ ' "W WW "W" " ’WW‘ ' 7 WW WWWW W W—W‘l
l l
35 . l

l

30 . l

25 Q 1
stress .

20 . .

10E

1 2 3 4 5
dims
Figure 9: Entry scree plot—Top line is raw, next line down is J accard, bottom line
is Euclidian. Stress is the y-axis, number of dimensions on the x-axis

72

20 f.

I
15 1
stress .
l
10 i

5 ..

1 2 3 4 5

dims

Figure 10: Approval scree plot—Top line is raw, next line down is Euclidian,
bottom line is J accard. Stress is the y-axis, number of dimensions on the x-axis
Determining the Appropriate Number of Dimensions

Examining a scree plot of the stress over multiple dimensions is also helpful in
deciding how many should be utilized. As seen in Figure 9, it appears that a two or three
dimension solution would be acceptable for the entry phase and the marginal beneﬁt of
moving to four or more dimensions may be low. The approval phase scree plot (Figure
10) indicates that three or four dimensions may be needed to ensure a better ﬁt of the data
with the projection.

In many cases, the guide for choosing dimensionality should be the pragmatically
driven by the interpretability of the solution (Kruskal & Wish, 1978). This involves
examining several projections and visualizing their locations in space, running multiple

regressions and evaluating how well the dimensions explain the data. The three-

73

dimensional projections can be viewed two dimensions at a time, or visualized
interactively in a 3-d plot. Visualization beyond three dimensions is difﬁcult.
Results

In the next section, I discuss the results of the analyses performed in this chapter.
I ﬁrst qualitatively examine a list of sequences and the dimensions obtained from the
MDS algorithm. Then, I use multiple-regression to help interpret these dimensions in
relation to the other variables in my model. These regressions also can be used to
visualize how the MDS dimensions relate to the input, outcome, and automation
variables. Next, I employ these dimensions as formative measures of the process it in a
partial-least squares structure using smartPLS. Finally, I discuss the relationships
between inputs, process, outcome and automation based on this path analysis.
Interpreting the MDS Sequence Dimensions

To complement the Markov analysis, I attempt to discover which input variables
drive differences in the process. Kruskal~ and Wish (1978) suggest several techniques to
help the researcher understand what the variables from a speciﬁc MDS projection mean.
First, they suggest considering the dimensions and relate them to the original data. In my
case, this means qualitatively examining a list of sequences with each sequence’s scores
on each of the extracted dimensions. A variety of visualizations, including plotting the
points representing the sequences and coloring them in relation to other variables can also
be helpful. Finally, they suggest multiple-regression of projected MDS dimensions on
each variable of interest to evaluate the signiﬁcance and explained variance of a given
solution to evaluate how they relate. Kruskal and Wish (1978) imply candidate

covariates should have as high an R2 as possible and indicate a .01 alpha level for testing

74

regression signiﬁcance. They give a rule of thumb at .7 for acceptable explained variance
(R2), but they note that this is not always possible in practice.
Qualitative Analysis of MDS Dimensions

I place the sequences in a list with the related dimensions that are extracted for
each sequence. I evaluate sequence length, and I discover that there appear to be no
relationship for the entry phase, but v2 on the approval phase indicates a noisy
relationship with sequence length. 1 then sort the sequences by each dimension in order
to determine if I could discern any additional patterns. This was difﬁcult in some cases,
because the underlying relationships may be at an angle to these listed dimensions. There
are an inﬁnite number of unique projections that show the same structure from the MDS
algorithm, all rotations along some axis from each other. Table 24 indicates some of the
regularities that I was able to observe in these sequences, and what those action

sequences actually are.

75

 

 

 

 

 

 

 

V1 High values seem to start with 5, 8, 7, 6 more often than lower values.
This is a sequence of activities “Enter document type”, “Enter invoicno",
“Enter invoicedate", “Enter duedate”.
Lower values did not seem to have a common pattern.
V2 Higher values seemed to start with 8.7.6.3,4 often with the action
Entry sequence: “Enter invoicno”, “Enter invoicedate”, “Enter duedate", “Enter
Phase
amount”, “Enter currency".
Lower values seemed to start with 5, 4, 20 and 5, 3, 4 often giving the
action sequence: “Enter document type", “Enter currency", “Enter vendor
account”, “Enter value dim. 7".
V3 I did not identify any regularities.
V1 Loosely affiliated with sequence length.
ldid not identify any differences between high and low values.
V2 Fairly good relationship with sequence length.
Approval Low values (< ~35) tended to start with 2, 11, 10, or an action sequence
Phase of “Enter account", “Enter Tax-code", “Enter period”.
Higher values (>-35 and < 40) were more likely to start 10, 4, 23 for an
action sequence of “Enter period", “Enter amount", “Enter account”.
V3 I did not identify any regularities here.

 

 

 

Table 24: Qualitative results from examining raw sequences and extracted

variables

Implications of Sample Size

In Table 25, I show the number of sequences, the number of sequence types, and

the number of invoices that are the basis for the number of sequences. I sampled 2000

invoices, but this leads to 2000 entry and 2853 approval sequences. Of these sequences,

there were 206 entry phase types and 929 approval phase types. There were some

76

 

missing values in the outcome variable (SC_to_AA), so the number of valid N is smaller
than this number. Variables related to the invoice such as those related to inputs
(TotalVendorCount, VendorExperience, LogAngmount) and those related to outcomes
(SC_to_AA) are based on the invoice. The MDS variables locating the sequences by
their scaled string distance are based on the types of sequences. Automation is measured
separately for each sequence, so there are as many different values for this as there are

actual valid sequences.

 

 

 

Phase Invoice Number of Sequence Valid Automation
Data Sequences Types N Data
Entry 2000 2000 206 1869 1869
Approval 2000 2853 929 2528 2528

 

 

 

 

 

 

 

 

Table 25: Sample size and basis for samples and data

The total information (variance) that was available in the sample was less than the
total amount that was possible (e.g., 206 sequence types in 2000 sampled sequences).
Given the lexicon and length of sequences, the reduction in information content is quite
large. The total number of possible sequences in this system is truly inﬁnite, because
there is no upper limit on the length of the sequence, and the lexicon is editable. When r-
square is calculated, it is the ratio of explained variance to the total variance (explained +
unexplained). I believe that the unexplained variance is inﬂated for one main reason:
redundancy of sequence types in my sample

I could at most ﬁnd 2000 different entry sequences and 2853 approval sequences.
1 actually found 206 different entry and 908 approval sequences. This means that the
MDS algorithm only had a small set of different string distances as compared to the
variance that could be observed. I believe this smaller set is the actual amount of

unexplained variance relating to the location of the sequences in space. I can only

77

explain at most the variance based on 206 sequences in the approval phase, even though
my N is based on a sample of 2000. Simply adjusting N to 206 or 929 will not solve the
problem, as I really do have samples of 2000 and 2853 for entry and approval
respectively. As I note in the discussion of this chapter, there is a calculable correction

for this, but this was not performed.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Std.
N Min Max Mean Deviation
Angmount 1990 20.66 1529886 20073.1 74779.704
LogAngmount 1990 1.31 6.18 3.6143 .73301
TotalVendorCount 1991 1 2392 516.57 658.595
VendorExperience 2000 0 21 17 416.56 552.989
SC_to_AA 1867 0 233 6.17 8.234
Table 26: Descriptive information for invoice variables
Std.
N Min Max Mean Deviation
Entry 2000 .00 1 .00 .4524 .23241
Approval 2852 0 0.78 .1493 .14698

 

 

 

 

 

 

 

 

Table 27 Descriptive information for sequence variables
Using Multiple Regression to Understand MDS Dimensions

Tables 26 and 27 above show the descriptive information for the variables relating
inputs, outcomes, and automation. I evaluate each of the distance metrics for two and
three dimensional projections by performing a regression with the variable of interest as
dependent variable, and the MDS dimensions as independent variable. This was done for
both the entry phase and the approval phase. The coefﬁcients can then be used to map a
line or ‘gradient’ upon the scatterplot of scaled sequences to show the direction that
variable moves within sequence space as a way to visualize these relationships. This is
seen in the Appendix 3, ﬁgures 18 through 21. Table 28, Table 29 and Table 30 show

regressions of each variable of interest upon the two and three-dimensional MDS

78

Euclidian solution. Each table focuses on a separate set of variables relating to inputs,

automation, and outcome.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

r- Adj. r-
Phase Variable Z.v1 Z.v2 Z.v3 Square Square
Vendor -0-096' 0.291 * 0.089 0.088
E t Experience -0.022* 0299* -0.265* 0.159 0.157
n
ry Log 0086' -0.174' 0.034 0.033
, Amount 0087' -0.180* -0.054‘ 0.038 0.037
Envrronment
Vendor -0.090* -0.275' 0.090 0.089
Experience 0093" -0.260' -0.079' 0.096 0.095
Approval
Log 0 0.006 0.071 0.005
Amount 0.002 0.094‘ -0.121* 0.019 0.018

 

 

Table 28: Two and three dimensional standardized regressions on input variables,
entry and approval phase. * indicates signiﬁcance at the .01 level

For entry phase, the two dimensional sequence space explains roughly 9% of the

variance in vendor experience, and this rises to 15.7% when the third extracted dimension

is included, as seen in Table 28. Similar results were initially seen for the approval

phase, adding a third dimension did not add much explanatory power over the two

dimensional solution. The log of the vendor amount was explained weakly by the

extracted variables for the entry phase, but the approval phase indicated a poor ﬁt and

low explanation.

 

 

 

 

 

 

 

 

 

 

 

 

 

r- Adj. r-
Phase Variable Z.v1 Z.v2 Z.v3 Square Square
Entry SC_to_AA -0.265* 0.155" 0.068 0.067
-0.258* 0.154‘ -0.014 0.067 0.066
Outcome
Approval SC_to_AA -0.223* 0.118“ 0.054 0.053
-0.220" 0.109“ 0.054“ 0.056 0.055

 

 

Table 29: Two and three dimensional standardized regressions on outcome
variables, entry and approval phase. * indicates signiﬁcance at the .01 level

Table 29 interprets how well the extracted dimensions explain the outcome of the

process, the amount of time it takes from scanning to approval (SC_to_AA). There is a

79

 

good pattern of signiﬁcance overall, and the r-square indicates a weak relationship
between the process variables and outcomes. Interestingly, adding a third dimension to
the process added little explanatory power as evidenced by no signiﬁcant increase in

adjusted r-square.

 

 

 

 

 

 

 

 

 

 

 

 

 

r- Adj. r-

Variable Phase Z.v1 Z.v2 Z.v3 Square Square
E t AutoPCT -0.270* 0.333“ 0.169 0.168

n r
. y -0.103" 0.365“ -O.645‘ 0.570 0.569
Automation
AutoPCT -0.197" -0.002 0.039 0.038
Approval

-0.194* -0.023" 0.109“ 0.051 0.050

 

 

Table 30: Two and three dimensional standardized regressions on automation
variables, entry and approval phase. * indicates signiﬁcance at the .01 level

Automation seems to be the variable (of those tested) that is best explained by the
location of sequences in multidimensional scaled space. The three-dimensional solutions
explain more variance in the amount of automation than does the two-dimensional
projection for the entry phase (Table 30). While an R2 of .569 does not meet Kruskal’s
rule of thumb of .7, this is the best result obtained, suggesting that the three-dimensional
projection of automation onto the scaled space best explains the variance in the process.
The same result was not seen for the approval phase where the amount of explained
variance was low, and did not improve much by including an additional dimension.

I summarize the differences in adjusted R2 due to the addition of the third
extracted dimension in Table 31 for all of the variables evaluated in Table 28, Table 29
and Table 30. Because there was at least some improvement in most cases, I decided to

use the three-dimensional scaling solution for visualization and path analysis.

80

 

 

 

 

 

Variable Entry Approval
Vendor
. 0.069 0.006
Environment Experience
Log Amount 0.004 0.013
Outcome SC_to_AA -0.001 0.002
Technology Use AutoPCT 0.401 0.012

 

 

 

 

 

 

Table 31: Improvement in explained variance (adjusted r-square) in 3-dimensional
solution over the 2-dimensional solution

Multiple Regression and Visualization

I examined visualizations of the projected dimensions, along with the regressions
of these dimensions on the other variables of interest. It is difﬁcult to show a vector or
surface for how the variables relate to the ‘cloud’ of points representing the sequences, at
least on paper. Sometimes it is useful to look at pairs of dimensions at a time, but the
best way is through an interactive 3-dimensional graphing program. In Appendix 3,
Figures 18 through 23 show the 3-dimensional solutions, with a pair dimensions
represented in each graph. Each of these pictures shows a red circle that represents each
sequence in scaled string-distance space, as located by the MDS algorithm. Each line
represents the relation between the two MDS extracted dimensions for the sequence, and
input (TotalVendorCount, VendorExperience), outcome (SC_to_AA), and automation
(AutoPCT) variables.

For example, the entry phase graphs (Figure 18, Figure 19, Figure 20) show that
the MDS algorithm located each of the sequences according to their string distance in a
coordinate plain. Automation seems to have a nearly vertical slope in all three graphs,
indicating that it has a similar relationship in all three dimensional pairs. Total Vendor

Count seems to have a similar relation with V1, V3 as with V2, V3, but a different one

81

with V1, V2. We can also see that these lines do not explain much of the variation in
these spaces. It would be difﬁcult to imagine a good ﬁt for any drawn line for the entry
phase sequences, but the approval phase seems to be linear at least across some of the
pairs of dimensions.

What do these Dimensions Mean?

The qualitative examination of sequences and the three dimensional solution
indicated some regularities in the way that the algorithm located the sequences, at least at
the extreme points on each dimension. Some of the variables, namely the third
dimension for the entry and approval might have a relationship that is difﬁcult to discern,
or there might be no relationship at all. Other variables, speciﬁcally in the approval
phase, seemed to be associated with the length of the sequence. In some cases, I was able
to observe some patterns that were more common at one end of the dimension than
others, but I was still unable to understand what the dimensions mean. The scaled string
distance appeared to be identifying differences in the sequences, but the ordering of the
sequences themselves was difﬁcult to interpret.

Exploring the relationships between the sequence dimensions and other variables
by using multiple regression and visualization suggests several things. First, the amount
of explanatory power of these dimensions may be lower than is advised by literature.
Some of this may be due to information requirements, an omitted variable, or as indicated
in many of the scatter plots, the lack of any linear relationship in the underlying data.
Second, automation appears to have the strongest relationship with the extracted

dimensions, especially for the Entry phase.

82

In. I.

.i

Third, there may be divergence in how these relationships are expressed between
the entry and approval phases. There were differences in all parameters that the
regressions found with regard to the relationships with variables of interest. The amount
of explained variance, the pattern of signiﬁcance, and most importantly the coefﬁcients
of the variables indicate that the relationships between inputs, outcomes, and automation
may be different for the entry phase as compared to the approval phase. Next I test the
theory presented in chapter three by examining these relationships using path analysis.
Evaluating the Research Questions

In this section, I posit answers to my research questions, exploring the concepts of
buffering and the role of automation. I use the three-dimensional MDS solution, but
locate the extracted variables in a structural model that is run using partial least squares
analysis. This method allows me to include all of the variables of interest, and evaluate
each research question as a whole. This technique also allows the creation of constructs
to represent the concepts of environmental input, dimensions of the process, outcomes,
and automation. I ﬁrst examine the input-process-outcome model, separately for the
entry and approval phases. Then I add automation, and discuss how the use of
technology changes the impact and nature of buffering.

This analysis is complicated by the fact that I have two processes or subroutines
that take place within the overall invoicing routine at the research site. For each model, I
have two sets of data for consideration. This is interesting, because it allows the model to
express different relationships between inputs, process, outcomes, and automation for the

entry phase and the approval phase. As noted in Figure 8, a MDS projection correctly

83

clustered these subroutines as being different from each other, and the results from the
PLS analysis point to differences between their relationships in the model as well.
Construct Formation

In this analysis, I deﬁne the input construct as a linear combination of the log of
the average invoice amount, the total vendor count, and vendor experience. The process
construct is a combination of the three extracted variables relating each sequence in
scaled string distance space. The outcome is measured by a single variable, the length of
time from scanning to complete approval (SC_to_AA). l have included a covariance and
correlation matrix for all of these variables, with the measures for each process in
Appendix 3, Table 38 through Table 41.

Because of the way that these constructs are deﬁned for this analysis, they are
considered formative. This means that each of the variables may tap into a different facet
of the construct, and that each construct may in fact be multidimensional (Petter, Straub,
& Rai, 2007). Reﬂective constructs are different in that each item for a construct is
expected to move with the other items for that construct: together they are one-
dimensional.

This means that the items that form my constructs need not correlate, and also that
they are not measured with error. For example, there was no instrument that measured
the amount of the invoice, so there is no way to introduce measurement error. Instead of
measurement error, formative constructs have an error term associated with the construct
itself that represents misﬁt of the items with the construct, miss-speciﬁed items and items
that may be missing from the construct (Diamantopoulos & Winklhofer, 2001; Jarvis,

Mackenzie, & Podsakoff, 2003; Peters & Saidin, 2000). The arrows in Figure 11 through

84

Figure 14 point from the items towards the construct as is customary (Diamantopoulos &
Winklhofer, 2001; Jarvis et al., 2003; Peters & Saidin, 2000), and smartPLS deals with
them appropriately as formative.
Inputs, Process, Outcome: Buffering

Figure 11 shows the coefﬁcients and R2 for the input-process—outcome buffering
model. The explained variance in the outcome variable is low, and the input and process
constructs do not load well from their components. The three internal paths are
signiﬁcant, but overall this is not a great model. I also evaluated splitting the input
construct into Invoice Amount and Vendor Experience. This produced a better ﬁt
(signiﬁcant loadings) for the input construct, but the internal paths were insigniﬁcant.
For comparison to the approval phase, I decided to leave this model as it is presented

here.

 

[ LogAngmt

 

 

 

 

 

 

1.19:] Input Outcome

LTotalVendorCount '-315 _ 0.00 225* .071 100° SC_to_AA
.038

l VendorExperience .471 * ,071*

 

.007
Process .221 : V3 1

999* -. 149

v2

Figure 11: Coefﬁcient and explained variance for input-process-outcome buffering
model, entry phase

The approval phase buffering model shows a number of improvements over the
entry phase model. First, the amount of explained variance in outcomes is approaching

practical signiﬁcance, albeit at a small level. Interestingly, there is not much variance in

85

the extracted process variables being explained here. The paths between input and
outcome, input and process, process and outcome are all signiﬁcant.

The relative sizes of coefﬁcients suggest a theoretically relevant story. The input-
process and process-outcome paths are almost double that of the direct path between
input and outcome. Comparing this model to the entry phase model, one can conclude
that the entry phase has little connection to outcomes to begin with, but the process is not
a core buffering mechanism. In the approval phase, the process appears to be a partial
mediator of the relationship between input and outcome, implying that the process

buffers the variance in the inputs from impacting the outcomes of the routine.

 

[ LogAngmt l

- Outcome
.064

 

Input

    

 

 

 

 

l TotalVendorCount SC_to_AA j

 

 

 
   
    

-1.46V/

[ VendorExperience !

 

Process

CE

.549*

Figure 12: Coefﬁcient and explained variance for input-process-outcome buffering
model, approval phase

The Role of Automation and Information Technology Use

Now that the model evaluating how processes can buffer environmental variance
has been evaluated, I explore how automation ﬁts into the picture. Figure 13 shows the
results of adding the percentage of automation within a process to the entry phase model.
Interestingly, the amount of explained variance in outcomes dropped, but this model

explains much more of the process. The pattern of construct loadings was superior as

86

compared to the entry model without automation. When examining the coefﬁcients of
the paths, we can see how automation may be a buffer within this routine, rather than the
process alone.

Inputs drive the amount of automation in a process, and automation strongly
drives the process, while the relationships of input—process and input-outcomes are very
weak. Given the size of process-outcome and automation-outcome relationships, we can
see that the process acts as a buffer most strongly through automation, not via the inputs.
This model does support buffering, but it highlights a very complex information system
impact through the use of automation features. The entry phase does loosely couple
inputs and outcomes, but it is through automation’s impact on the process, rather than
simple contingent actions acting as the mechanism. This indicates that buffering
mechanisms can use a combination of actions and technology, rather than solely on the

expression of actions.

87

 

"Wt

LogAngmt 1

W1

 

 

 

 

 

 

 

 
  
 

' ' ln ut Outcome
.106 p
i TotalVendorCounth1~2§-—+i 0.00 "028* :i .015 ‘T‘JLOOLW_ SC_to_AA
.916*/’ W
l VendorExperiencej

 

       
    

,1 " .852
Process/£\.570 ‘
I / I ' W t

__—_-__4

.465*

 

\

i AutoPct H .216 1 Automation

1.00\/

Figure 13: Coefﬁcient and explained variance for input-process-outcome with
automation model, entry phase

The approval phase held additional surprises and implications from the addition
of automation. The outcome variance explained (R2) improved, indicating that
automation does add more information to the model. On the other hand, this model
explains less of the process variance. All paths were signiﬁcant; construct loadings as
well as internal paths, indicating that this is may be a sufﬁciently acceptable model for
this data structure. This model is consistent with the approval buffering model without
automation, but in this phase of the invoicing routine, we ﬁnd that inputs do affect
processes directly and through automation. Looking at the relative sizes of the path
coefﬁcients, we see that the process emerges as a stronger buffer, in that the input-
outcome connection is less than half the size of the input-process and process-outcome

relationships.

88

Interestingly, the size of the automation-outcome coefﬁcient is smaller than the
input-outcome coefﬁcient, but the connection between input and automation remains
high. This implies that the process acts as a buffer, but automation also has some
buffering content in how it affects the process. If one were simply studying the impact of
automation on outcomes, the results would completely miss its impact through the
process as a mediator. This highlights how innovative the methods are in this study, and

represents a new class of IT impacts that have not yet been investigated or theorized.

 

1
I
l

 

    
 
   
  
 
 

 

 

 

 

 

 

 

 

l L A A t
P °9 v9 m - 1 Input Outcome
-.212*\
[TotalVendorCount 134288”: 0.00 ‘1. "028* : .124 1.09241 SC_to_AA j
-3.012*
LVendorExperlenceJ -. 350"r .319*
.531"
Process .084
.481* ,_ .562*
.580* v, , V2 .103*
—.739*
“\x

LEE—act l——>< .337 ) Automation
1.00\\\_,//

Figure 14: Coefﬁcient and explained variance for input-process-outcome with
automation model, approval phase

Comparing the effects of automation on the entry model to that obtained through
the approval model provide additional insights. It appears that the role of automation is
markedly different between the two phases of the invoice process. In the entry phase, the

process does mediate the variance between the inputs and outcomes, but only through the

89

mechanisms of automation. In the approval phase, automation seems to be an adjunct or
secondary buffer, as the process does buffer independently of automation.

This indicates some of the complexities of understanding the impacts of
information technology use. In general, the entry phase has a higher mean automation,
but the distribution in the approval phase is much wider. Even though I have only one
measure of automation that is calculated the same between the subroutines, there is
heterogeneity in what automation means for each process. The task context is different,
but also the nature of use may be different between the two phases. For example, many
of the actions that are automated in the entry phase are data entry and forwarding. In the
approval phase, much of the automated tasks are notiﬁcations and forwarding for further
approval.

Discussion

The number of choices given to the researcher with this set of methods results in a
complex picture to interpret. While the dimensions extracted from the process were
difﬁcult to interpret with the available variables through regression and visualization, the
path analysis was consistent with the Markov results. The low amount of explained
variance may be due to the information content in the sequences compared to the total
information available based on the lexicon and sequence length.

In 2000 sequences I could have at most 2000 different sequences, but there were
only 241 distinctly different ones. This means that the sample of sequences only has a
fraction of the information contained in the whole set of possibilities. This smaller set of
information is the only basis that the MDS algorithm can use to ﬁnd differences and

variance between the sequences. Normal calculations of r-square use the explained

90

variance divided by the total variance. I believe that the total variance I have in my case
is smaller than normally found, leading to a smaller SST and r-square being unnaturally
low. This is may be correctable statistically but is beyond the scope of my study.

Evaluating model quality is difﬁcult in models including formative constructs,
because measures of internal consistency are only useful concepts when applied to
reﬂective constructs (Diamantopoulos & Winklhofer, 2001; Jarvis et al., 2003; Peters &
Saidin, 2000). Typically, one must use either a correlation to connecting reﬂective scales
or consider the relative sizes of coefﬁcients and explained variance to ascertain the value
and quality of a given model. Given the low explained variance, and difﬁculty with
signiﬁcance in some paths, it would be helpful for validation through other regression
techniques in addition to PLS. When my model was processed using seemingly unrelated
regression (SUR), similar results with signiﬁcance and explained variance were obtained.
When my model was evaluated using three-stage least squares (3SLS), the model failed
because of endogeneity and identiﬁcation issues.

The fact that all of the constructs in my model are formative has implications on
the quality of the model. This means that I have an identiﬁcation problem, because there
is some indeterminacy between the error terms and the scale of measurement (Jarvis et
al., 2003). Possible solutions include setting one of the indicator paths to l or adding a
reﬂective construct to the model. The best time to solve this identiﬁcation problem is at
the research design stage, before analysis has begun (Diamantopoulos & Winklhofer,
2001; Petter et al., 2007). Since I do not have the option of adding a new reﬂective
construct, my options may be limited with the current analysis, but a respeciﬁcation of

the model or the use of a different regression technique may prove useful.

91

There are signiﬁcant issues with non-linearity and non-normality in the data. The
ultimate DV is not continuous; rather it is a count of the number of days it takes to
process an invoice. The nature of this variable may indicate that probit or logit analysis
with different distributional assumptions may be appropriate. The MDS dimensions may
exacerbate this issue, given that there are only 206 different values being applied to 2000
entry sequences.

An additional reason the explained variance is low is the likely omission of other
variables that may explain these relationships better. For example, I was unable to
categorize the vendors or their products, due to them being written in Norwegian. There
may also be organizationally relevant variables that the workﬂow system does not
capture. Despite these challenges, I believe that these analyses lead to some important
ﬁndings, and implications that can be generalized.

The results have implications for the concept of buffering. Given the differences
between the entry and approval phases in how the process protects outcomes from
environmental variety, this indicates that we may need to think a little differently about
how contingent actions act as a buffering mechanism within business processes. Rather
than seeing the business process as a whole, I ﬁnd that buffering occurs differently in
subprocesses of the invoicing routine. This suggests that the environment may impinge
upon different sections of a routine, and create points of buffering sections within a given
sequence. Routines may have internal heterogeneity in how variety in a routine is
harnessed to buffer environmental variety. We may have to look deeper within the set of
sequences a routine generates to ﬁnd sections that are buffers within the routine itself,

rather than looking at a whole routine as a buffering mechanism.

92

The results also indicate some changes in how we view the concept of
information system use. We already know that there is heterogeneity in how individuals
and organizations use information technology, and this challenges our ability to study the
impact of IT in a generalizable way. This study shows that there is heterogeneity also
found within an organization as to how different subprocesses supported by information
technology have differential impacts of the use of IT, adding a layer of complexity to
studies of IT impact that have not previously been examined. For example, one impact of
IT on the entry process is that of a buffering mechanism that substitutes for contingent
actions. The approval process is a complement to contingent actions as a buffer. In this
way, information technology may impart a different class of impact from what has been
previously theorized and empirically tested.

The implication is that we can achieve different results from the application of
technology in different subsections of a business process. If technology use has a
negative impact on the front half of a process, but a positive impact on the back half of a
process, a study that looked at the immediate consequences may ﬁnd no relationship,
when in fact there were two effects that cancel each other out. This highlights how
complex the relationship between IT use, processes, and outcomes may actually be, and
also some of the difﬁculties that researchers and managers have with evaluating the

impact of IT investment, adoption, assimilation, and even use.

93

Chapter 7—Discussion and Limitations

Introduction

In this chapter I integrate the results with theory, and discuss implications of the
study on literature and practice. I then describe some limitations of the approach I
undertook, and close with some directions for future research, including a new research
design. Overall, this research is designed to be explanatory. I begin with theory, a set of
a priori assumptions about the world and adopt a conﬁrmatory approach. The methods I
used to evaluate the research questions within my theoretic framework, indicate a more
exploratory approach. While these methods have been applied in other areas relating to
processes and sequences, their application to workﬂow data to test theory has been
absent. There is a tension between exploration and explanation in this work that will be
resolved with future work and further study.
Theoretical Implications

In this section I connect the results of my work with the larger conversations
taking place in the literatures of organization theory, routines, business process
management, and information technology impact. I also explore extensions and
improvements to this research in the various areas that help us understand buffering,
process management, and the impact of information technology.
Organizational Theory

Organizational theorists have been interested in how the environment affects
organizational systems as soon as they perceived its open nature (Scott & Davis, 2007).
While there have been several empirical studies of Thompson’s (1967) theory of

buffering (Cooper & Smith, 1992; Koberg, 1988; Sorenson, 2003) and recent theoretical

94

developments (Lynn, 2005; Yan & Louis, 1999), there have not been any studies of
buffering at the business process level. More importantly, none has embraced the
perspective that examines the actual actions that take place as a buffering mechanism.
The results of this dissertation conﬁrm the buffering of environmental variety: Outcomes
are weakly related to inputs. Interestingly, inputs and automation drive changes in the
process that are transmitted to the outcomes.

This dissertation serves as an exemplar of applying the theory of organizational
routines to improve our understanding of organizational actions and structure. Empirical
studies of organizational routines are rare mainly because of the difﬁculty and cost of
obtaining and analyzing data tracking actual events relating to hundreds of process
executions (Pentland et al., 2009d). The work presented here signiﬁcantly adds to the
literature on organizational routines in at least three ways.

First, the Markov approach gives a measure of the probabilistic relationship
between temporally connected actions in the routine. This connects to the
conceptualization of organizational routines as habits or ‘dispositions’ (Schulz, 2008),
and represents one of the methods we suggest to empirically compare routines (Pentland
et al., 2009a). Second, measures of sequential variety indicate the amount of variation in
the choice and order of actions within a sequence. If we look at this attribute of an
organizational routine over time, we can explore aspects of endogenous change within a
routine (Pentland et al., 2009b, 20090), but also the effects of managerial intervention on
a process.

Third, organizational learning can have an equivocal effect on the variety in a

process. On one hand, as systems learn which sequences do not work or are undesirable,

95

these sequences are pruned from the set of possibilities, leading to a reduction in
sequential variety. Conversely, the act of learning new ways of performing an
organizational routine would involve trials of candidate sequences, leading to an increase
of sequential variety. Sequential variety may represent the natural ‘repertoire’ of the
different ways an organizational routine can be performed under various stimuli or
conditions in addition to improvisation or errors.

Business Process Management

This work rejects the black—box approach to understanding and managing
business processes. While a few scholars in the ﬁeld are beginning to understand the
implications this change in perspective (Melﬁo & Pidd, 2000, 2008), it is deﬁnitely not
widespread. By understanding the actions that take place in-situ, and studying how
people and technology interact, scholars of business process management can connect to
other related literatures such as organizational routines and management. '

The synthesis of ﬂexibility and stability represents an extension of the BPR/BPM
literatures, and can be found in areas such as lean and custom manufacturing, services,
and high-reliability organizations. Despite the rise of these innovative strategies, typical
literature in the management of business processes often begins with a perspective of
conformance and matching process executions to documented standards (Singh et al.,
2009). This dissertation begins with a different perspective: embracing variety in
execution to understand its antecedents and consequences. In this way, I seek to be one
of the bridges between the organizational routines literature and that relating to business

process management.

96

Another common feature of business process management research is the use of
‘typical’ rather than ‘actual’ representations of the process (Singh et al., 2009). This
means a focus on the abstract features of the usual process, or what steps should be
performed within the process, usually obtained through interviews. Research that uses
‘actual’ representations uses observational data in some way to discern what actions are
expressed within the business process. A highly prolific group of the BPM scholars
proposes the use of workﬂow mining to automatically extract and visualize patterns of a
process based on the action logs that are recorded by the workﬂow software (Agrawal et
al., 1998; Agrawal & Srikant, 1995; van der Aalst, Desel, & Oberweis, 2000; van der
Aalst, ter Hofstede, & Dumas, 2005; van der Aalst & van Dongen, 2002; van der Aalst et
al., 2003; van der Aalst & Weijters, 2004; van der Aalst et al., 2004). Van der Aalst and
his colleagues suggest the use of workﬂow mining in the investigation of organizationally
relevant research questions in addition to conformance and pattern extraction (van der
Aalst et al., 2003; van der Aalst & Weijters, 2004) and this dissertation answers their call.

The results of this dissertation point to the management of variety through
contingently expressed action as a method of protecting the core. While there has been a
widespread understanding of the systemic properties of buffering, there have not been
any examples of an empirical test that can be applied directly to a business process. The
sequential variety analysis supports sequential variety as an expression of contingently
expressed actions. Queuing models and other management science techniques represent
one method analyzing and designing business processes for buffering. This dissertation

demonstrates a different view of incorporating speciﬁc actions as the central feature of

97

business process management and technology as a core feature of digitally enabled
routines.
IT Impact

Studying the immediate antecedents and consequences of technology situated in a
single business process allows the isolation of speciﬁc use effects. This explores the
moderation effect of the use construct, and complements the ﬁrm-level, organizational,
and behavioral impact literatures. The impact of IT can move beyond a study of
investment (Brynjolfsson & Hitt, 1995) or adoption (Venkatesh, Morris, Davis, & Davis,
2003), into research questions that relate to exactly how IT drives value. This can occur
by studying the enablement and constraint of organizational actions as a primary impact
of information technology.

The results of this dissertation point to automation as a discriminator among
patterns of action. From the Markov results, heterogeneity of the process was found
among groups of sequences that varied with the amount of automation expressed in the
performance. The sequential variety analysis conﬁrmed this result and revealed
automation as a strong player in the buffering of environmental variety, beyond its impact
on the outcomes of the process. This was a surprising result, and points to a new ﬁnding
from the substitution of IT use for labor—buffering. We have known for some time that
IT is a substitute for other forms of input such as labor and ordinary capital (Dewan &
Min, 1997), but less was known about how it can substitute.

What is interesting here is that automational technologies are typically seen as a
substitute for human labor, when decision-making needs can be anticipated and relevant

stimuli identiﬁed. From the cybernetic world-view, a system exhibits a variety of

98

responses that is equal to the variety in the inputs. When tasks are automated, the
decision-making as to which response is appropriate is designed into the system, such
that the match between stimuli and response is predetermined, and that the set of stimuli
and responses can be developed before the set of rules are ingrained into the system. In
most cases the set of automated responses to stimuli is much smaller than would be
possible if decisions were guided intelligently at the moment of execution. What I am
proposing from the ﬁndings in this dissertation is that automational technologies, despite
the reduction of ﬂexibility as compared to a manual system, can still act to buffer a
process from variety in inputs. This can occur through several ways.

First, the tasks that are automated can be general purpose, where a given response
can respond appropriately to many different kinds of stimuli. This was observed in the
mail sorting example, where machines to sort the mail using OCR did not discriminate
between Helvetica, Arial, or Times Roman fonts, but the OCR applied equally to each.
Second, given the ability of the system to correctly discriminate between types of stimuli,
automation may allow a more consistent application of rules and lead to a more easily
manageable organizational system. These two features of automational aspects of an
organizational information system show how automation can be used to buffer and
protect the technical core of an organization. Taken together with the earlier observation
of the ability within sections of an organizational routine to act as a buffer, the impact of
automational IT can also be seen to act as a substitute buffer to process-based buffering.

Future research could explore other aspects of information technology impact such as
how the features of a given system support or improve inforrnating up, down or sideways.

It may be possible to examine the actions taking place within the current dataset to

99

discern a typology of different actions that could be theoretically interesting. In general
terms, I see actions that are in the following categories: information processing, decision-
making, and coordination. There may be other, more theoretically driven categorization
schemes.

Methodological Implications

The methods used in this dissertation canbe used to understand organizational
behavior phenomena in many other areas. I use them as an attempt to synthesize a
middle path between qualitative and quantitative research of organizations. While
workﬂow mining doesn’t provide the richness and depth of understanding of causality in
organizational processes, it does allow the development of statistical conclusions that
focus on what really happens rather than mathematical relationships between numerical
proxies for actions. One way to describe this approach might be the “variance of
processes” or “process-oriented variance analysis”.

In most cases, research is either process—based or variance-based (Markus &
Robey, 1988). There have been some proponents of studying the properties of processes
(Monge, 1990) and also those who suggest different ways to study processes themselves
(Langley, 1999; Sabherwal & Robey, 1995; van de Ven, Angle, & Poole, 1989; van de
Ven & Poole, 1990). It appears that these are competing perspectives, studying similar
phenomena from different directions. I view them as complementary and not mutually
exclusive within the same research plan. By associating different patterns of action
(representations of the process) with the variance of inputs and outputs, I am integrating
the quantitative strategy outlined by Langley (1999, p. 697) with the evaluation of

properties of processes over time proposed by Monge (1990).

100

Practical Implications

In this section, I discuss how the results of this dissertation can help the practice
of management, information systems design, and information systems use. I also explore
implications for the education of managers and IS professionals.
Managerial Impacts

Managers, especially those of boundary business processes, must understand the
complex interaction of environment, process, and outcomes. Their ability to manage
uncertainty is challenged by the need for stability and control over the process. As they
seek creative ways to simultaneously improve quality and efﬁciency, a focus on the
speciﬁc causes of expressed patterns of action represents a different way to look at
managing processes than is currently taught in business schools today.

Focused on the black-box approach to managing processes, programs such as TQM
and six-sigma measure and statistically measure the outcomes from a process, with an
emphasis on control. Process standards such as ISO 9000 treat processes as ﬁxed, and
deviation from documentation is considered a sign of poor process execution. As
business schools (and resulting managers) follow these programs, they forgo the
opportunity to dynamically monitor and adjust the processes themselves both in advance
and at the time of execution. The worldview described in this dissertation represents an
opportunity for managers to shift their thinking to new paradigms of managing processes.
This has an impact beyond traditional management perspectives and can be applied to
supply chain, remanufacturing, and service provision among other areas of managerial

practice.

10]

Without an understanding of the drivers and consequences of speciﬁc patterns being
expressed within processes, managers must continue to use the tools of statistical process
control such as TQM and six-sigma to achieve some measure of regulation. The
discovery of these drivers and outcomes within a business process represent a new mode
of management that was previously unavailable to be implemented. In addition, in areas
where statistical process control regimes are less useful such as service provision,
managing the sequence and choice of actions within the process holds special promise to
give new tools to the practice of management.

Also, this focus on the expression of speciﬁc actions increases the ability of managers
to discover and learn from their processes. In a world of information overload,
discerning patterns and their antecedents and consequence allows the manager to better
make sense of the organizational system. Organizational and individual learning can be
bolstered by the greater understanding and retention of process-based knowledge that is
typically tacit or hidden in the spatially and temporally difﬁise business processes that are
typically executed in modern organizations.

Finally, managers can now more fully realize the beneﬁts of continuous auditing and
assurance (Vasarhelyi & Halper, 1989). This requires at a minimum a good set of IT
controls, some form of real-time or near real-time monitoring capability, and the ability
of timely release of reports detailing the impact and performance of an assurance and
control regime (ISACA Standards Board, 2002). Much of this information becomes
available to managers through the use of workﬂow mining, related technologies as well

as managerial intervention (Alles, Brennan, Kogan, & Vasarhelyi, 2006).

102

Impacts on Information System Design

Designers of information systems need to better understand the consequences of their
decisions. Given the tradeoffs between ﬂexibility and control in the designed interactions
of users, and the natural tendency of users to innovate and utilize tools for unforeseen
proposes, the design of information systems is difﬁcult. The perspective in this
dissertation, namely that of studying the actual paths of user behavior within the system,
allows IS designers to develop more ﬂexible use-cases, and achieve synergy between
control and elasticity of the IS-enabled process.

There has been some research related to the design of web sites involving the
collection and interpretation of ‘clickstream’ data mined from logs of web servers
(Kosala & Blockeel, 2000). Typically, researchers have focused on discovering patterns
of user interaction to categorize users (Buchner, Baumgarten, Anand, Mulvenna, &
Hughes, 1999; Cooley, 2000; Cooley, Mobasher, & Srivastava, 1999), and improve the
user experience (El-Ramly, Stroulia, & Sorenson, 2002). In this literature, there has been
less interest in the management of the web usage process, but rather in the practical
aspects of design and development of usable systems. The research presented in this
dissertation represents a complimentary view to the models of web usage, as a process
that the user and their characteristics become inputs, and the outcomes can be measured
in terms of success, failure, effort expended or satisfaction.

This dissertation explores automation as the core feature within workﬂow systems.
Given that there were differential effects of technology use on the process, information
system designers may need to look closer to sections or subprocesses within a business

process for appropriate system designs. For example, processes should have automation

103

and control where appropriate, yet have ﬂexibility where it is needed within sections of

the process. There may be reasons to implement controls in the system to reﬂect physical

constraints, business rules, institutional and social norms, but these should only impact
the business process during speciﬁc times, places or within action sequences that are
expressed.

In general, this research supports the following principles of organizational and
information system design:

0 Automate where information needs are sufﬁcient to determine the appropriate actions
without human decision-making (Ashby, 195 8, 1968; Cyert & March, 1963).

0 Make information available to support human decision-making and conserve the
scarce resource of attention (Simon, 1973).

0 Coordinate between individuals when resources (especially knowledge or
information) are interdependent (Crowston, 1997; Grant, 1996; Malone et al., 1999).
Workﬂow and other organizational technologies can be designed for monitoring and

control over a business process. If this perspective is followed too far, the reduction of

ﬂexibility may cost more than the beneﬁts that are enabled through the use of the system.

This understanding should be core to the design of information systems, especially ones

with organization-wide effects such as ERP and workﬂow systems. This is not a new

insight (Merton, 1936), and there are some scholars that see ERP as the new ‘iron cage’

(Gosain, 2004), but these ideas have not become widespread in ISD education.

Impact on Information System Users

Similarly, users of information systems must understand what they give up in

terms of ﬂexibility when they adopt a particular information systems solution. While

104

some vendors are starting to add exception handling and more ﬂexibility to workﬂow, the
organizational costs of too much control over a process are less understood. There is a
suggestion here that standards such as ISO 9000 may have hidden costs beyond
documentation and certiﬁcation through the restriction of ﬂexibility in organizational
action.

Limitations

The main limitation to this study is the difﬁculty in interpreting and integrating
the analyses. The multiple analyses were qualitatively consistent, and yet highlighted
different perspectives of the performance of routines. While using contingency table tests
to compare the transition matrices is an effective way to explore the group membership of
various sets of sequences, the results of these tests give binary responses. There is no
measure of how different the sequences are from each other. From the scaled string
distance approach, there is a much better measure of sequence distance, but the extracted
dimensions are difﬁcult to interpret. There is no measure of what the differences mean.
Additional analyses or extensions of these methods that can integrate perspectives and
give a more complete picture of the antecedents and consequences of sequential variety
must be performed across a variety of contexts.

Another limitation relates to the source and characteristics of the data. There may
be actions that are part of the invoicing entry and approval subroutines that occur outside
the purview of the workﬂow system. As with any observation of organizational actions,
research design choices, politics, cognitive limits to inspection, and many other factors
determine what is available for analysis by researchers. This does not invalidate the

ﬁndings of any study, but may limit the types of inferences and conclusions that are

105

possible to be made from such analysis. I believe that additional data and analysis would
complement my ﬁndings, not contradict them.

I make no claims to understanding the intentions and feelings of participants, the
socio-political structure of the organization, or many other aspects of organizational
routines that have been theorized or shown to exist. I can make no use of the ostensive
aspects of organizational routines at my research site—but this does not affect my ability
to answer the research questions. This study focuses on the actual actions as recorded by
a workﬂow system—focusing on the technologically feasible and practically available
data for large-scale statistical analysis. I recognize that much of the rich detail that is the
hallmark of many studies of organizational routines such as those by Barley (1986; 1990)
and Pentland (1992; I999; Pentland & Rueter, 1994) is not present, but this research
represents a complementary rather than contradictory perspective.

Future Research

Given the theoretic, methodological, and practical impacts of this dissertation,
there are several natural paths to future research. Some of these could be completed by
utilizing the same or similar data, but there are implications beyond organizational
theory, business process management, and IT impact. The general form of the Input-
Process-Outcome model is easily applied to a number of areas. I realize now that I have
developed this worldview and applied it in previous research searching for disturbances
in the software development routines in open source software. It has extensions beyond
the management of processes, and could be used to study organizational behavior,

psychology, accounting (auditing), supply chain, even non—business ﬁelds like biology.

106

Any ﬁeld that takes a systemic view, and holds some process at the center of inquiry can
utilize this basic model.

I would be interested to extend the methods used in this dissertation to analyze
different business processes. I should be able to achieve the best connection between
inputs, processes and outcomes in some speciﬁc, targeted contexts within organizations.
Reverse supply chain analysis is the study of how businesses accept returns from
customers, and has received recent scholarly attention. I could investigate how the
characteristics of the customer and the product would drive how the business process
would handle each returned item.

Similarly, a remanufacturing business process would exhibit a variety of actions
depending on the qualities and characteristics of the input. Finally, the technical support
function may also change its process in response to the joint characteristics of the
problem, attitude of the customer, and training of the technician. To the extent that these
are digitally enabled through a technology that allows automatic logging and data
collection, they may be most appropriate to study buffering and the impact of technology
and add to my ﬁndings.

Studies could be conducted to better understand the connection of speciﬁc inputs
to speciﬁc patterns of action. One method that has been suggested is related to the
Markov approach, but utilizes order statistics (David & Nagaraja, 2004; Rényi, 1953).
This approach would model the most probable path through the actions to obtain a
‘primal’ routine or set of routines. The most probable initial transition becomes the start
of the chain, then the most probable transition given that particular starting point. In this

manner, a path through the transitions is drawn, based on the probabilities of each one.

107

Then, the inputs associated with those performances that match exactly can be studied,
and the distance of other performances can be computed from the primal routine. This
extraction of a primal or modal sequence can be used to increase the power of the scaled
sequential variety approach in visualizing the distribution of the routine around this
centroid. Also, the use of order statistics integrated with string distance and
multidimensional scaling may allow the development of a method with the discriminating
power of the Markov approach and the visualization and interpretation potential of the
sequential variety approach.

There is more information within the workﬂow log related to the inputs that was
not utilized in this dissertation. I have used the vendors simply as a vehicle of
experience, but the nature or line of business for these vendors could be discerned and
associated with the processes. Also, I have information about the detail lines on the
invoice, such as the number and type of goods that were ordered. These, like the vendor
name, are in Norwegian, and would necessitate the use of a native speaker to translate
them, and they would then need to be categorized and coded. The use of semantic
models might be able to be used if translation is not an option, not focusing on the
meaning, but the connection of symbols to processes.

Another extension using the same data (and similar methods) would be to explore
the impact and interaction of the action network with the social network that completed
the work. In some ways, this would be just be adding a mode of connection between
social actors for every action. Both methods used in this dissertation could be applied to
this data. Three Markov matrices could be extracted: the action—action transitions, social-

social transitions and the action—social (role) transitions. The string—distance and MDS

108

approach could be applied to the sequences of people, and also to sequences of people-
actions. Candidate research questions are easy to visualize. What has a stronger effect
on variation within the process? Is variation driven by changing actions, changing
people, or changing roles over time or various combinations thereof?

Finally, learning effects can be examined. Since I have data from the initial
installation of the software, I could examine how patterns of action change over time in
relation to efﬁciency. This examination of the learning curve could consider the
relationship between sequential variety and efﬁciency over time. The intuition is that
with experience, people tend to try things, and learn what not to do, leading to a drop in
sequential variety with experience. Interestingly, sequential variety seems to be
increasing over time, meaning that the repertoire of organizational routines may be
increasing with experience. This highlights the importance of understanding the impact
of sequential variety on learning and learning models both in theory and practice.

Another extension of this work into organizational learning allows a much more
micro focus on how individual ‘leamings’ are combined to form the traditional
logarithmic form of the learning curve. Because learning occurs at several levels of
analysis, from individual, to between individual to group and organization, it would be
interesting to map out what the experience curves are at each level, and how they interact
between levels to allow the organization to learn from its environment and prosper. Also,
economies of scope in learning can be explored, moving beyond the experience curve
(economies of scale) and examine the transference and retention of different types of

knowledge within the organization.

109

I can envision an application of March’s (1991) learning model to the data.
Docking his environment-organization-outcome simulation model to the data in this
dissertation could be attempted. This would represent a contingent-ﬁt approach to
learning, and would connect to related recent research such as that by (Miller, Zhao, &
Calantone, 2006). I can also envision the analysis of data obtained through experiments
similar to Cohen and Bacdayan (1994).

The methods I have utilized are not limited to creative uses with the current set of
data, as they can be applied to many various areas. Given the rise of organization-wide
information systems such as ERP, this may make much more process data available. If
the correct site could be found, I would like to apply these methods to the entire
organizational system, as the different business processes interact. This type of analysis
would be complex, and probably beyond the ability of personal computing technology to

implement, but it would allow the investigation of many interesting research questions.

110

Appendix 1: String Matching Distance

adapted from (Pentland, 2003b; Pentland et al., 2009d)

This measure is deﬁned as the average distance between each pair of observed
sequences. A standard technique for measuring the distance between two sequences that
may vary in length is called optimal string matching (Sankoff & Kruskal, 1983; Abbott &
Hrycak, 1990; Gribskov & Devereux 1992; Sabherwal & Robey, 1993). String matching
has been used extensively in molecular biology to compare protein sequences, such as
DNA. Abbott (1995) provides a review of applications in the social sciences.

The distance between two strings can be computed by counting up the number of
operations needed to transform one string into the other. The operations include
substituting one element for another, or inserting or deleting elements. Each operation has
a cost, and the distance between the strings is the total cost. In this paper, all of these
costs were set equal to one, but could be adjusted to account for similarity of actions, as
discussed below. The technique is called ‘optimal’ string matching because it ﬁnds the
lowest cost set of operations to accomplish the transformation, thus insuring that the
computed distances are unique and well-behaved (e.g., they obey the triangle inequality:
d(A,B) + d(B,C) >= d(A,C)). Distances computed in this way are called Levenshtein
distances (Sankoff & Kruskal, 1983).

Observations can be represented in an N x M array of events, where each row

corresponds to one iteration of the process, as seen in equation 3:

111

e e e e e
Observed sequences = S = 21 22 23 24 2M (3)

 

 

where N = the number of observed sequences and M = number of events in the longest
sequence. Since the length of the observed sequences may vary, this array can have a
‘ragged’ edge (signiﬁed in equation 3 by ‘.’). This representation includes each
observation in its entirety.

To estimate the variation in a set of sequences like those in Equation 3, we can
compute the distance between each sequence and every other sequence. If the sequences
were all identical, then the distances would all be equal to zero. If the sequences
diverged from each other in a single element (e.g., ‘aaa’, ‘aba’), then the distances would
all be equal to one. As the differences between the sequences become more pronounced,
the distances increase. Thus, a convenient and meaningful measure of variety in a set of
sequences is simply the average of distances between all pairs of observations, shown in

Equation 4:

. _ 1 N N . . 4
Average distance —W Z Zd(z,j) ( )

i: 1 j =i
where N equals the number of observed sequences and d(i,j) equals the Levenshtein
distance between each sequence. The factor n(n-1)/2 is simply the number of pairs in a
set of n sequences. Alternatively, the entire matrix of relative distances between

sequences can be used as in this dissertation.

112

IS

Markov Analys'

Appendix 2

 

emu—E than». ..8 5.5a:— Eﬁunaﬁ. "NM ~35.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

o o N m o o o o o o o o o .. m o wN
v vm mm N o o o o o o o o o V v v NN
v _. mi. N o mN N o o o o o o NNN mNm NVN mN
m mm VNV owow o N o o o o o o o NN 2.: 3 MN
0 o o o o m _.N mm 0 v9. om m o o o_. _. o v 0N
w v. mVN No_. 0.. 0 Nov F a F F _. o o _. Nw NVN N —.
o o o _. o m o m m o w P 5N N mm wmv c _.
o o o o o m o o o o o o o o o N m
o o o o o m m _. o mmv me o mN N? o o w
o o o o N m m r o o 3% o 0.. mm o v N
o o o o N 2. w— o o o o F mow NNm o w c
o o o 0 mm o o o FNN we wN o wON 9. o o m
o o o m 3» m P m m v o vm F m m mm o Nmm o o V
o F wa mNN om? NB 0 o m _. o 9. CNN 0 mm o n
o o o o o o w _. o o o o o o 0 ON N
o r m9. MN 0 mmv F F we N o o o o 5 o ON.“~ —.
wN NN mN MN ON N_. 2. m a N o m V n N _. :o=u<

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

113

$89:— 05 he ..oEOlomasa tbs—.m— umm ...—nah.

 

212.3% 624m...” Now a

NNN-mvmmN.NHAooomH% .m.N~mNH NNK

186$ 84.4.2631 Net a

 

 

as u 444 82 u N4 86: u 46

NE: u .e Nass u 46 N583 u 6
NANIUVwIEUHKB NAmlUV~I=Q“.\B NAdIUVﬁltunsKw
624% n we. 25 u N». 4.284 u N».

382532182” a».

:98”? u a».

Amﬁmmdv 64$: mommi— u NR.

Nansen: n NR

822: 644.4: 8»: n a».

#4382 u Na

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

momma 1 252 n 2 mg n 8mg”
NS: - 4 n 882 - NSI u $3.9 -82; u $43 - Eden
524$ shield maker”. geisha
m o 89.; n at?” $43”
4 n 6 2 u 4.4 83125. n 3:4 -28.? Soon -3446”
ESSA. Q n KN Easier l 55%th n NQ SEMEQ I EEMSQ H mer EEMEWN l Echwotgvﬁ n wt
56 483 - ~83; u 483 - $8.2 u men n - 48w 2 u

4
7648338 42 u m 582: 76483402 we" at 582: 78487 832 men Em 3.388: 7643- 843 3 n 3 t

. N _ ~u .~ a BEA». . N _ gum 2 5333 ~ N N Hm e SEMI .. N ._ _ H .~ = Euxuetuzw
.2 MOT: N llzm2nN E f: WOT: N vll=m2u . I T: weft N Vll=w2H . E = wo_ = N Vllemﬁu Q

6 _ I on” _ . 684 _ . 823 a

5:2 848“ 85m 32m: 32: 65m 28mm 648. 55m 8.682: 649: £5
2 a a a: s _ N 83 . . m 384v . . 4 Game
.2w2 é a Beam Ems .= = SEED .emE .2 2 SEE; éwﬁ .= n anemone—DO

 

 

 

 

114

$39:— 2: he continua—.— E>Eaa< 34m 933.

 

OL—omﬂmv .maoea._nw_an NN Va

cuelNNEe .3323: u N55

31:8284 £8382" N55

 

SN"
NaE n 444

NetONVTzNuse
NeteTeeue

ONNN u
N838 n 444
N: 1 8:1 NON n 46

thexteons

as: n
Nassm n 444

NATONVTMON "\4
Ne t 8:6 u \4

 

85.22:“ Nun

ASSENVQKMWQSNS n N

o_¢ow.MNwoN H NR
ANNEomdew—Nwmvmwmm; H N

285.83.“ Na
Ewoo_.wa_$a$wm._ u Na

 

 

 

 

 

 

 

 

 

 

5232 1 NR N023: u N 5:33 u N94

N62 88 u 28er n 84525 n 24:48.8 1

NENRNaNSNS n 4344: - NweNmmN n were: -3418 n @4443 - 3:33 .1.

624$ 44-64 Marsala --merw

o 4244: u :28; n 8243 u

£234 u ewe n 4.4 888384353 n 8438.“ -3483 n 3483 - 5343 n

\eaEAme n KN Baikal I SEMEQ H NQ Seaman: I SENSE H MN.N EEMEW I Seawetezwﬁ H 31*

NweNmmN M 64388 1 348m. on 5842i

34;. _ ERNNﬁ 8343:; 82.2 n 4aoo8.w-%m8. 4T Namﬂnangwin
ONmmm EN 2 EN 482 9&8 . 4

34241118232“ 4% A4. 4$Scl|l 12332” .6 seagull -8382“ St :. o£4NNV| .8938.” :4: Q
~| I N: «A — H ~ M — H .~ M — H .~ M U

..er2.L = N IIEMEH NEE at A:=Nwo_ .= N Vlltwo_u Emu it A:=Nwo_ .= N VII=w2H SE SE. :NME .= N vll=w2n SE 2.3: E

t 84 _ . Sow _ . 882 _ .

$6834 _ Nemwm _ Eam 448$; _ ONE. _ cam Mamba _ weeNm _ Saw 3248 _ eroN _ cam

. . a GNU _ t _ 83V 2 _ s 888 t s a 888:

Ewe—é .= Beam Ewe—.2 é SEED awe—é .2 .52th Ewe—é .2 Emuwotmso

 

 

 

 

 

 

 

115

02:... 3525a 82.... .655

..8 1322?: H NEJME H ex: wt .5.— 3E 09.5w ”3 PEuE 5.. 1322A E H .NNJMB H om: NI 3.. 8E 85w "mu «2&3
£05.25 20 .mnEsz 22.53 be .3632
m w v m N _. o w m v m N —. o
. , .1 1‘6 IIII . . . . :I III? ..1 .o
I. .md ,md

IBH'H
WH‘H

.m.m

 

116

228883 no woman 2038::me .353 Now 8593 @280 ”mm 238.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

4°53 4&2? 408.; B 4 .Né 9 F 4N4 EN 4
402.9 $8.2 $99 8 o 8.9 4 F 2N m8 m .9053
408.? 02.8.8 403.? F. N 8.4? 4 F 43 0% N
4&3 .4086 4044.4 m4 N No. : m o 3o 88 P
408.08 .4088 408$ 2 m 3.2 S o NNN N8 4
.4083 408.04 408.3 4. S 4:: o 4 No.4 54, N
4004.3 408.8 408.8 ON a 2.2 m N m4.N N8 N Em.
409.2 .4086 .4089 E a mi: N o 84 42 P
.82 2:2 .24 o§< m>< x22 2:: Swan”... .82 £5 owuqumﬁx z 9.20 39?.

 

 

7
l
l

Entry Phase p(LR = 1140.463, df =255) = 0
DP = (T —1)(s)'(s-1) (according to G+R)
T = 2 (segments)
3 = 16 (number of codes)
r = 1 (order of sequence)
DF = 240
DF = 255 (according to loglin in R)
The data are judged non-stationary for the entry phase.

Approval Phase p(LR = 5247.366, df =399) = 0
DP = (T —l)(s)'(s -1) (according to G+R)
T = 2 (segments)
5 = 20 (number of codes)
1' = 1 (order of sequence)
DF = 380
DF = 399 (according to loglin in R)

Table 36: Omnibus test of stationarity results

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Phase Test LR df p-value
Overall 3227.09 765 0.
1 v 2 214.66 255 0.969
1 v 3 1201.03 255 0.
Entry 1 v 4 1921.60 255 0.
2 v 3 1060.50 255 O.
2 v 4 1770.55 255 0.
3 v 4 228.23 255 0.885
Overall 6990.04 1197 0.
1 v 2 276.46 399 1
1 v 3 2568.14 399 0
Approval 1 v 4 3946.30 399 0.
2 v 3 2617.73 399 0.
2 v 4 4036.11 399 0
3 v 4 624.95 399 0

 

 

 

 

 

 

 

Table 37 : Subsequent tests of homogeneity

118

Appendix 3: String Distance Analysis

J 1‘3 '1 J
= mm x--,x.
i=1 ’1 ’k

=A+B—2j
jk A+B—J

Equation 5: Jaccard distance calculation

 

 

 

 

 

 

 

 

 

   

 

 

Dim Entry Approval
§
Non-metric fit, R2= 0.946 . ~ Non-metric ﬂt, R2= 0.95
Linear ﬁt, R2 = 0.909 ' 0 Linear fit, R2 = 0.911
S a
8 s
.5. .. o
.9 E 8 ‘
o o
1 5 ~§ 8
is to N ‘
é. 6
O O o
e _
o 26 4'0 6'0 6 160 260 360 460 560
Observed Dissimilarity Observed Dissimilarity

 

 

 

Figure 17: Entry and approval Sheppard Plots, 1 through 5 dimensions, continued
next two pages

119

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Dim Entry Approval
§
Non-metric fit, R2= 0.982 Non-metric fit, R2= 0.979
8 _ Linear fit, R2 = 0.95 § Llnear fit, R2 = 0.95
W 0
8 8
'5 '5
N
.5 §
<5 5 o
e ‘-
o _
0 20 40 60 0 100 200 300 400 500
Observed Dissimilarity Observed Dissimilarity
§
2 “ Non-metric fit, R2= 0.99 Non-metric fit, R2- 0.99
0 Linear fit, R2 = 0.965 8 Llnsar fit. R2 = 0.973
‘0 7 _
a s 4
s 8‘ s g
'3 3 — 2% "’
3 .5 .6
a 8 - a .3, -
N o 8 -
E’ A .—
o 4 0 ~ .
0 20 40 60 0 100 200 300 400 500

Observed Dissimilarity

 

 

Observed Dissimilarity

 

Figure 17 continued

120

 

 

Dim Entry Approval

 

 

 

O
O
Non-metric fit, R2= 0.995 m Non-metric fit, R2= 0.996
Linear fit. R2 = 0.98 0 Linear fit. R2 = 0.986
O
a w 4
2 .9
o o
4 .5 .5
E E-
9 '12
O O

   

 

 

 

 

 

 

o 2T0 4'0 6'0 6 160 260 360 460 500
Observed Dissimilarity Observed Dissimilarity

 

 

 

 

Non-metric ﬁt, R2= 0.997 Non-metrlc ﬁt, R2= 0.998
Linear fit, R2 = 0.986 Linear flt. R2 = 0.994

or
Ordination Distance
Ordination Distance

 

 

 

 

   

 

 

o 20 40 so 6 160 260 360 460 500
Observed Dissimilarity Observed Dissimilarity

 

 

 

Figure 17 continued

121

 

Automation Pct VendorExperience

 

 

 

 

 

v2
0

 

 

 

 

 

to ~ ° LogAngmt

lllllll

  

TotalVendorCount. I

O

 

 

430

O

 

so -40 so do -10
v1

 

0300 O OO ommomomF—d

 

 

 

 

Figure 18: V1 and V2 of 3-d projection, entry phase showing lines representing the
regression coefficients of variables of interest

122

 

 

 

 

 

 

Automation Pct
L 0 9 Avg Amt TotalVendorCount VendorExperience
;. \x
91 \._.
\
l W « 2:. *
i \\‘ ° 0 ‘5...
+2 . \, 9.; 7.,
O .1 \:\. 2’ :2 r .,
1 a“? ‘ - 4‘-
I‘\\ . :r L
. ;;.~ .
«2- ‘ .... .\
~60 -5o -40 -30 -20 -i0 6 10

V1
a cow 0 o e nan-Dome-

Figure 19: V1 and V3 of 3-d projection, entry phase showing lines representing the
regression coefﬁcients of variables of interest

 

 

 

 

VendorExperience
LogAngmt Automation Pct
\,
o . .. 03‘
‘— \\ \
\\ \\
\\ \
\ x \ K g 1:.
1 113—1 4“” Ave" ‘i “c °
’ ‘ ‘73)"? 2‘“? l 2 ’ ., .~
\ : €~ .
k’) “X \\ f 'A' r
I > a.\ a; 3,
I O 4 \ 2 .
“P- ; \ '2 TotaiVendor
_ \ \ Count
1 I l ‘\ i \‘V l i
-15 -1O -5 5

 

Figure 20: V2 and V3 of 3-d projection, entry phase showing lines representing the
regression coefﬁcients of variables of interest

123

VendorExperience

 

 
  

TotalVendorCount
o _ ' ‘7
v :4 0
C) 4 o
N
O " Automation Pct

a .
- ._ . . .,
...: ,
'2. r. a, ‘ ,
‘.-, o y 4'
at» ‘ ‘-
o .. o .
5‘ '» ‘

 

-20

-40

 

 

 

-4bo -3bo -2bo -160 6
v1
Figure 21: V1 and V2 of 3-d projection, approval phase showing lines representing
the regression coefﬁcients of variables of interest

 

VendorExperience
TotalVendorCount
Automation Pct LogAngmt
\ ,
o _
0')
o -
N
<3 _
l .
l > o - 4
l
l 9-
O
0,1 _

 

 

 

 

-460 -360 -zbo -1bo
v1
Figure 22: V2 and V3 of 3-d projection, approval phase showing lines representing
the regression coefﬁcients of variables of interest

124

 

Automation Pct LogAngmt

 

 

 

 

 

 

o “\ ° 1’
2 /
(‘0 7 2 .2, " U
E (.2. \\ . I', C .
.‘I ‘ Q x r. 1" ._.‘ C A
v ' ﬁ 2
o 4 O o \9 \ :0 ‘ 6
L . -?' ‘4.
N -i e '9) Q \E c "a
'3 CHI» 5?? i r.
4 3 ' "I ~‘
6 {I}\
. Ct
52 2 ~ Me
fa J - :2 .3 3° . "I ./
. (I Q s . "I'; ‘7
m VendorExperience - 4, g ,,
> I- 9‘:
0 ‘ TotalVendorCount 4;, ., ..
P—4 ‘g. 0;, :0 1'3. . l. .4/ {’l "3 ) “a“,
3 4‘ '_. -'. z ’7': "2, 2.4-6" ' t “ -' s‘ Q);
t 52...). .. ir" ,. / * _’-‘ .
‘O_ .l ”a '9'43’/2“"¢ m ... .../0 '49”; :r
I O r. Liar-"din -:'.- I!“ . 'J :7 &' I
{’2' '/‘ 4::‘8 ’4 :46}, J O';I’_E¢ 1;," 7 a"? 06“
2 w," .. of M
O 6 ‘7 ? .'- 0
(\I‘ c/ , r. e
/ .. 9
1” ‘
/,
O ,"
0") r

 

 

 

 

r l 1
L I
Figure 23: V1 and V2 of 3-d projection, approval phase showing lines representing
the regression coefﬁcients of variables of interest

125

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Covariance Autgrgatlon 11:19:33 SC_to_AA TotalVendorCount
AutomationPCT 0.054 16.052 45.418 87.547
LogAngmount 16.052 498009.694 467128.465 469830.676
SC_to_AA 45.418 467128.465 6217371.840 460210.856
TotalVendorCount 87.547 469830.676 460210.856 927401.005
V1 0294 2214.461 3025.708 1717.014
v2 0.547 91.851 -40.992 1477.257
V3 0656 195.146 -307.234 -503.161
VendorExperience 59.881 17399.337 22468.069 377922.924
v1 v2 v3 VendorExperience
AutomationPCT -0.294 0.547 —0.656 59.881
LogAngmount 2214.461 91.851 195.146 17399.337
SC_to_AA 3025.708 -40.992 -307.234 22468.069
TotalVendorCount 1717.014 1477.257 -503.161 377922.924
v1 46.976 3.636 5.046 -167.441
v2 3.636 46.097 0.485 1099.693
v3 5.046 0.485 18.475 -627.847
VendorExperience -167.441 1099.693 -627.847 305797.254

 

 

 

 

 

Table 38: Entry covariance matrix for input, process, outcome, and automation

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

variables

Correlation AutomationPCT LogAngmount SC_to_AA TotalVendorCount
AutomationPCT 1.000 0.098 0.078 0.391
LogAngmount 0.098 1.000 0.265 0.691
SC_to_AA 0.078 0.265 1.000 0.192
TotalVendorCount 0. 391 0.691 0.192 1.000
v1 -0. 185 0.458 0.177 0.260
v2 0.346 0.019 -0.002 0.226
V3 0657 0.064 -0.029 -0.122
VendorExperience 0.466 0.045 0.016 0.710

v1 v2 v3 VendorExperience
AutomationPCT -0.185 0.346 -0.657 0.466
LogAngmount 0.458 0.019 0.064 0.045
SC_to_AA 0.177 -0.002 -0.029 0.016
TotalVendorCount 0.260 0.226 -0.122 0.710
v1 1.000 0.078 0.171 -0.044
v2 0.078 1.000 0.017 0.293
v3 0.171 0.017 1.000 -0.264
VendorExperience -0.044 0.293 -0.264 1 .000

 

 

 

 

 

Table 39: Entry correlation matrix for input, process, outcome, and automation

variables

126

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Covariance Autgnciation £93113 SC_to_AA TotalVendorCount
AutomationPCT 0.022 -0.031 81.303 51.141
LogAngmount -0.031 0.586 -483.426 -106.754
SC_to_AA 81.303 483.426 10137830000 184550.039
TotalVendorCount 51.141 -106.754 184550.000 557791.952
V1 0807 0.305 16120.020 -3054.643
v2 -0.068 0.976 9235.476 -3681.087
v3 0.201 -1.018 7180.519 -1269.223
VendorExperience 38.412 -78.109 139958.200 467752.043

v1 v2 v3 VendorExperience
AutomationPCT -0.807 -0.068 0.201 38.412
LogAngmount 0.305 0.976 -1.018 -78.109
SC_to_AA 16120.018 9235.476 7180.519 139958.226
TotalVendorCount -3054.643 -3681 .087 -1269.223 467752.043
v1 771.695 59.252 -3.384 -2170.481
v2 59.252 313.865 41.829 —3204.050
v3 -3.384 41.829 163.151 4020.560
VendorExperience -2170.481 -3204.050 -1020.560 401319.356

 

 

 

 

 

 

 

Table 40: Approval covariance matrix for input, process, outcome, and automation
variables

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Correlation AutomationPCT LogAngmount SC_to_AA TotalVendorCount
AutomationPCT 1.000 -0.276 0.174 0.466
LogAngmount -0.276 1.000 -0. 198 -0.187
SC_to_AA 0.1 74 -0.198 1 .000 0.078
TotalVendorCount 0.466 -0.187 0. 078 1.000
V1 0198 0.014 0.182 0147
v2 -0.026 0.072 0.164 -0.278
v3 0.107 -0.104 0.177 -0.133
VendorExperience 0.413 -0.161 0.069 0.989

v1 v2 v3 VendorExperience
AutomationPCT -0.198 -0.026 0.107 0.413
LogAngmount 0.014 0.072 -0.104 -0.161
SC_to_AA 0.182 0.164 0.177 0.069
TotalVendorCount -0. 147 -0.278 -0.133 0.989
v1 1.000 0.120 -0.010 -0.123
v2 0.120 1.000 0.185 0285
V3 0010 0.185 1.000 —0.126
VendorExperience -0.123 -0.285 -0.126 1.000

 

 

 

 

 

Table 41: Approval correlation matrix for input, process, outcome, and automation

variables

127

 

References

Abbott, A. (1983). Sequences of social events: Concepts and methods for the analysis of
order in social processes. Historical Methods, 16(4), 129.

Abbott, A. (1990a). Conceptions of time and events in social science methods: Causal
and narrative approaches. Historical Methods, 23(4), 140.

Abbott, A. (1990b). A primer on sequence methods. Organization Science, 1(4), 375-392.

Abbott, A. (1995). Sequence analysis: New methods for old ideas. Annual Review of
Sociology, 21, 93-113.

Abbott, A., & Hrycak, A. (1990). Measuring resemblance in sequence data: An optimal
matching analysis of musicians' careers. The American Journal of Sociology,
96(1),144-185.

Abbott, A., & Tsay, A. (2000). Sequence analysis and optimal matching methods in
sociology: Review and prospect. Sociological Methods Research, 29(1), 3-33.

Agrawal, R., Gunopulos, D., & Leymann, F. (1998). Mining process models from
workﬂow logs: Springer.

Agrawal, R., & Srikant, R. (1995, 1995). Mining sequential patterns. Paper presented at
the Eleventh lntemational Conference on Data Engineering

Alles, M. G., Brennan, 0., Kogan, A., & Vasarhelyi, M. A. (2006). Continuous
monitoring of business process controls: A pilot implementation of a continuous
auditing system at Siemens (Vol. 7, pp. 137-161): Elsevier.

Anderson, T. W., & Goodman, L. A. (1957). Statistical inference about markov chains.
The Annals of Mathematical Statistics, 28(1), 89-1 10.

Ashby, W. R. (1956). Self-regulation and requisite variety. Systems Thinking, Penguin
Books, Harmondsworth.

Ashby, W. R. (1958). Requisite variety and its implications for the control of complex
systems. Cybernetica, 1(2), 83-99.

Ashby, W. R. (1968). Variety, constraint, and the law of requisite variety. Modern
Systems Research for the Behavioural Scientist, 129-136.

Ashby, W. R. (1976). An introduction to cybemetics: Harper & Row.

Baird, D., & Weisberg, R. (1982). Rules, standards, and the battle of the forms (Vol. 68,
pp.1217—1262)

128

Barley, S. R. (1986). Technology as an occasion for structuring: Evidence from the
observation of ct scanners and the social order of radiology departments.
Administrative Science Quarterly, 31, 78-108.

Barley, S. R. (1990). Images of imaging: Notes on doing longitudinal ﬁeldwork.
Organization Science, 1(3), 220-247.

Basu, A., & Kumar, A. (2002). Research commentary: Workﬂow management issues in
e-business. Information Systems Research, 13(1), 1-14.

Becker, M. C. (2004). Organizational routines: A review of the literature. Industrial and
Corporate Change, 13(4), 643-678.

Benders, 1., Batenburg, R., & van der Blonk, H. (2006). Sticking to standards; technical
and other isomorphic pressures in deploying erp-systems. Information and
Management, 43(2), 194-203.

Bishop, Y. M. M., Fienberg, S. E., & Holland, P. W. (1975). Discrete multivariate
analysis: Theory and practice.

Board, I. S. (2002). Continuous auditing: Is it fantasy or reality? Information Systems
Control Journal, 5.

Bollen, K. A. (1990). Structural equations with latent variables. New York: Wiley.

Brown, S. L., & Eisenhardt, K. M. (1997). The art of continuous change: Linking
complexity theory and time-paced evolution in relentlessly shifting organizations.
Administrative Science Quarterly, 42, 1-34.

Brynjolfsson, E., & Hitt, L. M. (1995). Information technology as a factor of production:
The role of differences among ﬁrms. Management Science, 3(3), 183-200.

Brynjolfsson, E., & Hitt, L. M. (2000). Beyond computation: Information technology,
organizational transformation and business performance. The Journal of
Economic Perspectives, 14(4), 23-48.

Buchner, A. G., Baumgarten, M., Anand, S. S., Mulvenna, M. D., & Hughes, J. G.
(1999). Navigation pattern discovery from internet data. Paper presented at the
WEBKDD’99. from http://www.inﬁ.ulst.ac.uk.:’~cbg\124/'PDF/WEBKDD99.pdf.

Buckley, W. F. (1967). Sociology and modern systems theory. Upper Saddle River, NJ:
Prentice Hall.

Carlsen, S. (1997). Conceptual modeling and composition of ﬂexible workﬂow models.
Norwegian University of Science and Technology.

129

Chen, M., Chen, A. N. K., & Shao, B. B. M. (2003). The implications and impacts of web
services to electronic commerce research and practices. Journal of Electronic
Commerce Research 4(4), 128-139.

Cohen, J ., Cohen, R, West, S. G., & Aiken, L. S. (2003). Applied multiple
regression/correlation analysis for the behavioral sciences. Mahwah, NJ: L.
Erlbaum Associates

Cohen, M. D. (2007). Reading dewey: Reﬂections on the study of routine (Vol. 28, pp.
773): EGOS.

Cohen, M. D., & Bacdayan, P. (1994). Organizational routines are stored as procedural
memory: Evidence from a laboratory study. Organization Science, 5(4), 554-568.

Cohen, M. D., Burkhart, R., Dosi, G., Egidi, M., Marengo, L. W., M., & Winter, S.
(1996). Routines and other recurring action patterns of organizations:
Contemporary research issues. Industrial and Corporate Change, 5, 653-698.

Compello Software, A. (2007). Retrieved 4/27/07, from http:f/www.compellocoml

 

Cook, J. E., & Wolf, A. L. (1998). Discovering models of software processes from event-
based data. ACM Transactions on Software Engineering and Methodology, 7(3),
215-249.

Cooley, R. (2000). Web usage mining: Discovery and application of interesting patterns
from web data. UNIVERSITY OF MINNESOTA.

Cooley, R., Mobasher, B., & Srivastava, J. (1999). Data preparation for mining world
wide web browsing patterns (Vol. 1, pp. 5-32).

Cooper, A. C., & Smith, C. G. (1992). How established ﬁrms respond to threatening
technologies. Academy of Management Executive, 6(2), 55-70.

Crowston, K. (1997). A coordination theory approach to organizational process design.
Organization Science, 8(2), 157—175.

Culnan, M. J. (1992). Processing unstructured organizational transactions: Mail handling
in the us senate. Organization Science, 3(1), 117-137.

Cyert, R. M., & March, J. G. (1963). A behavioral theory of the ﬁrm. Englewood Cliffs,
NJ: Prentice-Hall.

Davenport, T. H., & Short, J. E. (1990). The new industrial engineering: Information
technology and business process redesign.

130

David, H. A., & Nagaraja, H. N. (2004). Order statistics: Wiley-Interscience.
Davis, S. (1989). From future perfect: Mass customization. Planning Review, 2, 22.

Devaraj, S., & Kohli, R. (2003). Performance impacts of information technology: Is
actual usage the missing link? Management Science, 49(3), 273-289.

Dewan, S., & Min, C.-k. (1997). The substitution of information technology for other
factors of production: A ﬁrm level analysis. Management Science, 43(12), 1660-
1675.

Diamantopoulos, A., & Winklhofer, H. M. (2001). Index construction with formative
indicators. Journal of Marketing Research, XXX VII, 269-277.

Dijkstra, W. l. L. (2001). How to measure the agreement between sequences: A
comment. Sociological Methods Research, 29(4), 532-535.

Dijkstra, W. l. L., & Taris, T. (1995). Measuring the agreement between sequences.
Sociological Methods Research, 24(2), 214-231.

Dunn, C. L., Cherrington, J. 0., & Hollander, A. S. (2005). Enterprise information
systems: A pattern-based approach: McGraw-Hill/Irwin.

El-Ramly, M., Stroulia, E., & Sorenson, P. (2002). From run-time behavior to usage
scenarios: An interaction-pattern mining approach Paper presented at the Eighth
ACM SIGKDD international conference on Knowledge discovery and data
mining Edmonton, Alberta, Canada.

F eldman, M. S. (2000). Organizational routines as a source of continuous change.
Organization Science, 11(6), 61 1-629.

F eldman, M. S., & Pentland, B. (2003). Reconceptualizing organizational routines as a
source of ﬂexibility and change. Administrative Science Quarterly, 48(1), 94-118.

Georgakopoulos, D., Homick, M., & Sheth, A. (1995). An overview of workﬂow
management: From process modeling to workﬂow automation infrastructure.
Distributed and Parallel Databases, 3(2), 119-153.

Gosain, S. (2004). Enterprise information systems as objects and carriers of institutional
forces: The new iron cage. Journal of the Association for Information Systems,

5(4), 151-182.

Gottman, J. M., & Roy, A. K. (1990). Sequential analysis: A guidefor behavioral
researchers. Cambridge: Cambridge University Press.

131

Grant, R. M. (1996). Toward a knowledge-based theory of the ﬁrm. Strategic
Management Journal, 17, 109-122.

Hammer, M. (1990). Reengineering work: Don’t automate, obliterate. Harvard Business
Review, 68(4), 104-112.

Howard-Grenville, J. A. (2005). The persistence of ﬂexible organizational routines: The
role of agency and organizational context. Organization Science, 16(6), 618.

Jarvis, C. B., Mackenzie, S. B., & Podsakoff, P. M. (2003). A critical reivew of construct
indicators and measurement model misspecifcaiton in marketing and consumer
research. Journal of Consumer Research, 30.

Kenny, D. A. (1979). Correlation and causality. New York: Wiley.

Khandwalla, P. N. (1974). Mass output orientation of operations technology and
organizational structure. Administrative Science Quarterly, 19(74), 74-97.

Klatell, J. M. (2006, Sept. 23, 2006 =). Is mail safer since anthrax attacks? Questions
remain about post ofﬁce security 5 years after 5 died. Retrieved 5/30/2009,
2009, from

http://www.cbsncws.com/storics:"20()6/09."23/evcningncwsimain2036244.8html

 

Koberg, C. (1988). Dissimilar structural and control proﬁles of educational and technical
organizations. Journal of Management Studies, 25(2), 121.

Koestler, A. (1967). The ghost in the machine. London: Hutchinson.

Kohli, R., & Hoadley, E. (2006). Towards developing a framework for measuring
organizational impact of it-enabled bpr: Case studies of three ﬁrms. ACM SIGMIS
Database, 3 7(1), 40-58.

Kosala, R., & Blockeel, H. (2000). Web mining research: A survey SIGKDD Explor.
News], 2(1), 1-15.

Kruskal, J. B., & Carrol, J. D. (1969). Geometric models and badness-of-ﬁt functions. In
P. R. Krishnaiah (Ed.), Multivariate analysis (Vol. 2, pp. 639-670). New York:
Academic Press.

Kruskal, J. B., & Wish, M. (1978). Multidimensional scaling.

Langley, A. (1999). Strategies for theorizing from process data. The Academy of
Management Review, 24(4), 691-710.

Lee, R. G., & Dale, B. G. (1998). Business process management: A review and
evaluation. Journal, Vol, 4(3), 214-225.

132

Leidner, R. (1993). Fast food, fast talk: Service work and the routinization of everyday
life: University of California Press.

Lynn, M. L. (2005). Organizational buffering: Managing boundaries and cores.
Organization Studies, 26(1), 37.

Malone, T. W., Crowston, K., Lee, J ., Pentland, B., Dellarocas, C., Wyner, G., et al.
(1999). Tools for inventing organizations: Toward a handbook of organizational
processes. Management Science, 45(3), 425-443.

March, J. G. (1991). Exploration and exploitation in organizational learning. 2(1), 71-87.
March, J. G., & Simon, H. A. (1958). Organizations. New York: John Wiley and Sons.

Markus, M. L., & Robey, D. (1988). Information technology and organizational change:
Causal structure in theory and research. Management Science, 34(5), 583-598.

Melao, N., & Pidd, M. (2000). A conceptual framework for understanding business
processes and business process modelling. Information Systems Journal, 10(2),
105-129.

Melao, N., & Pidd, M. (2008). Business processes: Four perspectives. In M. L. Markus &
V. Grover (Eds), Business process transformation (advances in management
information systems): Publisher: ME. Sharpe (April 2008).

Merton, R. (1936). The unintended consequences of purposive social action. American
Sociological Review, I, 894-904.

Meznar, M. B., & Nigh, D. (1995). Buffer or bridge? Environmental and organizational
determinants of public affairs activities in american ﬁrms. The Academy of
Management Journal, 38(4), 975-996.

Miller, K. D., Zhao, M., & Calantone, R. J. (2006). Adding interpersonal learning and
tacit knowledge to march's exploration-exploitation model. Academy of
Management Journal, 49(4), 709-722.

Monge, P. R. (1990). Theoretical and analytical issues in studying organizational
processes. Organization Science, I (4), 406-430.

Mooney, J. G., Gurbaxani, V., & Kraemer, K. L. (1996). A process oriented framework
for assessing the business value of information technology. ACM SIGMIS
Database, 27(2), 68-81.

Mukhopadhyay, T., Rajiv, S., & Srinivasan, K. (1997). Information technology impact on
process output and quality. Management Science, 43(12), 1645-1659.

133

Narendra, N. C. (2004). Flexible support and management of adaptive workﬂow
processes. Information Systems Frontiers, 6(3), 247-262.

Nelson, S. G., & Winter, R. R. (1982). An evolutionary theory of economic change.
Cambridge, MA: Harvard University Press.

O'Neill, P., & Sohal, A. S. (1999). Business process reengineering: A review of recent
literature. T echnovation, 19(9), 571-581.

Oakland, J. S. (1999). Statistical process control: Butterworth-Heinemann Boston.
Oksanen, J. (2009a). Multivariate analysis of ecological communities in r: Vegan tutorial.

Retrieved 5/11/2009, 2009, from
http://ccoulu.ﬁ/~iarioksa/opetus/mctodi/vegantutorpdf

 

Oksanen, J. (2009b). Package ‘vegan’: Reference manual. Retrieved 5/11/2009, 2009,
from http://cran.r-proicct.org/web/packagcsivcgan/vcgan.pdt'

Orlikowski, W. J. (1992). The duality of technology: Rethinking the concept of
technology in organizations. Organization Science, 3(3), 398-427.

Orlikowski, W. J. (1995). Improvising organizational transformation overtime: A
situated change perspective: Sloan School of Management, Massachusetts
Institute of Technology.

Overby, E. (2008). Process virtualization theory and the imapct of information
technology. Organization Science, Articles in Advance, 1—14.

Pentland, B. (1992). Organizing moves in software support hot lines. Administrative
Science Quarterly, 37(4), 527-548.

Pentland, B. (1999). Building process theory with narrative: From description to
explanation. The Academy of Management Review, 24(4), 71 1-724.

Pentland, B. (2003a). Conceptualizing and measuring variety in organizational work
processes. Management Science, 49(7), 857-870.

Pentland, B. (2003b). Sequential variety in work processes. Organization Science, 14(5),
528-540.

Pentland, B., Haerem, T., & Hillison, D. (2007). Using war/glow data to explore the
structure of an organizational routine. Paper presented at the 3rd lntemational
Conference on Organizational Routines: Empirical Research and Conceptual
Foundations.

134

Pentland, B., Haerem, T., & Hillison, D. (2009a). Comparing organizational routines as
recurrent patterns of action. Unpublished Working Paper.

Pentland, B., Haerem, T., & Hillison, D. (2009b). Longitudinal endogenous changes in
the performance of organizataional routines. Unpublished Working Paper.

Pentland, B., Haerem, T., & Hillison, D. (20090). The (n)ever changing world: Stability
and change in organizational routines. Unpublished Working Paper.

Pentland, B., Haerem, T., & Hillison, D. (2009d). Using workﬂow data to explore the
structure of an organizational routine. In M. Becker & N. Lazaric (Eds),

Organizational routines: Advancing empirical research (pp. 47-67). Cheltenham:
Edward Elgar.

Pentland, B., & Rueter, H. H. (1994). Organizational routines as grammars of action.
Administrative Science Quarterly, 39(3), 484-510.

Peters, L., & Saidin, H. (2000). It and the mass customization of services: The challenge
of implementation. International Journal of Information Management, 20(2),
103-119.

Petter, S., Straub, D., & Rai, A. (2007). Specifying formative constructs in information
systems research. MIS Quarterly, 31(4), 623-656.

Poole, M. S., & Desanctis, G. (1990). Understanding the use of group decision support
systems. In C. Steinﬁeld & M. L. Markus (Eds.), Organizations and
communication technology. Newbury Park, CA: Sage.

Rényi, A. (1953). On the theory of order statistics. Acta Mathematica Hungarica, 4(3),
191-231.

Sabherwal, R., & Robey, D. (1993). An empirical taxonomy of implementation processes
based on sequences of events in information system development. Organization
Science, 4(4), 548-576.

Sabherwal, R., & Robey, D. (1995). Reconciling variance and process strategies for
studying information systems development. Information Systems Research, 6(4),
303-327.

Sankoff, D., & Kruskal, J. B. (1983). Time warps, strings edits, and macromolecules: The
theory and practice of sequence comparison. Reading, MA: Addison-Wesley.

Schulz, M. (2008). Staying on track: A voyage to the internal mechanisms of routine

reproduction. In M. Becker (Ed.), Handbook of organizational routines.
Cheltenham: Edward Elgar.

135

Scott, W. R., & Davis, G. F. (2007). Organizations and organizing: Rational, natural,
and open system perspectives. Upper Saddle River, NJ: Pearson Prentice Hall.

Shannon, C. E., & Weaver, W. (1949). The mathematical theory of communication:
University of illinois press. Urbana, 117.

Simon, H. A. (1973). Applying information technology to organization design. Public
Administration Review, 33(3), 268-278.

Simon, H. A. (1996). The sciences of the artiﬁcial. Cambridge, Massachusetts: MIT
Press.

Singh, H., Pentland, B., Yakura, E., & Hillison, D. (2009). Business process
management: A review and new directions.

Sorenson, O. (2003). Interdependence and adaptability: Organizational learning and the
long-term effect of integration. Management Science, 49(4), 446-463.

Spender, J. C., & Kessler, E. H. (1995). Managing the uncertainties of innovation:
Extending thompson (1967). Human Relations, 48(1), 35.

Thompson, J. D. (1967). Organizations in action: McGraw-Hill New York.

van de Ven, A. H., Angle, H. L., & Poole, M. S. (Eds). (1989). Research on the
management of innovation: The minnesota studies. New York: Ballinger/Harper
and Row.

van de Ven, A. H., & Poole, M. S. (1990). Methods for studying innovation development
in the Minnesota innovation research program. Organization Science, 1(3), 313-
335.

van der Aalst, W. (2003). Business process management: Past, present and future.

van der Aalst, W., Desel, J., & Oberweis, A. (2000). Business process management,
models, techniques, and empirical studies: Springer-Verlag London, UK.

van der Aalst, W., ter Hofstede, A. H. M., & Dumas, M. (2005). Patterns of process
modeling. In M. Dumas, W. van der Aalst & A. H. M. ter Hofstede (Eds),
Process-aware information systems: Bridging people and software through
process technology (pp. 179-203): Wiley & Sons.

van der Aalst, W., & van Dongen, B. F. (2002). Discovering workﬂow performance

models from timed logs. International Conference on Engineering and
Deployment of Cooperative Information Systems (EDCIS 2002), 2480, 45—63.

136

van der Aalst, W., van Dongen, B. F., Herbst, J ., Maruster, L., Schimm, G., & Weijters,
A. J. M. M. (2003). Workﬂow mining: A survey of issues and approaches. Data
& Knowledge Engineering, 47(2), 237-267.

van der Aalst, W., & Weijters, A. J. M. M. (2004). Process mining: A research agenda.
Computers in Industry, 53(3), 231-244.

van der Aalst, W., Weijters, A. J. M. M., & Maruster, L. (2004). Workﬂow mining:
Discovering process models from event logs. IEEE Transactions on Knowledge
and Data Engineering, 16(9), 1128-1142.

van Driel, K., & Oosterveld, P. (2001). Nonoptimal alignment: A comment on
"Measuring the agreement between sequences" By dijkstra and taris. Sociological
Methods Research, 29(4), 524-531.

Vasarhelyi, M. A., & Halper, F. B. (1989). The continuous audit of online systems.
Artificial Intelligence in Accounting and Auditing: Knowledge Representation,
Accounting Applications and the Future, 175.

Venkatesh, V., Morris, M. G., Davis, G. B., & Davis, F. D. (2003). User acceptance of
information technology: Toward a uniﬁed view. MIS Quarterly, 425-478.

Wagner, H., Beimborn, D., Franke, J ., & Weitzel, T. (2006). It business alignment and it
usage in operational processes: A retail banking case. System Sciences, 2006.
HICSS'06. Proceedings of the 39th Annual Hawaii International Conference on,
8.

Weick, K. E. (1979). The social psychology of organizing: Mc-GraW-Hill Publishing Co.

Weick, K. E. (1998). Introductory essay: Improvisation as a mindset for organizational
analysis. Organization Science, 9(5), 543-555.

Weske, M., van der Aalst, W., & Verbeek, H. M. W. (2004). Advances in business
process management (Vol. 50, pp. 1-8): Elsevier.

Winter, S. (1964). Economic "Natural selection" And the theory of the ﬁrm. Yale
Economic Essays, 4, 225-272.

Yan, A., & Louis, M. R. (1999). The migration of organizational functions to the work
unit level: Buffering, spanning, and bringing up boundaries. Human Relations,
52(1), 25-47.

137