Ewan. .3

a ii .3. t

9

.W& «mwmﬁm

.53.... ,
r
..

423-3"

 

”Wanna...
: .l
. . J.
.ou.
£121.19

..
.

.2....£ ..
{$.52
’ A}.

 

 

 

 

 

 

 

 

lu.:}h., .
.1. v.71.
3..
« .
3?. s. .
Jaw, , .rf
zzvﬂmﬁubwki
m- ..
Oiwl
.3
2.5.) .
. 11H.
2.2.“,

 

“aggﬁzﬁgﬁﬁ . in”.

531%

«r 5‘3- LIBRARY
Michigan State
5”, f, 3 University

K

{$75838l

This is to certify that the
thesis entitled

THE EFFECTS OF PERSON AND MACHINE CHARACTERISTICS
ON OPERATOR USE AND MONITORING OF AN AUTOMATED
SYSTEM

presented by
Stephanie M. Drzakowski

has been accepted towards fulﬁllment
of the requirements for the

MA. degree in Psychology

/:7 A/Kﬁv

Major Prdt'essor’ 5 Signature

7/2 {/05

Date

 

MSU is an Afﬁnnative Action/Equal Opportunity Institution

‘ —a-n-o-n---n-o-u-n-n-o-c-o-n-c-o-cnn-o-u-c-n-u-uc-—

v.v---<--'-I-l-!-l-I-O-c-n-t-c- -

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE

DATE DUE

DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2/05 c:/CTFiC/oateom.mms

 

THE EFFECTS OF PERSON AND MACHINE CHARACTERISTICS ON OPERATOR

USE AND MONITORING OF AN AUTOMATED SYSTEM

By

Stephanie M. Drzakowski

A THESIS

Submitted to
Michigan State University
In partial fulﬁllment of the requirements
For the degree of

MASTER OF ARTS
Department of Psychology

2005

ABSTRACT

THE EFFECTS OF PERSON AND MACHINE CHARACTERISTICS ON OPERATOR

USE AND MONITORING OF AN AUTOMATED SYSTEM

By

Stephanie M. Drzakowski

Organizations are increasingly automating tasks that were once performed solely by
humans. However, in many cases automation has failed to increase organizational
productivity. This study posits that human operators of automated technology make
errors at two decision points: the decision of whether to use the automated system and
the decision of whether to monitor the automated system for errors. The present research
investigates the factors inﬂuencing these two decisions. A sample of 253 undergraduates
participated in a computer-based simulation of an airport security screening task. While
visually searching for guns and knives in X-ray images of luggage, participants had the
option of using an automatic screener. Results indicated positive and signiﬁcant effects
of trust and affect toward the automated system on use and monitoring of this automatic
screener. The effects of self-efﬁcacy, conscientiousness, extraversion, desire to decrease

cognitive load, and the interaction between trust and self-efﬁcacy were non-signiﬁcant.

 

 

Copyright by
STEPHANIE M. DRZAKOWSKI

2005

ACKNOWLEDGMENTS

First and foremost, Dan, this project never would have happened without your
tremendous support. Thank you for encouraging me to pursue this topic despite the fact
that I was originally unsure about its feasibility and place in the I/O literature. I’m so
glad that you convinced me to choose a topic I was truly interested in. In addition, I am
incredibly grateful for your assistance in developing the task simulation. Thank you for
giving so willingly of your resources. I hope that the program will continue to be useful
for you and others in the future. And most of all, thank you for the enormous amount of
time and effort you dedicated to reading and providing comments on my numerous
draﬁs! Your helpful feedback not only inﬁnitely improved the project, but also helped
me grow as a researcher. You have been an excellent model of careful thought and clear
articulation of ideas. I have learned so much from working with you.

Neal and Fred, thank you so much for agreeing to serve on my committee. You
were there when I had questions or concerns, and your insightful feedback on the project
has been tremendously helpﬁil. Thank you so much again.

I’d also like to thank the Transportation Security Administration. The TSA
provided X-ray images without which the experimental task used in this study would not
have been possible. Much thanks to the TSA for its help and support.

Dan N., thank you so much for helping me out with the slides. All of those hours
you spent cutting and pasting X-ray images were much appreciated! Mike V., I’d also

like to thank you for your help in testing the program, running sessions, and checking all

iv

of the data in that Excel ﬁle. Thanks to both of you for doing such a great job — good
luck with your future endeavors!

Mom and Dad, thanks for supporting me through the rough times — especially
through the ﬁrst year I was here. Your encouragement meant so much, and now look—
your daughter has ﬁnally ﬁnished her master’s thesis! Thank you so much for listening
to me whine and for always supporting me through all of this process.

Also, thanks to Alyssa, Smriti, and Dan W. for volunteering your time to help me
test my program. Your input was much appreciated! Anna, thanks for all those great TV
/ thesis nights last year. . .and thanks so much for lending an ear and making me laugh
when I needed it. And a big thanks to all of the grad students — it was great to have the
support of such a great group!

Finally, I’d like to thank my husband. Kevin, thanks for being there for me
through the good times and the bad. Thanks for comforting me when l was upset and
celebrating with me when I reached a milestone in the process (who could forget the
Thesis Peep?). Most of all, thank you for agreeing to move to Michigan for three years!
That’s quite a sacriﬁce, and I hope you know how much I appreciate it. But now the race
is on! You may have beaten me by a year on the M.A., but I’ve ﬁnally caught up to your

educational level. Who will ﬁnish the Ph.D. ﬁrst? Only time will tell!

TABLE OF CONTENTS

LIST OF FIGURES ........................................................................................................... ix
LIST OF TABLES.................; ............................................................................................ x
Introduction ......................................................................................................................... 1
Literature Review ............................................................................................................ 4
Attitudes Toward the Automated System ................................................................... 5
Trust ........................................................................................................................ 5
Complacency Potential. ........................................................................................ l9

Affect Toward the Automation ............................................................................. 20
Cognitive load ....................................................................................................... 2O
Dispositions ............................................................................................................... 22
Self—efﬁcacy .......................................................................................................... 22
Personality ............................................................................................................. 24
Hypotheses .................................................................................................................... 26
Task Overview .............................................................................................................. 29
Method .............................................................................................................................. 34
Participants and Design ................................................................................................. 34
Procedure ...................................................................................................................... 34
Experimental Manipulations ..................................................................................... 36
Measures ....................................................................................................................... 38
Manipulation Checks .................................................................................................... 39
Results ............................................................................................................................... 41
Organization of the Results ........................................................................................... 41

Vi

 

Descriptive Statistics ..................................................................................................... 45

Hypothesis Testing ........................................................................................................ 47
Hypotheses 1a through lc. ........................................................................................ 47
Hypothesis 2 .............................................................................................................. 50
Hypothesis 3 .............................................................................................................. 52
Hypotheses 4a and 4b ............................................................................................... 55
Hypothesis 5 .............................................................................................................. 56
Hypotheses 6a and 6b ............................................................................................... 57

Discussion ......................................................................................................................... 59

Complacency Potential Results ..................................................................................... 59

Trust .............................................................................................................................. 60

Trust X Self-Eﬁicacy Interaction .................................................................................. 60

Desire to Decrease Cognitive Load .............................................................................. 62

Affect ............................................................................................................................ 64

Conscientiousness and Extraversion ............................................................................. 64

Limitations .................................................................................................................... 66

Future Research Suggestions ........................................................................................ 67

Conclusion .................................................................................................................... 68

References ......................................................................................................................... 72
Appendix A: User Training Script .................................................................................... 76
Appendix B: Machine Training Script .............................................................................. 77
Appendix C: Automation-Induced Complacency Potential Measure ............................... 78
Appendix D: Post-Task Predictability, Dependability, & Faith Items ............................. 80
Appendix E: Post-task Competence and Responsibility Items ......................................... 82
Appendix F: Conscientiousness Scale Items .................................................................... 83

vii

 

Appendix G: Extraversion Scale Items ............................................................................. 84

Appendix H: Pre-task Trust Items .................................................................................... 85
Appendix I: Desire to Decrease Cognitive Load Scale Items ........................................... 86
Appendix J: Self-Efﬁcacy Scale Items ............................................................................. 87
Appendix K: Affect Toward the AWD Scale Items ......................................................... 88
Appendix L: Perceived Cognitive Load Measure ............................................................. 89
Appendix M: Covariance Matrices for Automation Use and Monitoring ........................ 92
Appendix N: Automation-Induced Complacency Potential Rating Scale Analyses ........ 93
Appendix 0: Post-hoe Analyses of the Effects of Slide Difﬁculty .................................. 96
Appendix P: Post-hoc Analyses of the Effects of Weapon Presence or Absence ............ 99
Appendix Q: Post-hoe Analyses of Point Performance .................................................. 101
Appendix R: Affect Manipulation Replication ............................................................... 107
Appendix S: Trust Measure Analysis ............................................................................. 108

viii

 

LIST OF FIGURES

Figure 1: Experimental Model .......................................................................................... 29
Figure 2: Example X-ray Slide #1 .................................................................................... 30
Figure 3: Example X-ray Slide #2 .................................................................................... 31
Figure 4: Example X-ray Slide #3 .................................................................................... 32
Figure 5: Screen Capture from X-ray Task ....................................................................... 33
Figure 6: Original distribution of AWD USE ................................................................... 43
Figure 7: Transformed distribution of AWD use .............................................................. 44
Figure 8: Graph of the Interaction Between Trust and Self-Efficacy ............................... 53
Figure 9: Histogram of the Self-Efﬁcacy .......................................................................... 55
Figure 10: Example TLX Scale ........................................................................................ 90

Figure 11: Scree Plot - Exploratory Factor Analysis for Automation-Induced
Complacency Potential ..................................................................................... 94
Figure 12: Distribution of Total Points ........................................................................... 101
Figure 13: Plot of the Interaction of Trust Condition and AWD Use onto Total Points 104
Figure 14: Plot of the Interaction of Trust Condition and Correct Disagreements onto
Total Points .................................................................................................... 105

Figure 15: Scree Plot - Exploratory Factor Analysis of Trust Scale Items ..................... 108

ix

 

LIST OF TABLES

Table 1: Selected Deﬁnitions of Trust ................................................................................ 6

Table 2: Top Five Words for Trust in General, Trust Between Humans, and Trust in a

Machine .............................................................................................................. 8
Table 3: Basis of Trust Expectations at Different Levels of Experience .......................... 12
Table 4: X-Ray Screening Task Scoring System .............................................................. 36
Table 5: T-tests for Signiﬁcant Differences Between Session Types A and B ................ 42
Table 6: Scale Ranges, Means, and Standard Deviations ................................................. 46
Table 7: Scale Interorrelations .......................................................................................... 46
Table 8: Trust Manipulation Check .................................................................................. 47
Table 9: Self-Efﬁcacy Manipulation Check ..................................................................... 48
Table 10: Affect Manipulation Check .............................................................................. 48

Table 11: Regression Analyses of Trust, Self-Efﬁcacy, and their Interaction on Use, Total
Disagreements, and Correct Disagreements .................................................... 50

Table 12: T-tests for Differences in Trust, Self-Efﬁcacy, and their Interaction Between
Low Use and High Use Participants ................................................................ 51

Table 13: Correlations of Desire to Decrease Cognitive Load with Use and Monitoring 56

Table 14: Correlations of Conscientiousness and Extraversion with Use and Monitoring

.......................................................................................................................... 58
Table 15: Perceived Cognitive Load Minimums, Maximums, Means, and Standard

Deviations ........................................................................................................ 63
Table 16: Dimensions of the NASA TLX ........................................................................ 89

 

Table 17:

Table 18:

Table 19:

Table 20:

Table 21 :

Table 22:

Table 23:

Sample Covariance Matrix for Use .................................................................. 92
Sample Covariance Matrix for Monitoring ...................................................... 92
Reliabilities of the Automation-Induced Complacency Potential Subscales.... 93

Revised Automation-Induced Complacency Potential Scale Items and Statistics

.......................................................................................................................... 95
Descriptive Statistics for Total Points ............................................................ 102
Regression Analysis of Trust Condition and AWD Use on Total Points ....... 103

Regression Analysis of Trust Condition and Correct Disagreements on Total

Points .............................................................................................................. 105

xi

 

Introduction
The past half-century has borne witness to an unprecedented explosion of
autonomous technology development and implementation, particularly in workplace
settings. Many jobs once performed by humans are now being automated. Automation is
deﬁned as “the allocation of functions to machines that would otherwise be allocated to
humans, or the machines that perform those functions” (Funk, et al., 1999). For example,
people once made bank deposits and withdrew cash with the aid of a human bank teller.

Now, the tasks of withdrawing and depositing money are often conducted by automatic

 

teller machines (ATMs). Telephone operators have, in many organizations, been
replaced by automated directories. Physically dangerous assembly work, once
undertaken at great risk, is now regularly performed by robotic systems.

Expert systems, a type of automated decision aid, have been implemented in
many work environments. Expert systems are “computer programs that apply substantial
knowledge of speciﬁc areas of expertise to the problem solving process” (Bobrow,
Mittal, & Steﬁk, 1986, pp. 880). These systems are provided an expansive knowledge
base in a very speciﬁc area. One of the most famous expert systems is Deep Blue (IBM),
the expert system that defeated chess grandmaster Gary Kasperov (IBM, n.d.). Deep
Blue is a highly-developed prototype system with power surpassing most machines.
However, Waterman (1986) states that less powerful automated decision aid systems
have been implemented in almost every type of workplace (as cited in Will, 1991). For
example, they have been widely implemented in the ﬁeld of behavioral accounting,
where they produce recommendations for whether applicants should receive loan

renewals. In addition, expert systems have been used since the 19705 in medical

 

diagnoses, where doctors can enter symptom information and the system produces a
likely diagnosis (Yu, et al., 1979).

As more tasks have become automated, a unique situation has arisen: the
situation where humans and machines must work jointly on the same task. For example,
airplane cockpits are now partially automated, such that the craft have autopilot and
automonitoring systems. However, we still have human pilots and co-pilots on every
ﬂight. Oﬁen, both the human pilots and autopilot system perform the same task —
piloting the plane. Control can be allocated to either human or machine, but both must
function correctly and work together to make ﬂights safer and smoother. This type of
joint human-machine responsibility for performance has become increasingly common
(Parasurarnan & Riley, 1997) and requires research attention.

Organizations spend a large amount of revenue purchasing and maintaining
automated systems and training employees to use them. In return for their investments,
they expect the beneﬁts that automated systems promise. These systems are designed to
reduce labor costs and to free humans from mundane and/or dangerous tasks (i.e., by
reducing fatigue and boredom and by removing humans from dangerous working
conditions). In many cases, they are effective. Research has established that in many
decision situations, even relatively simple algorithms produce consistently more accurate
results than do human judgments (Dawes, Faust, & Meehl, 1989; Leli & Filskov, 1984).

However, sometimes the implementation of automated technology does not tender
the degree of beneﬁt for which it was designed. The importance of empirical attention to
human-machine interaction is highlighted by the “productivity paradox” (e. g., Lee &

Perry, 1999; Stratopoulos & Dehning, 2000). The productivity paradox is that increased

implementation of advanced technology in the workplace has not consistently been
signiﬁcantly correlated with increased organizational performance. In fact, research has
suggested that in some industries, disappointment with the degree of improvement after
automation implementation tends to be common (Arnold & Sutton, 1998). It has been
suggested that the disappointing degrees of improvement are not due to mechanical
malfunctions or to poor design. Instead, it has been proposed that automation
implementation failures are due to faulty human-automation interactions (e.g., Muir,
1994; de Vries, Midden, & Bouwhuis, 2003; Will, 1991).

Numerous fatal disasters have implicated poor human-automation interaction as a
cause. For example, Sparaco (1994) describes a fatal airplane crash that could have been
prevented had the pilots taken action to override the autopilot only four seconds sooner.
However, overconﬁdent in the plane’s automated technology, they were too late to
realize the error. Poor human-computer interaction has also been implicated in medical
patient overdoses (Neuman, 1986; Sollins, 1986) and the nuclear incident at Three Mile
Island (Connors, Harrison, & Summit, 1994).

In order for automated technology to be beneﬁcial, users of the automated
systems must make the correct choices about how to interact with the technology. It is
argued here that there are two points at which a decision regarding the technology must
be made. An incorrect decision at either of these points can severely decrease total
system effectiveness.

The ﬁrst decision point for the human is whether or not to use an automated
system. The major error at this decision point is that the user may fail to use the

automated system when doing so would improve performance. For example, a doctor

 

may fail to use an expert system for advice on a diagnosis even when the system has a
much higher likelihood of making a successful diagnosis than does the doctor.

This second decision point is only encountered after one chooses to use the
automated system. At this point, users are required to determine what actions they will
take after engaging the system. They decide whether to monitor the system for errors so
that they can override the system if an error occurs. Automated systems, like humans,
sometimes make mistakes or non-optimal decisions. Often the algorithms on which
decision aids are based are probabilistic. Thus, part of the role of the human in shared
decision situations is to monitor and evaluate the performance of the automated system.

The purpose of the present study was to examine the factors that might affect the
decisions of humans working in conjunction with an automated system at these two
decision points. The vehicle for doing so was the task of airport security screening. A
simulation of the baggage screening task was produced in which participants encountered
the two decision points: they were able to choose whether or not to use an “automated
weapons detector,” for assistance, and if they did, they had to choose whether to monitor
its decisions for errors.

Literature Review

Many outcomes of interest in psychology are affected by two classes of variables,
person variables and situation variables. The present study focuses on building an
integrative model of some person factors that may inﬂuence automation use and
monitoring. Previous research has established that person factors such as trust toward the
automation, self-efﬁcacy, and personality may be important determinants of operator

choices when using automation (e. g., Dzindolet, et al., 2001; Dzindolet, etal., 2003; Lee

& Moray, 1994; Parasuraman & Byme, 2003). The person factors relevant to the
decisions operators make at the two decision points of use and monitoring fall into two
categories: attitudes toward the automated system and dispositions. Attitudes toward the
automated system are heavily inﬂuenced by the characteristics of the automated system,
while dispositions are more stable propensities that exert effects independent of the
machine’s characteristics. Each of these categories will be discussed in turn.
Attitudes Toward the Automated System

Trust. The ﬁrst factor upon which I focus is trust in the automated system. If the
operator does not trust a system, he or she will probably resist using it. In regard to
monitoring, excessive trust in the system might cause the operator to become less vigilant
or reluctant to monitor or override the system. Will (1991) proposes that an implicit trust
forms when a user decides to act upon the suggestions of an automated system.
Furthermore, Muir (1987) states that when operators are forced to use a system they do
not trust, they often go to extremes to avoid using the system; however, when users trust
automation to an excessive level, they may allocate tasks to automated systems
inappropriately. In order to understand the role of trust in human-computer performance,
one must explore the nature of trust. I begin with an examination of theories of trust
between humans. Then, a theory of trust between a human and an automated system is
described. Finally, the conceptualization of trust used in the present study will be
presented.

Deﬁnitions of Trust. Some studies that examine trust do not provide a deﬁnition
of the construct but instead leave the reader to utilize his or her own personal and

colloquial deﬁnition of the term (e.g., Dzindolet, et al., 2003; Lee & Moray, 1994;

Lewandowski, Mundy, & Tan, 2000). Among those researchers who do provide explicit

deﬁnitions of trust, little consensus is found. Table 1 presents a few of the numerous

disparate deﬁnitions of trust that can be found in the literature.

Table 1

Selected Deﬁnitions of Trust

 

Source

Barber (1983)

Hosmer (1995)

Lewis & Weigert (1985)

Robinson (1996)

Deﬁnition

“socially learned and socially conﬁrmed expectations that people have of
each other, of the organizations and institutions in which they live, and of the
natural and moral social orders that set the fundamental understandings for

their lives”

“the expectation. . .of ethically justiﬁable behavior — that is, morally correct

decisions and actions based upon ethical principles of analysis”

“undertaking of a risky course of action on the conﬁdent expectation that all

persons involved in the action will act competently and dutifully”

“expectations, assumptions, or beliefs about the likelihood that another’s
ﬁiture actions will be beneﬁcial, favorable, or at least not detrimental to

one’s interests”

 

Not all of these deﬁnitions appear to be relevant to the study of human-computer

trust. In order to develop an appropriate deﬁnition of trust in the context of the current

study, the literature on social trust between humans and human-computer trust was

reviewed. This review resulted in the identiﬁcation of the components of trust that

appear most applicable to human-machine trust relationships.

Researchers studying human-computer trust often apply theories of social trust
between humans to their research (e. g., Lee & Moray, 1992; Muir, 1987; Muir & Moray,
1996). Evidence suggests that elements of social trust and human-computer trust are
often similar and thus, research and theory in one area can reasonably be applied to
research in the other. J ian, Bisantz, and Drury (2000) speciﬁcally suggest that ﬁndings
on interpersonal trust are applicable to human-machine trust. Their study asked
participants to perform a sorting task in which they identiﬁed and ranked words as being
associated with trust between people, trust between and human and an automated system,
and trust in general. They compared the sets of the top 15 words that participants judged
to be related to trust between two humans and trust between a human and a machine.
They found the union set to be small — seven words (whereas ﬁve words would indicate
absolute agreement). This suggests that people perceive human-human trust and human-
machine trust as being very similar. Additionally, of the top ﬁve words in each set, three
(trustworthy, loyalty, and reliability) were identical (see Table 2).

One theory of trust that is directly focused on automated systems has been
proposed (Muir, 1987). This theory was based heavily on two of the most popular
theories of social trust between humans: Rempel, Holmes, and Zanna’s (1985) theory
and Barber’s (1983) theory. In the following paragraphs, I discuss these two theories in
depth. Following the discussion of the theories, I turn to Muir’s theory of trust in expert
systems and the research supporting it.

Rempel, Holmes’, & Zanna ’s Theory. Perhaps the most popular theory of trust
was proposed by Rempel, Holmes, and Zanna (1985). This theory, which was developed

in order to describe close interpersonal relationships, does not suggest one speciﬁc

 

deﬁnition of trust but instead conceptualizes trust as a hierarchy of attributions one may
make about one’s partner. In this hierarchy, each level becomes more abstract than the
last. The ﬁrst level, predictability, concerns speciﬁc patterns of behavior. When one’s
partner is highly predictable, one can forecast how one’s partner will react in a given set
of circumstances. At the predictability stage, little effort is made to determine why any
speciﬁc behavior is exhibited; predictability simply entails a consistent response in a
given context.

Table 2

Top Five Words for Trust in General, Trust Between Humans, and Trust in a Machine

 

 

Trust in General Trust Between Humans Trust in a Machine
Trustworthy Trustworthy Trustworthy
Honesty Honesty
Loyalty Loyalty Loyalty
Reliability Reliability Reliability
Honor Honor
Integrity
Familiarity

 

In the second stage of the trust hierarchy, dependability, one makes attributions
about the partner’s stable disposition. The focus is no longer on speciﬁc behaviors in
speciﬁc situations; the focus rests on the partner him- or herself. One may say, “My
partner is a dependable person. I can trust him or her not to hurt me.” Dependability is
distinct from predictability in that it is a general attribution about the referent person, not

an attribution about a context-speciﬁc behavior.

Faith, the third and ﬁnal stage in the trust hierarchy, is the most abstract. It
involves conﬁdence that, in the face of uncertain future events, the partner will behave in
appropriate ways. It can be considered a state of emotional security where one believes
that his or her partner’s motives and intentions are righteous, such that in any situation
the partner will behave appropriately. Like dependability, faith is an attribution about the
person. However, unlike dependability, faith extends into the future and to situations that
are completely novel; thus, it is not necessarily based on any speciﬁc evidence from past
experience.

It should be noted that Rempel, et al.’s theory speciﬁes that trust develops over
time. As a relationship matures, trust usually evolves from predictability to dependability
to faith. According to Rempel, et al., it is rare to ﬁnd trust at the faith level in the early
stages of a relationship. Additionally, while the stages are distinct, they are not
independent. Although predictable behavior is not sufﬁcient for the development of
dependability, it serves as evidence of a dependable disposition. Therefore, it is unlikely
that someone considered unpredictable would be considered dependable, just as it is
unlikely that someone considered undependable would be trusted at the faith level.

In summary, Rempel, et al.’s theory suggests that trust develops over time and
progresses through stages of increasing abstraction, from predictability of speciﬁc
behaviors to attributions of a dependable character to faith in future behavior regardless
of context. Because they are based on actions occurring in known contexts, the concepts
of predictability and dependability can easily be applied to the concept of trust in
automation. An automated system can function consistently in a known situation — so

much so that the user might attribute dependability to the machine. The concept of faith,

however, is more difﬁcult to conceptualize as related to machines because it implies that
the user will generalize the machine’s dependability to new, future situations.

Barber ’s Theory. In 1983, Barber proposed his inﬂuential theory of trust. This
theory, which is sociological in nature, posits that trust exists in order to provide social
ordering and control. He views trust as a function of three types of expectations that
social actors have of one another as they make choices about which behaviors are
rationally effective and morally appropriate. He suggests that trust develops as a function
of three expectations.

The ﬁrst expectation is that the natural and moral orders will persist. In this,
Barber means that we expect order, regularity, and stability to some degree. We expect
that social norms will remain relatively unchanged from day to day, as will the laws of
physics. These expectations, according to the theory, prevent us from being paralyzed by
fear of every imaginable threat, and they allow us to develop trust in others.

Secondly, trust is based on the expectation of technically competent role
performance. In order for a person to trust a target other, the target must possess
sufﬁcient knowledge and technical facility to meet performance standards. In other
words, the expectation reﬂects the simple notion of consistently producing the correct
answer or product. For example, a technically competent expert system would produce
the correct answer to the problem time after time.

Finally, trust is based on the expectation that target others will fulﬁll their
ﬁduciary responsibility, or the responsibility to place others’ interests above their own
personal gain. This expectation is inherent in situations where one partner makes use of a

special knowledge or skill that other members of the social system do not possess.

10

According to Barber’s theory, this type of trust functions to prevent abuse of power, and
he primarily applies the concept to politicians and doctors.

Trust in Automation: Muir ’3 Theory of T rust in Expert Systems.

Muir (1987) provides a theory designed speciﬁcally to describe human trust in an
expert system. Because the present study employs a simulation of an expert system task,
Muir’s (1987) theory appears particularly relevant. Muir integrates both Barber (1983)

and Rempel et a1. (1985) to create her theory of human trust in expert systems.

As discussed previously, expert systems are automated decision aids that produce
recommendations for action in an area of expertise. Muir uses Barber’s facets of trust as
a base point. She then applies Rempel et al.’s concepts of predictability, dependability,
and faith to discuss how trust may evolve over time in a human-computer relationship.
The synthesis of Barber and Rempel et al.’s theories results in a three by three matrix into
which various types of trust can be mapped (see Table 3).

In her theory, Muir also discusses how each of the six theory components, which
were developed with social trust in mind, can be applied to human-computer trust
relationships.

Muir begins with Barber’s theory. The ﬁrst component in Barber’s theory is
persistence, the expectation that the existing natural and social orders will persist into the
future. This persistence provides some sense of environmental constancy, allowing trust
to develop. Without the expectation of persistence, we would not be able to assume that
the laws of gravity will persist day after day, that the word “hello” will remain a greeting
tomorrow and thereafter, or that someone who we trust today will be trustworthy

tomorrow. This expectation of persistence is assumed to be present in all individuals, but

11

the degree to which individuals expect persistence is thought to vary. Some authors have
conceptualized persistence as an individual difference characteristic reﬂecting a
disposition to trust machines (Muir & Moray, 1996). This conceptualization is adopted
here and will be discussed further in the following sections.

Table 3

Basis of Trust Expectations at Different Levels of Experience

 

Basis of expectation at different levels of experience

 

Expectation Predictability (of acts) Deﬁggzggzjﬁ Faith (in motives)
Persistence
Natural physical Events conform to Nature is lawful Natural laws are
natural laws consistent
Natural biological Human life has survived Human survival is Human life will survive
lawful
Moral social Humans and computer Humans and computers Human and computers
act ‘decently’ are ‘good’ and ‘decent’ will continue to be
by nature ‘good’ and ‘decent’ in
the future
Technical competence X’s behavior is X has a dependable X will continue to be
predictable nature dependable in the future
Fiduciary responsibility X’s behavior is X has a responsible X will continue to be
consistently responsible nature responsible in the future

 

Barber’s second component, competence, refers to the ability to meet or exceed
performance standards. This same concept can be easily extended to include automated
systems. Automation that is perceived as competent will consistently produce desirable
performance results (Lee & Moray, 1992).

Fiduciary responsibility, Barber’s ﬁnal component, refers to one’s obligations to
place the well-being of others above one’s own personal gain. The opportunity to do so
occurs when one member of a group possesses knowledge, skills, or expertise exceeding
that possessed by other group members (Barber, 1983). Barber discusses trust at a macro

level, and his conceptualization of responsibility was termed “ﬁduciary” because it was

12

 

framed in terms of entire professions, such as politicians’ responsibility to place the
welfare of their constituents above their own personal welfare. Today, although human
operators sometimes act based almost entirely upon the recommendations of machines,
the ultimate responsibility for those actions remains with the human operator. Machines
are not blamed for accidents or for ﬁnancial loss resulting from their use; instead, the
blame is attributed to the operator. Therefore, the concept of ﬁduciary responsibility
does not ﬁt machines at the present time, and the conceptualization of responsibility must
be slightly amended. Muir amends the meaning of “ﬁduciary responsibility” to avoid
ﬁnancial connotations, but she retains the adjective “ﬁduciary.”

In order to adapt ﬁduciary responsibility to the human-computer context, Muir
asserts that because expert systems are usually designed to provide the expert knowledge
not possessed by the human operator, the human typically has little ability to evaluate
whether the expert system has produced the correct recommendation. This situation sets
the stage for the inﬂuence of responsibility. Responsibility has been conceptualized two
ways in the human-computer trust literature thus far. First, responsibility has been
discussed in terms of the programmer’s responsibility to provide the best possible product
(Muir & Moray, 1996). According to this conceptualization, a highly responsible
machine would be impeccably designed and programmed, whereas an irresponsible
machine would be designed with a less effective program. Second, responsibility has
been conceptualized as the extent to which the machine gives clear explanations of how it
is working and why it is requesting particular information (Muir, 1987). This second
conceptualization reﬂects responsibility in that the machine reduces the knowledge gap

between itself and the operator by making its processes and intentions explicit.

13

Muir next moves to Rempel, Holmes, and Zanna’s (1985) components of trust,
the ﬁrst of which is predictability. Muir sees predictability as both a construct in itself
and also as a stage of trust. As a construct, predictability receives little modiﬁcation from
Muir — it refers to the extent to which the machine behaves consistently in a given
situation. In the predictability phase, trust is based on the consistency of one’s behavior.
In the case of human-computer interaction, trust will be based on the consistency of the
machine’s output.

At the dependability phase, the focus is on an attribution about the machine’s
character. In order for a machine to be considered dependable, it needs to have exhibited
predictable behavior over time. For a machine, a predictable response might be to
process input and produce output. When the machine converts input to output over time,
the machine could be considered to be dependable. Any situation in which the machine
does not consistently perform (e.g., the machine “freezes up” or produces error messages)
may lead to perceptions that the machine is not dependable.

Muir also ﬁnds a way to apply faith to the human-machine relationship. In the
faith stage, the emphasis is on the operator’s expectation that the machine will continue
to be predictable and dependable in the future. In this phase of social trust, Muir
proposes that weight is given to events that indicate the person’s intrinsic motivation to
maintain the positive social relationship. While Muir balks at the attribution of
motivation to today’s machines, she does propose that human operators have faith in
machines. She believes that the fact that people continually use expert systems despite an
awareness that the systems operate largely beyond their understanding and can easily

malfunction indicates a sense of faith in those systems. Faith, then, can be

14

conceptualized as the extent to which operators are insensitive to cues indicating that the
machine is unpredictable or undependable.

Muir’s theory has received some empirical support. Muir and Moray (1996)
tested the theory in two studies that simulated the operation of a milk pasteurization
plant. In Study 1, participants were responsible for the supervision of three pump
functions, one of which they were able to toggle between manual and automatic control.

As previously discussed, persistence is seen as an individual difference
characteristic in human-computer trust relationships. Muir and Moray (1996) excluded
persistence from their study due to their desire to examine situational effects only. Of
Barber’s two remaining trust factors, competence emerged as the greatest predictor of
subjective trust, accounting for 81% of the variance in trust. Adding responsibility to the
regression equation did not result in a signiﬁcant improvement, although this may have
been the result of a measurement artifact; the items assessing competence and
responsibility were phrased almost identically. Therefore, the conclusion that
competence best reﬂects how people deﬁne “trust in a machine” requires replication by
future researchers.

Because participants had strongly preferred manual control over automatic control
in Study 1, Study 2 imposed a small penalty for the use of manual control. This penalty
was designed to be large enough to encourage the participants to select automatic control
when the automation was functioning correctly, but small enough so that they would not
hesitate to override faulty automation. Results indicated a positive correlation between

subjective trust in the automation and time spent in automatic mode (r = .71). This

15

ﬁnding supports the assertion that operators will be more willing to use automation that
they trust.

Furthermore, the results showed that the more error-prone the automation, the
more the operators will monitor it. Operators were signiﬁcantly more likely to check the
status of the automatic pump when that pump had a higher fault rate. The results of this
study therefore supported predictions made by Sheridan and Hennessy (1984) that an
inverse relationship would exist between trust and monitoring of automation.

In summary, Muir and Moray (1996)’s study found that high trust in the
automated system increases system use and decreases system monitoring. It was also
found that automation competence is the factor that most inﬂuences human-machine
trust, while responsibility did not signiﬁcantly add to the prediction of trust.

Lee and Moray (1992) conducted a study designed to replicate and extend Muir
and Moray’s (1996) ﬁndings. They used a nearly identical simulation of a juice
pasteurization plant. The difference was that in this study, operators could allocate all
three sub-systems to any combination of manual or automatic control. Surprisingly, the
results of the study failed to support predictions by Muir (1987) and Zuboff (1988) that
trust will be positively associated with increased use of the automated system. Instead,
they found the opposite relationship; as trust decreased, use of the automatic ﬁmction
increased. The authors conclude that trust is not solely responsible for inﬂuencing the
allocation of automatic and manual functions. They speculate that because many
participants were operating under manual control when errors were introduced into the
system, the errors might have affected the allocation strategy the operator was using at

the time. If the operator was using manual control of the pump when the errors occurred

16

(as the majority were), the operator might have been motivated to explore an alternative
strategy - use of the automation. Lee and Moray suggest that automation use might be
governed by a pairing of trust in the automation and operator task self-efficacy. This
proposition will be discussed further later in this paper.

The present study will attempt to further illuminate the relationships between trust
and automation use and between trust and automation monitoring. Lee and Moray (1992)
suggest that moderators such as self-efﬁcacy exist in the relationship between trust and
use; however, the possibility that moderator variables might exist in the trust / monitoring
relationship has yet to be examined.

Trust summary. In this study, trust will be conceptualized based on the
suggestions presented by Muir (1987). Therefore, in our study, the components of trust
will be conceptualized as follows:

Persistence: The belief that the natural and moral orders will persist, as deﬁned
by Barber (1983), cannot be manipulated for study. However, Muir and Moray (1996)
suggest that persistence implies a dispositional characteristic and can be investigated as
such. Therefore, persistence is conceptualized as a dispositional tendency to trust,
speciﬁcally to trust automation. This construct will be further discussed as “automation-
induced complacency” in the following sections.

Competence: Competence will be conceptualized as the extent to which the
automation produces correct and appropriate recommendations or actions.

Responsibility: As conceptualized by Muir and Moray (1996), responsibility
refers to the appropriateness of the system’s design in comparison to alternative possible

designs. However, because they found that the variance accounted for by responsibility

17

 

over and above competence was non-signiﬁcant, we will adopt the conceptualization
offered by Muir (1987). Responsibility will be conceptualized as the degree to which the
automation provides the user with information about its motives and processes.

Predictability: Predictability will be conceptualized here as the extent to which,
given the same input, the automated system will produce the same output.

Dependability: Dependability refers to an attribution that one makes that another
person can be consistently relied upon (Rempel, et al., 1985). In the case of a machine,
dependability will be conceptualized as the extent to which the machine is able to
perform its intended ﬁrnction. Therefore, a machine that was not dependable would have
a tendency to break down or to consistently produce error messages.

Faith: Muir (1994) suggests that faith is evident when operators use automation
under conditions where there is not absolute certainty that the automation will function
correctly. In situations of social trust between humans, faith is proposed to develop only
after an extended period of time and is expected to emerge only aﬁer predictability and
dependability have been established (Rempel, et al., 1985). However, Muir and Moray
(1996) found that faith instead was the ﬁrst component to emerge in the human-machine
trust relationship. This ﬁnding is not counter-intuitive if one considers that when the
human operator engages the automated system for the ﬁrst time, he or she has no
information regarding predictability or dependability on which to base trust. Instead, he
or she must take a “leap of faith” that the automation will function correctly. In addition,
automated systems that never produce errors or incorrect recommendations are extremely
rare. Operators are, over time, bound to experience some evidence that the automated

system may not be predictable and dependable, and operators may differ in the extent to

18

which they acknowledge this evidence. We conceptualize faith as the extent to which
operators are insensitive to cues that the automation lacks predictability and
dependability.

Complacency Potential. Some authors have suggested that people have a general
propensity to either trust or not trust automated systems in general. As previously
discussed, Muir and Moray (1996) conceptualize persistence as an individual difference
variable related to a general propensity to trust. Parasuraman, Molloy and Singh (1993)
examine a construct which similarly reﬂects a propensity to trust — speciﬁcally, a
propensity to trust automated systems. They propose that individual differences exist in
“automation-induced complacency potential,” where complacency is deﬁned as “a
psychological state characterized by a low index of suspicion” (Wiener, 1981, p. 117).
Parasuraman, et al. (1993) believe that some people are more likely than others to trust
automated systems to the extent that they cease to monitor the system for faults.
Automation-induced complacency has been implicated in numerous aircraft incidents in
which pilots failed to recognize or respond to a fault in one or more automated ﬂight
systems (e.g., Danaher, 1980; Wiener, 1985). However, until the past decade there have
been few efforts to empirically measure complacency or complacency potential.

Singh, Molloy and Parasuraman (1993) developed the Automation-Induced
Complacency Potential Rating Scale (AICPR) in order to assess complacency potential.
This scale asks respondents to rate whether they feel more comfortable using, for
example, an ATM versus a human bank teller or a computer-aided library search versus a
card catalog search. Because it examines generalized trust in all automation, this scale is

likely to tap the construct of past experience with automated systems. While not

19

explicitly stated by the authors, they seem to implicitly assume that positive past
experiences with automated systems in general is associated with trust in a particular
automated system, which may or may not be trustworthy.

It is worth noting that the concept of complacency potential somewhat contradicts
the ﬁnding of Lee and Moray (1994) that trust in one automated system does not directly
affect trust in another automated system. Research is needed to examine the extent to
which trust generalizes from one automated system to another.

Aﬂect Toward the Automation. A second individual difference of interest is
affect, or liking, for the speciﬁc automated system. The implementation of an automated
system in the workplace can at times invoke strong affective reactions, which may affect
user behaviors. For example, research on automation implementation in the workplace
suggests that when workers feel that their job security may be threatened by the
introduction of an automated system, they can feel negatively toward the system (Chao &
Kozlowski, 1986). Research suggests that when a user has negative affect toward an
automated system, he or she may avoid using it or take actions to undermine it (Sullivan,
1982). Therefore, affect toward the automation in question may inﬂuence decisions
regarding automation use.

Cognitive load. People have a tendency to minimize cognitive effort (Sharit,
2003). This tendency is due to the fact that humans have limited information processing
capacity. Because we cannot function well when this capacity is expended, we attempt to
decrease the amount of effort we need to expend on any single task. This tendency may

affect the extent to which people are motivated to rely on automated systems.

20

While automated systems are usually designed to decrease the amount of
cognitive effort one must spend on a task, the use of automation always incurs cognitive
costs (Sharit, 2003). We must learn to operate the system and continuously re-learn to
keep pace with upgrades. We must then expend cognitive effort on deciding whether or
not to use the automation, on going through the process of automation use, and on
monitoring the system for errors. If any errors arise, we must then determine how we can
correct the error. It is therefore possible for the cognitive costs of using an automated
system to outweigh the cognitive beneﬁts of using the system (Parasuraman & Riley,
1997). These cognitive load considerations may affect the likelihood that one will use an
automated system. If use of the system increases cognitive load, the system is less likely
to be engaged. Alternatively, if use of the system decreases cognitive load, the operator
should be more likely to use it when he/she desires to decrease cognitive load.

Although cognitive load considerations may affect automation use, cognitive load
has been predominantly investigated in relation to automation monitoring. Speciﬁcally,
attention has focused on attempts by users to reduce their experienced cognitive load by
reducing the amount of cognitive attention focused on monitoring the automated system.
Research has supported the suggestion that users may fail to monitor the automation if
doing so reduces cognitive load. For example, Dijkstra (1999) found that participants
who consistently agreed with incorrect system advice reported using less cognitive effort
than did participants who disagreed with the system at least once. Parasuraman, et al.
(1993) found that when participants worked under low workload conditions, they
monitored system performance accurately and identiﬁed all system errors. However,

when participants were placed under conditions of high cognitive workload, they reduced

21

cognitive load by ceasing to monitor the automated system for errors, and in this
condition they missed many system errors. Dzindolet, et al. (2001) found that
participants expended less cognitive effort as automated aid reliability increased. These
ﬁndings suggest that users attempting to minimize cognitive effort may not sufﬁciently
monitor faulty automation, leading to unconditional acceptance and negative
consequences.

Dispositions

I next turn to constructs related to the human user’s stable dispositions.

Self-efficacy. To say that trust alone predicts behavioral choices in automation
use represents a view too simplistic to accurately describe complex human behavior. As
previously discussed, studies have shown that user reliance on automation does not
consistently correspond to changes in the user’s trust in the automation (Lee & Moray,
1992). Lee and Moray propose that task self-efﬁcacy might be an intermediate variable
in the relationship between trust and use of automation.

A widely-cited study by Lee and Moray (1994) examined the interacting effects
of trust and self-conﬁdence in operators’ decisions to either place a task under automatic
or manual control. Those authors deﬁned the term “self-conﬁdence” as “anticipated
performance during manual control” (p. 154). This deﬁnition seems to overlap strongly
with the psychological construct of “self-efﬁcacy,” which is deﬁned by Bandura as
“beliefs in one’s capabilities to organize and execute the courses of action required to
produce given attainments” (Bandura, 1997, pp. 3). Both self-conﬁdence and self-
effrcacy reﬂect a person’s beliefs about his or her ability to perform a task at a desired

level. Therefore, I will hereafter use the term “self-efﬁcacy” as synonymous with “self-

22

conﬁdence.” Both terms are used in reference to the user’s perceptions of his or her
ability to perform the task manually; they do not refer to perceptions of ability to
correctly operate the automated system.

Lee and Moray (1994) used the PASTEURIZER task from Lee and Moray
(1992), in which participants operated various functions in a partially-automated
pasteurizing plant. Operators were responsible for maintaining acceptable performance
on three functions — the feedstock pump, the steam pump, and the steam heater. The
participants could choose to place each of these pumps under either automatic or manual
control at any time during each trial. After participants became accustomed to operating
the plant systems, two types of errors were introduced, both of which affected the
feedstock pump. One type of error affected manual control of the pump, while the other
affected automatic control of the pump. Therefore, during the trials in which the error
affected manual control, participants were expected to allocate control to the automatic
system, and vice versa.

The authors did not use existing scales of trust and self-conﬁdence in this study,
nor did they specify precise deﬁnitions of the terms. They allowed the participants to use
their individual deﬁnitions of trust and self-conﬁdence. The two constructs were
measured on subjective scales and were framed to refer directly to the feedstock pump,
the steam pump, and the steam heater individually.

Results on the subjective rating scales indicated that when errors occurred in the
manual control of the feedstock pump, operator self-efﬁcacy decreased and trust in the
automatic system increased. Accordingly, after an error occurred while the pump was

under manual control, participants tended to rely more heavily on the automation than

23

they had before the error. In addition, while allocation strategies varied somewhat
between participants, those who chose to allocate tasks mostly to the automation
generally had more trust in the automation than self-efﬁcacy and those who mostly
retained manual control generally had more self-efﬁcacy than trust.

Lee and Moray also correlated the average difference between trust and self-
efﬁcacy for each pump with allocation to the automated system for that pump. They
found signiﬁcant relationships between the trust and self-efﬁcacy difference (T — SE) for
each function and automation use for that same function. The results present a strong
case for the balance of trust and self-efﬁcacy in the determination of the choice to use
automation.

However, participants’ choices regarding automation use did not always
correspond with the difference between trust and self-efﬁcacy. At times, participants
opted to use manual control even when their trust exceeded their self-efﬁcacy. This
tendency was more pronounced in the earlier part of the study, with instances in 25% of
the early trials and 13% of the later trials. The authors suggest that these deviations may
be due to exploratory behavior — that is, behavior designed to “see what will happen” if
the participant pursues various courses of action. While this is a possibility, it is also
plausible that the deviations could be the results of unaccounted-for variables. Perhaps
some of the deviations can be explained by personality variables.

Personality. Parasuraman and Byme (2003) cite research by Parasuraman (1976)
and Davies and Parasuraman (1982) that suggests that some people are more diligent
monitors of automation than are others. These studies have shown that general

intelligence, aptitude, reasoning skills, and memory are generally poor predictors of

24

monitoring performance. Personality may be a more ﬁ'uitful avenue for monitoring
research. Much psychological research has recently incorporated the Big Five
personality traits. Two of these traits, extraversion and conscientiousness, seem
particularly relevant to automation monitoring.

Conscientiousness. Conscientiousness is comprised of the tendencies to
be dutiﬁil, scrupulous, hardworking, and ambitious (McCrae & Costa, 1987). These are
qualities that are likely to be important in the process of monitoring automation.
Individuals high in conscientiousness are likely to be very concerned that the task is
performed correctly and should be more likely to monitor the automation in order to
ensure correct performance.

Extraversion. Those who are high scorers on the extraversion factor tend
to be sociable, cheerful, and assertive. Introverts, on the other hand, tend to be more
withdrawn and to draw energy from being alone (McCrae & Costa, 1987). Most
relevant, however, is the tendency for extraverts to exhibit more sensation-seeking
behavior than do introverts. Parasuraman and Byrne (2003) state that some studies have
found moderate correlations between introversion and monitoring performance and
suggest that this correlation may be due to introverts’ lesser need for stimulation.

Introversion, therefore, may be positively related to automation monitoring.

25

Hypotheses

See Figure 1 for an integrated model of the hypotheses. As discussed previously,
persistence will be conceptualized in the human-computer trust relationship as
Parasuraman, et a1. (1993)’s “automation induced complacency potential.” Complacency
potential represents the degree to which people trust automated systems in general.
Therefore, participants with a high complacency potential — and therefore high
generalized trust in automation — are hypothesized to trust the speciﬁc system used in this
experiment more than will those with low complacency potential. In addition, because
people high in complacency potential have a dispositional tendency to trust machines
across all situations, even without experience with those machines, complacency
potential is hypothesized to relate positively to faith. Finally, people high in
complacency potential are hypothesized to be less likely to monitor the automation’s
functioning than are those low in complacency potential.

HIa: Complacency potential will be positively related to subjective ratings of
trust.

H] b: Complacency potential will be positively related to faith.

HIc: Complacency potential will be negatively related to automation monitoring.

As discussed by Lee and Moray (1994), the decision to use automation is thought
to depend on both trust in the automation and on one’s own task self-efﬁcacy. If the user
trusts the automation to perform the task correctly, s/he will be more likely to use the
automation than if s/he has little trust in the automation. However, trust in the

automation is balanced by the user’s beliefs about how well he or she can perform the

26

tasks unassisted. When trust is high and self-efﬁcacy is low, the operator will have a
stronger tendency to engage the automated system, and vice versa.

H2: Self-eﬂicacy will moderate the relationship between trust and automation
use.

Furthermore, the trust — self-efﬁcacy relationship should also affect the likelihood
that a user will monitor the automation for errors. When trust is high and self-efﬁcacy is
low, operators are likely to feel that the automated system “knows best,” and they will be
less likely to monitor the system. However, when trust is low and self-efﬁcacy is high,
operators will be more likely to monitor the automated system because they see
themselves as being equals or superiors to the automation in terms of task performance.

H3: Self-eﬂicacy will moderate the relationship between trust and automation
monitoring.

The user’s desire to decrease his or her cognitive load may also affect both use
and monitoring of the automated system. The degree to which use of the automation
increases or decreases cognitive load may depend on the design of the system itself.
When an automated system is difﬁcult to use, the operator might experience a higher
level of cognitive load when using the system than when not using the system. However,
the automated system designed for the present study is relatively easy to use, and
therefore, desire to decrease cognitive load should be positively related to automation
engagement.

H4a: Desire to decrease cognitive load will be positively related to automation

use.

27

Regardless of system design, monitoring the automation should always require
more cognitive effort than not monitoring the automation. Therefore, it is predicted that

H4b: Desire to decrease cognitive load will be negatively related to automation
monitoring.

Hypothesis 5 concerns affect toward the automated system. When affect toward
the system is positive, users should be more likely to use it. Those users who experience
negative emotion toward the automated system will be more likely to avoid contact with
it. Therefore,

H5: Positive affect toward the automation will be positively related to
automation use.

As discussed previously, it has been suggested by Parasuraman and Byme (2003)
that extraversion may be negatively related to automation monitoring. In addition,
because it reﬂects a tendency to be diligent and careful, conscientiousness may exhibit a
positive relationship with automation monitoring.

H6a: Conscientiousness will be positively related to automation monitoring.

H6b: Extraversion will be negatively related to automation monitoring

28

Figure 1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Experimental Model
Affect
I
H5
v
f Use
Trust 512
Conditions: S elf-Effr c a cv H4a
Competence
Dependability l D .
Predictability, Trust H3 13:22::
etc. Load
\ H4b
Faith Hla Monitoring ’5/
\ H l c
Hlb Complacency / /\
Potential H 6a H 6b
t / \
Conscientiousness Extraversion

 

 

 

 

 

 

Task Overview

In order to address the hypotheses, a task was designed that was somewhat
familiar to our participants and that would be seen as important — the task of inspecting
luggage. One hundred and twenty x-ray images of suitcases were developed. Some of
these images contained weapons; most did not. Participants were to inspect each image
as accurately and quickly as possible. After inspecting each image, participants indicated
whether they would “search” the bag (they believed it might contain a weapon) or “clear”
the bag (they believed it contained no weapon). In addition, they were told that an
automated machine with visual capabilities was available to assist them in the inspection.

This machine was termed the “Automatic Weapons Detector,” or AWD. Example

29

images of suitcases are displayed in Figures 2, 3, and 4, and an example screenshot from
the task is displayed in Figure 5. The primary focus of this study was on participants’
decisions regarding this machine: their choices to use the automated system and their
willingness to monitor its decisions.

Figure 2

Example X-ray Slide #1

 

30

Figure 3

Example X-ray Slide #2

 

 

       
 

\

\ , .1
I“ I"

“"3

”of ll
’4 I,

 

 

31

Figure 4

Example X-ray Slide #3

 

 

 

 

 

32

Figure 5

Screen Capture from X—ray Task

r ry '4 ' \lr‘.‘ ~'. ', v’.‘

     

TIME REMAINING POINTS ACCUMULATED
I” Ill-In I35 aud- B

 

 

 

 

 

 

 

33

Method

Participants and Design

A sample of 253 undergraduate students from a large Midwestern university
volunteered to participate in this study in return for course credit. Participants’ mean age
was 19.25 years. The sample was 84.9% White, 4.7% Black, 4.7% Asian, and 3.1%
other ethnicities (2.4 percent declined to report ethnicity). In order to increase
motivation, the top scorer in each condition received a $50 prize. After the conclusion of
data collection, the $50 prize was awarded to the highest-scoring participant in each
condition. Hypotheses were tested using a two (high or low trust conditions) X two (high

or low self-efﬁcacy) X two (neutral or positive affect) between participants design.

Procedure

The experimental task was entirely computer-based. After providing informed
consent, participants completed an online questionnaire assessing demographic
characteristics, automation-induced complacency potential, extraversion, and
conscientiousness.

Next, participants completed a computer—based training session for the task of
baggage screening. A transcript of this training can be found in Appendix A. The
session included instructions regarding what the participants should look for (guns and
knives), how to “search” or “clear” a bag, and the details of the scoring system for correct
and incorrect decisions. After receiving these instructions, participants completed a one-
minute practice trial during which they did not have access to the AWD. During the trial,
participants received false feedback about their performance. Each participant was

randomly assigned to receive high performance feedback or low performance feedback in

34

order to manipulate self-efﬁcacy. This manipulation is discussed more fully in the
“manipulations” section (see page 36). Following this practice trial, participants
completed measures of self-efﬁcacy and desire to decrease cognitive load.

In Training Session 2, participants received instructions on how to operate the
Automatic Weapon Detector (AWD). The transcript of this session can be found in
Appendix B. The training information included descriptions of the predictability,
dependability, and competence of the AWD’s performance. Following this training
session, participants viewed a one-minute demonstration of the AWD. During this trial,
participants witnessed the machine’s tendency to make mistakes, to break down, or to
function correctly. Following Training Session 2, participants completed measures of
pre-task trust and faith.

After the two training sessions, the full task began. In this task, participants were
given a time period of 20 minutes in which they were required to “screen” as many bags
as possible with the fewest errors (see Figures 2, 3, and 4 for example X-ray slides). The
base rate for weapon presence in this study was 30%. Airline security personnel often
face time pressure in screening bags, as long lines cause customer dissatisfaction. In
order to simulate these conditions, a point penalty was imposed for each minute passed
during the screening task (see Table 4). The point system was pilot tested prior to data
collection in order to ensure that variance was obtained in user behaviors. In sum, the
point system devised for this screening task rewarded both speed and accuracy.

Following the task period, participants completed measures for post-task trust,
affect, and perceived cognitive load. Finally, participants experienced a debrieﬁng period

before being dismissed from the experimental session.

35

Table 4

X -Ray Screening Task Scoring System.

 

 

Action Points
Correctly Identify a Gun or Knife + 100 points
Miss a Gun or Knife - 100 points
Correctly Clear a Bag + 25 points
Open a Bag and Find Nothing - 25 points
Each Minute Elapsed - 10 points

 

Experimental Manipulations
Trust conditions. Conditions were manipulated in order to encourage trust in the

automated system to be either high or low. Based on the literature review of trust
presented previously, trust conditions were manipulated by providing participants with
explicit information about the predictability, competence, and dependability of the AWD.
The descriptions below provide speciﬁc values for variables manipulated.

Predictability. In the high trust condition, participants were told that the
AWD was 100% predictable — that is, when given the same bag to screen multiple times,
the AWD would produce the same recommendation every time. In contrast, participants
in the low trust condition were told that the AWD may produce different
recommendations if the same bag were scanned multiple times.

Competence. Participants in the high trust condition were also told that
the machine was highly competent in that it would make correct recommendations 85%

of the time. In the low trust condition, participants were told that the AWD produced a

36

correct recommendation only 65% of the time. These percentages were pilot tested for
effectiveness and were reﬂected in the AWD’s actual functioning throughout the trials.

Dependability. In the high trust condition, participants were told that the
AWD does not break down. Participants in the low trust condition were informed that
the AWD is not highly dependable and may fail to operate at times. During the task trial,
the AWD produced error messages on an average of 25% of bags screened.

Responsibility. Responsibility was not manipulated verbally, but was
instead reﬂected solely in the AWD’s operating procedure. In the high trust condition,
the AWD provided increased levels of information about what it was doing during the 5
seconds required for it to operate. During the ﬁrst 1/3 of the time, a message indicated
that the AWD was “acquiring.” During the second 1/3 of the time, the message indicated
that the AWD was “scanning,” and during the ﬁnal 1/3 of the time, the message indicated
that the AWD was “processing.” In the low trust condition, only a single message,
“scanning,” was displayed for the entire 5 seconds. Therefore, in the high responsibility
condition, participants were provided with more information about what the AWD was
doing than they were provided in the low responsibility condition.

Self-eﬂicacy. Self-efﬁcacy was manipulated using the feedback provided to the
participants during their practice session. Following the initial set of slides, participants
in the high self-efﬁcacy condition were told that they had been correct on 90% of their
practice trials, while low self-efﬁcacy participants were told that they had been correct on
only 45% of trials.

Aﬂect. Affect toward the AWD was manipulated using word choice in the

training session. Reeves and Nass (1996) found that when a computer was labeled as a

37

“teammate,” participants reported signiﬁcantly higher liking and affect toward it than
when it was simply labeled “a computer.” Therefore, in the high affect condition, the
training protocol referred to the AWD as “your automated assistant,” whereas in the
neutral affect condition, the AWD was referred to as “the Automatic Weapon Detector
(AWD).”

Measures

Complacency potential. Complacency potential was assessed using the
Automation-Induced Complacency Potential Rating Scale (Singh, Molloy, &
Parasuraman, 1993). All items can be found in Appendix C.

Automation use. The extent to which participants made the decision to use the
AWD was assessed by the proportion of total bags that were screened using the AWD. A
bag was considered screened using the AWD if the participant activated the AWD prior
to making a decision about whether to search or clear a bag.

Monitoring. The extent to which the participant monitored the AWD’s
performance was assessed using the quotient of the number of times in which the
participant correctly disagreed with the AWD’s recommendation by the total number of
trials during which the AWD was engaged and produced an incorrect recommendation.
The total number of times the participant disagreed with the AWD was also examined.

Predictability, dependability, and faith. These three trust components were
measured using an adaptation of the scale developed by Rempel, Holmes, and Zanna

(1985). All items can be found in Appendix D.

38

Competence and responsibility. Participants’ perceptions of the AWD’s
competence and responsibility were each assessed using items created by the author for
this study. See Appendix E for items.

C onscientiousness and extraversion. These dispositional characteristics were
measured using the appropriate scales of the IPIP measure (Goldberg, 2001). All items
can be found in Appendix F and G.

Trust. Participants’ subjective trust in the AWD was measured with items created
for the purposes of this study. Scale items can be found in Appendix H.

Desire to decrease cognitive load. Participants’ desire to decrease cognitive load
was measured immediately after the training session using items created for this study.
To view the items, see Appendix I.

Manipulation Checks

Self-efﬁcacy. Self-efﬁcacy was assessed immediately following the training
session using an adaptation of Toney and Kozlowski’s (1999) self-efﬁcacy scale. See
Appendix J for items.

Affect. Participants’ affect toward the AWD was assessed using self-report
measures developed by the author for use in the present study. See Appendix K for
items.

Perceived cognitive load. The AWD was designed with the intention of making
the screening task easier. In other words, the AWD was intended to decrease
participants’ cognitive load on the screening task. However, it is possible for an

automated system to increase the cognitive load required on a task (Sharit, 2003).

39

Therefore, perceived cognitive load was measured using the NASA Task Load Index

(Hart & Staveland, 1988). TLX procedures can be found in Appendix L.

40

Results
Organization of the Results

The results begin with a discussion of the manipulation checks, followed by a
presentation of scale descriptive statistics, including scale means, standard deviations,
reliabilities, and intercorrelations. Next, the results of all hypotheses will be presented as
well as the results of structural equation modeling tests of model ﬁt. Before turning to
the results, I will present four notes concerning the analyses.

First, it is important to note that in spite of indications to the contrary from pilot
testing, the initial data collection yielded low base rates for use and disagreements with
the automated system. Therefore, a mandatory use condition was added. One-half of the
participants completed the revised procedure. In the revised procedure, the AWD
activated automatically as each slide was presented, forcing these participants to use the
AWD on every slide viewed. Independent samples t-tests revealed no signiﬁcant
differences between participants in the two procedure types on demographic
characteristics or predictor variables. A signiﬁcant difference in affect was found
between groups; the mandatory AWD group had higher affect toward the AWD (see
Table 5). This difference in affect (which was measured after participants had completed
the full task), was likely due to the different levels of AWD use between the two groups
rather than to dispositional differences. Based on the results of these t-tests, Hypotheses
2, 4a, and 5 were analyzed using participants in session Type A (AWD optional) only.
Since there was little or no variance in monitoring behavior under the AWD optional

condition, Hypotheses 3, 4b, 6a, and 6b were analyzed using participants in session Type

41

B (AWD mandatory) only. The remaining hypotheses were analyzed using all
participants.
Table 5

T-tests for Significant Diﬂerences Between Session Types A and B

 

 

’ P
Automation-Induced Complacency Potential -.64 .53
Conscientiousness .38 .70
Extraversion -.74 .46
Desire to Decrease Cog Load -1.57 .12
Self-Efﬁcacy .47 .64
Pre-task Trust .20 .84
Faith -.20 .84
Trust X Self-Efﬁcacy .03 .97
Affect -2.5 l .01
Post-task Trust .90 .37
Automation Use -48.62 <.01
Automation Monitoring (disagreements) -6.49 <.01

 

Second, the data for AWD use exhibited a signiﬁcant positive skew (see Figure
6). To correct this problem, a log transformation was performed on the data to normalize
the distribution. In this transformation, eighteen participants who did not use the AWD at
all on the task were removed. The transformed distribution is displayed in Figure 7.
Hypotheses tested before and after the logarithmic transformation on the data did not
signiﬁcantly differ. Therefore, the analyses using uncorrected data are reported here.

Third, monitoring of the AWD was operationalized in multiple ways. The ﬁrst
operationalization was the proportion of slides on which the participant disagreed with

the recommendation of the AWD.

_ disagreements

 

slidesviewed

42

Figure 6

The original distribution of A WD USE:

Distribution of AWD Use

 

 

 

0.00 0.20 0.40 0.60 0.80
awduse

43

Figure 7

Transformed distribution of A WD use

Distribution of AWD Use
(With Log Transformation)

 

 

00unt

 

-2. 00

-1.50 -1.00

-0.50
loguse

For the transformation, participants who dld not use the AWD
at least once were removed

The second operationalization of automation monitoring focused not on total

disagreements, but on correct disagreements only. This second operationalization

assessed not monitoring behavior in general, but only correct monitoring behavior. This

44

construct was created by dividing the number of disagreements made with the AWD by

the number of slides on which the AWD made an incorrect recommendation.

_ disagreements
A WDincorrect

 

2

Throughout this section, the results concerning both total disagreements and correct
disagreements are presented.

Finally, subjective trust in the AWD was measured both before and after the
major task was performed. Because it was expected that levels of trust would change
over time as participants gained more experience with the task, hypotheses regarding
subjective trust were tested using both the pre-task and post-task ratings of subjective
trust (refer to appendix H for the pre-task trust items and appendices D and E for the
post-task trust items). For the AWD-optional participants, the correlation between pre-
task trust and post-task trust was .52, while for AWD-mandatory participants the
correlation was .35. These correlations indicate that the trust level did change throughout
the task.

Descriptive Statistics

Table 6 presents the means, standard deviations, minimums, and maximums for
each scale used in the present study. Internal consistency reliabilities (coefﬁcient alphas)
and scale intercorrelations are found in Table 7. All scales achieved acceptable levels of

reliability with the exception of the automation-induced complacency potential scale.

45

Table 6

Scale Ranges, Means, and Standard Deviations

 

 

 

N Min Max Mean Std Dev

Complacency 253 2.5 4.83 3.50 0.37
Conscientiousness 252 2.20 5.00 3.66 0.50
Extraversion 253 2.30 5.00 3.75 0.51
Desire to

Decrease Load 253 1.00 5.00 3.28 0.72
Self-Efﬁcacy 253 2.00 5.00 3.35 0.53
Trust (pre) 253 1.33 5.00 2.89 0.63
Faith 253 1.29 4.43 2.97 0.53
Trust X SE 253 3.19 25.00 9.75 2.92
Affect 253 1.00 5.00 2.96 0.93
Trust (post) 252 1.17 4.28 2.57 0.61
Manipulation Checks

Tables 8 and 9 display the results of the manipulation checks for trust, self-

efﬁcacy, and affect. The checks were performed by correlating the experimental

condition (high or low for each construct) with the individual item scores for trust (Table

8) and with the measured construct scale scores for self-efﬁcacy and affect (Table 9).

Results indicate that the manipulations for trust and self-efﬁcacy were linked with

signiﬁcant differences in measured trust and measured self-efﬁcacy. However, the

manipulation for affect was non-signiﬁcant.

Table 7

Scale Intercorrelations

 

 

l 2 3 4 5 6 7 8 9
l Complacency Potential .65
2 Conscientiousness . 12 .83
3 Extraversion .21 * .20“ .85
4 Pre-task Trust .23* .04 .08 .88
5 Self-Efﬁcacy .24" .19* .17" .17" .82
6 Affect .06 -.O4 -.04 .31 * -.06 .90
7 Post-task Trust .07 .00 -.04 .26‘ -.09 .27“ .96
8 Desire to Decrease Load -.08 -.05 -.09 -.06 -.44"‘ .05 .08 .76
9 Faith 36* .05 .08 .61 * .16“ .34‘ .46“ -.05 .70

 

* = signiﬁcant at p < .01
N = 253

46

Note: Scale reliabilities are displayed in the diagonal of the matrix
Hypothesis Testing.
Hypotheses 1a through 1c.

Hypotheses 1a and 1b proposed that automated-induced complacency potential
would be positively related to subjective ratings of trust and faith. As expected,
complacency potential exhibited a signiﬁcant and positive correlation with subjective
pre-task trust (r = .23, p < .01), indicating that complacency potential, which is
conceptualized as a generalized propensity to trust machines, signiﬁcantly related to trust
in this speciﬁc machine (the AWD) prior to the task. Therefore, Hypothesis 1a was
supported. However, it is interesting to note that complacency potential failed to relate
signiﬁcantly to post-task trust (r = -.02, p = .71). Therefore, automation-induced
complacency potential seemed to affect trust in an automated system only before
participants had actual experiences with the automated system.

Table 8
Trust Manipulation Check:

Means and Standard Deviations of Faith and Pre-task Trust by Trust Condition and
Correlations of Trust Condition with Faith and Pre-task Trust

 

 

Faith Pre-Task Trust
High Trust
Mean 2.89 2.73
SD 0.55 0.62
N 133 133
Low Trust
Mean 3.06 3.08
SD 0.48 0.59
N 120 120
CorrelationT .17* .27"
d .35 .59

 

T Note: Point-Biserial Correlations of Trust Condition with Faith and Pre-Task Trust
* __

p - .01
** p < .01

47

Table 9

Self-Efﬁcacy Manipulation Check:
Means and Standard Deviations Reported Self-Eﬁicacy by Self-Eﬂicacy Condition and
Correlation of Self-Eﬁ‘icacy Condition with Reported Self-Eﬁ‘icacy

 

 

 

Self-Efﬁcacy Condition
Low Self-Efﬁcacy
Mean 3.20
SD 0.49
N 131
High Self-Efﬁcacy
Mean 3.50
SD 0.54
N 122
CorrelationT .28*
d .56
T Note: Point-biserial correlation
* p < .01
Table 10

Affect Manipulation Check:
Means and Standard Deviations Reported Affect by Aﬂect Condition and Correlation of
Affect Condition with Reported Affect

 

 

Affect Condition
Neutral Affect
Mean 2.87
SD 0.91
N 117
Positive Affect
Mean 3.04
SD 0.94
N 136
CorrelationT .09
d .18

 

T Note: Point-biserial correlations

Hypothesis 1b suggested that complacency potential would be signiﬁcantly
related to faith (measured prior to the task). This hypothesis was supported (r = .26, p <

.01). The support for Hypothesis 1b suggests that those with higher automation-induced

48

complacency potential were more likely to be trusting of an automated system in
conditions of uncertainty than were those low in automation-induced complacency
potential.

Hypothesis 1c proposed that automation-induced complacency potential would be
negatively related to the extent to which participants monitored the automated system.
Neither measure of monitoring supported Hypothesis 1c. Total disagreements (r = .00, p
= .97) and correct disagreements (r = -.06, p = .52) with the AWD were unrelated to the
measure of automation-induced complacency potential. Hypothesis 1c was not
supported.

One possible explanation for the lack of support found for Hypothesis 1c is that
the low reliability of the complacency potential measure may have led to decreased
correlations with monitoring behavior. Another possible explanation for these ﬁndings is
that people, despite their initial predispositions to trust machines, were able to quickly
adjust their attitudes when they gained actual experience with a speciﬁc automated
system. This idea is supported by the results of Hypothesis 1a, in which complacency
potential related signiﬁcantly to pre-task trust, but not to post-task trust. To further
examine this effect in relation to monitoring behavior, the correlations between
complacency potential and monitoring were contrasted for the ﬁrst half of slides and the
second half of slides viewed. Results showing a stronger relationship in the ﬁrst half than
in the second half would support the hypothesis that experience with an automated
system negates the effects of complacency potential. However, the opposite result was
found. The correlation between complacency potential and total disagreements with the

AWD increased in the second half when compared to the ﬁrst half. Therefore,

49

participants either immediately adjusted their perceptions of the AWD — perhaps during
the training session prior to the task — or automation-induced complacency potential
might exhibit no true relationship with automation monitoring.
Hypothesis 2

Hypotheses 2 and 3 predicted that self-efﬁcacy would moderate the relationships
between trust and use and trust and monitoring, respectively. Information related to these
hypotheses can be found in Table 11.
Table 11

Regression Analyses of Trust, Self-efﬁcacy, and their Interaction on Use, Total
Disagreements, and Correct Disagreements

 

 

AWD Use Total Disagreements . Correct
Disagreements
n 13 p n l3 p n 13 p

 

Pre-task Trust 129 .32 <.01 124 -.12 .20 124 -.15 .11
Post-task Trust 129 .39 <.01 124 -.54 <.01 124 -.36 <.01
Self-efﬁcacy“ 129 -.1 1 .20 124 -.04 .66 124 .01 .88
Self-efﬁcacy“ 129 -.02 .80 124 -.00 .99 124 .03 .76
Pre-trust X

Self-efﬁcacy 129 -.61 .33 124 -1.44 .06 124 -.46 .55
Post-trust X

Self-efficacy 129 -.86 . 12 124 -1 . 12 .05 124 .04 .95

 

* = self-eﬂicacy in the pre-task trust model
** = self-efﬁcacy in the post-task trust model

Hypothesis 2 proposed that self-efﬁcacy would moderate the relationship between
subjective trust and AWD use. Entered in the ﬁrst step of the regression equation were
pre-task trust and self-efﬁcacy. The trust by self-efﬁcacy interaction was entered in the
second step of the regression analysis. The main effect for pre-task trust was statistically
signiﬁcant (t=3.73, [3:32 p<.01). However, neither the main effect for self-efﬁcacy (t =
-l .28, [3 = -.11,p=.20) nor the interaction between trust and self-efficacy (t = -.99, B = -

.61, p=.33) was found to be a signiﬁcant predictor of AWD use.

50

Hypothesis 2 was also investigated using post-task trust. In the analysis involving
post-task trust, similar results were found. Post-task trust exhibited a signiﬁcant main
effect on AWD use (t = 4.65, B = .39; p<.01), but self-efﬁcacy (t = -.26, B = -.02, p= .80)
and the interaction between post-task trust and self-efficacy (t = -1.57, 0= -.86, p=.12)
were non-signiﬁcant.

In order to more fully explore Hypothesis 2, the data were split into three groups.
Group 1 was composed of the 30 participants with the lowest AWD use. Group 3 was
composed of the 30 participants with the highest AWD use, and group 2 was composed
of the middle range participants. Groups 1 and 3 were contrasted on trust, self-efﬁcacy,
and the interaction term using independent samples t-tests to compare means (see Table
12).

Table 12

T-tests for Diﬂerences in Trust, Self-Eﬁ‘icacy, and Their Interaction Between Low Use
and High Use Participants

 

 

Use N Mean SD t p
Pre-task Low 30 2.57 .46
Trust High 30 3.18 .58 -4.51 <.01
Post-Task Low 30 2.56 .37
Trust High 30 3.00 .62 -3.33* <.01
Self- Low 30 3.40 .53
Efﬁcacy High 30 3.54 .56 .63 .53
TxSE Low 30 8.77 2.26

High 30 10.57 2.77 -2.75* .01

 

* Equal variances not assumed

 

Signiﬁcant mean differences between groups 1 and 3 were found on both pre-test
trust (t = -4.51,p < .01) and post-task trust (t = -3.33,p < .01), with the high use group
reporting higher mean trust. This ﬁnding supports an assertion that participants with high

trust are more likely to use automation than are participants with low trust. There was no

51

signiﬁcant difference between the high use and low use groups on self-efﬁcacy (t = .63, p
= .53). The interaction between trust and self-efﬁcacy did, however, show a signiﬁcant
mean difference with the high use group reporting higher means (t = -2.75, p = .01).
These results suggest that the interaction between trust and self-efﬁcacy may have
predictive validity for the extreme levels of AWD use. The results of Hypothesis 2 are
revisited following the discussion of Hypothesis 3.
Hypothesis 3

Similarly, Hypothesis 3 proposed that self-efﬁcacy would moderate the
relationship between subjective trust and monitoring. Like the relationship of self-
efﬁcacy and use, the relationship of self-efficacy and total proportion of disagreements
with the machine was not statistically signiﬁcant (t = -.45, B = - .04, p = .66). However,
the relationship of trust with total proportion of disagreements with the automated
machine also did not reach levels of statistical signiﬁcance (t = -1.30, B = -.12, p = .20).
The interaction between pre-task trust and self-efﬁcacy produced a marginally signiﬁcant
relationship with total proportion of disagreements (t = -1.91, B = -1.44, p = .06). A
graph of this interaction revealed that participants with high trust and high self-efﬁcacy
disagreed with the machine far fewer times than did participants in the other three groups
(see Figure 8).

Hypothesis 3 was additionally tested using the proportion of correct
disagreements with the machine as the monitoring form of interest. These tests yielded a
somewhat different pattern of results. The direct effects of pre-task trust (t = -1.60, [3 = -

.147, p = .11) and self-efficacy (t = .16, B = .01, p = .88) were not statistically signiﬁcant.

The interaction between trust and self—efficacy was also non-signiﬁcant (t = -.61, B: -

52

.46, p = .55). However, the direct effect of post-task trust was signiﬁcant (t = -4.15, B = -
.36, p < .01). Self-efficacy (t = .31, [3 = .03, p = .76) and the interaction between post-
task trust and self-efﬁcacy (t = .07, B = .04, p = .95) were non-signiﬁcant. Overall,
Hypotheses 2 and 3 were not supported.

Figure 8

Graph of the Interaction Between Trust and Self-Eﬂicacy

 

 

 

 

 

 

 

 

 

 

 

 

 

 

30
29.5 «hm—— —- .
29 \

If.

g 28.5 . I % I+Low Self-l
SIS 28 "' \ T efﬁcacy .
e5 27-5 ~—--—————----- \ +Highseu.

g 27 -II~ —— _____ . Efﬁcacy

26.5 , \ﬁn
26 .. h
25.5 . .
0 1 2 3 4
Trust
p=.06

There are at least two possible causes for the lack of signiﬁcant effects for self-
efﬁcacy in Hypotheses 2 and 3. First, the self-efﬁcacy manipulation may not have been
sufﬁciently strong. The manipulation check for self-efﬁcacy showed a signiﬁcant and
positive correlation between self-efﬁcacy condition and measured self-efﬁcacy. This
result indicated that those in the high self-efficacy condition reported higher levels of
self-efﬁcacy than did those in the low self-efﬁcacy condition. However, the magnitude
of this correlation was only around r = .25. Further support for a weak manipulation is

provided by the histogram of the self-efficacy distribution. A strong manipulation is

53

expected to yield a multimodal distribution in which two means are evident — one for the
high self-efﬁcacy condition and another for the low self-efﬁcacy condition. However,
the histogram for the data in the present study resembles a normal distribution. A normal
distribution does not support the presence of a strong self-efﬁcacy manipulation (see
Figure 9). It is possible that a stronger manipulation may have yielded signiﬁcant results
for self-efficacy on automation use. To more closely examine the potential effects of this
possibility, the participants with the highest and lowest self-efﬁcacy were contrasted.
The results showed no signiﬁcant differences between the participants highest and lowest
in self-efficacy for either automation use or automation monitoring. Therefore, the
strength of the manipulation did not likely have a large impact on the lack of signiﬁcant
ﬁndings for self-efficacy.

A second possible explanation for the lack of signiﬁcant results for self-efﬁcacy
on automation use concerns the duration the manipulation can be expected to last. The
self-efﬁcacy manipulation occurred early in the experimental procedure in the form of
false feedback during the task training. However, participants received true feedback as
they progressed through the experimental task. Therefore, the effect of the self-efﬁcacy
manipulation may have decreased steadily throughout the experimental task as
participants gained a sense of their true task ability via task feedback. In order to explore
this possibility, the effects of self-efficacy were tested by contrasting the halves of the
experimental task. If the strength of the self-efﬁcacy effect were decreasing throughout
the experimental task, we would expect that self-efﬁcacy would have a stronger effect on
automation use on the ﬁrst half of slides than on the second half of slides. However, the

correlation between self-efﬁcacy and automation use on the ﬁrst half and automation use

54

on the second half were both non-signiﬁcant and small (r = -.09, p = .30; r = -.05, p =
.58, respectively). These correlations would seem to indicate that the lack of signiﬁcant
ﬁndings was not likely to be due to a decrease of manipulation effectiveness over time.
These analyses suggest that some unknown mechanism might moderate whether self-
efﬁcacy will have a signiﬁcant effect on automation use and/or monitoring.

Figure 9

Histogram of the Self-Eﬂicacy Distribution

Distribution of Self-Efﬁcacy

 

 

Count

 

 

 

Hypotheses 4a and 4b

 

In Hypotheses 4a and 4b, desire to decrease cognitive load was proposed to be
signiﬁcantly and positively related to automation use and automation monitoring,

respectively. Table 13 indicates that the relationships between desire to decrease

55

cognitive load and automation use and automation monitoring were nonsigniﬁcant.
Therefore, Hypotheses 4a and 4b were not supported.

Table 13

Correlations of Desire to Decrease Cognitive Load with Use and Monitoring

 

Desire to Decrease Cognitive Load

 

 

N r p
AWD Use 129 .11 .24
Proportion of Disagreements 124 -.06 .51
Proportion of Correct Disagreements 124 .04 .64

 

Hypothesis 5

Positive affect toward the automated system was proposed to predict automation
use in Hypothesis 5. This hypothesis was supported (r = .61, p < .01). However, because
affect was measured following the experimental task, the causal direction of this
relationship is difﬁcult to conﬁrm from these ﬁndings. It is also possible that those who
used the AWD more liked it more or that those who noticed fewer errors liked the
machine more. To more closely examine this possibility, a hierarchical regression
analysis was conducted testing whether the interaction between trust condition and AWD
use signiﬁcantly predicted affect. The reasoning here is that participants’ affect toward
the automated system will be based primarily on the AWD’s functioning, and because the
AWD functioned much more effectively in the high trust condition than in the low trust
condition, AWD use should predict positive affect more strongly in the high trust
condition than in the low trust condition. Results indicated that this was not the case.
While the direct effects of trust condition (t = 3.33, B = .44, p < .01) and AWD use (t =

7.01 , B = 2.41, p < .01) on affect were signiﬁcant, the interaction term did not account for

any signiﬁcant portion of variance in affect beyond the direct effects (t = -.35, B = -.121,

56

p = .73). These ﬁndings indicate no signiﬁcant differences in affect based on the trust
condition and amount of AWD use. These results lend support to the causal direction
proposed in Hypothesis 5 —— that affect predicts AWD use and not vice versa.

An additional problem regarding affect concerns the possibility of
multicollinearity between subjective trust and affect. Affect exhibits signiﬁcant and
positive correlations with faith (r = .34, p < .01), pre-task trust (r = .31, p < .01), and post-
task trust (r = .74, p < .01). Although the item contents of the scales are reasonably
different, the constructs nevertheless appear to be highly related. In fact, when both trust
and affect (which individually show signiﬁcant correlations with automation use) are
entered into block 1 of the regression equation, the relationship of trust with AWD use
becomes non-signiﬁcant (t = 1.00, p = .32). These results indicate that multicollinearity
issues may arise when research attempts to examine the simultaneous effects of both trust
and affect on automation use.

Hypotheses 6a and 6b

Hypothesis 6a posits that conscientiousness will be positively related to
automation monitoring. Table 14 demonstrates that the relationships between
conscientiousness and total disagreements (r = .08, p = .37) and conscientiousness and
correct disagreements (r = .03, p = .78) were both non-signiﬁcant. In addition, contrary
to Hypothesis 6b, extraversion was not signiﬁcantly related to either total disagreements

(r = -.03, p = .76) or correct disagreements (r = .04, p = .70).

57

Table 14

Correlations of C onscientiousness and Extraversion with Use and Monitoring

 

 

 

 

Conscientiousness Extraversion
N r p N r p
AWD Use 129 .03 .72 129 .02 .83
Proportion of Disagreements 124 .08 .37 124 -.03 .76
Proportion of Correct
Disagreements 124 .03 .78 124 .04 .70

58

Discussion
Complacency Potential Results

Hypothesis 1a proposed that automation-induced complacency potential would be
positively related to trust. In partial support of Hypothesis 1a, automation-induced
complacency potential was signiﬁcantly related to trust before the task but not
signiﬁcantly related to trust after the task. This ﬁnding suggests that people did have
preconceived notions about the “goodness” of automated systems, but those preconceived
notions dissipated with actual experience with the automated system. This ﬁnding is a
positive one in that it indicates that people can adjust their impressions of an automated
system based on experience with that system.

Hypothesis lb proposed that automation-induced complacency potential would
relate positively to faith. The situation in which the participants found themselves was
highly uncertain — they entered the experiment with little or no baggage screening
experience and with no knowledge of the extent to which the AWD would be helpful or
unhelpful on the task. In support of Hypothesis 1b, participants’ level of faith (the extent
to which they were willing to trust the AWD even in these conditions of uncertainty) was
found to be positively related to automation-induced complacency potential. Because
this ﬁnding indicates that those high in automation-induced complacency potential are
more likely to trust an unfamiliar machine, it supports the conceptualization of
automation-induced complacency potential as a generalized propensity to trust machines
across situations.

Hypothesis 1c, which proposed signiﬁcant relationships between complacency

potential and monitoring behavior, was not supported. The relationship failed to reach

59

levels of statistical signiﬁcance for both total disagreements and correct disagreements
with the automated system. This ﬁnding is inconsistent with the proposition by
Parasuraman and colleagues (1993) that automation-induced complacency potential is
related to increased operator mistakes when working with automated systems; however, it
is consistent with their qualiﬁcation that signiﬁcant effects will only be found on tasks on
which a single operator is responsible for multiple functions. Because this was a single-
task environment, the effects of automation-induced complacency potential on correct
disagreements may have been suppressed.

Trust

Trust was proposed to be positively related to automated system use. This
hypothesis was supported, both for pre-task trust and for post-task trust. Results
indicated that the more a person trusted the automated system, the more often he or she
used the automated system. This ﬁnding is consistent with the predictions of Muir
(1987) and Zuboff (1988) and with the ﬁndings of Muir and Moray (1996).

In contrast, the direct effect of task self-efﬁcacy on automated system use was
non-signiﬁcant. This was surprising given the evidence from prior research that those
who were more conﬁdent in their ability to perform well on the screening task on their
own would be less likely to use the automated system (Lee & Moray, 1994). The reason
for this lack of signiﬁcant effect is unclear. Supplementary analyses performed
suggested that the lack of signiﬁcant effect was not due to a lack of variance on self-
efﬁcacy or to a decrease of manipulation effectiveness over time.

Trust X Self-Eﬂicacy Interaction

60

The interaction between trust and self-efﬁcacy did not signiﬁcantly predict either
automation use or automation monitoring. In regard to automation use, supplemental
analyses indicated that individuals with high self-efﬁcacy used the AWD slightly less
often than did those with low self-efﬁcacy; however, this difference appears only at high
levels of trust, suggesting that self-efﬁcacy may only signiﬁcantly interact with trust to
inﬂuence automation use when the operator trusts the automated system. However,
further research is needed to validate this ﬁnding.

In regard to monitoring, participants with high self-efﬁcacy disagreed with the
AWD more often than those with low self-efﬁcacy at low levels of trust. However, that
relationship ﬂipped when trust was high, such that at high levels of trust, those with low
self-efﬁcacy are much more likely to disagree with the AWD than those with high self-
efﬁcacy. The reason for this counterintuitive ﬁnding is unclear. However, as expected,
participants with high self-efﬁcacy and low trust disagreed with the automated system
more often than did participants with low self-efﬁcacy and high trust.

When only correct disagreements with the machine were considered, the expected
pattern emerged more clearly. Participants with low trust correctly disagreed with the
AWD more often than did participants with high trust. Participants with high self-
efﬁcacy and low trust were most likely to correctly disagree with machine errors, while
participants with high self-efﬁcacy and high trust were least likely to correctly disagree
with machine errors. Participants with high self-efﬁcacy may have demonstrated both of
the more extreme levels of correct disagreements because they had more conﬁdence in

their abilities to determine whether the AWD was correct or incorrect. However, because

61

this interaction did not achieve statistical signiﬁcance, further research is needed in order
to more closely examine the relationship between trust and self-efficacy.
Desire to Decrease Cognitive Load

Hypotheses 4a and 4b held that desire to decrease cognitive load would relate
positively to automation use and negatively to automation monitoring, respectively.
Neither hypothesis was supported in the present study. However, desire to decrease
cognitive load was signiﬁcantly negatively related to self-efﬁcacy (r = -.44, p < .01).

This signiﬁcant correlation indicates that participants who believed that they were weaker
performers on the task held a stronger desire to decrease their cognitive load on the task.
This ﬁnding suggests that the desire to decrease cognitive load scale may serve as a
proxy for perceived task difﬁculty.

Based on the research of Dijkstra (1999), it was expected that individuals who
used the automated system more and disagreed with it less would experience lower levels
of cognitive load. However, this was not the case in the present study. Perceived
cognitive load was assessed using the NASA Task Load Index, NASA TLX (Hart &

Staveland, 1988). See Table 15 for TLX score information.

Results indicated no signiﬁcant relationship between AWD use and cognitive load
or AWD monitoring and cognitive load. An independent samples t-test also indicated
that there were no signiﬁcant differences in perceived cognitive load between the two
session types (optional AWD use and mandatory AWD use). These results reveal that
use and monitoring of the automated system had no signiﬁcant effect on cognitive load.
This lack of effect could explain the lack of signiﬁcant ﬁndings between desire to

decrease cognitive load and automation use and monitoring. If use of the AWD did not

62

result in a change in perceived cognitive load, then desire to decrease cognitive load
would not be expected to affect participants’ AWD use decisions. Likewise, if AWD
monitoring did not increase cognitive load, then desire to decrease cognitive load would
not be expected to signiﬁcantly predict AWD monitoring.

Table 15

Perceived Cognitive Load Minimums, Maximums, Means, and Standard Deviations

 

 

 

 

N Minimum Maximum Mean Sid'.
Devratron
Type A Participants
Mental 128 2 45 22.89 9.02
Physical 128 0 12 .38 1.56
Temporal 128 1 36 12.20 8.24
Effort 128 2 45 12.56 8.47
Performance 128 0 36 14.79 8.57
Frustration 128 0 45 10.23 6.64
Total Score 128 25 105 73.02 15.77
All Participants
Mental 252 2 45 23.75 9.39
Physical 251 0 25 .43 2.01
Temporal 252 0 36 1 1.40 7.87
Effort 252 0 45 13.08 8.55
Performance 252 0 36 14.69 8.33
Frustration 252 0 45 9.94 6.63
Total Score 251 25 1 15 73.25 16.84

 

It is also possible that the task design did not allow cognitive load effects to
surface. Because the study took place in a controlled lab setting, participants were
encouraged to give their full attention to the task. In fact, there was little else that
participants could give attention to in the laboratory environment. In addition, even when
a participant used the AWD, he or she still had to make the ultimate decision about
whether to clear or search each bag. These conditions might reasonably reﬂect the nature

of an airport security screening job, but they do not provide many opportunities for the

63

participant to reduce his or her cognitive load. While the AWD scanned a bag, the image
of the bag was still present on the screen, and participants had little else to do but stare at
the bag and formulate their own hypotheses about whether or not a weapon was present
in the image. In that sense, participants did about the same amount of cognitive work
(visually searching the bag and formulating a plan to either search or clear the bag)
regardless of whether the AWD was activated. Therefore, it seems that task and
automation design inﬂuence the extent to which cognitive load concerns will predict
behavior. In a task with few other distractions, cognitive load may not play a large role
in determining automated system use and/or monitoring behaviors.
A ﬂeet

Affect toward the AWD exhibited a strong positive relationship with AWD use.
This ﬁnding provides empirical support for the propositions by authors such as Chao and
Kozlowski (1986) that operators with negative feelings for an automated system will
avoid using it. In fact, affect exhibited the strongest effect on automation use in the
present study. In addition, in exploratory analysis, a negative relationship was found
between affect and automation monitoring. This result suggests that when people are
forced to use an automated system but feel negatively toward it, they may avoid
monitoring it rather than expending extra effort on catching machine errors.
Conscientiousness and Extraversion

Past meta-analytic research has linked conscientiousness with job performance
(e.g., Barrick & Mount, 1991; Hogan & Holland, 2003). Because correct disagreements
with the AWD should be related to task performance in the current study, a positive

relationship was expected between conscientiousness and correct disagreements with the

64

AWD. Contrary to expectations, conscientiousness failed to exhibit signiﬁcant
relationships with automation monitoring, as did extraversion. It is possible that the
duration of the study was not sufﬁciently long to bring out differences in monitoring
behavior across levels of conscientiousness.

In addition, participants may have been subject to strong motivational effects.
One source of motivation for the participants may have resulted from a cross-over of real-
world concerns into the experimental task. Airline safety is currently a major concern in
the world outside of the study, and thus participants may have felt a strong desire not to
allow any weapons to go through undetected. The students therefore may have
experienced a tendency to over-search the bags. In addition, participants were told prior
to the task that the top performer would receive a $50 prize, an amount that may have
provided signiﬁcant motivation for undergraduate students. These motivational factors
may have provided a motivational situation of sufﬁcient strength to overwhelm the
effects of conscientiousness on monitoring behavior. Thus, the combination of a
relatively short task and strong motivational factors may have masked any effect of
conscientiousness on monitoring performance.

With respect to extraversion, the duration of the experimental task may have
again played an important role in the strength of the relationships found. The studies that
provided the basis for the hypothesized relationship between extraversion and automation
monitoring were pulled primarily from the vigilance stream of research. Research on
automation use and vigilance often consists of experimental tasks lasting for several
hours (e.g., Lee & Moray, 1994; Muir & Moray, 1996). A desire for sensory stimulation

may be manifested only after a longer amount of time than the 20-minute task time used

65

here. The experimental task in the present study lasted for a relatively brief amount of
time, whereas the “vigilance decrement” — a marked decrease in performance in error
detection — tends to occur over a length of time spent on task (e.g., Grier, et. a1, 2003).
Perhaps a longer task duration is necessary to detect any relationship between
extraversion and monitoring performance. A second alternative is that the experimental
task used in the present study was inherently more stimulating than were the tasks used in
the vigilance studies in which signiﬁcant relationships were found. A more stimulating
task would decrease the sensory depravation that was hypothesized to lead to decreased
monitoring performance for extraverts.
Limitations

The extent to which a sample of undergraduate college students can generalize to
a broader population may be sometimes questionable. In this case, the student sample
used may have had effects in two ways. First, it was assumed that all of the sample
participants entered the experiment with equivalent, and negligible, X-ray screening
experience. By extension, we assumed that all participants would have approximately
equal, and low, self-efﬁcacy on the task. However, this distinction between a college and
organizational population was likely to have reduced impact due to the fact that we were
able to signiﬁcantly manipulate self-efﬁcacy during the task training. A graph of post-
manipulation self-efﬁcacy scores revealed an approximately normal distribution of self-
efﬁcacy. This distribution suggests that while the manipulation was not extremely
strong, variance on self-efﬁcacy was achieved.

A second and more signiﬁcant issue, however, is the difference in motivation

between college students and professional samples. A sample of actual X-ray screeners

66

might be more motivated to achieve high scores on the X-ray task than might a sample of
undergraduate students because actual X-ray screeners might identify with the task more
strongly. In the present study, we attempted to increase motivation to perform by
offering a $50 prize to the top scorer in each experimental condition. According to
comments made by the students, this $50 prize seemed to provide a signiﬁcant
motivational incentive. While a one-time prize cannot match the level of motivation
experienced on an actual job, we believe that it provided some encouragement for the
students to achieve high performance.

An additional concern is that the task design may not have allowed certain
hypothesized relationships to materialize. For example, the effects of conscientiousness
and extraversion may require longer periods of time to develop, and cognitive load
effects may only surface when people are required to balance multiple tasks. Further
research is required to more fully determine the populations and tasks for which these
effects are signiﬁcant.

Future Research Suggestions

The current study focused primarily on person factors impacting automation use
and monitoring, such as trust, self-efﬁcacy, and personality factors. Structural equations
analyses suggested that trust may mediate the relationships of individual differences with
automation use and automation monitoring. In addition, the results suggested that the
psychological processes leading to automation use and monitoring decisions may be
highly related. However, these propositions should be subjected to future testing in light

of the fact that the model modiﬁcations made here capitalized on chance.

67

In addition, the current study found a surprising lack of relationship between self-
efﬁcacy and automation use and monitoring decisions. The reasons for this lack of
relationship are unclear but might be related to some element in task design or sample
characteristics. Future research may shed light on the situations in which self-efﬁcacy is
and is not signiﬁcantly related to automation use and monitoring.

While self-efﬁcacy did not show any signiﬁcant relationships with use and
monitoring, affect toward the automated system exhibited strong effects with both. The
results of this study suggest that affect may play a very large role in the decisions that
operators make about automation use and monitoring. Antecedants of affect toward
automation might be explored, as might potential mediators of the relationships between
affect and the decisions made about use and monitoring.

Finally, while the present study was designed to investigate person factors
affecting automation use and monitoring, the situation in which the operator ﬁnds him or
herself may also impact these two decisions. Future research should identify situational
characteristics that may interact with person factors to affect automation use and
monitoring decisions.

Conclusion

The present study contributed to the literature on human-automation interaction
through an examination of the ways in which individual differences affect automation use
and monitoring. Individual differences in trust and affect exhibited effects on both
automation use and automation monitoring, indicating that those high in trust or affect
were more likely to use and less likely to monitor the automated system. These results

for trust and affect toward the system suggest that individual differences in attitudes

68

toward automated machines exist and do have effects on automated system use. It is
important to note that levels of trust and affect are most likely also inﬂuenced by machine
characteristics such as competence and dependability. Future research might further
investigate the relative inﬂuence of the four machine characteristics discussed in the
present study (competence, dependability, responsibility, and predictability) and
individual differences on levels of trust and affect.

In the present study, no signiﬁcant effects were shown for self-efﬁcacy, desire to
decrease cognitive load, conscientiousness, or extraversion. The non-signiﬁcant ﬁnding
for self-efficacy was particularly unexpected because the inﬂuence of self-efﬁcacy on
automation use had received previous support in the literature (i.e., Lee & Moray, 1994).
The reason for this unexpected result is unclear. The manipulation check for self-efﬁcacy
indicated a successful manipulation with a moderate effect size (d = .56). In addition,
contrasts of the correlations between self-efﬁcacy and automation use for the ﬁrst half of
the task and the second half of the task indicated that the relationship did not decay over
time; instead, it seems that the relationship was non-signiﬁcant even on the initial slides.
Because the lack of signiﬁcant effect did not seem to be caused by methodological
factors, some aspect of the task, situation, or sample may be relevant to the emergence of
a signiﬁcant self-efficacy effect. Therefore, future research should investigate the
potential moderating effects of task design (e. g., length, complexity, etc.) and sample
characteristics (e.g., motivation, task experience, etc.) on the relationship between self-
efﬁcacy and automation use. Such analyses may also shed light on the potential impact
of other individual differences (such as conscientiousness or desire to decrease cognitive

load) that did not exhibit signiﬁcant effects in the present study.

69

 

In future studies, researchers may also wish to investigate more speciﬁcally the
subtypes of use and monitoring errors that operators make. It is possible that false
positive errors and false negative errors may have different causes. Individual
characteristics such as risk aversion may differentially affect operators’ willingness to
make false positive and false negative errors. In addition, if the consequences of false
positives and false negatives are asymmetrical, operator behavior is likely to be affected.

Overall, the present research supports the assertion that individual differences
have a role to play in the decisions users make about using and monitoring automated
systems. In addition, the results suggest that it may be difﬁcult to identify the effects of
these individual difference characteristics independent of a consideration of situational
and task characteristics. While this study focused primarily on the interplay of the user’s
individual differences and the machine’s characteristics (e.g., competence, dependability,
etc.), future research should also more strongly consider task characteristics (e.g.,
complexity, consequences for false positives versus false negatives) and situational
characteristics (e. g., environmental distractions). Accurate prediction of automation use
and automation monitoring decisions may depend on the interactions of all four types of
inﬂuences: machine, person, task, and situation.

Further research is warranted on the effects of individual diﬁerences on
automation use and monitoring decisions. Although non-signiﬁcant results were found
for self-efﬁcacy, extraversion, and conscientiousness in the present study, the reasons for
these ﬁndings are unclear. As previous research has suggested that relationships among

these individual differences and automation use and monitoring may exist, researchers

70

 

 

should investigate the potential moderating effects of task characteristics on these
relationships.

Relevant task characteristics may include the length of the task, the complexity of
the task, and consequences for success and failure inherent in the task. Task length may
be important in that as the length of the task increases, the effects of traits related to
motivation may increase. For example, near the beginning of the task, all participants
may be equally motivated to perform well. However, as the task wears on, the effects of
conscientiousness and extraversion on use and monitoring may become increasingly
evident. To investigate these effects, researchers should vary task length or contrast the
relationships between conscientiousness or extraversion and automation use or
monitoring in the early stages of the task versus the late stages of the task. Note that to
see signiﬁcant effects, the task may need to be at least 30-60 minutes in duration.

Situational characteristics may be most likely to affect use and monitoring
decisions via their effects on cognitive capacity. In situations where operators are
required to divide attention among multiple tasks or in environments that are distracting,
cognitive load effects are more likely to be found than in cognitively “easy” situations.

Finally, research in the area of social cognition may inform future research on
operator use and monitoring. As technology becomes more intelligent and more human-
like, operators are more likely to respond to these technologies as they would respond to
another human (Reeves & Nass, 1996). Therefore, future researchers may wish to
consult the social cognition literature on topics such as impression formation and person
perception in order to determine the extent to which these processes generalize to human-

machine relationships.

71

 

 

References

Arnold, V. & Sutton, SC. (1998). The theory of technology dominance: Understanding
the impact of intelligent decisions aids on decision makers’ judgments. In J.E.
Hutton (Ed. ) Advances in accounting behavioral research, 1, 175-194.

Bandura, A. (1997). Self-Efﬁcacy: The exercise of control. New York: W. H. Freeman.

Barber, B. (1983). The logic and limits of trust. Rutgers University Press, New
Brunswick.

Barrick, M. R., & Mount, M. K. (1991). The Big Five personality dimensions and job
performance: A meta-analysis. Personnel Psychology, 44(1), 1-26.

Bobrow, D. G., Mittal, S., & Steﬁk, M. J. (1986). Expert systems: Perils and Promise.
Communications of the ACM, 29, 880-894.

Chao, G. T., & Kozlowski, S. W. J. (1986). Employee perceptions on the
implementation of robotic manufacturing technology. Journal of Applied
Psychology, 71(1), 70-76.

Connors, M. M., Harrison, A. A., & Summit, J. S. (1994). Crew systems: Integrating
human and technical subsystems for the exploration of space. Behavioral
Science, 39, 183-212.

Danaher, J. W. (1980). Human error in ATC system operations. Human Factors, 22(5),
535-545.

Dawes, R. M., Faust, D., & Meehl, P. E. (1989). Clinical versus actuarial judgment.
Science, 243, 1668-1674.

De Vries, P., Midden, C., & Bouwhuis, D. (2003). The effects of errors on system trust,
self-conﬁdence, and the allocation of control in route planning. International
Journal of Human-Computer Studies, 58, 719-735.

Dijkstra, J. J. (1998). User agreement with incorrect expert system advice. Behavior and
Information Technology, 18(6), 399-411.

Dzindolet, M. T., Peterson, S. A., Pomranky, R. A., Pierce, L. G., & Beck, H. P. (2003).
The role of trust in automation reliance. International Journal of Human-
Computer Studies, 58, 697-718.

Dzindolet, M. T., Pierce, L. G., Beck, H. P., Dawe, L. A., & Anderson, B. W. (2001).

Predicting misuse and disuse of combat identiﬁcation systems. Military
Psychology, 13(3), 147-164.

72

 

 

Funk, K., Lyall, B., Wilson, J., Vint, R., Niemczyk, M., Suroteguh, C., & Owen, G.
(1999). Flight deck automation issues. The International Journal of A viation
Psychology, 9(2), 109- 1 23.

Goldberg, L. R. International Personality Item Pool (2001). A Scientiﬁc Collaboratory
for the Development of Advanced Measures of Personality Traits and Other
Individual Differences (http://ipip.ori.org/). Internet Web Site.

Hart, S. G. & Staveland, L. E. (1988). Development of NASA-TLX (Task Load Index):
Results of empirical and theoretical research. In: Hancock, P. A., Ed.; Meshkati,
N., Ed.; Human mental workload, Oxford, England: North-Holland.

Hogan, J., & Holland, B. (2003). Using theory to evaluate personality and job-

perforrnance relations: A socioanalytic perspective. Journal of Applied
Psychology, 88(1), 100-112.

Hosmer, L. T. (1995). Trust: The connecting link between organizational theory and
ethics. Academy of Management Review, 20, 379-400.

IBM. (n.d.). Deep Blue Wins 3.5 to 2. 5. Retrieved November 14, 2003, from
http://www. research. ibm. com/deepblue/home/html/b. html.

J ian, J., Bisantz, A. M., & Drury, C. G. (2000). Foundations for an empirically
determined scale of trust in automated systems. International Journal of
Cognitive Ergonomics, 4(1), 53-71.

Lee, J. D. & Moray, N. (1994). Trust, self-conﬁdence, and operators’ adaptation to
automation. International Journal of Human-Computer Studies, 40, 153-184.

Lee, J. D. & Moray, N. (1992). Trust, control strategies, and allocation of function in
human-machine systems. Ergonomics, 35, 1243-1270.

Lee, G., & Perry, J. L. (1999). Are computers boosting productivity?: A test of the
paradox in state governments. Center for Research on Information Technology
and Organizations. 1. T. in Business. Paper 121.
http://repositories.cdlib.org/crito/business/ l 21

Leli, D. A. & F ilskov, S. B. (1984). Clinical detection of intellectual deterioration
associated with brain damage. Journal of Clinical Psychology, 40(6), 1435-1441.

Lewandowski, S., Mundy, M., & Tan, G. P. A. (2000). The dynamics of trust:
Comparing humans to automation. Journal of Experimental Psychology: Applied,
6(2), 104-123.

Lewis, J. D., & Weigert, A. (1985). Trust as a social reality. Social Forces, 63, 967-985.

73

McCrae, R. R., & Costa, P. T. (1987). Validation of the ﬁve-factor model of personality
across instruments and observers. Journal of Personality and Social Psychology,
52(1), 81-90.

Muir, B. M. (1987). Trust between humans and machines and the design of decision
aids. International Journal of Man-Machine Studies, 27, 527-539.

Muir, B. M. (1994). Trust in automation: Part 1. Theoretical issues in the study of trust
and human intervention in automated systems. Ergonomics, 3 7, 1905-1922.

Muir, B. M., & Moray, N. (1996). Trust in automation. Part 11. Experimental studies of

trust and human intervention in a process control simulation. Ergonomics, 39,
429-460.

Neuman, P. G. (1986). AI in medical diagnosis and aviation. ACM SIGSOFT, Software
Engineering Notes, 11(2), 5.

Parasuraman, R. & Byme, E. A. (2003). Automation and human performance in aviation.
In: Tsang, P.S. & Vidulich, M. A. (Eds) Principles and practice of aviation
psychology. Lawrence Erlbaum Associates, Mahwah, NJ, 3 1 1-3 56.

Parasuraman, R., Molloy, R., & Singh, 1. L. (1993). Performance consequences of
automation-induced “complacency.” The International Journal of Aviation
Psychology, 3(1), 1-23.

Parasuraman, R. & Riley, V. (1997). Humans and automation: Use, misuse, disuse, and
abuse. Human Factors, 39(2), 230-253.

Reeves, B. & Nass, C. I. (1996). The media equation: How people treat computers,
television, and new media like real people and places. Stanford, CA: CSLI
Publications.

Rempel, J. K., Holmes, J. G., & Zanna, M. P. (1985). Trust in close relationships.
Journal of Personality and Social Psychology, 49, 95-1 12.

Robinson, S. L. (1996). Trust and breach of the psychological contract. Administrative
Science Quarterly, 41, 574-599.

Sharit, J. (2003). Perspectives on computer aiding in cognitive work domains: Toward
predictions of effectiveness and use. Ergonomics, 46(1-3), 126-140.

Sheridan, T. B., & Hennessy, R. T. (eds), (1984). Research and Modeling of Supervisory
Control Behavior (National Academy Press, Washington).

Singh, 1. L., Molloy, R., & Parasuraman, R. (1993). Automation-induced
“complacency”: Development of the complacency-potential rating scale. The
International Journal of A viation Psychology, 3(2), 1 1 1-122.

74

Sollins, K. R. (1986). A medical risk of computers. AGM SIGSOFT, Software
Engineering Notes, 1 1(3), 1 1.

Sparaco, P. (1994). A330 crash to spur changes at airbus. Aviation Week and Space
Technology, 141(6), 20-22.

Stratopoulos, T., & Dehning, B. (2000). Does successful investment in information

technology solve the productivity paradox? Information and Management, 38(2),
103-117.

Sullivan, M. J. (1982). Managing to mismanage robot productivity programs. Dearbom,
MI: SME Technical Paper.

Toney, R. J ., & Kozlowski, S. W. J. (April, 1999). Effects of evaluative feedback and
task difﬁculty on learning and training performance. In S. Gully (Chair),
Learning to fail or failing to learn? The role of errors, failure, and feedback in
learning environments. Symposium presented at the 14th Annual Conference of
the Society for Industrial and Organizational Psychology, Atlanta, GA.

Waterman, D. A. (1986). A guide to expert systems. Reading, MA: Addison-Wesley
Publishing Company.

Wiener, E. L. (1981). Complacency: Is the term useful for air safety? In Proceedings of
the 26'” Corporate Aviation Safety Seminar. Flight Safety Foundation, Denver,
116-125. p. 117

Wiener, E. L. (1985). Beyond the sterile cockpit. Human Factors, 27, 75-90.

Will, R. (1991). True and false dependence on technology: Evaluation with an expert
system. Computers in Human Behavior, 7, 171-183.

Yu, V. L., Fagan, L. M., Wraith, S. M., Clancey, W. J ., Scott, A. C., Hannigan, J., Blum,
R. L., Buchanan, B. G., & Cohen, S. N. (1979). Antimicrobial selection by a
computer: A blinded evaluation by infectious diseases experts. Journal of the
American Medical Association, 242(12).

Zuboff, S. (1988). In the age of the smart machine: The future of work and power. Basic
Books, New York.

75

 

Appendix A
User Training Script

This is a simulation of the job of airport security screener. You’ll be looking at X-ray
images of luggage and deciding whether or not to search each bag. If you think you see a
GUN or a KNIFE in the X-ray image, you should SEARCH the bag. If not, you should
CLEAR the bag. To SEARCH a bag, click on the button labeled, "search bag." To
CLEAR a bag, click on the button labeled, "clear bag."

After you search or clear the bag, feedback will be presented on the screen that tells you
whether your decision was correct. This feedback will remain on the screen for a few
seconds, and then you'll be presented with the next X-ray to examine.

You will gain points for correct decisions and lose points for incorrect decisions. The
point breakdown is as follows:

Find a weapon: +100 points

Miss a weapon: -100 points

Search a bag with no weapon inside: -25 points

Clear a bag with no weapon inside: +25 points

Each minute passed: -10 points

*****

Take a minute to make sure you know how points are gained and lost. Notice that you'll
lose 10 points for every minute passed, so try to work as quickly as you can without
making mistakes.

Now you'll have a chance to practice screening X-rays before the main simulation begins.
A small clock will tell you how much time you have left in this practice session. When
time is up, we'll let you know how you did on the practice session and ask you to ﬁll out
another short survey. When you are ready to begin, click the button below.

76

 

 

Appendix B

Machine Training Script
(A WD Optional)
Low trust and neutral aﬂect conditions

When you are completing the main simulation, you can use the AWD to help you out.
The AWD scans each X-ray image and alerts you if it detects anything suspicious.

For each X-ray, you will have the choice whether or not you will use the AWD. To turn
it on, click on the button that says "AWD on." When it is activated, a small light comes
on, indicating that the AWD is operating.

If the AWD detects something it thinks is suspicious, a red light will begin to ﬂash, and
the message "ALERT!" will appear on the screen. If the AWD thinks the bag is safe, the
message "SAFE" will appear in blue on the screen.

Even when the AWD is activated, you still have to decide whether to SEARCH or
CLEAR the bag. That means that even if the AWD produces a recommendation, it is up
to you--the operator--to make the ﬁnal decision about whether to search or clear the bag.

There are two things about the AWD to keep in mind:

1.) The AWD is a complex technology and takes a few seconds to analyze each X-ray
image.

2.) The AWD isn’t 100% accurate. In fact, it is accurate 65% of the time. In addition, it
sometimes will not be able to scan an image and will produce an error message. If that
happens, you won't be able to use the AWD on that X-ray. Also, if the AWD scans the
same bag multiple times, it will not produce the same recommendation each time.

PLEASE REVIEW THE INFORMATION ABOUT HOW TO USE THE AWD AND
BE SURE YOU UNDERSTAND IT BEFORE PROCEEDING.

Next, you will view a DEMONSTRATION of the AWD scanning bags by itself. This
demonstration will allow you to see how the AWD works before you use it yourself. In
this demonstration, you don't have to click "search bag" or "clear bag." However,

feedback will appear on the screen telling you whether the AWD was right or wrong on
each X-ray. Now, watch the AWD.

77

 

 

Appendix C

Automation-Induced Complacency Potential Measure
(Singh, Molloy, & Parasuraman, 1993)

l.

I think that automated devices used in medicine, such as CT scans and ultrasound,
provide very reliable medical diagnosis.

Automated devices in medicine save time and money in the diagnosis and
treatment of disease.

If I need to have a tumor in my body removed, I would choose to undergo
computer-aided surgery using laser technology because it is more reliable and
safer than manual surgery.

Automated systems used in modern aircraft, such as the automatic landing
system, have made air journeys safer.

ATMs provide a safeguard against the inappropriate use of an individual’s bank
account by dishonest people.

Automated devices used in aviation and banking have made work easier for both
employees and customers.

Even though the automatic cruise control in my car is set at a speed below the
speed limit, I worry when I pass a police radar speed trap in case the automatic
control is not working properly.

Manually sorting through card catalogues is more reliable than computer-aided
searches for ﬁnding items in a library.

I would rather purchase an item using a computer than have to deal with a sales
representative on the phone because my order is more likely to be correct using

the computer.

78

10. Bank transactions have become safer with the introduction of computer
technology for the transfer of funds.

11. I feel safer depositing my money at an ATM than with a human teller.

12. I have to tape an important TV program for a class assignment. To ensure that the
correct program is recorded, I would use the automatic programming facility on

my VCR rather than manual taping.

79

 

Appendix D

Post-task Predictability, Dependability, and Faith Items
(Adapted from Rempel, Holmes, & Zanna, 1985)

1.

10.

ll.

12.

When I encounter an unfamiliar object in a bag, I would not feel worried letting
the AWD screen the bag (faith)

I can count on the AWD to be there to help me out when I need it (dependability)

. In general, the AWD does the same thing each time I use it (predictability)

The AWD has proven to be trustworthy, and I am willing to depend on it
(dependability)

I am familiar with the ﬁrnctioning of the AWD, and I can rely on it to behavior in
certain ways (dependability)

Even when I don’t know how the AWD will answer, I feel comfortable letting it
screen the bags (faith)

Though times may change and terrorists may try new plans, I know that the AWD
will always be ready to help me stop them (faith)

I am never certain that the AWD will give me the right answer (predictability)
The AWD is very unpredictable. I never know if it is going to give me the correct
advice (predictability)

I feel very uncomfortable when the AWD makes decisions that might affect how
well I do my job (predictability)

I have found that the AWD is usually dependable, especially on important
decisions (dependability)

The AWD functions in a very consistent manner (predictability)

80

 

13. When I use the AWD, my future screening performance is an unknown which I
worry about (faith)

14. Whenever I have to make a decision about a strange looking bag, the AWD will
do its best to help me make the right choice (faith)

15. I can rely on the AWD to help me make a correct decision when I can’t make the
decision myself (faith)

16. The AWD can be counted on (dependability)

17. I have to keep alert, or the AWD might convince (me of the wrong decision
(dependability)

18. I am certain that the AWD will not break down (dependability)

19. I sometimes avoid using the AWD because it is unpredictable and I fear that it
will give me the wrong advice (predictability)

20. I can rely on the AWD to function at its best every time I use it (dependability)

21. Even if I were to continue working as a screener, I would never guarantee that I
would still use the AWD 1 year from now (faith)

22. When I am using the AWD, I feel secure in facing unknown new situations
(faith)

23. Even when the AWD gives advice that seems like it might be wrong, I am
conﬁdent that it is still operating correctly (dependability)

24. I am willing to let the AWD make decisions for me (dependability)

81

 

 

 

Appendix E

Competence Scale Items

1. The AWD usually gave me the correct recommendation.
2. The AWD was generally right.

3. The AWD was very competent.

Responsibility Scale Items
1. The AWD kept me informed of what it was doing.

2. The way the AWD works is a mystery to me.

3. I understand the AWD’s process.

82

Appendix F

Conscientiousness Scale Items

(Goldberg, 2001)
1. Am always prepared.
2. Pay attention to details.
3. Get chores done right away.
4. Carry out my plans.
5. Make plans and stick to them.
6. Waste my time.
7. Find it difﬁcult to get down to work.
8. Do just enough work to get by.
9. Don’t see ﬁrings through.

10. Shirk my duties.

83

 

Appendix G

Extraversion Scale Items
(Goldberg, 2001)

l.

2.

3.

8.

9.

Feel comfortable around people.

Make friends easily. '

Am skilled in handling social situations.
Am the life of the party.

Know how to captivate people.

Have little to say.

Keep in the background.

Would describe my experiences as somewhat dull.

Don't like to draw attention to myself.

10. Don't talk a lot.

84

 

Appendix H
Pre-task Trust Items

1. Overall, I think the AWD is trustworthy.
2. The AWD is a competent performer.

3. I can depend on the AWD.

4. I ﬁnd the AWD very predictable.

5. I have faith that the AWD to perform well.

6. The AWD is very responsible.

85

 

 

Appendix I

Desire to Decrease Cognitive Load Scale Items

1.

2.

This task seemed a lot harder than it should have been.
I wish this task had been easier.

I had to pay more attention to this task than I really wanted to.
I felt too much time pressure while screening the bags.

I felt myself getting more frustrated than I would have liked.

86

 

 

Appendix J

Self-Eﬂicacy Scale Items

(Adapted from Toney & Kozlowski, 1999)

1.

2.

I can meet the challenges of this baggage screening task.
I am conﬁdent in my understanding of how information cues are related to

decisions.

. I can deal with decisions about bags under ambiguous conditions.

I am certain that I can manage the requirements of this screening task.

I believe I will fare well in this task if the required speed is increased.

I am conﬁdent that I can cope with this screening simulation if it becomes more
complex.

I believe I can develop methods to handle challenging aspects of this task.

I am certain that I can cope with any distractions that may arise.

87

Appendix K

A ﬂeet Toward the A WD Scale Items

1. I liked using the AWD

2. I was glad I had the option of using the AWD to assist me
3. Overall, I feel positively toward the AWD

4. The AWD made me feel uncomfortable

5. I wish the AWD had never been created

88

Appendix L

Perceived Cognitive Load Measure

(NASA Task Load Index)

The TLX assesses six dimensions of cognitive load, as presented in Table 16 below.

Table 16

Dimensions of the NASA TLX Scale

 

Dimension
Mental Demand

Deﬁnition
How much mental and perceptual activity was required (e.g., thinking,

deciding, calculating, remembering, looking, searching, etc.)? Was the task

easy or demanding, simple or complex, exacting or forgiving?

 

Physical Demand

How much physical activity was required (e.g., pushing, pulling, turning,
controlling, activating, etc.)? Was the task easy or demanding, slow or brisk,

slack or strenuous, restful or laborious?

 

Temporal Demand

How much time pressure did you feel due to the rate or pace at which the
tasks or task elements occurred? Was the pace slow and leisurely or rapid

and frantic?

 

Effort

How hard did you have to work (mentally and physically) to accomplish

your level of performance?

 

Performance

How successful do you think you were in accomplishing the goals of the task
set by the experimenter (or yourself)? How satisﬁed were you with your

performance in accomplishing these goals?

 

Frustration Level

 

How insecure, discouraged, irritated, stressed, and annoyed versus secure,

gratiﬁed, content, relaxed, and complacent did you feel during the task?

 

These deﬁnitions were presented to the participants, and then participants rated

the extent to which the task was high or low on likert-type scales ranging from 1 to 9.

89

Figure 10
Example TLX Scale
TEMPORAL DEMAND: how much time pressure did you feel due to the rate or pace at

which the tasks or task elements occurred?

It had almost no effect
It had a small effect
It had a moderate effect

It had a large effect

333103333

It had an extreme effect

After participants provided ratings for each of the six dimensions, the dimensions
were weighted according to their subjective importance. Importance was assessed using
a series of 15 pairwise comparisons. Participants were presented with two of the six
dimensions and rated which dimension was the more important one in establishing their
overall workload.

Example:
Mental Demand vs. Physical Demand
Effort vs. Temporal Demand
Mental Demand vs. Frustration
The number of times each dimension was selected was calculated and used as a

weight. The participant’s Likert scale rating for each dimension was multiplied by its

90

weight. The total cognitive load score is the sum of the weighted scale ratings divided by

15 (the number of comparisons).

91

 

Appendix M

Correlation Matrices for Automation Use and Monitoring

 

 

 

 

 

 

Table 17
Sample Correlation Matrix for Use
Use Comp! Consc Extrav S.E. T.xS.E. Faith Trust Affect Desire

Use (.19)
Complacency . 12 (.36) -
Consc .03 .15 (.51)
Extrav .02 .34* .24* (.52)
SE. -.07 .25* .18* .29" (.53)
T.xS.E. .18* .33“ .17 .29“ .66“ (3.00)
Faith .28“ .30“ .08 .20* .13 .58“ (.53)
Trust .30" .24“ .05 .15 .13 .82* .69" (.64)
Affect .61“ .13 -.06 .05 -.15 . 19* .44* .39* (.90)
Desire .11 -.08 -.08 -.22* -.45* -.28* -.00 -.03 .10 (.76)
N = 129
* p < .05
Note: Scale standard deviations are presented in the diagonal
Table 18
Sample Correlation Matrix for Monitoring

Monitor Comp! Consc Extrav S.E. T.xS.E. Faith Trust Affect Desire
MonitoringT (17.25)
Complacency -.06 (.38)
Consc .03 .09 (.49)
Extrav .04 .08 . 16 (.50)
SB. -.02 23* .20“ .04 (.53)
T.xS.E. -. 12 .30* . 13 .05 .69" (2.84)
Faith -.08 .21“ .02 -.06 .20* .50* (.53)
Trust -.14 .22" .03 .01 .21 * .85" .53* (.62)
Affect -.38* -.01 -.Ol -.16 .05 .19* .26“ .23 * (.95)
Desire .04 -.08 -.00 .06 -.44"‘ -.28* -.1 1 -. 10 -.O4 (.68)
N = 123

T Correct Disagreements

*p<.05

Note: Standard deviations presented in parentheses

92

Appendix N

Validation of the Automation-Induced Complacency Potential Rating Scale

This study included an effort to validate Singh, Molloy and Parasuraman’s (1993)
automation-induced complacency potential rating scale. This rating scale was designed
to assess an individual’s stable propensity to become complacent in regard to automated
systems. The subscales of the complacency potential rating scale are trust, compliance,
reliance, and safety. To my knowledge, the psychometric properties of these subscales
had yet to be validated by experimenters other than the authors. None of the four
subscales proposed by Singh and colleagues achieved acceptable levels of reliability in
the present sample (see Table 19). An exploratory factor analysis was performed which

did not support the four factor structure. The scree plot from this analysis is displayed in

 

 

Figure 11.
Table 19
Reliabilities of the Automation-Induced Complacency Potential Subscales

N # of Items Alpha
Conﬁdence 25 1 4 0.55
Reliance 249 3 0.29
Trust 25 l 3 0.30
Safety ' 253 2 0.08

 

An examination of the item content provides a possible explanation. These
subscales were developed empirically (Singh, et al., 1993), and the correspondence
between the some items’ content and their scale labels is unclear. For example, the item
“ATMs provide a safeguard against the inappropriate use of an individual's bank account
by dishonest people” was assigned to the “reliance” subscale as opposed to the “safety”

subscale, and the item “I have to tape an important TV program for a class assignment; to

93

ensure that the correct program is recorded, I would use the automatic programming
facility on my VCR rather than manual taping” was assigned to the “safety” subscale as
opposed to the “reliance” subscale.

Figure 11

Scree Plot — Exploratory Factor Analysis for Automation-Induced Complacency
Potential

Scree Plot

 

 

 

Eigenvalue

0.0

 

15325;;555101'112
Component Number
Scale reﬁnement was attempted in order to improve internal consistency.
Revision based on item-total correlations resulted in a one-factor scale with seven items
and a maximum coefﬁcient alpha of or=.64, still below the acceptable level of .70 (see
Table 20).

In conclusion, the present study was unable to ﬁnd support for the psychometric
properties of the automation-induced complacency potential rating scale, either as a
whole or in terms of its individual subscales. First, no subscale achieved an acceptable
level of internal consistency, as subscale alpha coefﬁcients were .55, .29, .30, and .08,

respectively. The alpha coefficient for the entire scale was .60, and the revised scale

94

alpha coefﬁcient was .64 — slightly better but still below the conventionally acceptable
level of .70.

Table 20

Revised Automation-Induced Complacency Potential Scale Items and Statistics

 

Item Mean Std Item-Total
Dev Correlation

I think that automated devices used in medicine, such

as CT scans and ultrasound, provide very reliable 4.17 .58 .30

medical diagnosis

Automated devices in medicine save time and money

in the diagnosis and treatment of disease

If I need to have a tumor in my body removed, I

would choose to undergo computer-aided surgery

using laser technology because it is more reliable and

safer than manual surgery

Automated systems used in modern aircraft, such as

the automatic landing system, have made air journeys 3.81 .72 .38

safer

ATMs provide a safeguard against the inappropriate

use of an individual’s bank account by dishonest 3.15 .95 .36

people

Automated devices used in aviation and banking have

made work easier for both employees and customers

Bank transactions have become safer with the

introduction of computer technology for the transfer 3.25 .92 .28

of ﬁrnds

4.01 .67 .43

3.45 .87 .36

4.12 .56 .45

Overall Alpha .64

95

T.

Appendix 0
Post-hoe Analyses of the Effects of Slide Diﬂiculty

The X-ray slides were designed to ﬁt into a classiﬁcation of easy, medium, and
challenging levels of difﬁculty. Easy slides contained between two and four items, and
any weapons contained in those slides were easily visible. Medium difﬁculty slides
contained four to six items, and weapons were less easily visible — either turned at a
slight angle and/or partially obscured by other items. Challenging slides contained
between ﬁve and nine items, and weapons were obscured and/or angled such that they
were not easily discemable as weapons.

Some participants indicated that they were most likely to use the AWD on slides
for which the correct answer was not immediately obvious. Therefore, it is likely that
slide difﬁculty will be positively related to AWD use — in other words, participants will
be more likely to use the AWD on more difficult slides than on easier slides. In contrast,
because the presence or absence of weapons should be more obvious in easier slides than
in more difﬁcult slides, it is expected that participants will be more likely to correctly
disagree with the AWD on easier slides rather than more difﬁcult ones.

In order to test these post-hoc hypotheses, the proportions of use and correct
disagreements made was calculated for easy, medium, and challenging slides. Dependent
samples t-tests were employed to determine whether those proportions signiﬁcantly
differed from one another. In regard to automation use, the post-hoc hypotheses were
supported. Results of the t-tests indicated that participants were signiﬁcantly more likely

to use the AWD on medium (t=2.78, p=.01) and challenging (t=5.08, p<.01) slides than

96

on easy slides. In addition, participants were signiﬁcantly more likely to use the AWD
on challenging slides than on medium slides (t=2.65, p=.01).

The hypotheses regarding the relationship of slide difﬁculty to automation
monitoring were also supported. Dependent-samples t-tests indicated that participants
were signiﬁcantly more likely to correctly disagree with the AWD on easy (t=15.48,
p<.01) and medium (t=6.87, p<.01) slides than on difficult slides. Participants were also
signiﬁcantly more likely to disagree with the AWD on easy slides than on medium slides
(t=9.34, p<.01).

These post-hoc analyses revealed that slide difficulty played an important role in
both automation use and automation monitoring. Participants were more likely to use the
automated system on more difﬁcult slides than on easier slides, and participants were
more likely to correctly identify automation errors on easier slides than on more difﬁcult
slides. In regard to automated system use speciﬁcally, the effect of the trust and self-
efﬁcacy interaction decreased as the slide difficulty level increased. The relationships of
affect and post-task trust showed the opposite pattern. As slide difficulty increased, the
effects of affect and trust on automation use also exhibited a pattern of increasing
correlations. The effects of faith and pre-test trust (both measured prior to the task)
remained essentially constant across slide difficulty. It is also interesting to note that sex
exhibited signiﬁcant and increasingly negative relationships with automation use as slide
difﬁculty increased such that males were less likely to use the automated system on more
difﬁcult slides than on easier slides.

Slide difﬁculty also exhibited interesting, although less consistent, relationships

with automation monitoring. For both pre-task trust and the interaction of trust and self-

97

efﬁcacy, the relationship with automation errors identiﬁed was near zero for easy and
medium difﬁculty slides but became signiﬁcant for difﬁcult slides only. For both affect
and post-task trust, the relationships exhibited a U-shape such that the correlations were
signiﬁcant for easy slides, decreased in magnitude (but were still signiﬁcant) for medium
slides, and reached their highest levels on difﬁcult slides. The relationships found in
these post-hoc slide difficulty analyses suggest that further research on interactions

between slide difficulty and other constructs may yield interesting and valuable results.

98

Appendix P
Post-hoc Analyses of the Eﬂects of Weapon Presence or Absence

The point system applied to the experimental task was designed such that
participants could not achieve maximum scores by searching every bag or nearly every
bag. In real life, however, the consequences of letting a weapon slip by undetected are
much more severe than the consequences of taking the time to search many bags. The
effect of the point system may not have been strong enough to overwhelm the powerfully
engrained ideas about the relative consequences of under- and over-searching. It is
possible that participants would have had a tendency to search whenever they believe that
a weapon could possibly be present. Therefore, in regard to monitoring behavior, it was
hypothesized that participants would be more effective in disagreeing with the automated
system when it incorrectly suggests that a bag contains no weapon than when the
machine incorrectly suggests that a weapon is present. Furthermore, it was hypothesized
that this tendency would grow increasingly stronger with increasing slide difficulty, as
the presence or absence of weapons becomes less obvious.

To examine this hypothesis, the proportion of correct disagreements was
calculated separately for bags containing weapons and bags containing no weapons.
Each of these proportions was calculated for each of the three slide difﬁculties, yielding
six proportions.

First, dependent samples t-tests were employed to test for signiﬁcant mean
differences in correct disagreements between slides with no weapon and slides containing
weapons within each difficulty level. A signiﬁcant difference was found between slides

containing weapons and slides not containing weapons for the easy difﬁculty slides (t = -

99

12.54, p < .01). The direction of the effect was as expected - participants were
signiﬁcantly more likely to disagree with an incorrect AWD recommendation when a
weapon was present than when a weapon was not present. However, the effects for slides
of medium difﬁculty (t = -.61, p = .54) and challenging difficulty (t = .99, p = .32) were
both non-signiﬁcant.

Next, t-tests were used to contrast the proportion of correct disagreements with
AWD clears among the slide difﬁculty levels. In other words, differences were tested for
slide difﬁculty on the likelihood that participants would search for a weapon when the
AWD incorrectly suggested that none were present. Results indicated that participants
were signiﬁcantly more likely to correctly disagree with a suggested clear on easy slides
than on medium slides (t = -15.22, p < .01) and on challenging slides (t = -19.75, p < .01).
In addition, participants were more likely to correctly disagree with clears on medium
slides than on challenging slides (t = -6.74, p < .01). The contrasts using correct
disagreements with alerts yielded the same pattern. Participants were more likely to
correctly disagree with incorrect alerts on easy slides than on medium slides (t = -3.09, p
< .01) and on challenging slides (t = -9.91, p < .01). Participants were also more likely to
correctly disagree on medium slides than on challenging slides (t = -6.57, p < .01). This
structure of effects lends support to the slide difficulty categorizations and suggests that

slide difficulty had similar effects regardless of weapon presence or absence.

100

Appendix Q

Post-hoe Analyses of Point Performance

Task performance is a pervasive and important outcome in work-related studies.
Therefore, several analyses were performed in relation to task performance, as measured
by total points earned. A graph of the point totals reveals a relatively normal distribution
(see Figure 12). However, two outliers were present. The following analyses exclude
these two participants, who were removed from the analyses due to their extremely low
point totals. For information on point means and standard deviations, see Table 21.
Figure 12

Distribution of Total Points

40

 

30-

204

10-

Std. Dev = 980.18
Mean = 3356.7
N = 253.00

 

 

 

0 0 0 0 0

J 3 x3 T) x? J V Y 6‘ 6‘ 6‘
<3 ) '9 9 9 ) x3 ) ‘3 ) g3
s s .% s~ s .% 3‘ q,.% 5b s
.0 00 .0 00 I0 '0 .0 .0 00 .0 '0

sim points before penalty

First, the effects of the three manipulations on total points were examined. A
signiﬁcant positive relationship was found between trust condition and total points such
that those in the high trust condition (those with higher-functioning AWDs) scored more

points than did participants in the low trust condition (r = .29, p < .01). However, no

101

signiﬁcant relationships were found for self-efﬁcacy condition (r = .02, p = .73) or affect
condition (r = .03, p = .66).
Table 21

Descriptive Statistics for Total Points

 

 

N Min Max Mean Std Dev
Points 253 250.00 6325.00 3356.72 980.18
AWDUse 253 00.00 1 .00 .57 .44
Total
Disagree 235 00.00 1.00 .18 .16

 

Next, correlations were examined between total points and the scales included in
the study. No hypotheses were made regarding these relationships; they were examined
from an exploratory standpoint. A signiﬁcant negative relationship was found between
automation-induced complacency potential and total points (r = -.13, p = .04). This
relationship is interesting in that the relationship does not seem to operate through
automation use or automation monitoring (as those correlations were non-signiﬁcant).
Instead, automation-induced complacency potential seems to operate on task performance
through another mechanism not assessed in this study. Because of the attenuation
resulting from the low internal consistency of this scale, it is likely that the true
relationship between complacency potential and total points is actually stronger than the
one found here. In addition, signiﬁcant relationships with total points were found for
post-task trust (r = .23, p < .01) and conscientiousness (r = -.15, p = .02). The
relationships for extraversion (r = -.07, p = .25), desire to decrease cognitive load (r = -
.07, p = .28), self-efficacy (r = -.03, p = .66), faith (r = -.04, p = .52), pre-task trust (r =

.05, p = .47), and affect (r = .05, p = .39) were non-signiﬁcant.

102

It was also hypothesized that total points could be predicted by the interaction of
the trust condition and the extent to which the automated machine was used. It was
predicted that participants in the high trust condition who used the machine more would
have the most points and those in the low trust condition who used the machine more
would have the least points.

Table 22

Regression Analysis of Trust Condition and A WD Use on Total Points

 

 

Variable N p R2 ARZ
Trust Condition 129 432.17"‘
STEP 1 AWD Use 129 -1563.29* '09
Trust Cond 129 94.06
STEP 2 AWD USE 129 -6541.56* .14 .05 *
Trust Cond x USE 129 2795.20"'

 

* p < .05

As seen in Table 22, the interaction of trust condition and AWD use added
signiﬁcantly to the prediction of total points beyond the direct effects of those variables
(R2 change = .05, p = .01). As the graph in Figure 13 demonstrates, when AWD use was
zero, trust condition had no effect. However, as AWD use increased, the participants in
the low trust condition performed increasingly poorly relative to the participants in the
high trust condition.

A similar proposition was tested using automation monitoring and trust condition
(see Table 23). The interaction between trust condition and the proportion of correct
disagreements produced a signiﬁcant increase in prediction over and above the direct
effects (R2 change = .12, p < .01). A graph of this interaction reveals an interesting effect
(see Figure 14). As expected, for participants in the low trust condition, points increased

sharply as correct disagreements increased. However, for participants in the high trust

103

condition, points actually decreased somewhat as correct disagreements increased. This
ﬁnding might be explained by the functioning of the AWD program. In the high trust
condition, the AWD was programmed to provide a correct recommendation 85% of the
time. However, the exact percentage will approach 85% over time; the machine is not
correct exactly 85 out of 100 times. Perhaps the relatively brief length of this
experimental task allowed for variance in the actual percentage of slides on which the
AWD was correct. Therefore, some participants in the high trust condition may have had
AWD machines with a greater likelihood of making mistakes. Because trust condition
(and therefore AWD percentage correct) is signiﬁcantly related to points, perhaps the
decline in points as disagreements increase reﬂects an AWD that makes more mistakes
and thus requires more disagreements.

Figure 13

Plot of the Interaction of Trust Condition and A WD Use onto Total Points
——4500
4000

3500 --~~
Total + Low Trust

Pornts 2500 T“ Condition

2°00 -__-W-- .__________ s—I—High Trust
1500 Lu” WW I Conditionﬁ

1000 -- *— _

500
-0.2 0 0.2 0.4
AWD Use

 

 

7'

 

 

 

 

 

 

 

 

 

 

 

 

104

Table 23

Regression Analysis of Trust Condition and Correct Disagreements on Total Points

 

 

 

Variable N ,5 R2 AR2
Trust Condition 124 1049.21 *
STEP 1 Correct Disagreements 124 7 .78* '37
Trust Cond 124 3515.12*
STEP 2 Correct Disagreements 124 74.97* .51 .14 *
Trust Cond x Correct Disagree 124 -39.71*
* p < .05
Figure 14

Plot of the Interaction of Trust Condition and Correct Disagreements onto Total Points

4000
3500

3000 re— *-
Total

Points 2500 ‘
2000 ‘L‘** i —-— High Trust

1500 ~~ - _ LP___ Condition

 

 

 

 

 

__._. I—O— Low Trust
Condition

 

 

 

 

 

 

 

 

 

1000 WWW,“ ,_______.
500 -—————— --——---—-— -fo.

0 i
0 50 100

 

 

 

 

Correct Disagreements

Finally, the effects of automation use and automation monitoring on total points
were calculated. It is important to note that these results are sample speciﬁc and highly
dependent on the level of ﬁmctioning of the particular automated system in use here. The
relationship between automation use and total points was tested using the data from
session type A only. A signiﬁcant and negative correlation was found between

automated system use and total points (r = -.23, p = .01). This correlation indicates that

105

overall on this task, the more participants used the automated system, the lower their total
point score. The relationships between monitoring and total points were tested using
session type B only. It was interesting to ﬁnd that the proportion of disagreements with
the AWD related negatively to total points (r = -.48, p < .01), as did correct
disagreements with the AWD on easy slides (r = -.34, p < .01). However, the
relationships of total correct disagreements (r = -.O7, p = .45), correct disagreements on
medium slides (r = .10, p = .28), and correct disagreements on difﬁcult slides (r = -.13, p
= .14) with total points were non-signiﬁcant. These negative correlations may reﬂect the
trust condition — participants in the high trust condition needed fewer disagreements in
order to get a higher score. This interpretation is supported by the strong correlation
between trust condition and total points (r = .59, p < .01). In light of this fact, it is
difﬁcult to interpret these ﬁndings related to the relationships between automation use

and automation monitoring.

106

Appendix R

A ﬂect Manipulation Replication

The affect manipulation in the current study was included in an attempt to
replicate the effect found by Reeves and Nass (1996). This effect was not replicated in
the current study. There are at least two possible causes for this failure. First, the effect
of machine name on affect toward the machine may be very small in size. The sample
size of 253 in the current study may have been insufﬁciently large to detect a small
effect. Second, the manipulation in the present study was present in a one-page set of
instructions. Although the varied automated system name (either “the AWD” or “your
automated assistant”) appeared multiple times during the one page set of instructions, it is
possible that one page was unable to saturate the participants with a manipulation strong
enough for the effect to manifest. This hypothesis is supported by a non-signiﬁcant
correlation between affect condition and affect scores. In sum, it appears that either a
very strong manipulation and/or a very large sample size is required in order to detect this

particular manipulation.

107

Appendix S

Trust Measure Analysis

Muir (1987) proposed that six trust factors would affect trust in an automated
system. The 36-item post-task trust measure was composed of items representing ﬁve of
these trust factors: predictability, dependability, faith, competence, and responsibility
(recall that the sixth factor was measured as automation-induced complacency potential).
A reliability analysis revealed that the 36 items had an alpha coefﬁcient of .96, indicating
a high degree of internal consistency. Furthermore, an exploratory factor analysis
indicated that a strong initial factor on which 34 of the 36 items loaded. Figure 15
presents the scree plot for this factor analysis.
Figure 15

Scree Plot — Exploratory Factor Analysis of Trust Scale Items

Scree Plot
20

 

 

 

 

 

1 '35? 91W1131517192123252729313335

Factor Number

108

Because competence, predictability, responsibility, and dependence were all
manipulated together, ﬁrm conclusions regarding the divergent validity of these four
constructs cannot be drawn from these results. On the other hand, faith and persistence
(as automation-induced complacency potential) were not manipulated but measured. We
should therefore be able to discriminate between these two scales. An exploratory factor
analysis on the items of the faith and complacency potential scales do reveal the two
expected factors -— one factor on which all of the faith items load and one factor on which

the majority of the complacency potential items load.

109

 

IIIIIIIIIIIIIIIIIIIIIIIIIIIII

’lHIllllllllllllllllllllllllllllllllllllllll