lilll

31293 01087 5262
THES‘S

223222

This is to certify that the

dissertation entitled

Modeling the Impact of Extralegal Bias and
Defined Standards of Proof on the Decisions of Mock
I Jurors and Juries
presented by
Robert J. MacCoun

has been accepted towards fulﬁllment
of the requirements for

Ph . D 0 degree in PsyCh0109y

 

 

72222122”. KW

 

Major professor

Norbert L. Kerr
Date August 15, 1984

MSka an Affirmative Action/Equal Opportunity Institution 012771

2; I BARR?

L243
1'1! «5.52513

aétate

W332”

 

llllll lulllllllllll‘lll Mr M...

 

 

MSU

 

 

 

RETURNING MATERIALS:
Place in book drop to

 

LIBRARIES remove this checkout from

“ your record. FINES will
be charged if book is
returned after the date
stamped below.

 

SfP ? 51996

m ’7? 4

 

 

 

MODELING THE IMPACT OF EXTRALEEAL BIAS AND DEFINED

STANDARDS OF PROOF ON THE DECISIONS OF HOOK JURORS AND JURIES

By

Robert J. HecCoun

A DISSERTATION

Submitted to
Hichigen State University
in pertiel fulfillment at the require-ente

for the degree oi

DOCTOR OF PHILOSOPHY

Deperteent 04 Psychology

1984

ABSTRACT

MODELING THE IMPACT OF EXTRALEGAL BIAS AND DEFINED

STANDARDS OF PROOF ON THE DECISIONS OF MOCK JURORS AND JURIES

By

Robert J. HacCoun

Research on the psychology of the courtroom has documented
gaggglggél Qigg -- bias due to inadmissable or non-evidentiary factors
-- in the verdicts and related judgments of mock jurors. This
dissertation describes a ggitggigggggttigg model proposing that
extralegal factors influence the juror’s standard of proof, the
.threshold of evidence that must be crossed to render a guilty verdict.
The relationship between bias and this criterion is hypothesized to be
mediated by the perceived costs of convicting an innocent defendant or
acquitting a guilty one.

A 2 (Defined Standard of Proof) x 2 (Victim Attractiveness) x 2
(Defendant Attractiveness) x 2 (Subject Sex) factorial experiment was
conducted. Three hundred and twenty one subjects participated in
a simulated auto theft trial. Subjects were shown photographs that
varied in physical attractiveness and allegedly portrayed the victim
and the defendant. Subjects received either "beyond a reasonable
doubt" or "mere preponderance of evidence" standard of proof
instructions.

As predicted, the instructional manipulation resulted in a higher
conviction rate for the preponderance of evidence standard than for
the reasonable doubt standard. Although the case was close and the

attractiveness manipulations were strong, this study was not

able to detect attractiveness effects on either pre-deliberation
verdicts or recommended sentences. This failure to replicate previous
research might have resulted from the addition of auditory trial
information.

Each of several criterion estimates was more accurate at
predicting verdicts than expected by chance; however, the decision
theory and rank-order procedures were each significantly more accurate
than three self-reported probability formats, which may have been
inflated by social desirability.

Despite the absence of bias prior to deliberation, groups were
significantly less likely to convict the defendant when he was
attractive. This pattern is in clear contradiction to Kaplan and
Miller’s (1978) hypothesis that deliberation should attenuate
extralegal biases. Criterion modeling revealed more stringent
criterion estimates in the attractive defendant condition. There also
an unexpected opposite trend for the perceived weight of evidence.

Social Decision Scheme analyses demonstrated an asymmetry effect
for group verdicts; however, the hypothesis that this asymmetry

results from the reasonable doubt standard was not supported.

Dedicated
to the Memory of

Barbara A. MacCoun

ACKNONLEDGEMENTS

I owe all that is best in me to my father, Malcolm MacCoun. Dad,
your humor, patience, wisdom, and warmth seem infinite, and I love
you.

I have had the very good fortune of having not one, but two wise
and nurturant mentors during my graduate training. Norb Kerr and Larry
Messe’, you have each demonstrated all the best characteristics of a
good scientist and teacher: ceaseless persistence and enthusiasm, a
quick wit, a healthy dose of skepticism, and a strong sense of
diplomacy and fairness. I hope I can make you feel proud of me.
Thanks also to Bill Crano, Jack Hunter, and Gerry Miller for the
expertise, time, and direction they provided -- phew, what an ace team
of consultants!

Love and gratitude to Tassia Riordan, Kit Faulkner, Ann Kantner,

Tom MacCoun, Ralph "Bond" Duman, Hike "Bond" Malinowski, and Renee’ "Bond"

Rutz, -- each of whom made a big deal out of the Ph.D. and wouldn’t
tolerate any cynicism, pessimism, or mock humility. "Dr. Rob" -- I
love it!

Thanks to Lonnie Supnick, Berne Jacobs, Juliet Vogel, Pat Ponto,
and Xarifa ("It’s gonna be okay, isn’t it?") Greenquist of Kalamazoo
College, and to the staff, past and present, of Northwest Illinois
Human Resources Development Center, for believing in me.

While writing this beast, I kept my sanity and humor through a

weekly rotation of great meals and great company: Jan Hymes’ gourmet

creations; E1 Azteco slow burns con Los Dos Guys de Lansing
(featuring the inimitable Tape Man); bagelling with Dan (Pillar of
Sanity) Stults; Sunday dinners at the Pantry with Bim (my oldest
friend), Rich, Doug, and Martha; Peanut Barrel lunches with Jazz Man
Gorenflo; and nightcaps with Isidore Flores and Ray Kamalay at the
Varsity Inn.

Thanks to LePro for his lessons in controlled folly and to
Curious George for tickling my soul.

Finally, I never would have made it if it weren’t for my
colleague, friend, and co-bozo, Rob ("The Hymo") Hymes, Ph.D.. Dr.
Bob, you have made me the best man but you are still the best Bob.
Together, we have raised panic to the level of great art, as captured
in our mantra "Tomorrow, it starts.” I hope we will continue our
pursuit of progressive music, demented humor, and collaborative

research for years to come.

ii

TABLE OF CONTENTS

PAGE

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . vi
CHAPTER 1: INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . 1
Juror Decision-Making and the Hypothesis-Testing Metaphor . . . . 2

The Thomas and Hogue Model . . . . . . . . . . . . . . . . .

m

An Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 9
The Extralegal Effects of Victim and Defendant Attractiveness . 10
Modeling the Evaluation of Evidence . . . . . . . . . . . . . . 14
Information Integration Model . . . . . . . . . . . . . . . . 14
Bayesian Model . . . . . . . . . . . . . . . . . . . . . . . 15
Modeling the Decision Criterion . . . . . . . . . . . . . . . . 16
The Blackstone approach . . . . . . . . . . . . . . . . . . . 17
The Self-Report approach . . . . . . . . . . . . . . . . . . 18
The Rank-Order approach . . . . . . . . . . . . . . . . . . . 19
The Statistical Decision Theory approach . . . . . . . . . . 20
The comparative accuracy of the approaches . . . . . . . . . 23
Experimental Research on Judicial Instructions. . . . . . . . . 25
Comprehensibility . . . . . . . . . . . . . . . . . . . . . . 27
Motivation to comply with judicial instructions . . . . . . . 30
"Rationalization” . . . . . . . . . . . . . . . . . . . . . . 31
Modeling the impact of the judge’s instructions . . . . . . . 33
From Juror Verdicts to Jury Verdicts . . . . . . . . . . . . . 36
Kaplan’s Evidentiary Polarization Hypothesis . . . . . . . . 37

The Asymmetry Effect . . . . . . . . . . . . . . . . . . . . 40

TABLE OF CONTENTS (Continued)

CHAPTER 2: METHOD . . . . . . . . . . . . . . . . . . .

SUbjECtS and Design . . . . . . . . . . . . . . . . .
Stimulus Materials and Pilot Studies . . . . . . . .
Attractiveness manipulations . . . . . . . . . . .
Trial Materials . . . . . . . . . . . . . . . . . .
Procedure . . . . . . . . . . . . . . . . . . . . . .

Dependent Measures . . . . . . . . . . . . . . . . .

WTER 3 g mst C O O O O O I C C O I O O I I I O I

Manipulation Checks for the Attractiveness Factors .
Pre-Deliberation Verdicts and Guilt-Related Judgments
Evaluations of the Victim and Defendant . . . . . . .

The Victim . . . . . . . . . . . . . . . . . . . .

The Defendant . . . . . . . . . . . . . . . . . . .

Subjective Probability of Built and Criterion Estimates

Self-reported p(G) and pt estimates . . . . . . . .
Indirect estimates of pi . . . . . . . . . . . . .
Measuring the accuracy of the criterion estimates .
Thomas and Hogue Estimates . . . . . . . . . . . .
Criterion instruction manipulation checks . . . . .
Group Verdicts . . . . . . . . . . . . . . . . . . .
Effects of Deliberation on Individual Verdicts . . .
Modeling Analyses . . . . . . . . . . . . . . . . .

Examination of the Asymmetry Effect . . . . . . . . .

iv

PAGE

41

41

43

44

45

49

53

64

65

65

67

69

72

73

75

78

79

81

TABLE OF CONTENTS (Continued)

CHAPTER 48 DISCUSSION . . . . . . . . .

Victim and Defendant Attractiveness and Juror

Estimates of Perceived Probability of Guilt

and the Decision Criterion . . . . .
Self-Report Estimates . . . . . . .

Indirect Estimates . . . . . . . .

Compliance with Standard of Proof Instructions

Extralegal Defendant Attractiveness Bias

Following Group Deliberation . . . .

Standards of Proof and the Asymmetry Effect .

The Mock Jury Technique: Is it Externally Valid?

REFERENCES . . . . . . . . . . . . . .

FOOTNOTES . . . . . . . . . . . . . . .

APPENDIX A: Experimental Materials . .

Departmental Research Consent Form .

Judgments

Pre-Deliberation Individual Juror Questionnaire

Foreperson’s Questionnaire . . . . .

Post-Deliberation Individual Juror Questionnaire

APPENDIX B: Analysis of Variance and Log-Linear Tables

PAGE

85

B6

B9

89

91

94

99

101

104

114

116

117

118

123

14.

15.

15.

LIST OF TABLES

Cell Sizes for the Experimental Design . . . . . . . . . .
Pilot Study Scale Ratings for

Victim and Defendant Photographs . . . . . . . . . . . . .
Individual Pre-Deliberation Verdicts by Instructions . . .
Instruction x Defendant Attractiveness Interaction

on Guilt Ratings for Subjects with Extreme Attractiveness
Ratings . . . . . . . . . . . . . . . . . . . . . . . . .
Correlations Between Evaluative Ratings and Guilt Scores .
Multi-Trait/Multi-Method Matrix of Self-Reported

p(G) and pt Estimates . . . . . . . . . . . . . . . . . .
Intercorrelations Among pt Estimates . . . . . . . . .
Mean pi and Accuracy Rates . . . . . . . . . . . . . . . .
Z-Tests of the Relative Accuracy of pt Estimates . . . . .
Instructional Manipulation Checks for Each pt Estimate . .
Group Verdicts by Size . . . . . . . . . . . . . . . . . .
Group Verdicts by Defendant Attractiveness . . . . . . . .
Social Decision Scheme Matrix for Each

Defendant Attractiveness Condition . . . . . . . . . . .
Time x Defendant Attractiveness Interaction on Individual
Pre- and Post-Deliberation Guilt Scores . . . . . . . . .
Social Decision Scheme Matrix for All Four-Person Groups .
Social Decision Scheme Matrix for Each Instructional

condition I I I I I I I I I I I I I I I I I I I I I I I I

vi

PAGE

44

60

62

65

66

68

69

71

74

76

76

77

79

84

LIST OF TABLES (Continued)

TABLE PAGE
B-l. Analysis of Variance: Victim Attractiveness Manipulation

Check by Subject Sex, Instructions, Victim and Defendant

Attractiveness . . . . . . . . . . . . . . . . . . . . . . . 126
8-2. Analysis of Variance: Defendant Attractiveness Manipulation

Check by Subject Sex, Instructions, Victim and Defendant

Attractiveness . . . . . . . . . . . . . . . . . . . . . . . 127
8-3. Log-Linear Analysis: Individual Pre-Deliberation Verdicts

by Subject Sex, Instructions, Victim and Defendant

Attractiveness . . . . . . . . . . . . . . . . . . . . . . . 128
8-4. Analysis of Variance: Pre-Deliberation Guilt Scores by

Subject Sex, Instructions, Victim and Defendant

Attractiveness . . . . . . . . . . . . . . . . . . . . . . . 129
8-5. Analysis of Variance: Pre-Deliberation Guilt Score Internal

Analysis by Subject Sex, Instructions, Victim and Defendant

Attractiveness . . . . . . . . . . . . . . . . . . . . . . . 130
8-6. Analysis of Variance: Recommended Sentences by Subject Sex,

Instructions, Victim and Defendant Attractiveness . . . . . 131
B-7. Analysis of Variance: Victim Believability by Subject Sex,

Instructions, Victim and Defendant Attractiveness . . . . . 132
8-8. Analysis of Variance: Victim Likeability by Subject Sex,

Instructions, Victim and Defendant Attractiveness . . . . . 133
8-9. Analysis of Variance: Victim Intelligence by Subject Sex,

Instructions, Victim and Defendant Attractiveness . . . . . 134

LIST OF TABLES (Continued)

TABLE PAGE

B-lO. Analysis of Variance: Sympathy for Victim by Subject Sex,
Instructions, Victim and Defendant Attractiveness . . . . . 135
8-11. Analysis of Variance: Defendant Believability by Subject Sex,
Instructions, Victim and Defendant Attractiveness . . . . . 136
8-12. Analysis of Variance: Defendant Likeability by Subject Sex,
Instructions, Victim and Defendant Attractiveness . . . . . 137
8-13. Analysis of Variance: Defendant Intelligence by Subject Sex,
Instructions, Victim and Defendant Attractiveness . . . . . 138
B-14. Analysis of Variance: Sympathy for Defendant by SUDjECt Sex,
Instructions, Victim and Defendant Attractiveness . . . . . 139
8—15. Log-Linear Analysis: Group Verdicts by Subject Sex,
Instructions, Victim and Defendant Attractiveness . . . . . 140
B-16. Repeated Measures ANOVA: Guilt Scores by Time, Size,
Instructions, Victim and Defendant Attractiveness . . . . . 141
8-17. Repeated Measures ANOVA: Mean p(G) Estimates by
Subject Sex, Instructions, Victim and Defendant

Attractiveness . . . . . . . . . . . . . . . . . . . . . . . 143

viii

CHAPTER 1

INTRODUCTION

In the psychological study of the legal system, it is the squeaky
wheels that receive the grease. As in other areas of psychology,
legal psychologists tend to focus on pathology. Although this focus
on flaws and problems at times may strike the layman as a morose
outlook on the world, for the psychologist, it is often the simplest
way of gleaning insights into the way things work when they
work well. For instance, our system of common law is a system of
fact-finding and fact-weighing. Its personification, Themis, balances
facts in her scales and shields her eyes from all appearances which
threaten to seduce and mislead. But for the psychologist, the obvious
task is to stand at her feet and try to catch her peeking. Thus, up
to this point in its relatively brief history, the psychology of the
law has been predominantly the study of extralegal bias, whether in
eyewitness testimony, pre-trial publicity, parole decisions, jury
composition, or the impact of physical and personal characteristics of
the plaintiff and the defendant on the administration of justice.

Characteristics of the actors in the system may, in some
instances, be legally relevant (e.g., credibility), but psychologists
have tended to focus on extralegal characteristics, i.e.,
characteristics that should have no legal bearing on the decision to
convict or acquit the defendant. For example, Sigall and Ostrove
(1975) have demonstrated that jurors are influenced by the physical
attractiveness of the defendant. Kerr (1978a) found that mock jurors

were more likely to vote for conviction when the victim of a crime was

both ”beautiful and blameless" (i.e., when she took precautions to
prevent the crime). Other variables whose potential effects have been
explored include the race, religion, occupation, and physical stigmata
of defendants and victims. (For a recent review of this literature,
see Dane & Wrightsman, 1982.)

The tacit assumption behind the extralegal status of these
characteristics seems to be that they should have no objective logical
bearing on the weight of evidence against the defendant. For
legal theorists, the question is: Does bias miscalibrate the balance
or come to rest on its scales? Some scholars (e.g., Kaplan & Miller,
1978; Shaffer, Case, & Brannen, 1979) argue that bias is weighed along
with the evidence -- a disturbing prospect but for the hope that the
evidence can come to weigh increasingly more, relative to the biasing
information, as the judgment process proceeds. Others (e.g., Kerr et
al, 1984; Thomas & Hogue, 1976) have argued that when bias influences
verdicts, it often does so through its impact on the judge or
juror’s standard of proof, the criteria for the amount of evidence
necessary to conclude guilt beyond a reasonable doubt. This
dissertation examined the potential extralegal impact of victim
and defendant attractiveness in a mock criminal trial, and in
addition, it explored the.role of two legal procedures -- the judge’s
charge to the jury and the jury deliberation -- in shaping jurors’

judgments and possibly moderating extralegal bias.

Juror Decision-Making and the Hypothesis-Testing Metaphor
[he 199995 egg ﬁggg_ ggggl. In 1976, Thomas and Hogue presented

a formal mathematical model of juror decision-making that is roughly

analogous to formal models in signal detection theory. But since the

”true" state can never be known in legal hypothesis-testing, the
Thomas and Rogue model includes only one distribution.

Thomas and Hogue have postulated two relevant parameters: the
perceived weight of evidence against a defendant, and a judgmental
criterion for "reasonable doubt.” The perceived weight of evidence is
conceived of as a random variable, X, with probability density
function f(x), and expected value, m. The decision criterion, c,
divides f(x) into regions "for“ and “against” the defendant. Thus,
the ith juror will compare his/her estimate of the defendant’s guilt,
X(i) against the criterion, and will convict if X(i) > c, or acquit if
X(i) < c.

In order to estimate m and c, Thomas and Hogue make the assumption
that a juror’s confidence in his/her verdict is a monotonically
increasing function, g, of 1X - :1, the discrepancy between the
perceived weight of evidence and the decision criterion. This allows
them to estimate m and c by collecting jurors’ verdicts and ratings of
confidence-in-verdict. As a matter of mathematical convenience,
Thomas and Hogue further assume that f(x) is characterized by an
exponential and asymmetric distribution. Through a rather complex
bootstrapping operation which is beyond the scope of this paper (and,
at the moment, beyond the mathematical prowess of its author), they
compare and evaluate three such distributions (exponential,
generalized gamma, and generalized Laplace) and demonstrate that these
assumptions are reasonably valid.

Using Thomas and Hogue’s model, Kerr (1978a) has demonstrated that
the impact of the attractiveness and precautiousness of the victim on

mock jurors’ verdicts was mediated by shifts in their reasonable doubt

criterion. When the victim was "beautiful and blameless," jurors
required less evidence to convict the defendant than when she was not.
In a second application, Kerr (1978b) demonstrated that the conviction
rate for mock jurors was inversely related to the severity of the
prescribed penalty; again, the Thomas and Hogue model indicated that
this relationship was mediated by jurors’ requirements of proof.

Kerr, Bull, MacCoun, and Rathborn (1984) found that a victim’s
attractiveness, precautiousness, and degree of facial disfigurement
influenced mock jurors’ verdict decisions, and that this effect
appeared to be mediated by the reasonable doubt criterion.

In order to get an intuitive grasp of the Thomas and Hogue model,
it is useful to consider a metaphor that is familiar to most
psychologists. Feinberg (1971) has pointed out that the juror’s task
is very similar to that of the inferential statistician. Both attempt
to infer "truth" based upon the available evidence. The common law
notion that the defendant is “innocent until proven guilty” provides
the juror with a ”null hypothesis," and the "reasonable doubt"
criterion provides the juror with an "alpha level." Thus, Feinberg
points out that the juror faces two possible errors: the "Type I
error" of convicting an innocent defendant, and the "Type II error" of
acquitting a guilty defendant. Although Feinberg originally conceived
of this metaphor as a tool for teaching statistics to college
undergraduates, many psychologists have found it to be a useful tool
for understanding and modeling the juror’s task.

Following Feinberg’s metaphor, Kerr, et al.(1984) argued for an
additional mediational link in the juror’s decision process. He
suggested that extralegal victim and defendant characteristics may

influence the reasonable doubt criterion by affecting the perceived

costs of acquitting a guilty person or convicting an innocent person,
respectively. This conceptual framework is illustrated in Figure 1.
Factors that lead jurors to sympathize with a defendant might heighten
their concern over avoiding a false conviction. As an extreme
example, consider a juror who is a personal friend of the defendant.
We would argue that this juror would require a great deal more
evidence to vote to convict than would a juror who was a complete
stranger. However, we would not expect these jurors to differ in
their reaction to seeing the defendant set free, if the weight of
evidence clearly suggests that he is guilty.

Factors which lead jurors to sympathize with the victim of a
crime, on the other hand, should lead to an increase in their desire
to avoid acquitting a guilty person. Consider a second example
(again, a rather extreme one) in which a juror is a friend of the
victim of a rape. We would predict that this juror will be much more
concerned about the possibility of acquitting the defendant if he is
guilty, especially since he may retaliate against the victim, than
would a juror who was a complete stranger. However, we would not
expect these jurors to differ in their desire to avoid convicting the
defendant if he is clearly innocent.

We are suggesting that these perceived costs may be reflected in
the level at which jurors set their reasonable doubt criterion. In
extreme cases, it is possible that jurors might either lower their
criterion so low as to convict the defendant no matter what the
evidence, or raise it so high as to refuse to convict an obviously
guilty defendant. However, the likelihood of such extreme cases may

be minimized by the voir dire procedure.

 

~l

Extra-legal
defendant

characteristics

Figure l

 

 

 

 

 

Other

factors

 

Perceived cos:
of juridic Type I
error

 

 

 

 

 

 

 

Extra-legal
victim
characteristics

 

Perceived cost
of juridic lype 11

error

 

 

 

 

 

Evidence

 

 

Definition of
.decision criterion

in judge's charge

1

 

 

 

 

Setting 0177
decision
criterion

 

 

Verdicts

 

 

 

 

 

 

Other
factors

 

 

 

Perceived
probability
of guilt

 

 

 

It seems more likely that extralegal victim and defendant
characteristics will manifest themselves more subtly, and will only
effect verdicts when the evidence is rather equivocal, i.e., near the
range of most jurors’ reasonable doubt thresholds. Informally, we can
suggest a number of extralegal factors that might plausibly influence

the perceived costs of these juridic errors in actual practice. For

example:

PERCEIVED Attraction to/sympathy for defendant

COST OF ‘\\‘\‘ Penalty severity

TYPE I ERROR ‘tDemand for improvement in police
inquiry or conduct
Belief in efficacy/morality
of penal system

PERCEIVED Attraction to victim

COST OF Belief in deterrent effect

TYPE II ERROR \T‘TT‘tDesire to avenge victim
Desire to punish defendant

We would not expect all extralegal factors to influence verdicts
through the mediation of the reasonable doubt criterion, however. For
example, jurors’ verdicts could be influenced by evidence (e.g., prior
criminal record) that has been ruled inadmissable during the trial
(cf. Doob & Kirshenbaum, 1972; Hans & Doob, 1976; Sue et a1, 1973,
1974) or has been publicized prior to the trial and then excluded from
the trial (cf. Kerr & MacCoun, 1983). Such factors could plausibly
effect the perceived weight of evidence without influencing the
decision criterion.

Figure 1 also suggests that in addition to these juridic costs,
the judge’s reasonable doubt instructions to the jury could also serve

as an input in establishing the stringency of the decision criterion.

By instructing the jurors to set a very stringent criterion level, the
judge can create the same result that a high concern over Type I
errors would have. And indeed, there is a good reason for doing so.
As Loftus (1983) has pointed out, the Type I error may indeed be more
costly, for it is easy to overlook the fact that when we convict an
innocent defendant, we also neglect to convict the true culprit.
Ironically, Champagne and Nagel (1982) have reviewed a number of
political reasons why judges may tacitly prefer a policy that reduces
the decision criterion, despite the risk of Type I errors that result
in the conviction of an innocent defendant while the true culprit
remains at large.

It is important to consider an additional effect that judicial
instructions may have, however. In addition to simply raising or
lowering the criterion level, these instructions may also change
jurors’ perceived costs of Type I and Type II errors, or reduce the
weights that jurors place on these subjective factors. Thus, the
instructions could actually reduce the impact of extralegal victim and
defendant characteristics, regardless of the direction of their

effects.

An Overview

The present chapter reviews research relevant to the
conceptual model presented in Figure 1, including models of the
evaluation of evidence, attempts to quantify the reasonable doubt
criterion, manipulations of the judicial definition of reasonable
doubt, and studies of factors influencing jurors’ ability and
motivation to comply with judicial instructions. Then, subsequent

chapters will describe a 2 (Victim attractiveness) X 2 (Defendant

attractiveness) X 2 (Standard of Proof instructions) factorial
experiment that was designed to provide a direct test of various
components of the model. Although any number of extralegal victim and
defendant characteristics might be useful for validating the model,
physical attractiveness has the benefit of being unambiguously
extralegal in an auto theft case such as that used here.

In addition to assessing individual verdicts and related
judgments, the present study also examined the verdicts of
deliberating jurors. Assessments of group verdicts must employ the
group as unit of analysis; unfortunately, practical constraints
prohibit the use of traditional twelve-person juries if the analyses
are to have adequate statistical power. Therefore, in the present
study, subjects deliberated in groups of two to four after completing
the individual questionnaires. The use of deliberating groups allowed
tests of several hypotheses regarding (a) the impact of deliberation
on extralegal biases (cf. Kaplan & Miller, 1978), and (b) the asymmetry
typically found in social decision scheme matrices (cf. Stasser, Kerr,
& Bray, 1982). These hypotheses are described in greater detail

below.

The Extralegal Effects of Victim and Defendant Attractiveness
Preliminary experimental evidence of the long-standing hunch that
good-looking defendants can I'get off easy“ came from a study reported
by Efran (1974). Efran provided students with a photograph of an
attractive defendant, an unattractive defendant, or no photograph at
all, and a fact sheet describing an incident in which a student was
allegedly caught in the act of cheating. He found that the attractive

defendant was less likely to be found guilty, and received a lighter

punishment than the less attractive defendant. Unfortunately, Efran
counfounded the sex of the subject with the sex of the defendant by
providing only male photos for females, and vice-versa. This
confounding is unfortunate because post-hoc contrasts suggest that the
effects for guilt and punishment were only significant for males
judging females, a pattern which is therefore difficult to interpret.

Kaplan and Kemmerick (1974) used trait adjectives to manipulate
the social attractiveness of defendants. Defendant characterization
and the amount of evidence were both varied in a within-subject design
employing a series of traffic felony trials. In addition, one third
of the subjects were told that the nonevidentiary defendant
characterizations might be useful for their judgments, one third were
told that such information was often misleading and inaccurate, and
the remaining third were given no special instructions. Kaplan and
Kemmerick report that both the evidentiary and nonevidentiary factors
were integrated in an additive fashion consistent with the predictions
of a weighted-average model described in a later section of this
chapter. The instructional manipulation had no effect, however.

A number of studies of defendant attractiveness (e.g., lzzett &
Fishman, 1976; Izzett & Leginski, 1974; Hichelini & Snodgrass, 1980;
Sigall & Ostrove, 1975) have examined possible moderating variables.
Sigall and Ostrove (1975) examined the possible moderating influence
of type of crime. Subjects read a brief transcript of a trial in
which a woman was either accused of (a) burglarizing an apartment for
$2,200 in cash and merchandise, or (b) swindling a middle-aged
bachelor into investing $2,200 in a non-existant corporation.

Subjects received either a photograph of an attractive defendant, an

unattractive defendant, or no photograph at all. The investigators

lO

solicited an attractiveness manipulation check and subjects’
recommended prison sentences, but regretably neglected or chose not to
obtain guilt judgments. Thus, jurors predisposed toward acquittal had
a limited range of response options for reacting to the trial

stimuli. Since the trials were constructed to imply the defendant’s
guilt, however, this might not have been a serious problem.

An Attractiveness by Offense interaction indicates that subjects
were significantly more lenient with the attractive defendant if the
crime was burglary, but more lenient with the unattractive defendant
if the crime was a swindle. Simple effects tests indicated that the
latter comparison was not significant, however. Comparisons to the
control condition suggested that the defendant received almost
identical treatment in both the unattractive and no photograph
conditions; apparently, unattractive defendants did not receive
discriminatory treatment in either case. Sigall and Ostrove argue
that the swindle case was a crime for which attractive defendants are
more likely to be successful and more likely to pursue in the future.
Conversely, for a more conventional type of crime, attractive
defendants receive the benefit of the doubt because they presumably
have socially desirable traits which would promote rehabilitation and
successful adjustment to the community.

Sigall and Ostrove interpret their pattern of results as
supporting a cognitive rather than a reinforcement-affect
interpretation of attractiveness-leniency effects. In this study, it
is difficult to determine whether attractiveness influenced subjects’
estimates of the probability of guilt or their judgments of the

expediency of rehabilitation. This difficulty is compounded by (a)

ll

the reliance on sentencing guidelines as a primary dependent measure,
and (b) the apparent confounding of specific photographs with levels
of attractiveness, so that a specific facial cue have triggered
inferences of intelligence, expressivity, or some other trait assumed
to be relevant to the burglary/swindle distinction.

Hichelini and Snodgrass (1980) follow a similar line of
reasoning. Subjects in their study received descriptions of
defendants which were either positive or negative and either relevant
or irrelevant for a traffic felony. Hichelini and Snodgrass found
that attractive traits only reduced perceptions of guilt when those
traits had relevance to the crime (e.g., "careful and deliberate”).

However, these results are also problematic. Since different
positive or negative traits were used depending on whether they were
relevant or irrelevant, these two factors were not truly crossed. The
manipulation check for relevancy solicited from a pilot group of
subjects indicated a reliable trait attractiveness by relevancy
interaction. Decomposition revealed a simple main effect for
attractiveness for relevant traits, in which attractively described
people were expected to be less likely to act in the described '
criminal manner than unattractively described people. No main or
simple main effects for relevancy on its own manipulation check are
reported, and the subjects in the main study did not complete
manipulation checks.

Less experimental attention has been given to the effects of
victim attractiveness. Uhile Thornton (1977) found no effects for
guilt ratings, the experiment involved a rape trial, in which victim
attractiveness may have had implications for probability of guilt

which might counteract any tendency to help an attractive victim find

12

justice. As described above, Kerr (1978) found that conviction was
more likely in an auto theft case when the victim was attractive, but
only if she took necessary precautions to avoid the crime.1 Kerr,
Bull, MacCoun, and Rathborn (1984) found that a complex interaction of
victim attractiveness, precautiousness, and facial disfigurement
influenced guilt ratings, but found no main effect for attractiveness.
To summarize, although numerous studies report defendant
attractiveness effects, these studies are plagued by a myriad of
methodological flaws, and they are limited in some cases to sentencing
effects rather than guilt effects. Moreover, several of these studies
manipulated gggigl, rather than physical, attractiveness. Relatively
few studies have examined victim attractiveness, and those that have
do not present a simple pattern. Therefore, a first objective of the

present study was to examine whether victim and defendant

attractiveness have reliable effects upon judgments of guilt.

Hodeling the Evaluation of Evidence

Thomas and Hogue (1976) do not articulate the cognitive processes
involved in evaluating the evidence presented in a trial in order to
establish a perceived weight of evidence. They do rationalize the use
of an exponential pdf f(x) by assuming a Poisson process in which
apparent weight of evidence increases by amount kA every time
interval of length A until some critical evidentiary datum appears at
the interval from t to t + A, at which point apparent weight “freezes“
at X = kt. Thomas and Hogue seem to suggest that this assumption was
created as a mathematical convenience rather than a psychological
postulate. In their review of juror decision-making models,

Pennington and Hastie (1981; cf. Penrod & Hastie, 1979) review several

13

other, more specific, cognitive models of juror decision-making. I

will review two such models briefly.

Information Lgtggggtigg ggggl. Based on the work of Norman
Anderson (e.g., 1981) the Information Integration model suggests that
jurors combine their initial estimate of guilt with information
presented during the trial in a process of valuation, the assignment
of scale values (5 ) to each piece of information, and integration, in
which each piece of information is given a weight (w ) and averaged.

I
For example:

 

J = [13

where J is the subjective likelihood of guilt. Dstrom, werner and
Saks (1978) have used this approach to analyze mock jurors’
presumption of innocence. They distinguish between four possible juror

strategies of "fair mindedness": (1) the juror can set s = .50
0
deciding that guilt or innocence are equally likely; (2) the juror can

attempt to be objective, and since more persons brought to trial are

found guilty than not-guilty, the juror can set s > .50 (see the
0
Bayesian model, described below, for an analysis of this type of

reasoning); (3) the juror can actually "presume innocence”, i.e., set

s = 0; or (4) the juror can decide to completely ignore his or her

0

predispositions, whatever they may be, by setting w = 0. Ostrom et
0

al. report that their mock jurors did apparently presume innocence

(strategy 3), and that s was averaged with the trial evidence to

0
produce a judgment of guilt. They also classified subjects as either

14

pro- or anti-defendant, and found that while anti-defendant subjects
actually set a lower level of s , they were also quicker to abandon
the presumption of innocence inothe face of evidence than were pro-
defendant subjects. The latter result may indicate that the anti-
defendant subjects had a lower reasonable doubt criterion; however, a
drawback of the information integration model is that it does not
account for the reasonable doubt criterion (cf. Pennington & Hastie,
1981), making it an incomplete portrayal of the juror decision
process.

Pennington and Hastie (1981) also review a sequential weighing
model that is an earlier precursor to the information integration
model. Its primary distinction is that it assumes that the averaging
process takes place sequentially, as each new item is encountered,
rather than at the completion of the trial. This assumption makes the
sequential weighing model more consistent with the Poisson process
assumed by the Thomas and Hogue model.

Martin Kaplan (e.g., Kaplan & Miller, 1978) has been one of the
foremost proponents of the information integration model. He has
provided a conceptualization of the deliberation process which yields
strong predictions regarding the impact of extralegal factors on
verdicts. Kaplan’s work is detailed in a later section of this
chapter.

Bgyggigg ﬂgggl. The Bayesian model is a normative model
suggesting the correct approach to integrating evidentiary information
consistent with probability theory. As such, it may not be a good
model of how jurors actually do reach decisions. Such models are often

used both as theories of decision-making (when they fit the subjects’

data) and as tools for discovering cognitive biases (when they don’t).

15

One such model (Marshall & Wise, 1975) is:

R = P(G:E )/P(NG:E ) [2]
n n n

where R is the posterior odds for guilt, the ratio of the probability
of guilt given all the evidence to the probability of not guilty given
all the evidence; this model is algebraically equivalent to the
probability of guilt prior to the evidence, R , multiplied by the
product of the likelihood ratios for each ite: of evidence, which are
measures of the diagnosticity of each item of information for
assessing the probability of guilt. As with other applications of
Bayesian analyses (cf. Nisbett & Ross, 1980), this model has not

described the decisions of mock jurors very accurately (Pennington &

Hastie, 1981).

Hodeling the Decision Criterion

The American judicial system has adopted the common-law tradition
of protecting the defendant from false conviction by placing the
“burden of proof" in a criminal trial upon the prosecution. The
defendant is to be "presumed innocent until proven guilty." During the
"fact-finding" process of the trial, the prosecution presents evidence
against the defendant in an attempt to build a case establishing
guilt. At the conclusion of the trial, jurors (or, in a bench trial,
the judge) must review the evidence and decide to convict the
defendant if, and only if, the evidence indicates “beyond a reasonable
doubt" that the defendant committed the crime.

Unfortunately, the reasonable doubt criterion, although noble in

spirit, is extremely vague and difficult to define unambiguously in

16

practice. A variety of “stock" definitions have been created in
various American court systems, and in some courts the judge is given
discretion to define the criterion as he or she sees fit in a specific
trial. As Simon (1970) has suggested, judges often attempt to define
the phrase "beyond a reasonable doubt” by providing jurors with

paraphrases or apparently synonymous terms. For example:

Reasonable doubt is one a reasonable person has after carefully
weighing all the testimony and is one a reasonable person would
act or decline to act upon. It is not a capricious doubt or a
fanciful doubt or a doubt arising in anyone’s mind because of any
sympathy for the defendant. It is in essence what the words
obviously mean - a reasonable doubt. A reasonable doubt may arise
not only from the evidence produced but also from a lack of

evidence.

Numerous researchers have attempted to translate the reasonable
doubt criterion into a more concrete, quantifiable definition. One
approach, the Thomas and Hogue (1976) model, has already been

described above. Several other approaches are reviewed below.

_hg Blagkgtggg ggpggggh. Following Blackstone’s assertion that
”it is better to let ten guilty men go free than to allow one innocent
man to be convicted” (cited in Kaplan, 1982), several authors have
suggested that the reasonable doubt criterion can be expressed as such
a ratio. For example, Grofman (1977) argues that in order to minimize
the expected disappointment in a verdict, jurors should rationally

apply the formula

17

pt = --------- [3]

where pt is the threshold probability above which the juror is able to
convict beyond a reasonable doubt, and r is the number of guilty
defendants the juror is willing to set free in order to avoid
convicting one innocent defendant. Kaplan (1982) points out that
Blackstone’s assertion, often considered representative of the
viewpoint of the American judicial system, therefore sets the
criterion at .91. Jurors can therefore compare their subjective
probability of guilt estimate against this criterion and convict if

and only if p(G) exceeds the criterion (cf. Cullison, 1977).

:ngggt gpgggggh. Simon and Mahan (1971) operationalized
the decision criterion as the minimum probability of guilt required
for a given judge or juror to vote guilty. Respondents were asked the

following question:

What would the likelihood or probability have to be that a
defendant committed the act for you to decide that he is guilty?
(FILL IN THE BLANK)

I would have to believe that it was a out of ten chance

that the defendant committed the act.

Simon and Mahan solicited self-reported criteria from judges, members
of the jury pool, and college students and found mean probabilities
of .89, .79, and .89, respectively. The modal criterion reported by
each group was 1.0, a requirement of absolute certainty. The fact

that 31% of the judges required absolute certainty and 69% did not may

18

demonstrate the ambiguity of the "beyond a reasonable doubt" concept;
alternatively, it may indicate that respondents had a difficult time
using Simon and Nahan’s response format.

Iversen (1971) has criticized Simon and Hahan’s operationalization
of the concept of probability, arguing that it is meaningless to use
probability in the sense of "relative frequency,” since the defendant
can not be tried ten times for the crime (nor, for that matter, can we
try ten different individuals with identical evidence in the same
trial). Instead, Iversen advocates the use of an "uncertainty”
conceptualization of probability, in which numbers between zero and
one signify the degree to which the juror is certain as to the
defendant’s guilt. (See Kerr, et a1, 1976, and Dane, 1979, for

applications using the zero-to-one scale).

Ihg Bagkggcgg; ggpggggh. Simon (1967) had half of her mock
jurors indicate their verdict after reading a trial transcript; the
other half were instructed to indicate the probability that the
defendant committed the act for which he was charged, using a 21-point
scale ranging from "0 out of 10 chance" to "10 out of 10 chance.”
Simon then rank-ordered the probabilities from highest to lowest.
Assuming that the subjects in both groups were randomly distributed,
Simon obtained an estimate of the reasonable doubt criterion for the
sample by counting down the probabilities, until the number of guilty
votes in the other group was reached. Using this technique, Simon
found estimates of .70 to .74. Unfortunately, as Dane (1979) has

pointed out, the accuracy of this technique could not be assessed since

Simon did not obtain both measures from each subject.

19

Ins §§stistissl 92212190 IDEQE! éEQEQéED- This approach (.-9-:
Fried, Kaplan, & Klein, 1975; J. Kaplan, 1968; Nagel, 1979, 1982) is
more theoretical than the previous, methodological approaches to
modeling the decision criterion. Furthermore, it is similar to the
conceptual model of Kerr, et al. (1984), because it includes juror’s
perceived costs of Type I and Type II errors as components of the
decision criterion. In addition, it also considers the perceived
utilites of correct verdicts. One such model was offered by Fried,
Kaplan, and Klein (1975), who consider the following matrix of

subjective expected utilities (U’s):

State of the World

 

 

Guilty Innocent
Convict U U
CG CI
Decision
Acquit U U
AG AI

Note that UCI and UAG correspond to what Feinberg (1971) and Kerr
et al. (1984) refer to as Type I and Type II errors, and are
conceptualized as "disutilities" with a value less than zero. Fried
et al. suggest that the expected utility (EU) of convicting the

defendant is

EU(C) = p u + (l-p) u :43
CG C!

where p is the juror’s subjective probability of guilt estimate.

Similarly, the expected value of acquitting the defendant is

20

EU(A) = p U + (1-p) U [53
AG AI

Fried et a1. argue that a juror should convict if and only if

EU(C) > EU(A) [6]
or
p u + (l-p) u > p u + (l-p) u [71
CG C1 A6 AI

Algebraically, Fried et al. then proceed to derive the juror’s

decision rule given the above assumptions:

 

p(U - U ) + (1-p)(U - U ) > 0 [8]
C6 A6 CI AI

p(U - U ) + (U - U ) + p(U - U ) > 0 [9]

C6 A6 CI Al Al CI
p(U - U + U - U ) > U - U [10]

C6 A6 AI CI AI CI

U - U
AI CI
p > [11]
U - U + U - U

Thus, the right half of equation [11] represents the decision
criterion, which Fried et al. denote as pt in equation [12]:
U - U

AI CI
pt = [12]

 

Fried et al (1975) provide two hypothetical examples of how this

21

formula might model a juror’s reasonable doubt criterion. First, they
consider a juror who believes that the penalties for a crime like
possession of marijuana are overly severe (cf. Kerr, 1978b). Such a

juror might have the following utility matrix:

State of the World

 

 

Guilty Innocent
Convict 10 -5000
Decision
Acquit 0 100

Applying equation [12] for this juror, we find an extremely stringent

decision criterion:

 

100 - (-5000) 5100
p: = = --—- = .993
10 — 0 + 100 - <-5000) 5110

Next, Fried et al. consider a juror in a rape trial in a community in
which there has been a recent wave of rapes. Such a juror might have

the following utility matrix:

State of the World

 

 

Guilty Innocent
Convict 100 -1000
Decision .
Acquit -200 300

Applying equation [10], Fried, et al. estimate this juror’s criterion
as pt = .82, a much more lax standard.

Note that Fried, et al. (1975) do not provide an explicit link
between the judge’s defined standard of proof and the juror’s

functional criterion. One possibility is that the judge’s

22

instructions have an indirect influence upon the functional criterion
by influencing jurors’ utility estimates for the four possible trial
outcomes. This issue is addressed in more detail in a later section of

this chapter.

In: sgmaacatixs assures! at the eeecessbss- Exc9pt for the
Thomas and Hogue (1976) model, all of these approaches to quantifying
the reasonable doubt criterion place it on a zero-to-unity metric.
This is convenient because it allows the researcher to compare the
criterion to each mock juror’s subjective probability of guilt
estimate in order to create an "expected verdict.” This expected
verdict may then be compared to the mock juror’s actual verdict in
order to assess the accuracy of the operationalization of “reasonable
doubt"; i.e., the method will either "hit“ or ”miss."

Dane (1979) utilized this technique for comparing several
alternative estimates of his mock jurors’ reasonable doubt criteria
following a trial simulation. The mean criterion estimates for the
Statistical Decision Theory (SDT), self-report, and rank-order
approaches were .52, .66, and .73, respectively. Dane found that the
rank order estimates were approximately 88% accurate, the SDT
estimates were approximately 82% accurate, and the self-report
estimates were approximately 77% accurate. All the estimates were
significantly more accurate than expected by chance. It is not
surprising that the rank-order estimates achieved such a high level of
accuracy, since the rank-order approach and the hit rate procedure are
both premised upon the positive monotonic relationship between the
conviction rate and p(G). The rank-order procedure has an

advantage over the SDT procedure in that it may be more generally

23

applicable; Dane reports that for 11 of his 168 mock jurors, the SDT
estimate fell outside the zero-to-unity range, as a result of jurors
assigning positive utilities to CI or AG, negative utilities to CG or
Al, or both. To the extent that its assumptions are valid, the Thomas
and Hogue model has the advantage that it does not require the use of
subjective probability and expected utility estimates, which are
suspected to be very difficult for mock jurors to make (cf. Kerr, et
al., 1984).

Dane also examined the correlations between the criterion
estimates and mock jurors’ confidence—in-verdict ratings, and he found
mixed support for Thomas and Hogue’s assumption of a positive,
monotonic relationship. While direct support would be encouraging,
mixed or non-support can only be inconclusive, since we do not know
whether the Thomas and Hogue model is invalid, or whether the
alternative approach used to derive the criterion is invalid. Without
independent evidence for Thomas and Hogue’s assumed g((X - c1)
function, such bootstrapping remains problematic.

The present study solicited subjects’ verdicts, confidence-in-
verdicts, subjective probability of guilt estimates, self-reported
criterion estimates, and perceived outcome utilities and costs. This
allowed a comparison of the Blackstone, SDT, Thomas and Hogue,
Rank-Order, and Self-Report estimates of the decision criterion for
accuracy, stringency, and validation of the model presented in Figure
1. Individual estimates of the criterion, as provided by the
Blackstone, SDT and Self-Report approaches, permitted standard
parametric test of the hypotheses. On the other hand, research
described below suggests some potential methodological artifacts that

can result from reliance on self-report estimates. Therefore, Thomas

and Hogue and SDT estimates are useful as an independent check

on such problems. Finally, it seems safe to anticipate that subjects
may have some difficulty providing reliable subjective probability and
expected utility estimates. Therefore, it seemed prudent to (a)
collect several different estimates and seek some convergence, and (b)
use large cell sample sizes, in order to increase the sensitivity of

the analyses.

Experimental Research on Judicial Instructions

Kerr, Atkin, Stasser, Meek, Holt and Davis (1976) manipulated the
judge’s charge to the jury in a trial simulation. They provided
subjects with either no definition of reasonable doubt, a lax
definition ("...a reasonable doubt must be a substantial one, ... one
for which reasons can be given ... you need not be absolutely sure
that the defendant is guilty to find him guilty"), or a stringent
definition ("...if you feel that the facts of this case are compatible
with any other theory of this case besides the one in which the
defendant is guilty, then you have a reasonable doubt..."). Kerr, et
al. demonstrated that the variations in criterion definition had a
significant impact on both pre- and post-deliberation verdicts, with
"the largest proportion of guilty verdicts obtained in the lax
condition (60% and 62%, pre- and post-deliberation), followed by the
no definition condition (51% and 57%), and finally, the fewest
convictions in the stringent condition (46% and 35%). Using the self-
report approach, Kerr, et al. found mean criterion estimates of .87, .82,
and .82 for the stringent, lax, and no definition conditions. The mean
for the stringent definition was significantly greater than for the

other two definitions (p < .005). Furthermore, Thomas and Hogue

25

(1976) estimated c for each condition in the Kerr, et al. (1976)
study, and found a similar pattern of decision criteria.

Nagel (1979; see also Nagel, Lamm, & Neef, 1981; Nagel & Neef,
1979) also varied the content of the judge’s definition of the
decision criterion in a simulated rape trial. He compared a no-
definition control condition with a "beyond a reasonable doubt"
definition, a ".90 probability" definition, and Blackstone’s ”10:1
Tradeoff“ definition. Nagel estimated each subject’s criterion using a
variation of Fried et al.’s SDT approach. First, subjects were asked
which of the four possible trial outcomes (i.e., AI, AG, CI, CG) they
considered to be desirable and which they considered to be
undesirable. Nagel reports that most subjects considered both AG and
CI undesirable, and felt that C1 (the Type I juridic error) is more
undesirable than AG. Subjects were then asked to place the most
undesirable outcome at -100 on a 0 to -100 scale, and then to place
the second most undesirable outcome on the scale between -100 and 0,
at a value Nagel denotes as ”X.“ By making the simplifying assumptions
that :UAI: = :UCI: and that :UAG: = :UCG:, Nagel then calculated the

2
criterion by using the following formula:

pt = --------- [13]
(100 - X)
For example, Nagel (1979) argues that Blackstone would presumably
consider AG one tenth as bad as CI, so that X = ~10, yielding .91, as
in Grofman’s formula [3].
Using this approach, Nagel (1979) reports the following mean

estimates:

26

Males Females

No Instructions .70 .50
"Beyond a Reasonable Doubt" .75 .60
.90 Probability .80 .75
10:1 Tradeoff .90 .90

Note that females were generally more lax than males. Although the sex
differences were not statistically significant, they suggest that
females appear to be more predisposed to convict the defendant in a
rape trial, as we would expect for females based upon the use of
expected cost estimates. It would be interesting to assess whether or
not the self report, rank order, and Thomas and Hogue approaches would
also reflect such differences. If so, it might suggest that the SDT
approach is simply an explicit model of what the other approaches
model implicitly.

Curiously, the .90 probability and 10:1 tradeoff instructions
define almost identical ideal criteria (.90 and .91, respectively),
and yet differ considerably in the estimates they yield. Not only
were the estimates in the 10:1 condition closer to the defined ideal,
they also did not reflect the sex differences apparent in the other
conditions. This pattern of mean criterion estimates may result from
differences in the comprehensibility of the different judicial
instructions, or differences in mock jurors’ motivation to comply with

the instructions. Each of these possibilities will be addressed below.

Comgrehensibility, Research by Charrow and Charrow (1979) and by

Sales, Elwork, and Alfini (1977) has demonstrated that most jurors

only understand a small percentage (under 50 %) of the instructions

that are read to them. Typical judicial instructions are often legally
precise but semantically vague, archaic, or redundant, and, as such, often
can be reworded so as to greatly improve their comprehensibility.
Consider, for example, instruction 3.71 from the Book of Approved Jury

Instructions:

If you should find that John Smith, who, at the time of the
accident in question, was driving the vehicle in which plaintiff
was riding, was negligent and that his negligence contributed as
a proximate cause of plaintiff’s injury, then you must determine
whether said driver was then the agent of the plaintiff and

acting within the scope of his employment.

If the driver was plaintiff’s agent and acting within the scope
of his employment, his negligence, if any, must be imputed to the
plaintiff, with the same effect as if the plaintiff himself were

contributorily negligent.

But if said driver was not then the agent of plaintiff or was not
acting within the scope of his employment, his negligence, if

any, may not be imputed to the plaintiff.

Compare those instructions to the modified version constructed by

Charrow and Charrow (1979):

As you recall, John Smith was driving the truck at the time of
the accident, and the plaintiff was a passenger in that truck.
Ordinarily, in deciding whether the plaintiff was contributorily

negligent, you would only look at the plaintiff’s conduct.

28

However, there is one situation where John Smith’s conduct
affects the plaintiff’s ability to recover money. That situation
is where, at the time of the accident, John Smith was the

plaintiff’s agent, and was performing duties he was hired to do.

If you find that at the time of the accident, John Smith was the
plaintiff’s agent, and was performing duties that he was hired to
do, then any negligence on John Smith’s part would transfer to
the plaintiff. It would be as though the plaintiff himself were

negligent.

On the other hand, if you find that John Smith was not the
plaintiff’s agent, or that he was not performing duties that he
was hired by the plaintiff to do, then any negligence on John

Smith’s part would not transfer to the plaintiff (p. 1351).

It is conceivable that Nagel’s (1979) instructions varied in the
degree to which subjects were able to comprehend them, with the 10:1
tradeoff definition being the simplest to comprehend. Alternatively,
comprehensibility might have interacted with the specific techniques
Nagel employed. For example, given the 10:1 tradeoff definition, many
subjects might have considered CI the most undesirable outcome, and
then, as in Nagel’s example, placed AG at -10, thereby maintaining a
10:1 ratio. In fact, quantitative definitions (".90 probability," 10:1
tradeoff, etc.) may lead to much greater discrepancies between the
different quantification approaches reviewed above than the
traditional, qualitative definitions do. For example, suppose Nagel
(1979) had employed the self-report approach. Subjects who might find

it very difficult to translate the ".91 probability" definition into

29

expected utilities, thereby creating a great deal of variance in SDT
estimates, might find it relatively easy to simply mark ".90" on a
zero-to-one self-report probability scale, leading to fewer individual
differences based on perceived costs like the sex differences Nagel
found for the rape case. A close match between quantified
definitions and estimates may result from an artifact of the
measurement process that has little to do with how real jurors form
verdicts.

On the other hand, recent evidence (Anonymous, 1984) suggests
that in some cases, quantified instructions may be more likely than
qualitative instructions to have their intended effect on verdicts,
suggesting legitimate effects upon decision criteria. This issue is
worthy of more systematic attention, since its policy implications are
enormous. For example, in ﬂggullgggh g; §_gtg, the Nevada Supreme
Court recently ruled that the use of quantifed criterion instructions
by a district court judge constituted prejudicial error (Igigl,

September, 1983, p. 10). This decision may be counterproductive if,

indeed, quantified instructions function better.

!911xat120 12 59921! 5110 19912121 192122511992- There is some
evidence that (a) jurors may not always obey judicial instructions,
and that (b) some judicial instructions will induce more compliance
than others. Research by Doob and Kirshenbaum (1972), Hans and Doob
(1976), Sue, Smith and Caldwell (1973), and Sue, Smith and Gilbert
(1974) has demonstrated that evidence ruled as inadmissable by the
judge (e.g., prior criminal record) can influence mock jurors’
verdicts, despite the admonishments of the judge to the contrary

(although see Cornish & Sealy, 1973, for evidence of compliance). Wolf

30

and Montgomery (1977) have demonstrated that a strong admonishment by
the judge (“...you have no choice but to disregard [the inadmissable
evidencel") can actually induce reactance (cf. Brehm, 1966) in
subjects, leading to increased disobedience. Further evidence for this
possibility is reported in Broeder, 1959.

It is possible that instructions such as Nagel’s "10:1 tradeoff”
definition motivate greater compliance in subjects than other
instructions. The ".90 probability“ definition may make the fact that
a guilty defendant can be acquitted based on insufficient evidence
especially salient for subjects, leading to resentment and reduced
compliance, especially by females in a rape trial. On the other hand,
the "10:1 tradeoff” definition may remind subjects why our courts are
willing to risk such an acquittal: we wish to avoid the even greater

tragedy of convicting an innocent person.

[Rationalization." The model of juror decision-making presented
in Figure 1 hypothesizes that jurors (a) form an estimate of the
probability that the defendant is guilty, and (b) form a decision
criterion, ideally based upon the judge’s "reasonable doubt"
instructions, but to the extent that these are incomprehensible or
jurors are not motivated to comply with them, also based upon the
jurors’ own perceived costs of juridic errors. At the culmination of a
trial, (a) and (b) are combined to form a verdict.

Nagel (1979) reports some evidence suggesting that this normative
model may not always correctly describe the decision process. Nagel
classified subjects as either conviction- or defendant-prone based
upon their subjective expected utility estimates. Some subjects

received a ".75 probability" definition of reasonable doubt, and

31

others received a ".90 probability" definition. Nagel reports that
while conviction-prone subjects tended to estimate p(G) as greater
than .90 in the .90 condition, some estimated p(G) as greater than .75
but less than .90 in the .75 condition. Conversely, while defendant-
prone subjects tended to estimate p(G) as less than .75 in the .75
condition, some estimated p(G) as less than .90 but greater than .75
in the .90 condition. Thus, Nagel suggests an alternative process, in
which jurors (a) form a tentative verdict, (b) receive a reasonable
doubt criterion from the bench, and then (c) adjust their estimate of
p(G) so that it is consistent with (a) and (b).

Nagel informed other subjects that a hypothetical defendant had
either a .60 or a .80 probability of guilt. He then solicited self-
report estimates of the law’s reasonable doubt standard. Conviction-
prone subjects in the .80 condition tended to provide estimates of the
criterion that were greater than .60 but less than .80, while some
conviction-prone subjects in the .60 condition provided estimates less
than .60. There was also some evidence of a converse pattern for
defendant-prone subjects. Thus, Nagel also suggests a third possible
process, in which jurors may (a) form a tentative verdict, (b)
estimate p(G), and then (c) report a criterion that is consistent with
(a) and (b).

Nagel (1979) cautions that "the findings concerning the
rationalization phenomenon were not as clearcut as [they are described
above]. That description represents a simplification designed to
clarify the general tendencies (p. 194)." Note that while Nagel
treats these findings as evidence of actual discrepancies between the
criterion-setting model and the manner in which actual jurors form

verdicts, it is possible that his alternative processes are

32

methodological artifacts. Nagel uses the decision theory estimate of
the criterion, formula [13], to classify his mock jurors as
conviction- or acquittal-prone, and a second estimate, the self-report
approach, to represent their functional decision criteria. Yet, both
are presumably estimates of the same construct, and Nagel doesn’t
explain why they should be used differently. As Dane (1979) reports,
the decision theory method provides more accurate estimates than the
self-report method. One plausible explanation for Nagel’s apparent
findings is that the use of the zero-to-one scale on both the
instructions Nagel provided and the scales he employed created an

artificial decision-process that would not have taken place otherwise.

29221129 222 122221 21 222 1229212 122222221222- Nona of the
aforementioned attempts to model the decision criterion has
explicitly dealt with the role of the judge’s criterion definition.
Conceivably, the judge’s definition could influence each juror’s
criterion level in one of three ways:

A. The judge’s instructions might influence the juror’s

utility estimates, thereby influencing the juror’s

criterion level;

B. The juror might set a personal criterion level,
based on utility estimates (i.e., the SDT model),
and then adjust the criterion if it is clearly
discrepant from the judge’s definition as the

juror understands it; for example:

pi = (p:"+ Apt), where Api=c<pr - pt") [14]

33

in which pt’ represents the juror’s perception of

the judge’s ideal criterion, pt” represents the

juror’s personal criterion, and 0 pt represents the
perceived discrepancy between pt’ and pt“ multiplied by a

scaling constant between zero and one; or

The juror’s functional decision criterion may be a
weighted average of the juror’s perception of the
judge’s criterion definition and the juror’s personal

utility-based criterion level; for example:

p: = pr (0) + pt" (1— 0) £153

in which pt’ represents the judge’s ideal criterion,

pt“ represents the juror’s criterion based upon SDT

formula [12], and ¢>represents a weighting parameter on a

zero- to-unity metric. This parameter might be a multiplicative
function of the juror’s ability to comprehend the judge’s

instructions, and his/her motivation to comply.

Model A would appear to be the model implicitly subcribed to by

the judicial system. In this model, the judge’s charge to the jury
ideally educates them and eliminates their personal biases. This
model predicts that judicial instructions will affect both jurors’
utility estimates and their criterion estimates. Ideally, to the
extent that the judge is able to convey the court’s standards for the
relative costs of Type I and II errors, extralegal victim and
defendant characteristics should have no effect on the criterion or

the verdicts.

34

Model 8 predicts that judicial instructions have a homeostatic
effect -- they define a reference level that each juror will
presumably attempt to match. Thus, if a juror’s utility estimates
yield an extremely lax criterion and then receives a more stringent
definition from the bench, then the juror may raise the criterion
enough to bring it in line with the judge’s level. However, the
scaling constant, c, indicates that this adjustment might not be
complete. This model predicts that the judicial definition will have
no effect on jurors’ utility estimates and some effect on jurors’
criterion estimates, while victim and defendant characteristics will
have an effect upon utility estimates and their effect upon criterion
estimates will be independent of the judicial definition.

Model C suggests that the clarity of the judge’s definition of
reasonable doubt (for a given juror), and the juror’s motivation to
comply with the judge’s definition, both serve as important moderators
of the effects that judicial instructions will have on each juror’s
functional criterion level. Thus, if both clarity and compliance are
high, the juror will set approximately the same decision criterion
that the judge would set. However, it appears unlikely that this close
match will happen consistently. First of all, to the extent that
judicial definitions of reasonable doubt are vague (as described
above), jurors will have a great deal of discretion to set a criterion
level as they see fit. Not only should clarity effect the degree to
which jurors rely on the judicial definition, it should also effect
their estimate of pt’ when they do attempt to rely on that definition.
Second, many jurors may choose not to comply with the court’s
admonishments if they perceive that doing so will prevent them from

maximizing their own personal utilities. Thus, for jurors with low

35

levels of pt and p(G), Model C would predict that the judicial
definition will have little effect on either utility or criterion
estimates, while victim and defendant characteristics will effect

both.

From Juror Verdicts to Jury Verdicts

By and large, the vast majority of jury simulation studies have
examined only individual verdicts. Bray and Kerr (1982) surveyed 72
such studies and found that only 52% obtained data from groups and
only 29% used the group as the unit of analysis. The relative lack of
data from juries is understandable given the exorbitant costs of
obtaining sufficiently powerful samples of groups. However, there are
a number of reasons why individual verdicts are extremely informative
by themselves. First of all, they are the single best predictor of
group verdicts (cf. Stasser, Kerr, & Bray, 1982). According to
Grofman (1977, p. 192), "it appears certain that the size of the
predeliberation majority largely determines the verdict outcome.“ Or,
as Kalven and Zeisel put it in their landmark book, [he emggiggg Jugy
(1966, p. 489): "The deliberation process might well be likened to
what the developer does for an exposed film: it brings out the
picture, but the outcome is pre—determined.”

Nevertheless, there are a number of reasons why deliberating
groups were of special interest in the present study. First of all,
the reasonable doubt criterion may have important implications for the
establishment of consensus (e.g., Kerr et al, 1976; Stasser, Kerr, &
Bray, 1982, p. 251). Second, group verdicts will permit an

examination of two hypotheses described in the sections that follow.

36

52212212 Ex12222122x 821221221122 2222222212- The use
of deliberating groups in the present study also allows for a
conceptual replication of research by Kaplan and Miller (1978)
which suggests that the process of deliberation increases the weight
(in information-integration theory terminology) jurors place upon
evidentiary information, thereby attentuating the effects of non-
evidentiary information (e.g., attorney obnoxiousness).

Note that this research may appear to be in direct
contradiction to the sizable literature on the group polarization
effect (e.g., Myers 2 Lamm, 1979; Stasser, Kerr, & Davis, 1981), which
suggests that group deliberation typically polarizes individual
predispositions, as reflected in the predeliberation distribution of
opinions. For example, Bray and Noble (1978) composed six-person mock
juries of either high or low authoritarian subjects and had them
deliberate a murder trial. Prior to deliberation, low authoritarians
recommended significantly lower sentences (M = 38.07 years) than did
high authoritarians (M = 56.36). Deliberation had the effect of
polarizing this difference (M = 28.58, 67.70, respectively). Myers
and Kaplan (1976) had mock juries reach judgments for four high-guilt
and four low-guilt traffic felony cases; each jury discussed two of
each and decided the other four privately. Myers and Kaplan report
polarization effects for both judgments of guilt and for recommended
sentences, but only for cases that were discussed in group
deliberation.

However, Kaplan (Kaplan, 1977; Kaplan & C. Miller, 1977; Kaplan &
L. Miller, 1978) and others (e.g., Anderson, 1981, pp. 386-388) have
interpreted the polarization phenomenon in terms of information-

integration theory (described above). Each juror’s extralegal bias

37

is conceived of as a piece of information, with a scale value and
weight, which is integrated with evidentiary information in formaing a
judgment. Kaplan argues, however, that the latter information will
predominate during deliberation. Each juror’s post-deliberation
judgment, then, will be a weighted average of the non-evidentiary bias
with all the information valued and weighed during the deliberation
process. Consider a juror with a relatively neutral pre-existing
bias, with a scale value of 1, and with an evidentiary fact having a
scale value of 6. This juror will have a pre-deliberation judgment
falling between 1 and 6, depending on the relative weights applied to
the two components. Now assume that this juror’s judgment is
representative of the jury as a whole, although the evidentiary
information that other jurors bring to discussion may not be
redundant. If she is exposed to new arguments having the same scale
value of 6, her post-deliberation weighted average will approach 6.
Thus, adding information of the same scale value can have the
seemingly paradoxical effect of polarizing judgment, a phenomenon
which Anderson (1981) refers to as a Sgtggigg Efﬁggt. The juror’s
judgment in such a situation will only remain unchanged if she did not
weigh her pre-deliberation bias at all.

Kaplan (1977) provides evidence for this line of reasoning using
a bogus note-passing procedure that allowed him to control the content
of deliberation. Trial transcripts were constructed to have either an
exonerating or an incriminating appearance, and bogus notes were
constructed to have either the same or the opposite proportion of pro-
conviction to pro-acquittal arguments as the notes each actual juror

provided. As predicted, when subjects received notes with the same

38

value that they themselves provided, their judgments polarized in the
direction of their initial predisposition.

Kaplan and Miller (1978, Exp. 3) report a study in which
extralegal biases were induced by manipulating the degree of
obnoxiousness of various trial participants (the prosecutor, defense
attorney, judge, or experimenter) as well as the appearance of guilt
in order to test the hypothesis that only evidentiary information
polarizes. Pre-deliberation judgments supported both the extralegal
bias and trial appearance manipulations. However, post-deliberation
judgments revealed significant polarization shifts for the appearance
of guilt but no significant differences due to extralegal biases.

From a legal and social standpoint, this pattern is encouraging.
The polarizing effects of the evidentiary factor are robust and
dramatic. However, the magnitude of these effects suggests a possible
artifactual interpretation of the lack of bias in post-deliberation
judgments. On a 0-21 point scale, the post-deliberation judgments
cluster around 15 for high- and 6.5 for low-appearance of guilt.
While these ratings are not at the actual ceiling and floor of the
subjects’ general reluctance to use scale extremes. If this is the
case, then the lack of biasing effects could be the result of a restriction
in range. This possibility is made more plausible by the fact that
subjects have been explicitly discouraged from using the extremes of
the guilt scale in previous research using this general paradigm
(Kaplan 2 Kemmerick, 1974, p. 496). The present study used a case
constructed to fall as close to the midpoint as possible. Thus, it
was possible to examine whether any biasing effects due to victim

and/or defendant attractiveness were polarized or attenuated by group

39

deliberation. If the criterion-setting model is accurate, extralegal
bias should exert its influence independently of the weight of
evidence, and the set-size effect would not apply. In fact,

any extralegal bias should be free to polarize independently of

evidentiary influences.

122 92222222x 2112222 The present study provided an
opportunity to examine the asymmetry effect often found in jury
research (cf. Stasser, Kerr, & Bray, 1982). Researchers have detected
a consistent “leniency shift," in which the rate of conviction tends
to be lower among juries than among jurors. When Social Decision
Scheme matrices, which illustrate the probability of a jury of every
given pre-deliberation split reaching a given verdict, are plotted,
many studies have found an gsymmgtgy gﬁfggt, in which jurors who are
intially at a deadlock are more likely to move toward acquittal than
guilt. Factions favoring acquittal are also more successful than
factions of the same size favoring conviction at winning converts and
ultimately prevailing. Of course, group polarization studies like
those discussed above provide occasional exceptions to this pattern;
nevertheless, it appears frequently in mock jury research.

One possible explanation for this effect is that the judicial
norms of "presumption of innocence," "burden of proof," and
"reasonable doubt” make it easier to argue for acquittal than for
conviction during deliberation (cf. Nemeth, 1977). If this is the
case, this shift should be eliminated when jurors receive a "mere
preponderance of evidence" instruction from the bench. The present

study also tested this prediction.

4O

CHAPTER 2
METHOD

Subjects and Design

Four hundred and fifty-two volunteers, 139 males and 313 females,
were recruited from Michigan State University Introductory Psychology
courses. In compliance with Departmental and University standards and
procedures, subjects provided informed consent and received extra
course credit for their participation. Although every effort was made
to recruit equal numbers of males and females in each condition, past
experience has shown that males are considerably more difficult to
recruit, and the present study was no exception.

Early in the duration of the experiment, an unfortunate but
serious typographical error was discovered on the first page of the
individual pre-deliberation questionnaire. Subjects were accidentally
informed that the migimgm, rather than maximum, sentence for auto
theft, the crime in question, was 20 years imprisonment. This
statement is clearly erroneous, if not outlandish, and probably
elicited a variety of reactions from subjects. Because its extreme
implications are irrelevant to the purposes of the present study, it
was not incorporated into the design as an additional factor.

Instead, the typographical error was corrected, and those subjects who
encountered it were omitted from the analyses presented here. Data

from the 321 subjects, 93 males and 228 females, who received the

41

corrected questionnaire are presented.

A 2 (Victim Attractiveness) X 2 (Defendant Attractiveness) X 2
(Judicial Reasonable Doubt Definition: Mere Preponderance of Evidence
vs. Reasonable Doubt) factorial design was employed. In order to
ensure that the attractiveness factors were not in any way confounded
with the specific stimuli employed, two additional control factors,
Specific Victim Photo and Specific Defendant Photo, were also included
in the design. These factors are nested within the Victim
Attractiveness and Defendant Attractiveness factors, respectively.

Cell sizes for the main design are displayed in Table 1:

Table 1: Cell Sizes for the Experimental Design

Instructions
Reasonable Preponderance
Doubt of Evidence
Victim Male Female Male Female Defendant
10 25 13 32 Attractive
Attractive
8 27 13 29 Unattractive
12 30 11 31 Attractive
Unattractive
13 23 13 31 Unattractive

The hypotheses of the present study also required data at the
group level of analysis. For this reason, an attempt was made to
schedule subjects in groups of four. Ultimately, 236 subjects
participated in 59 4-person juries, 33 subjects participated in 11 3-
person juries, 34 subjects participated in 17 dyads, and 18 subjects
were not able to participate in groups and were not included in

analyses at the group level. Thus, at the group level of analysis, the

design included a Size factor. Subjects were nested within groups, and

groups were nested within the experimental conditions.

Stimulus Materials and Pilot Studies

BEECQESiXQQEEE 2221221211222- Eight bliCk and white photographs
were required -- two attractive males, two unattractive males, two
attractive females, and two unattractive females. A pool of 16 male
and 10 female black-and-white photographs with good “face validity“
were selected from the collections of several departmental
researchers. These photographs were originally obtained from a number
of sources, including high school and college yearbooks; none of them
had been used in research during the year prior to the present study.
In an initial pilot study conducted during the term prior to the main
study, 12 males and 14 females from the University Psychology
Department were recruited to select suitably attractive and
unattractive photographs. Participants read the following

instructions:

THANK YOU FOR YOUR PARTICIPATION AND INTEREST IN THIS RESEARCH.
We are planning a large, comprehensive program of research on
criminals and criminality. We are especially interested in what
types of people commit felonies, and in discovering what types
of factors influence (a) the probability that they will be
convicted of a crime that they have committed, and (b) the
probability that they will successfully adjust to the community

after prison.

Today’s study is a preliminary look at the question: Can people

recognize "criminality" in facial photos? We would like you to

43

take a few moments to examine the booklet of 26 facial photos,
and to evaluate these faces. All 26 photos were taken from high
school and college yearbooks. Some of these photos may depict
people who were later convicted of felonies and served time in
federal prisons. Others are ordinary people who have not
committed felonies. Of course, you will not know which are which.
Please fill out the questionnaire for each photo. Do not write on
the photo sheets. Since this is a pilot study (i.e., a
preliminary one), we would find it very helpful if you added
comments and additional impressions in the margins of the

questionnaire. Let us know what you think of each photo.

The photo booklets consisted of 26 photos arranged on three
consecutive 9-1/2” x 14" sheets of paper, with an arbitrary three-
digit I.D. number under each photo. The questionnaire consisted of

26 sets of the following scales:

Photo 4

Extremely Extremely
attractive : : : : : : : : unattractive

Extremely Extremely
unintelligent : : : : : ° - : intelligent

Extremely Extremely
trustworthy : : : : : : : : untrustworthy

How likely is it, 19 ygg; ogigigg, that this person has been, or will

be, convicted of a felony?

Extremely Extremely
unlikely : : : : : : ' : : likely

44

This procedure was used to select eight suitable photographs that were
perceived as attractive or unattractive but which were relatively
Vneutral on the remaining three dimensions. Scale ratings for the eight
photographs selected are presented in Table 2.3
Table 2
Pilot Study Scale Ratings

for Victim and Defendant Photographs

 

 

 

 

 

 

 

Sex Attractiveness Intelligence Trustworthiness Possible Felon?
—M- 1.69 (.79) 3.69 (1.49) 3.46 (1.56) 3.73 (1.69)
M 2.19 (1.39) 3.65 (1.67) 3.62 (1.55) 4.35 (1.50)
F 1.46 (1.24) 5.12 (1.63) 5.15 (1.63) 1.96 t (1.43)
F 2.28 (1.34) 3.72 (1.75) 3.80 (1.61) 3.44 I (1.73)
M 5.20 (1.08) 4.84 (1.43) 4.92 (1.61) 3.08 (1.91)
M 5.73 (.96) 4.96 (1.34) 4.92 (1.52) 2.92 (1.70)
F 5.92 (.69) 4.62 (1.24) 5.11 (1.21) 2.15 t (.83)
F 6.04 (1.10) 4.84 (1.65) 5.72 (1.06) 1.92 t (1.88)

NOTE: Attractiveness and Trustworthiness scales have been recoded so
that 1 is the low anchor and 7 is the high anchor for each rating
scale. Standard deviations appear in parentheses. Means denoted by
asterisk are not relevant for the present study, as the female
portrayed a felony victim, not a criminal defendant.

_______ The trial simulation was a modification of a
trial transcript used previously in our lab (e.g., Kerr et al, 1982).
Although the transcript is very realistic (including, for example,
opening and closing arguments, direct- and cross-examination of
witnesses, and judge’s instructions to jurors), the case is, in fact,
a fictional one.

This permitted evidence to be manipulated in series

45

of minor pilot studies (approximately ten subjects each) used to
establish a close case which would avoid both floor and ceiling
effects on verdicts. This attempt was very successful, resulting in a
trial scenario with a 52.6% pre-deliberation conviction rate in the
main study.

In general, the case involves an auto theft charge that was
allegedly tried in Chicago, Illinois. Briefly, the facts of the case
are as follows: The victim’s car was stolen while she was shopping.
The car was recovered during a police raid on a garage in which a
number of stolen cars were being repainted. The defendant’s
fingerprints were found in the car, a number of checks made out to the
defendant were found in the garage, and subsequent investigation
revealed that the defendant was in a cafe near the place where the the
car was stolen at about the time of the theft. The defendant claimed
that he had been an employee of the garage, that he had left his
fingerprints when he repainted the car, that the checks had been
paychecks, and that he had not left the cafe until well after the
theft had occured.

In addition to the defendant and victim photographs, the trial
transcript also included photographs of the witnesses, attorneys, and
the presiding judge. These photographs were retained from the Kerr et
al. (1984) study. The defendant and victim photographs varied by
condition, and the additional photographs were constant across
conditions.

An audio simulation of the trial was also constructed. Graduate
students in the Department of Psychology performed the roles of the
judge, victim, defendant, attorneys and witnesses. The audio tape was

prepared primarily to keep the rate of presentation of trial materials

46

constant for all subjects. Participants were informed that the tape
was not an actual recording of the trial, but that it was hoped that
they would find that it made the trial simulation more involving and
life-like.

In order to manipulate the judge’s criterion instructions,
subjects read and heard either “mere preponderance of evidence"
instructions, as typically used for civil trials, or “reasonable
doubt" instructions. These instructions were obtained from sourcebooks
of patterned jury instructions (Reid, 1960a, 1960b). Both sets of
instructions came from cases tried in the State of Michigan and were
selected because they appeared to be approximately equivalent in
length and comprehensibility and were judged to be fairly
representative. All subjects received the following instructions (with
an adaptation for the "preponderance of evidence“ condition in
parentheses):

Now in this phase of the proceedings the Court explains to you
what the law is that applies to this case. The Information
charges one offense, the charge is Auto Theft. The statute upon
which this charge is based is Article III.45.2 of the Revised

Illinois Penal Code and it reads as follows:

”Any person who shall knowingly take possession and operate a
motor vehicle without the knowledge and permission of the person
holding title to that vehicle shall be guilty of a felony."

Auto theft, as charged in the Information has been defined as the
possession of a motor vehicle without the permission of the
lawful owner of that vehicle. In this case, it is clear that the

owner of the vehicle in question did not grant any permission to

47

the person or persons who removed it from the location indicated

in the Information...

In dealing with criminal matters there are several particular
rules which apply and which do not apply to civil matters. One
of these is the doctrine of presumption of innocence. A
defendant is presumed to be innocent until his guilt is
established beyond a reasonable doubt (with a preponderance of
the evidence). In accordance with that rule of law, no inference
of guilt may be drawn from the fact that a person has been

arrested and has been placed on trial.

Juries in the "reasonable doubt" condition then received the following

instructions:

No man can be convicted of a crime in this jurisdiction until his
guilt is established beyond a reasonable doubt. A reasonable
doubt is what the words imply, a doubt founded in reason, a
doubt for which you can give a reason, a doubt growing out of the
testimony in the case or the lack of testimony, a doubt which
would cause you to hesitate in the ordinary affairs of life. It
is not a flimsy , fanciful, fictitious doubt which you could
raise about anything and everything. It means a reasonable
doubt. If, when all is said and done: you have such a doubt

about the accused, it is your duty to acquit him. (People v.

Davis, 171 Mich 241, 137 NW 61.)

Instructions for the "preponderance of evidence" condition were

adapted for use in a criminal trial. These instructions read as follows

48

(with the original wording in parentheses):

The burden of proof in this case is upon the prosecution (the
plaintiffs) to show by a preponderance of the evidence the
material facts which the State has (they have) alleged in its
(their) declaration. By a preponderance of the evidence we mean
simply the greater weight of evidence; in other words, the
prosecution (the plaintiffs) in this case must produce evidence
which in your minds carries greater weight than that which has

been produced against it. (Blaty v. Gray, 217 Mich 531, 187 NW

360.)
Finally, all juries were told:

Very well. Members of the Jury, the time has come to submit this
case to you for your deliberations. As I told you, you have
nothing to do in this case but to determine the guilt or
innocence of the defendant. I can tell you what the law is but

you are absolute in the realm of fact.

Procedure

A maximum of 16 subjects participated during any given session.
Subjects scheduled themselves by signing up for a given session, and
the experimenters, four male and four female undergraduates, called
them the evening before a session to confirm the appointment.

The laboratory featured a large rectangular central room with
three smaller rooms on either side. The four smaller rooms in the
corners were used as jury deliberation rooms, whereas the two remaining
middle rooms were left vacant. There were four chairs and a

rectangular table in each deliberation room; the table was flush

49

against the wall with a chair at each end and two chairs on the
exposed side. A microphone on a stand was placed on each table.

As subjects arrived at the laboratory, they were asked by the
experimenter to have a seat in one of the four deliberation rooms.
The experimenter alternated the room assignments between the front two
rooms until there were four subjects in each, and then alternated the
room assignments for the back two rooms for the remaining subjects.
This procedure was followed in attempt to randomize the composition of
the juries as much as possible. An attempt was made to create as many
4-person juries as posible given the attendance at a given session.
However, if necessary, 2- or 3-person juries were formed, or subjects
were seated alone in a room. The sex composition of the groups was
allowed to vary randomly. While subjects were seated, they were asked
to read the standard departmental consent form and a brief description
of the experiment (Appendix A), and to sign the form if they wished to
participate. Subjects were allowed to talk to one another while
waiting for the session to begin. The door to each deliberation room
remained open during the early portion of each session.

When all the subjects were seated, the experimenter distributed
the trial transcripts. Deliberation rooms were randomly assigned to a
victim attractiveness/defendant attractiveness photo combination prior
to each session, and sessions were randomly designated for either the
”reasonable doubt" or the "preponderance of evidence" condition, so
that all the subjects in a given deliberation room received identical
booklets. Tucked in the back of each transcript was a large manilla
envelope containing pre- and post-deliberation questionnaires. One of
each set of four transcripts was marked with a large red "F" and also

contained a group questionnaire; the recipient of this folder was

50

randomly determined and became the jury’s foreperson. If there were
only two or three people in a jury, the experimenter made sure that
one of the jurors received the foreperson’s folder.

The experimenter informed subjects that all their instructions
would be provided by an audio tape-recording, and then he or she
turned on a tape recorder in the central room; the recording played
through two speakers placed on either end of the central room, so that
it could be clearly heard in each room. The tape recording played

subjects the following instructions:

Welcome to "The Jury Study.” Thank you all for coming today.
Today, each of you will take on the role of a juror. You will
read the written transcript of a criminal trial and be asked to
reach a verdict and make related judgments for the case. This
case you will consider is called “The People v. William Lambeth."
William Lambeth was charged with auto theft and tried in Chicago
in 1974. We chose this actual case so that this study would be
as realistic as possible. Therefore, we’ve altered the original
transcript only slightly. Testimony has been summarized in a few
places where it could be done without altering its meaning.

Also, a few portions of the original transcript have been deleted
altogether; however, these were always clearly unimportant and
did not bear on the guilt or innocence of the defendant. For
example, some of the judge’s charge to the jury has been deleted.
In every case the deleted material was redundant with the

portions which were retained.

We were also able to obtain photographs from the court, from the

51

police, and from the files of a major Chicago newspaper. These
photographs are included to make the transcript as realistic as
possible, and to give you as much of the information available to
the real jurors as we could. Although we cannot provide you with
a tape recording of the actual trial, you will hear a taped
reenactment of the trial, performed by graduate students at
Michigan State University. We hope this tape will make the
transcript more involving and life-like. As you listen to the
tape recording, please read along in the transcript. Although you
are not actually a juror today, please try to put yourself in the

role of one of the actual jurors.

The trial lasted approximately 38 minutes. At the conclusion of

the trial, subjects were given additional instructions:

Now that you have read and heard the trial, please open the
manilla envelope in the back of your folder and fill out the
questionnaire. Please do not talk to anyone else while you are
filling out the questionnaire. You may find that some of the
questions seem similar to each other or difficult to answer.
Please give careful consideration to each question, and do the

best you can.

When everyone on your jury has completed the questionnaire,
please tuck them back in the folders, and close the door to your
room. Then you may deliberate the case as a jury, and attempt to
reach a unanimous group verdict. One of you will find a large
“F” on your manilla envelope. You will be the foreperson, and we

would like you to fill out the group jury questionnaire for your

jury. The experimenter will notify you when there are only 5
minutes left for deliberation. If you are not able to reach a
unanimous decision at the end of the deliberation period, the
foreperson should indicate that the jury has ”hung“ on the jury
questionnaire.

At the end of the deliberation period, your experimenter will

open all the doors and sign your experimental credit cards.

When all the members of a given jury completed the pre-
deliberation questionnaire and closed the door to their room, the
experimenter turned on a reel-to-reel tape recorder which recorded
that jury’s deliberation. Juries were given approximately 30 minutes
to deliberate, although this period varied widely depending on how
long it took the jury to first complete the pre-deliberation
questionnaires. The average deliberation length was 11 minutes and 47
seconds. At the completion of the session, the experimenter would
sign subjects’ experimental credit cards and give them a debriefing
sheet providing some general background on jury research, telling them
how they could contact the principal investigator should they desire
more information about the purpose and/or results of the study, and

requesting their confidentiality until all the experimental sessions

were completed.

Dependent Measures

The Pre- and Post-Deliberation and Group questionnaires appear in
Appendix A. The Pre-Deliberation juror questionnaire consists of 18
items that assessed each juror’s pre-deliberation verdict, recommended
sentence, evaluations of the defendant, the victim, and the judge’s

instructions, and a series of items intended for use in modeling

53

jurors’ individual decision processes, including subjective expected
utilities and subjective probability estimates.

After providing a tentative verdict, jurors rated their
confidence in that verdict on a 11-point Likert-type scale. These two
measures are required in order to perform the Thomas and Hogue (1976)
modeling analyses. In addition, they can be combined into a 22-point
Guilt scale, with 1 representing complete confidence in a Not Guilty
verdict, and 22 representing complete confidence in a Guilty verdict.
This measure has the advantage of being more sensitive than
dichotomous verdicts, and can be a valid predictor of movement during
group deliberation (cf. Stasser, Kerr, & Davis, 1980).

Subjects were asked to imagine that the defendant was found
guilty and they were asked what sentence they would recommend if they
were the judge presiding over the case. This type of dependent
measure has been criticized (e.g., Konecni & Ebbesen, 1982, pp. 28-29)
because real jurors don’t make such a judgment in many states. It
would clearly be foolish to argue for policy changes on the basis of
such data, but the question of how punitive subjects are in response
to varying trial conditions is a psychologically valid and meaningful
one. In addition, this measure facilitates comparison with past
studies.

Subjects rated the victim and the defendant on four 7-point
Likert-type items assessing believability, likeability, attractiveness
(the manipulation check), and intelligence. Seven-point Likert-type
scales were also used to assess how important the judge’s instructions
were for their decision, how comprehensible the judge’s instructions

were, and how much they sympathized with the victim and with the

54

defendant.

Subjects’ subjective probability of guilt [p(G)] estimates were
assessed using three different measurement formats. First, they rated
p(G) by placing a check mark along a 132 millimeter scale ranging from
0, complete certainty of innocence, to 1.0, complete certainty of
guilt. Next, they rated p(G) on a 10-point checklist, ranging from ”0
chances in 10” to “10 chances in 10" that the defendant did indeed
steal the car. Finally, they rated p(G) using an unbounded odds
scale. Three different probability formats were used in order to (a)
obtain a more reliable estimate of p(G) than a single item can
provide, and (b) attempt to establish which format is most accurate
and easy for subjects to conceptualize and use.

These three probability formats were also employed to obtain
self-report estimates of pt, the minimum probability of guilt the
juror requires to render 2 Guilty verdict. Thus, the accuracy of each
format can be assessed by examing whether or not a Guilty verdict is
given when p(G) > pt, or a Not Guilty verdict is given when p(G) <=
pt.

The Statistical Decision Theory estimate of pt requires
subjective estimates of the expected utility of convicting when
guilty, convicting when innocent, acquitting when guilty, and
acquitting when innocent. Since such estimates are abstract and
difficult to quantify, the task was broken down into three steps.
First, subjects were asked to imagine each of the four outcomes, one
at a time. Next, subjects were asked to consider each outcome and
indicate whether they regarded it as a positive outcome or a negative
outcome. Finally, subjects were asked to quantify how positive or how

negative each outcome would be, using any number ranging from negative

55

to positive infinity.

A final estimate of p! was adapted from Blackstone’s comment
regarding the relative efficacy of acquitting guilty defendants rather
than convicting innocent ones. Subjects were asked to complete the

following sentence:

"It is better to let _ guilty defendant(s) go free than to

convict one innocent defendant."

Following deliberation, subjects provided a personal verdict,
confidence-in-verdict, their satisfaction with the group verdict, and
re-assessed p(G) using the three probability formats. This allowed an
assessment of how accurate the p! estimates were at predicting their
post-deliberation verdicts. The foreperson was asked to indicate the

group verdict on a separate questionnaire.

56

CHAPTER 3

RESULTS

Manipulation Checks for the Attractiveness Factors

Two items on the pre-deliberation questionnaire assessed
subjects’ evaluations of the attractiveness of the victim and the
defendant. These items were each subjected to a 2 (Subject Sex) x 2
(Instructions) x 2 (Victim Attractiveness) x 2 (Defendant
Attractiveness) analysis of variance (ANOVA), presented in Tables 8-1
and 8-2. As expected, reliable differences between the high and low

attractiveness photographs were obtained. The victim was seen as more

attractive in the High Attractiveness condition (M 5.29 on a 7-point

scale) than in the Low Attractiveness condition (M 3.11); F(1,301) =
219.63, p < .001. In addition, there was a significant Sex x
Instruction x Victim Attractiveness interaction; F(30l) = 8.24,
p < .01. Post-hoc contrasts using the Tukey procedure indicated that the
main effect for Victim Attractiveness was reliable (p < .05) for both
sexes in both instructional conditions. There were no significant sex
or instructional differences for the Tukey contrasts at the alpha
= .05 level.

Similarly, analysis of the defendant attractiveness ratings

revealed a significant main effect for Defendant Attractiveness,

F(301)=303.74, p < .001. As expected, the defendant was perceived as

57

more attractive in the High Attractiveness condition (M = 4.47) than
in the Low Attractiveness condition (M = 2.40). In addition, there
was a significant main effect for Subject Sex, F(301) = 4.71, p (.05,
which was qualified by a Sex x Instructions x Defendant Attractiveness
interaction, F(1,301)=13.72, p (.001. Post-hoc contrasts revealed that
the interaction was due to a difference between ratings of the
attractive defendant by males in the Reasonable Doubt (M = 4.00) and
Preponderance of Evidence (M = 5.12) conditions (p<.01). More
importantly, the Defendant Attractiveness simple main effects were
reliable (p < .01) for both sexes in both instructional conditions.

Recall that not one but two photographs were used to represent
the victim and the defendant in each Attractiveness condition. In
order to establish that the attractiveness manipulations were not
limited to any specific photographs, the victim and defendant
attractiveness ratings were each subjected to a 2 (Subject Sex) x 2
(Instructions) x 4 (Victim Photograph) x 4 (Defendant Photograph)
ANOVA. A main effect for Victim Attractiveness on the Victim
manipulation check was significant, F(1,260) = 89.79, p < .001. T-tests
indicated that each pair of High and Low Attractiveness photos was
reliably different, t’s > 7.64, df > 142, p’s < .001. Tukey tests
revealed that the High Attractiveness victim photos (M = 5.31, 5.27)
did not significantly differ. However, the Low Attractiveness photos
(M = 3.65, 2.69) did (p < .01). Additional t-tests were computed to
establish that each victim photograph significantly differed from the
mid-point (4) of the scale. These one-tailed tests were significant
for both attractive photos, t = 9.95, df = 74, p < .001, and t =

10.16, df = 81, p < .001, and both unattractive photos, t = -2.00, df

58

= 69, p < .05, and t = -9.65, df = 92, p < .001.

A similar pattern was found for the defendant photos.
Decomposition of a main effect for Defendant Attractiveness, F(260) =
100.14, p < .001, indicated reliable differences for each combination
of High and Low Attractiveness photos, t’s > 11.33, df > 110,

p < .001. As with the victim photos, Tukey tests indicated that the
High Attractiveness photos (M = 4.52, 4.42) were not statistically
different, but the Low Attractiveness photos (M = 2.09, 2.56) were (p
< .05). Each of the defendant photos was also tested against the mid-
point of the attractiveness scale. These one-tailed tests were
significant for both attractive photos, t = 5.05, df = 58, p < .001,
and t = 4.03, df = 103, p < .001, and both unattractive photos, t =
-12.62, df = 52, p < .001, and t = -12.28, df = 102, p < .001. The
complete pattern of these analyses suggests that the manipulations of

victim and defendant attractiveness were effective.

Establishing the effectiveness of the Instructional manipulation
was more difficult. A number of different estimates of subjects’
criterion for conviction were solicited; however, the validity of
these estimates must be established before they can serve as a
reasonable check on the Instructional manipulation. This is a

particularly thorny issue and will be addressed in more detail later.

Pre-Deliberation Verdicts and Guilt-Related Judgments

Immediately after the completion of the trial simulation and
prior to deliberation, subjects were asked to provide their personal
verdicts. As intended, the case was remarkably close, with an overall
conviction rate of 52.6%. Log-linear modeling techniques (e.g.,

Fienberg, 1970; Knocke & Burke, 1980) were used to assess the impact

59

of the independent variables upon these dichotomous dependent
measures, and are presented in Table B-3.4 The Verdict x

Instructions effect was significant, 622: 7.10, df = 1, p < .01,
shown in Table 3. As predicted, jurors were more likely to convict
the defendant in the Preponderance of Evidence condition than in the
Reasonable Doubt condition. Contrary to predictions, however, there
were no reliable effects for either Victim Attractiveness, Giz< 1.00,

2
df = 1, or Defendant Attractiveness, G < 1.00, df = 1. There were no

significant higher-order effects.

Table 3. Individual Pre-Deliberation Verdicts by Instructions

 

 

Verdict
Guilty Not Guilty Row Total

Reasonable 66 82 148
Doubt (44.6%) (55.4%)

Preponderance 103 70 173
of Evidence (59.5%) (40.5%)

Column 169 152 321
Total (52.6%) (47.4%)

NOTE: Row percentages in parentheses.

Since verdicts are dichotomous, they may be insensitive to subtle
but real influences upon jurors’ judgments. For this reason, a 22-
point "guilt" scale (cf. Kerr, et al, 1982) was created by combining
verdicts with the 11-point confidence-in-verdict scores. Thus, a
guilt score of 1 would represent complete confidence in a Not Guilty
verdict, while a score of 22 would represent complete confidence in 2
Guilty verdict. This type of pre-deliberation measure is clearly

related to the verdict and may even add some predictive validity;

60

Stasser, Kerr, and Davis (1980) report that such confidence scores are
related to verdict changes during deliberation. Therefore, the guilt
scores were analyzed in a 2 (Subject Sex) x 2 (Instructions) x 2
(Victim Attractiveness) x 2 (Defendant Attractiveness) ANOVA,
presented in Table B-4. This analysis replicated the log-linear
analysis: The only reliable effect was a main effect for Instructions,
F(1, 305) = 7.27, p < .01. As expected, subjects in the Reasonable
Doubt condition were less conviction-prone (M = 11.09) than subjects
in the Preponderance of Evidence (M = 13.48) condition. However, the
victim and defendant attractiveness manipulations did not influence
pre-deliberation verdicts or verdict-related guilt judgments.

Although the attractiveness manipulations were successful at an
aggregate level, not every subject perceived the photographs to be as
attractive or unattractive as intended. Furthermore, there were
differences in the perceived attractiveness of the photos nested
within both the victim and defendant attractiveness conditions. Thus,
it was judged reasonable to conduct a number of internal analyses to
determine whether there was indeed any extralegal impact of these
manipulations for certain photographs or for certain subjects.

First, a 2 (Subject Sex) x 2 (Instructions) x 2 (Victim
Attractiveness) x 2 (Defendant Attractiveness) ANOVA was conducted on
the guilt scores of subjects who received the most extreme victim and
defendant photographs, as described above. This analysis was not
successful. There were no reliable attractiveness effects, and the
main effect for Instructions was attenuated, F(1,47) = 3.40, p < .075,
presumably due to a loss of statistical power.

Second, a similar 2 x 2 x 2 x 2 ANOVA was conducted on the guilt

61

scores of subjects who provided an extreme rating (either a 1, 2, 6,
or 7) on at least one of the 7-point manipulation checks for
attractiveness. This ANOVA is presented in Table B-5. The main effect
for Instructions was again attenuated, F(1,177) = 2.83, p < .10.
However, there were two reliable two-way interactions. There was a
significant Sex x Victim Attractiveness interaction, F(1,177) = 4.86,
p < .03. However, post-hoc Tukey contrasts revealed no significant
simple effects.

An Instruction x Defendant Attractiveness interaction, F(1,177) =
4.24, p < .05, is displayed in Table 4. Post-hoc Tukey contrasts
indicated that the only reliable effect was the simple main effect for
Instructions when the defendant is attractive (p < .01). It appears
that for subjects for whom the attractiveness manipulations were
strong, the fact that the defendant was good-looking enabled him to
receive the benefit of reasonable doubt where he might otherwise have
been convicted. Of course, these internal analyses surrender the

advantages of random assignment and are thus only suggestive at best.

Table 4. Instruction x Defendant Attractiveness Interaction
on Guilt Scores for Subjects with Extreme Attractiveness Ratings

Defendant
Attractive Unattractive

Reasonable Doubt 9.19 12.92

(37) (52)
Preponderance 14.39 12.81
of Evidence (41) (63)

A 2 (Subject Sex) x 2 (Instruction) x 2 (Victim Attractiveness) x
2 (Defendant Attractiveness) ANOVA was performed on subjects’

recommended prison sentences given conviction, and is

presented in Table B-6. There was a marginal main effect for
Instructions, F(1,286) = 3.85, p < .052. Curiously, subjects were
somewhat more punitive in the reasonable doubt condition (M = 85.25
months) than in the preponderance of evidence condition (M = 72.89).
This may be a logical byproduct of a stricter decision criterion:
jurors who require less evidence to convict may anticipate more post-
decisional regret and subsequently recommend a more lenient
punishment. There were no other significant effects. All in all,
there is little evidence of extralegal bias in these pre-deliberation

judgments.

Evaluations of the Victim and Defendant

In addition to the attractiveness manipulation checks, subjects
also rated the believability, likeability, and intelligence of the
victim and the defendant (where 1 equals the positive anchor and 7
equals the negative anchor) and also indicated their amount of
sympathy for each (where 7 equals maximum sympathy). Each of these
measures was analyzed in a 2 (Subject Sex) x 2 (Instructions) x 2
(Victim Attractiveness) x 2 (Defendant Attractiveness) ANOVA. These
ANOVA’s are presented in Tables B-7 to B-14. All interactions were
decomposed using the Tukey procedure.

The Vigtim. A Sex x Instruction interaction, F(1,301) = 5.28, p
< .05, suggests that males found Helen Bednard more credible in the
reasonable doubt condition (M = 1.72) than in the preponderance
condition (M = 2.49), (p <-.01). Ms. Bednard was liked more when she
was physically attractive (M = 2.56) than when she was not (M = 3.12),
F(1, 301) = 16.16, p < .001. Similarly, she was perceived as more

intelligent when she was attractive (M = 2.18) than when she was not

63

(M = 2.44), F(1,301) = 4.24, p < .04. Decomposition of a Sex x
Instruction x Victim Attractiveness interaction, F(1,301) = 4.34,
p < .04, revealed no reliable comparisons at the Tukey .05 level. An
examination of the grand means for believability, likeability, and
intelligence (M = 2.20, 2.85, and 2.32, respectively) suggests that in
general, the victim was regarded favorably by most subjects. Although
an analysis of the sympathy item indicated a Sex x Instruction x
Victim Attractiveness x Defendant Attractiveness interaction, F(1,299)
= 8.74, p < .003, no post-hoc contrasts were significance.

Ihg nggggagt. Subjects were more likely to believe the

defendant’s testimony after receiving the reasonable doubt

instructions (M 4.14) than after the preponderance of evidence

instructions (M 4.55), F(302) = 5.44, p < .02, although overall (M =
4.33), subjects were apparently ambivalent about Lambeth’s
credibility. Like Helen Bednard, William Lambeth was found more
likeable (M = 3.58), F(1,302) = 4.62, p < .04, and more intelligent (M
= 5.25), F(1,302) = 9.19, p < .003, when he was physically attractive
than when he was not M = 3.87, 5.63, respectively). Thus, ratings of
both the victim and the defendant replicate the well-established
finding that for most people, ”what is beautiful is good" (cf.
Berscheid & Walster, 1974). Overall, subjects responded considerably
less favorably to the defendant. Decomposition of a Sex x Victim
Attractiveness interaction, F(1,299) = 5.12, p < .03, revealed no
reliable differences in sympathy for Lambeth, although a Sex x
Instructions x Victim Attractiveness x Defendant Attractiveness

interaction, F(1,299) = 10.40, p < .001, indicated that males in the

reasonable doubt condition sympathized with him more when the victim

64

was attractive (M = 4.20) than when she was unattractive (M = 2.50),
(p < .05). This interaction is unanticipated, and since it does not
have any obvious implications for the hypotheses of the present study,
will remain uninterpreted.

Pearson product-moment correlations between these evaluative
ratings and the predeliberation guilt scores are presented in Table 5.
Note that subjects’ guilt judgments were more strongly related to
their evaluative reactions to the defendant than to their reactions to
the victim. Ratings of Lambeth’s credibility alone account for about
45% of the variance in guilt ratings. Since Ms. Bednard was not able
to positively identify Lambeth as the culprit, his testimony plays a

much greater role in the case than her testimony does.

Table 5. Correlations between Evaluative Ratings and Guilt Scores

 

 

 

Correlations
Evaluative
Rating Victim Defendant
Believability -.16 It .67 it!
Likeability -.12 t .29 444
Intelligence -.13 it .07
Sympathy .13 t -.43 tit
t p f .05
I! p < .01
tit p < .001

Subjective Probability of Guilt and Criterion Estimates
2211:22222122 2121 222 2! 222122222- Recall that self-renorted

estimates of p(G) and pt were solicited using three different

probability formats, which shall be referred to as the millimeter

65

(MSR), 0 to 10 (TSR), and odds ratio (OSR) self-report methods. These
estimates were all converted to decimal fractions in order to
facilitate comparison on a zero-to-unity metric. Researchers have
speculated (e.g., Kerr, et al, 1984) that such estimates might be
unreliable and difficult for subjects to provide. For this reason, a
multi-trait/multi-method matrix for the two traits (viz., p(G), pl)
and three methods (viz., MSR, TSR, OSR) was constructed to examine
their convergent and discriminant validity (cf. Campbell & Fiske,

1959). This matrix is presented in Table 6.

Table 6. Multi-Trait/Multi-Method Matrix of

Self-Reported p(G) and pt estimates

 

 

MSR TSR OSR
p(G) pt p(G) pt p(G) pt
p(G) ---
MSR
pt .15 t ---
TSR
pt -.13 x .56 t! -.10 ---
p(G) .50 It .11 t .49 It -.02 ---
OSR
pt .08 .25 xx .07 .34 it .56 t1 ---
t p < .05
it p < .001

NOTE: Convergent validities appear in boldface type.
Note that the convergent validities are (a) all statistically
significant, (b) higher than the hetero-trait/hetero-method indices,
and (c) higher than the hetero-trait/mono-method indices, thus meeting

Campbell and Fiske’s criteria for establishing convergent and

66

discriminant validity. However, the OSR estimates appear to be
contaminated by a great deal of method bias, and as a result, do not
converge well with the MSR and TSR estimates. Subjects apparently
found it more difficult to express uncertainty using the odds format.

Iggigggt estimates 9f 2!. A rank-ordered (RO) aggregate estimate
of p! was calculated in the following fashion. Subjects’ mean p(G)
estimates were ranked from smallest to largest. Since 152 subjects
voted to acquit the defendant prior to deliberation, the 153rd p(G)
estimate from the top of the list was found. This value, .55, is the
rank-order estimate of the criterion for these jurors, above which a
conviction should be obtained.

The Statistical Decision Theory (SDT) estimate of pt was
calculated for each subject using formula 12 described above.
However, 25 subjects (7.8%) provided "incorrect" valences for at least
one subjective utility estimate (i.e., a positive number for UCI or
UAG, or a negative number for UCG or UAI). This is consistent with a
similar finding by Dane (1979) for 7% (ll/168) of his subjects. These
discrepancies may reflect genuine values, misunderstanding of the
response scale or instructions, or perhaps a lack of sincerity in
filling out the questionnaire. If these valences resulted in division
by zero or a pt greater than 1.00, no SDT estimate was calculated.
All in all, SDT estimates could not be calculated for 45 subjects who
either failed to provide one or more utility estimates, provided non-
codeable (e.g., verbal) responses, or provided incorrect valences.
While the utilities estimates theoretically ranged from negative to
positive infinity, responses with an absolute value greater than

9,999,999 were coded as +/- 9,999,999. Any resulting inaccuracies

67

were judged to be of negligible importance after rounding off
fractions.

”Blackstone" (BLK) pt estimates were obtained by solving for
each subject’s response, r, to the statement ”it is better to let _5_
guilty defendants go free than to convict one innocent defendant,"

using the formula:

As with the SDT utility estimates, r-values exceeding 9,999,999 were
coded as 9,999,999. While subjects were neither encouraged nor
discouraged from providing negative or non-integer values of r, no
subjects did so. Note that the lowest positive value this formula
will yield is .50, provided that subjects use a positive integer for
r. Thus, the BLK pt estimate should be interpreted cautiously, since
subjects who are relatively unconcerned about UCI may have found it

difficult to respond to this item in its present format.

Table 7. Intercorrelations Among pt Estimates

 

 

MSR TSR OSR SDT BLK
MSR 1.00
TSR .56 t! 1.00
OSR .25 I! .34 it 1.00
SDT -.08 .19 ti .14 t 1.00
BLK .09 .14 .04 .16 t 1.00
t p < .05
it p < .001

Intercorrelations between the MSR, TSR, OSR, SDT, and BLK

68

criterion estimates appear in Table 7. Since the RO estimate is an

aggregate one, it cannot be included in the correlation matrix.
Unfortunately, attempts to construct a reliable composite index of pi
were unsuccessful; all coefficient alphas were below .70.
222222122 122 2222222! 21 122 221122122 221122122- "Hit ratei"
for each pt estimate were computed in the following manner: If the
p(G) estimate for a given subject was greater than the respective pt
estimate, a "hit" was tallied if the subject’s pre-deliberation
verdict was ”Guilty" and a "miss" was tallied if the verdict was "Not
Guilty.” If the p(G) estimate was less than or equal to the p!
estimate, a "hit" was tallied if the verdict was "Not Guilty" and a
”miss" was tallied if the verdict was "Guilty." Self-report pt
estimates were matched against their respective p(G) estimates, while
SDT and BLK estimates were matched against the mean self-reported p(G)
(coefficient alpha = .84). The RO hit rate was obtained by matching
the aggregate pt estimate of .55 against each subject’s mean p(G)
score. Each hit rate was tested against the null hypothesis of 50%.

The average pt estimate, percentage of hits, and 2 statistic for each

method are presented in Table 8.

Table 8. Mean pt and Accuracy Rates

 

Estimate Mean pt Hits n z prob.
"3;?" '2; """ 2%}.- 358" 6.03 <.001
TSR .69 72% 315 7.81 (.001
OSR .50 70% 307 7.01 (.001
SDT .54 84% 276 11.30 (.001
BLK .49 68% 305 6.29 (.001
R0 .55 87% 311 13.09 (.001

 

69

The pt estimates are all rather low, ranging from .49 to .69.
However, every estimation method was significantly more accurate than
expected by chance. The R0 and SDT methods, although less direct than
the self-report methods, are nevertheless considerably more accurate.
It is conceivable, however, that the high accuracy rate for the SDT
method might be an artifactual result of the fact that the cases which
lead to computational errors were deleted. One might argue that the
cases with incorrect utility valences should be tallied as additional
”misses," thereby yielding a hit rate of 77%. Therefore, this
corrected hit rate was tested against the others. Z-tests indicate
that the corrected SDT hit rate is less accurate than the RO hit rate
(2 = 3.21, p < .001), more accurate than the MSR (z = 2.77, p <.003),
OSR (z = 1.93, p < .03), and BLK (z = 2.48, p < .005) hit rates, but
not different from the TSR hit rate (2 = 0.90, n.s.). This correction
seems unreasonably stringent, however. The SDT method did not
incorrectly predict those 25 pre-deliberation verdicts; instead, it
made no prediction at all.

Alternatively, perhaps the SDT estimate is more accurate because
it was only obtained from the most alert, dilligent, or intelligent
subjects. Z-tests comparing the mean hit rates for each method,
presented in Table 9, suggest that this is not the case. These 2-
tests were only computed for cases in which both the SDT estimate and
the other estimate in question were available. Nevertheless, the SDT
estimate was as accurate as the RO estimate, and is significantly more
accurate than any of the others (p’s < .001). The mean MSR, TSR, OSR,
BLK and RD hit rates for these selected cases are 67%, 73%, 72%, 67%,

and 87% respectively.

70

Table 9. Z-tests of the Relative Accuracy of pt Estimates

2 Statistic

 

 

SDT "SR TSR OSR BLK
MSR ‘4.85 3
(275)
TSR '3.35 3 1.33
(274) (314)
(270) (306) (302)
BLK '4.72 3 0.43 '0.89 '0.35
(270) (304) (303) (298)
R0 0.85 6.09 3 4.70 3 5.15 3 5.64 3
(276) (310) (309) (303) (305)

 

t p < .001 NOTE: Number of subjects per comparison in parentheses.

This pattern of accuracy is consistent with the pattern reported by
Dane (1979). Dane also found that the SDT and R0 methods of
estimating the decision criterion were more accurate than self-
reported estimates, with SDT hit rates of 82-85% and R0 hit rates of
86-88%.

Z-tests were computed to test for differences in the relative
accuracy of each pt estimate at predicting Guilty and Not Guilty
verdicts. These analyses only revealed significant effects for the MSR
estimate, 2 = -3.91, p < .001, and the TSR estimate, 2 = -4.94,

p < .001. These estimates were 57% and 60% accurate, respectively, at
predicting Guilty verdicts, and 78% and 85% accurate, respectively, at
predicting Not Guilty verdicts. Apparently, subjects rendering Guilty
verdicts were prone to overestimating their actual decision criterion

using either probability format. These two mean pt estimates are

considerably higher than the other, less direct estimates, suggesting

71

that self-presentational concerns may have inflated their criterion
beyond its actual level.

Although Iversen (1971) has criticized the use of a zero-to-ten
probability format on conceptual grounds, as discussed above, subjects
in the present study were nevertheless more accurate with such a
format that with the millimeter zero-to-one format. The gain in

clarity and accuracy may justify a degree of conceptual murkiness.

199222 and Hgggg Estimates. The Thomas and Hogue model assumes
a positive linear relationship between confidence-in-verdict ratings
and :p(G) - pt), the absolute difference between jurors’ perceived
probability of guilt and their decision criterion. This assumption
was tested by correlating confidence ratings with absolute difference
scores using the MSR, TSR, OSR, BLK, and SDT pt and p(G) estimates.
Self-report pt estimates were subtracted from their respective self-
report p(G) estimates, and SDT and BLK estimates were subtracted from
the mean p(G) index. The self-report estimates provided correlations
of -.06, .12, and .09, respectively; only the TSR index was significant
(p < .02). The BLK, SDT, and R0 estimates yielded correlations
of .18, .36, and .41, respectively; all were highly significant
(p < .001).

Thus, Thomas and Hogue’s (1976) fundamental assumption receives
reasonable support from four of the six estimates. As discussed in
Chapter 1, these tests of that assumption can only provide independent
support for the model to the extent that the pt and p(G) estimates are
themselves valid; in this regard, it is encouraging to note that the
greatest support was provided by the SDT and R0 procedures, the most

accurate of the six.

The Thomas and Hogue c and m parameter estimates were calculated
for all subjects using a FORTRAN program which requires verdicts,
confidence ratings, and appropriate contrasts between conditions as
input. As discussed in Chapter 1, these estimates are aggregate, they
have an arbitrary metric, and they have no satisfactory error term.

As a result, they cannot be subjected to inferential statistics and
are therefore only descriptive. In this case, c and m were estimated
for subjects in the Reasonable Doubt and Preponderance of Evidence
conditions. Since subjects in the Reasonable Doubt condition were
less likely to convict the defendant, we would expect the criterion
estimate, 2, to be greater for those subjects than for subjects in the
more lax Preponderance of Evidence condition. However, these
estimates were 1.16 and 1.17, respectively. The respective probability
of guilt parameters, 3, were 1.13 and 1.27. The differences in these
estimates are slight; nevertheless, the complete pattern of Thomas and
Hogue estimates suggests that the instructional manipulation may have
influenced verdicts and guilt scores by shifting subjects’ perceived
probability of guilt rather than their decision criteria. This

possiblility is explored below.

921122122 12212221122 222122121122 222212- One-tailed t-tests
were conducted on the MSR, TSR, OSR, SDT, and BLK pt estimates as
planned comparisons of the effectiveness of the instructional
manipulation. Results of these tests are presented in Table 10. If
the instructional manipulation were successful egg the pi estimate
were adequately valid and reliable, we would expect a higher criterion
level for subjects in the Reasonable Doubt condition. Although the

pattern of means is consistent with this expectation for five of the

73

six estimates, the only significant difference was obtained using the
SDT estimate, and even this difference is surprisingly small. Again,

since the RO criterion is an aggregate estimate, there is no variance

to analyze.

Table 10. Instructional Manipulation Checks for Each p8 Estimate

 

 

Means
Reasonable Preponderance
pt estimate Doubt of Evidence df t
MSR .67 .68 318 -0.36
TSR .69 .68 314 0.53
OSR .51 .49 306 0.83
BLK .51 .48 313 0.73
SDT .56 .52 281 1.68 x
R0 .56 .53 --- ----

 

 

The mean self-report p(G) index for each instructional condition
was examined in a oneway ANOVA. As suggested by the Thomas and Hogue
analysis, commission was perceived as somewhat less probable (M = .51)
for subjects in the Reasonable Doubt condition than for subjects in
the Preponderance of Evidence condition (M = .56), although this
difference was not significant, F(1,309) = 3.29. Thus, the complete
pattern of modeling estimates provides only weak support for the
predicted criterion shift.

Since the SDT estimate varied as a function of the judge’s
defined standard of proof, t-tests were conducted to determine whether

this resulted from a shift in the expected utility of a specific trial

74

outcome. None of these tests were significant; t values ranged from

-.97 to .57, df = 290 to 298.

Group Verdicts

Because two- and three-person groups were only formed when there
were not enough subjects in attendance to form a four-person group,
only 17 two- and 11 three-person groups were obtained. Unfortunately,
there are not enough groups at either size to allow the use of a
three-leveled Group Size factor in the analysis of group verdicts and
post-deliberation judgments. Following a suggestion by Brown (1981),
the two- and three-person groups were therefore combined and a two-
level, Small vs. Large, Group Size factor was created.

The overall trend for the group verdicts replicates the leniency
bias typically found in mock jury research (Stasser, Kerr, & Bray,
1982). While 52.6% of the individual pre-deliberation verdicts were
for conviction, only 29.9% of the groups voted for conviction, 47%
voted for acquittal, and 23% were unable to reach a unanimous group
verdict.

Log-linear analyses were conducted to examine the effects of
Size, Instructions, Victim Attractiveness, and Defendant
Attractiveness upon the group verdicts. These analyses are presented
in Table B-15. Curiously, the Verdict x Instruction effect obtained
for individual pre-deliberation verdicts was not replicated at the

group level. However, there were reliable effects for Verdict x Size,

2
G

2
G

7.88, df = 2, p < .01, and Verdict x Defendant Attractiveness,

7.25, df = 2, p < .01. As shown in Table 11, the Verdict x
Size effect indicates that larger juries were considerably less likely

to reach a unanimous group verdict. This finding conceptually

75

replicates a similar pattern reported by Kerr and MacCoun (1984).

Table 11. Group Verdicts by Size

 

 

 

Guilty Not Guilty Hung Row Total
Small 8 18 2 --—28----
(28.6%) (64.3%) (7.1%)
Large 18 23 18 59
(30.5%) (39.0%) (30.5%)
Column 23—— 41 20 ---87----
Total (29.9%) (47.1%) (23.0%)

NOTE: Row percentages appear in parentheses.

Table 12. Group Verdicts by Defendant Attractiveness

 

Guilty Not Guilty Hung Row Total
2.9.. ---; """""""" SI. """"""" I." "'3."-
(21.0%) (60.5%) (18.6%)
Low 17 15 12 44
(38.6%) (34.1%) (27.3%)
Column ---------
Total 26 41 2O 87

NOTE: Row percentages appear in parentheses.

The Verdict x Defendant Attractiveness effect is portrayed in
Table 12. Juries who viewed an attractive defendant were considerably
more likely to acquit him than juries who viewed an unattractive
defendant. This finding is especially noteworthy because there was no
such extralegal bias at the individual, pre-deliberation level.

Thus, this finding is completely at odds with Kaplan and Miller’s

76

(1978) contention that group deliberation serves to minimize such

extralegal biases by focusing jurors’ attention on evidentiary

factors.

Table 13.

Social Decision Scheme Matrix

Group Verdict

for Each Defendant Attractiveness Condition

 

 

 

 

Initial Split -- Row
(G, NG) Guilty Not Guilty Hung Total
Attractive Defendant

0, 4 0 1.00 0

(0) (4) (O) (4)
1, 3 .125 .625 .25

(1) (5) (2 (8)
2, 2 0 1. 00 0

(O) (4) (0) (4)
3, 1 .27 .18 .55

(3) (2) (6) (11)
4, C) 1.00 O 0

(2) (0) (0) (2)

Unattractive Defendant

0, 4 0 0 0

(O) (0) (0) (0)
1, 3 .25 75 0

(1) (3) (O) (4)
2, 2 .3 .30 .40

(3) (3) (4) (10)
3, 1 .47 .13 .40

(7) (2) (6) (15)
4, 0 1.00 0 0

(1) (0) (0) (1)

 

77

Social Decision Scheme matrices (Davis, 1973) were computed for
4-person juries in the attractive and unattractive defendant
conditions in order to determine whether the biasing effect was due to
differences in the deliberation process. These matrices are presented
above in Table 13. Log-linear analyses revealed a Verdict x Initial
Split effect, 82 = 25.07, p < .05, but no Verdict x Initial Split x
Defendant Attractiveness effect. Similar analyses deleting the hung
and intially unanimous juries replicated this same pattern.
Nevertheless, the table indicates some interesting trends which are

discussed in Chapter 4.

Effects of Deliberation on Individual Verdicts

Following deliberation, subjects again provided private verdicts
and confidence-in-verdict ratings. Because these individual subjects
were nested within groups following deliberation, their post-
deliberation guilt ratings are experimentally dependent within groups,
and may be statistically dependent as well. Therefore, it was
necessary to create mean pre- and post-deiberation guilt scores (cf.
Anderson 2 Ager, 1978) for each group in order to assess the impact of
deliberation upon individual judgments. These scores were then
analyzed in a 2 (Time: Pre- vs. Post-Deliberation) x 2 (Group Size:
Small vs. Large) x 2 (Instructions) x 2 (Victim Attractiveness) x 2
(Defendant Attractiveness) repeated-measures ANOVA, presented in Table
B-16. This analysis yielded two significant effects. A main effect for
Time, F(1,71) = 14.44, p < .001, provides further evidence of a
leniency shift as a result of deliberation. Overall, subjects leaned

toward conviction prior to deliberation (M = 12.23) and leaned toward

78

acquittal afterwards (M = 10.66). Consistent with the group verdicts,
there was a significant Time x Defendant Attractiveness interaction,
F(1,71) = 7.48, p 2 .008. As can be seen in Table 14, this
interaction is consistent with the Group Verdict x Defendant
Attractiveness interaction described above. Since there was no effect
for guilt ratings prior to deliberation, this pattern deviates
somewhat from the group polarization hypothesis described in the
Introduction. For this reason, the interaction was decomposed using
the post-hoc Tukey procedure. Tukey contrasts indicate a significant
leniency shift for the attractive defendant following deliberation (p
< .01). After deliberation, significantly less guilt was attributed

to the attractive defendant than to the unattractive defendant.

Table 14. Time x Defendant Attractiveness Interaction

on Individual Pre- and Post-Deliberation Guilt Scores

Defendant Attractiveness

High Low
Pre 12.19 12.28

Time
Post 9.47 11.82

Mgggligg egalyggg. Contrary to Kaplan and Miller’s prediction,
there was a reliable extralegal bias due to defendant attractiveness
following deliberation. Because Kaplan and Miller specifically
hypothesize polarized evidentiary effects following deliberation, and
because their measurement techniques were called into question in
Chapter 1, self-reported p(G) estimates were assessed again at the
conclusion of the session. Although the criterion-setting model is a

model of individual pre-deliberation judgment, we can nevertheless

79

extrapolate a prediction that the defendant attractiveness

effect for post-deliberation guilt scores and group verdicts should be
mediated by the decision criterion. Time constraints precluded the
assessment of post-deliberation criterion and utility estimates,
measures that usually require more attention, concentration, and
tolerance by subjects. Nevertheless, R0 and Thomas and Hogue
estimates could be computed using subjects’ final mean p(G) ratings,
verdicts, and confidence scores. Tests of these estimates must be
interpreted with a great deal of trepidation, however. Since
individuals were nested within groups, their verdicts should ideally
be aggregated by group, as they are in the analyses described in the
preceeding section. Unfortunately, there is no satisfactory way to
compute an aggregate dichotomous verdict representing the verdict
choices of the members of a given group, an index that both the R0 and
Thomas and Hogue procedures require. Moreover, neither procedure will
yield estimates which can be tested using inferential statistics.

The Thomas and Hogue parameters were computed for subjects in the
Attractive and Unattractive Defendant conditions. As predicted, the
criterion was more stringent for the attractive defendant (2 = 1.03)
then the unattractive defendant (8 = 0.90). This is consistent with
rank-order analyses, which provided estimates of .58 and .55,
respectively. Nevertheless, there also appears to be less perceived
weight of evidence against the attractive defendant (2 = 0.82) than
the unattractive defendant (3 = 0.92). A 2 (Time) x 2 (Instructions)
x 2 (Defendant Attractiveness) x 2 (Victim Attractiveness) x 2 (Size)
ANOVA, presented in Table B-17, was therefore conducted to see if the

Time x Defendant Attractiveness effect on guilt scores was mirrored by

80

a similar effect for probability of guilt. The Time x Defendant
Attractiveness interaction was marginally significant, F(1,68) = 3.16,
p < .08. Tukey contrasts revealed no significant differences between
means, although there was a trend suggesting a lower probability of
guilt for the attractive defendant (M = .48) than the unattractive

defendant (M = .52) after deliberation, a pattern which is consistent

with the Thomas and Hogue m estimates.

Examination of the Asymmetry Effect

The group and individual post-deliberation verdicts both
demonstrate the leniency bias described by Stasser, Kerr, and Bray
(1982). Deliberation had the effect of making juries more lenient
than jurors, and jurors more lenient after discussion than before
discussion. Social Decision Scheme matrices (Davis, 1973; Stasser, et
al., 1982) were computed to determine (a) whether there was an
asymmetry effect, such that evenly split juries on the first ballot
would be more likely to acquit than convict the defendant, and (b)
whether this effect was moderated by the instructional manipulation.
SDS matrices require all juries to be of the same size; furthermore,
previous theory and research (e.g, Kerr & MacCoun, 1984) indicate that
group process does not follow a simple proportionality rule across
varying small group sizes. Therefore, subsequent analyses will only
include four-person groups.

The complete SDS matrix for all four-person groups is presented
in Table 15. The relationship between initial distribution and final

outcome is statistically significant, x2 = 24.13, df=8, p < .003.

81

Table 15. Social Decision Scheme Matrix

for All Four-Person Groups

Group Verdict

 

 

 

Initial Split Row
(G, NG) Guilty Not Guilty Hung Total
0, 4 0 1.00 O

(0) (4) (0) (4)
1, 3 .17 .67 .17
(2) (8) (2) (12)
2, 2 .21 .50 .29
(3) (7) (4) (14)
3, 1 .39 .15 .46
(10) (4) (12) (26)
4, 0 1.00 0 0
(3) (0) (0) (3)
.305 .39 .305
Column Total (18) (23) (18) (59)

 

NOTE: Cell frequencies in parentheses.

Several patterns are readily apparent in the matrix. First, as
expected, the initial distribution of verdict preferences in a potent
predictor of final outcomes for juries reaching unanimous group
verdicts. Second, there was a rather high rate of hung juries
overall. This has the unfortunate effect of reducing statistical
power for the crucial comparisons involving close faction ratios
ultimately reaching unanimous verdicts. Nevertheless, asymmetry is
apparent in an examination of juries with an initial 2:2 split. These
juries had a 50% chance of acquitting, but only a 21% chance of

convicting the defendant. Moreover, a comparison of whether the

82

majority "wins" or "loses" for 1:3 and 3:1 splits, dropping hung
juries, indicates that a three-person faction was more likely to win
if it favored acquittal (80%) than if it favored conviction (71.4%),
= 6.17, df=1, p < .02. Conversely, a minority of one was more likely
to win if it favored acquittal (28.6%) than if it favored conviction
(20%).

In Chapter 1, it was suggested that this asymmetry results from
the reasonable doubt standard and should thus appear for groups in the
Reasonable Doubt condition but not the Preponderance of Evidence
condition. A SDS matrix broken down by the instructional conditions,
is presented in Table 16.

The hypothesized Instructions x Group Verdict x Initial Split
effect was tested in a log-linear analysis. Juries that hung or were
initially unanimous were excluded from the analysis. The 3-way effect
was not significant, G2 = 3.59, df = 2. An examination of Table 16
suggests that three-person majorities favoring conviction actually
fared somewhat better in the Reasonable Doubt condition. A Verdict x
Instruction chi-square test for independence suggests that this
pattern does not differ from chance expectation, X2 = 1.53, df = 1. A
comparison of juries with 2:2 initial splits reaching unanimous
verdicts shows that 83% of the Reasonable Doubt but only 50% of the
Preponderance of Evidence juries ultimately voted for acquittal.
Although certainly suggestive, a Fisher’s Exact 2 test indicates that

this pattern may have resulted from chance, p = .30.

83

Table 16. Social Decision Scheme Matrix
for Each Instructional Condition
Group Verdict

Initial Split Row
(G, NG) Guilty Not Guilty, Hung Total

 

Reasonable Doubt

 

0, 4 0 1.00 0

(0) (3) (0) (3)
1, 3 .14 .57 .2

(1) (4) (2) (7)
2, 2 .11 .56 .33

(1) (5) (3) (9)
3, 1 .43 0 .57

(3) (0) (4) (7)
4, 0 0 0 0

(0) (0) (0) (0)

Preponderance

of Evidence

0, 4 0 1.00 0

<0) (1) (01 m
1, 3 .20 .80 0

(1) (4) (0) (5)
2, 2 .40 .40 .20

(2) (2) m (5)
3, 1 .37 .21 .42

(7) (4) (a) (19)
4, 0 1.00 o 0

(3) <0) <0) (3)

 

Thus, although the complete matrix demonstrates a leniency bias
and an asymmetry effect, the hypothesized role of the criterion
instructions in generating these effects was not borne out. In

Chapter 4, these results are discussed and interpreted.

84

CHAPTER 4

DISCUSSION

This dissertation had six objectives. First, it attempted to
replicate and extend previous findings suggesting that the physical
attractiveness of the victim and the defendant of a crime can exert an
extralegal influence upon mock jurors verdicts and/or guilt-related
judgments. Second, several different procedures for estimating
jurors’ perceived probability of guilt and decision criteria were
evaluated and compared. Third, the judge’s charge to the jury was
manipulated to assess whether the reasonable doubt and preponderance
of evidence standards, as defined in practice, have their intended
influence upon jurors’ decision making. Fourth, it sought to test a
model of extralegal bias in juror decision-making which proposed that
jurors’ standard of proof mediates the influence of many extralegal
factors, and that this relationship is in turn mediated by the costs
of Type I and II juridic errors. Fifth, the role of group
deliberation in amplifying or possibly attenuating such extralegal
biases was examined. And finally, the hypothesis that asymmetry in
Social Decision Scheme matrices of jury deliberation results from
adherence to the reasonable doubt standard was tested. Each of these

objectives is discussed below.

85

Victim and Defendant Attractiveness and Juror Judgments

Contrary to predictions based upon previous research, neither
victim nor defendant attractiveness biased the pre-deliberation
verdicts or recommended sentences of individual jurors in the present
study. These verdicts were reached without detectable extralegal
bias and were strongly related to perceptions of the defendant’s
credibility. Since Lambeth’s testimony played a pivotal role in the
trial, subjects apparently based their pre-deliberation verdicts
primarily on their perception of the evidence. If so, their
conscientiousness is laudable, and hopefully representative of the
performance of jurors in actual criminal trials.

Nevertheless, it is curious that this study failed to replicate
previous research. Several explanations are worth considering. Two
potential explanations can be ruled out with confidence. First, the
male and female photographs selected for use in the study were clearly
perceived as intended. Each pair of attractive and unattractive
photographs for each actor were significantly different, and mean
ratings of each photograph were reliably far from the neutral point on
the scale. Second, floor or ceiling effects are not a plausible
candidate for eliminating attractiveness effects. The case was
extremely close, with a 52.6% pre-deliberation conviction rate.

Another explanation involves the method and mode of trial
presentation. Critics of research on attractiveness and juror
judgment (Horowitz & Willging, 1984; Konecni & Ebbesen, 1982) have
argued that attractiveness effects may be exaggerated by simulations

using otherwise impoverished stimuli. For example:

86

In the laboratory experiment, defendant’s characteristics are
etched in strong relief, which becomes a stark figure on a rather
plain background of trial evidence. In the actual trial,
defendants’ characteristics are embedded in a wide and rich
network of evidentiary materials that vitiate the characteristics
to a minor role in the trial’s outcome (Horowitz 2 Willging,

1934, p. 79).

Each of the relevant studies reviewed in Chapter 1 used a written
transcript to simulate a trial. These transcripts ranged from brief
fact sheets (e.g., Efran, 1974) to lengthy and detailed “verbatim"
transcripts with photographs of all the major participants (Kerr,
1978). However, only the present study added an audio re-enactment
which brought the trial to life in “real time." The resultant
increase in information and realism may have been sufficient to drown
out effects due to victim and defendant attractiveness. This
explanation gains credence from the fact that Kerr (1978) obtained
victim attractiveness effects using an auto theft trial transcript
almost identical to the one used in the present study, but without an
audiotape.

A final alternative is related to the previous one, and suggests
that the lack of attractiveness effects may be an unfortunate
byproduct of the mix of audio and written modes of presentation in the
present study. Recall that subjects were instructed to read along
with the written transcript as they listened to the audio re-enactment
of the trial. Indeed, the audiotape was actually included as an
afterthought -- with the primary intention of pacing jurors so that

they would complete the trial simultaneously. The photographs were

87

mounted on pages of the written transcript, at the point at which each
character was introduced. Subjects who kept pace with the audiotape
would therefore only view each photograph for the duration of the
trial that was transcribed on that page. As a result, these subjects
would not have an opportunity to view the photographs at a leisurely
pace, as they might have had in previous studies. However, in a real
criminal trial, jurors do view the victim and the defendant for an
extended period. Thus, the present procedure may have artificially
restricted subjects’ attention to attractiveness; i.e., rather than
artificially augmenting the impact of attractiveness, this study may
have artificially diminished it. However, the presence of clear
effects of the photographs upon ratings of attractiveness,
likeability, intelligence, and sympathy indicates that subjects were
at least to a minor extent aware of and influenced by the photographs.

Each of these explanations are clearly ad hoc and speculative.
Nevertheless, this study’s failure to replicate previous
attractiveness effects suggests that caution is warranted in
interpreting previous research on the topic. Future research
examining the biasing effects of attractiveness should ideally use
a videotaped trial simulation.

Regretably, the absense of extralegal bias in these pre-
deliberation verdicts does not permit tests of the relevant components

of the criterion-setting model discussed in Chapter 1.

Estimates of Perceived Probability of Guilt and the Decision Criterion

This study extended previous research (e.g., Dane, 1979; Simon,

1967; Simon & Mahan, 1971) attempting to provide quantitative

estimates of p(G) and pi, two parameters theorized to be of paramount
importance in the legal decision process. A wide variety of specific
measurement techniques were adopted, including two different
probability formats, an odds format, Blackstone’s tradeoff,
Statistical Decision Theory, and Thomas and Hogue modeling. This
breadth of procedures is noteworthy on both theoretical and pragmatic
grounds. Theoretically, several of the methods (e.g., SDT, BLK,
Thomas & Hogue) make precise assumptions about the nature of the
decision process. Pragmatically, using a wide variety of methods
enhances the likelihood that subjects can find at least one format
that allows them to access and assess their own cognitive processes.

§g1f35ggggt Estimates. The three self-reported estimates of p(G)
had good convergent and discriminant validity, and subsequently, their
composite index was internally consistent. This is fortunate. Unlike
previous research, which has relied upon a single p(G) estimate, the
present study can therefore provide more confident estimates of the
relative accuracy of the pt estimates that fall on a zero-to-unity
scale.

The self-reported pt estimates also demonstrated some convergent
and discriminate validity, although less so than the p(G) estimates.
These estimates were each more accurate than expected by chance, with
hit rates of 67-72%. The millimeter and zero-to-ten formats provided
the highest mean p! estimates. Since each was prone to overestimating
the frequency of acquittals, it seems likely that these pt estimates are
inflated, perhaps by a social desirability bias or good intentions that
weren’t followed.

Indirect Estimates. The BLK, SDT, rank-order, and Thomas and

Hogue estimates each share the mixed blessing of opacity; i.e., they

89

do not directly solicit standards of proof. This blessing is mixed
because, while they are less vulnerable to the inflationary influences
of social desirablity or rationalization, they may also be less likely
to tap the actual ongoing cognitive process. In this regard, it is
encouraging to note that the rank-order and SDT methods were the most
accurate of all, with hit rates of 88% and 85%, respectively. Dane
(1979) reports almost identical accuracy rates using these procedures;
thus, these findings appear to be stable. It is not particularly
surprising that the RO estimate is so accurate; the procedures used to
compute the RO pt estimate and the RO hit rate are both based on the
positive monotonic relationship between verdicts and p(G).

The high accuracy rate for the SDT procedure supports the decision

theoretic conceptualization of the judgment process (e.g., Fried,

K. Kaplan, & Klein, 1975; Kerr, Bull, MacCoun, & Rathborn, 198x).

Mock jurors do appear to weigh the utilities of potential trial
outcomes in setting their criterion for proof. However, as a
measurement procedure, the SDT model has several shortcomings. First,
subjects seem to find it difficult and time-consuming to explicitly
quantify the necessary utilities. Second, for whatever reason,
subjects may not provide adequate data for computing pl.

On the other hand, the RO procedure carries no clear theoretical
baggage, for better or worse. But it does have the advantage of being
extremely easy to compute. Given verdicts and p(G) estimates, which
every mock jury study can easily and quickly solicit from subjects, a
very accurate estimate of p! can be computed. Separate estimates can
be computed for subjects in each cell of an experimental design.

Unfortunately, as a single aggregate point estimate, it has no

90

variance and cannot be submitted to correlational or inferential
statistical procedures.

The Thomas and Hogue c and m parameters have the same problem.
Nevetheless, it also is easy to collect the verdicts and confidence-
in-verdict ratings the model requires. The present study provided
adequate support for the presumed positive linear relationship between
confidence ratings and :p(G) - p31 (in Thomas and Hogue’s notation,

:X - cl). Using the two most accurate pt estimates, this correlation
was estimated at .36 to .41. Furthermore, the c and m parameters

approximately mirrored the SDT and R0 pi and mean p(G) estimates.

Compliance with Standard of Proof Instructions

At the completion of the trial, subjects received instructions
requiring them to convict the defendant if and only if they perceived
that the weight of evidence presented in the case surpassed a
recommended criterion. For some subjects, this was the "beyond a
reasonable doubt” standard, the common law convention for a criminal
trial. Other subjects received a "preponderance of the evidence"
standard, a more lax criterion typically reserved for civil disputes
in which the State is merely an arbitrator and has no inherent
interest in the outcome of the trial.

As predicted in Chapter 1, and as intended by the legal system,
jurors were less likely to convict the defendant when given the more
stringent reasonable doubt criterion. This pattern was found for both
individual verdicts and for the verdict-based guilt scale. This
result is consistent with a similar result reported by Kerr, et al.

(1976) using reasonable doubt definitions of varying stringency.

91

This instructional effect suggests that jurors required less
evidence to convict the defendant when they received the preponderance
of evidence condition. However, this prediction received mixed
support. Only five of the seven estimates of the decision criterion
showed such a pattern; and of the five estimates for which inferential
statistics can be computed, only the SDT criterion showed a
significant difference. The Thomas and Hogue m estimate suggested
that subjects perceived less weight of evidence in the reasonable
doubt condition, although this pattern was not significant for the
mean p(G) estimate.

Note that the mean pt estimates were fairly low, overall, in the
range of .49 to .69. These estimates are in the range prescribed by
the preponderance of evidence standard, despite the fact that
reasonable doubt is the default standard for a criminal trial. Given
the fact that the case was an extremely close one, with a mean p(G)
of .54, slight differences in pt estimates could result in significant
differences in a discrete variable like the verdict. And given some
internal inconsistency in the mean p(G) estimates, and "miss“ rates of
22-33% for the pi estimates, it is plausible that real differences
could exist and yet fail to be detected.

It is also surprising that the judge’s defined criterion
manipulation did not have a subsequent influence upon the verdicts of
deliberating juries, especially since Kerr, et al. (1976) found
differences in group verdicts using a theoretically more restricted
range of criterion definitions. However, in the present study, 54.2%
of the Preponderance of Evidence juries and only 29.4% of the
Reasonable Doubt juries convicted the defendant, a strong trend in the

expected direction. Note that this is a 24.8% difference; a

92

difference of only 14.9% in individual pre-deliberation conviction
rates was statistically significant. Apparently, the instructional
effect failed to reach significance because of a loss in statistical
power at the group level of analysis.

The prevalence of hung juries in the present study may have
obscured real effects due to criterion instructions. Kalven and
Zeisel (1966) report a 5% hung jury rate in their large survey of
actual juries. In the present study, 30.5% of all four-person groups,
34.6% of the reasonable doubt juries and 27.2% of the preponderance of
evidence juries, failed to reach a unanimous verdict. In an actual
trial, a hung jury presumably protects the defendant and often results
in dismissal of the case. In this study, one might therefore argue
that the higher rate of hung juries for the reasonable doubt
condition, in conjunction with the lower rate of convictions for
reasonable doubt juries, 19% vs. 39%, constitutes evidence that the
reasonable doubt criterion serves to protect the defendant.

However, there is some indication that the hung jury rate is
inflated spuriously. Subjects in the study were told to deliberate
until they had either reached a unanimous group verdict or exhausted
the time available in the session. This admonishment was repeated on
the foreperson’s instruction sheet. Nevetheless, several
experimenters reported that when they would enter a deliberation room
to conclude a session, occassionally a group would report that they
had "hung a long time ago,“ and spent their remaining time waiting
behind the closed door and perhaps discussing matters unrelated to the
experiment. Since juries met in private, this couldn’t be prevented.

Perhaps a "dynamite charge," i.e., an admonishment to continue

deliberating, would have resulted in unanimous verdicts for many of
these groups. It isn’t clear whether such deadlocks were related to
the instructional manipulation or modal verdict preference. If these
deadlocks are premature and simply resulted from the random
distribution of some unmotivated yet influential subjects, it would
have the effect of obscuring real trends. Conversely, juries that
hung by running out of time while still deliberating may have
eventually reached unanimity. Thus, the data on hung juries in this

study is difficult to interpret.

Extralegal Defendant Attractiveness Bias Following Group Deliberation

Kaplan and Miller (1978) have argued and presented evidence that
the deliberation process may attenuate extralegal biases found in
individual pre-deliberation verdicts. They suggest that such
attenuation is inherent in the public act of deliberating. Jurors are
unlikely to raise attractiveness as an issue during deliberation, and
their colleagues are likely to discourage such a topic if it arises.
Instead, juries are hypothesized to focus predominantly upon the
facts. The net result is that the bias component in each juror’s
judgment comes to weigh less and less as deliberation proceeds.

The pattern of both group and individual post-deliberation
verdicts in the present study is in direct contradiction to this
argument, however. In this study, juries were significantly less
likely to convict the defendant of auto theft when he was physically
attractive. There was a significant leniency shift for individual
guilt ratings in the attractive defendant condition as a result of

group deliberation, and this resulted in a significant difference

94

between final guilt ratings in the attractive and unattractive
defendant conditions. Thus, deliberation brought out a clear
extralegal bias that was not apparent in the judgments reached
privately by individual jurors.

This is not the first study that has found extralegal group
verdict effects which weren’t manifested at the pre-deliberation
individual level. Hans and Doob (1976) conducted a mock jury study to
examine whether jurors complied with a judge’s instructions to
disregard the defendant’s prior criminal record. Subjects read a
transcript of a burglary case, and half were informed that the
defendant had been previously convicted of burglary. Prior to
deliberation, 45% of the jurors in the prior record condition and 40%
of the jurors in the no record condition voted for conviction, a
slight but non-significant trend for extralegal bias. But after
deliberating the case, 40% of the prior record juries convicted the
defendant, while none of the no record juries did. This is a
statistically significant difference (p < .01). Furthermore, Hans and
Doob (1976) recorded 71 comments regarding prior record during the
deliberation of the prior record groups; only 14 of these comments
suggested that the record should not be held against the defendant.
Contrary to Kaplan and Miller (1978), subjects apparently had few
qualms about blatantly discussing extralegal information, even though
they were instructed to ignore such information in reaching their
verdicts.

The results of the present study can be interpreted as an example
of group polarization. This phenomenon, strictly interpreted, would
suggest that slight initial differences in guilt ratings for the

defendant attractiveness manipulation would be amplified by group

95

discussion, shifting toward greater leniency in the attractive
defendant condition and less leniency in the unattractive defendant
condition. However, an examination of Table 13 in the last chapter
refutes this pattern. First, there was no significant shift in the
ratings for the unattractive defendant, and the trend suggests mggg
leniency, not less. Second, guilt ratings for the attractive
defendant are initially on the guilty side of the midpoint of the
scale, and subsequently shift across that point in the direction
favoring acquittal.

This strict interpretation of group polarization deserves some
qualification, however. Although the numerical midpoint on the 22-
point guilt scale is 11.5, the functional psychological midpoint is
almost certainly higher. The leniency bias for guilt-related judgments
suggests that a 50:50 pre-deliberation split will ggt result in a
50:50 post-deliberation split -- on the average, the conviction rate
will decrease significantly, as it did in the present study.
Furthermore, there is evidence that the effects of attractiveness on
verdicts are probably due to special treatment of the attractive
individual, not mistreatment of the unattractive individual (Sigall &
Ostrove, 1975). Thus, the mean pre-deliberation guilt rating for the
unattractive defendant, 12.28, is probably at or very near the
functional midpoint, and would not be expected to move toward greater
guilt. The attractive defendant’s rating starts below this point, at
12.19, and moves to 9.47, a significant polarization effect. Of
course, group polarization is a description of data; it does not
constitute an explanation or define a psychological process.

One possible explanation for the attractiveness bias is that

96

deliberation created a shift in the reasonable doubt criterion. In
the present study, post-deliberation pt estimates suggest that jurors
in the attractive defendant condition had more stringent standards of
proof than did subjects who saw an unattractive defendant.
Unfortunately, the R0 and Thomas and Hogue estimates do not allow a
conclusive test. Nevertheless, this is a viable hypothesis which
implies a very different judgmental process from the information
integration model, as outlined by Kaplan and his colleagues. While
their model suggests that bias is integrated with evidence in reaching
a verdict, the model advocated in this paper argues that bias is
reflected in the setting of a decision criterion. This decision
criterion is then matched against the perceived weight of evidence to
reach a verdict. Since these two components are not integrated, the
"set-size" phenomenon is not relevant, and either component may
polarize.

There is also a trend suggesting that the weight of evidence may
have also shifted after group discussion, but in a direction opposite
to the criterion shift. This might be interpreted as an indication
that attractiveness was averaged with the evidence, as in the
information integration weighted average model. This is possible, but
clearly at odds with Kaplan and Miller’s (1978) contention that the
evidence overwhelms the biased predisposition following deliberation.
This pattern could also result if jurors were to apply the decision
criterion, already influenced by their personal reactions to the
defendant, to each item of evidence independently, rather than to the
evidence as a whole (Cullison, 1977). Consistent with this reasoning,
Hans and Doob (1976) report that their prior record juries felt that

the evidence against the defendant was stronger, and discussed the

97

most incriminating facts more, than the no record juries. If jurors
do use the criterion in this manner, the model and operations
advocated here must be revised. For example, the method of creating
expected verdicts to assess accuracy would probably result in inflated
hit rates. It would be very difficult to determine whether jurors did
use the criterion in such a piecemeal fashion, however. One method
might be to get independent ratings of the evidence, either piecemeal
or as a whole, from a control group that receives no attractiveness
information at all, or receives attractiveness information at the
conclusion of the trial. Or, jurors exposed to different photographs
could be asked to rate each piece of evidence as it was received. The
information integration model would predict that any effects of
attractiveness upon p(G) should gradually diminish as the evidence
increases, even prior to deliberation. The "piecemeal" criterion
model would not predict any such attenuation; once the criterion was
set, the favored actor would continue to be perceived through rosy
lenses.

The attractiveness biasing effect might be explained in terms of
group process. Although log-linear analyses did not indicate a
significant relationship between defendant attractiveness, initial
verdict distribution, and group verdicts, an examination of Table 13
in the preceding chapter does reveal several suggestive trends.

First, juries in the attractive defendant condition were more likely

to begin with a unanimous preference for acquittal, which might result
from a weak pre-deliberation atttractiveness effect combined with the
vagaries of random assignment. Second, although there is an asymmetry

between the 3,1 and 1,3 juries for both attractiveness conditions,

98

factions favoring acquittal in the 2,2 juries were more successful --
winning every time -- when they were arguing for an attractive
defendant. This pattern raises the possibility that the defendant’s
attractiveness served as a "tie-breaker." Jurors might have earnestly
attempted to discuss the facts of the case, but given such equivocal
evidence, might have ultimately resorted to extralegal cues like
physical attractiveness. For example, “I don’t know -- he just looks
too wholesome to steal cars." Corroborating evidence for this "tie-
breaker" hypothesis must await a systematic content analysis of the

deliberation tapes.

Standards of Proof and the Asymmetry Effect

In this study, as in previous research (cf. Stasser, Kerr, &
Bray, 1982) juries were more lenient than might be predicted by
jurors’ pre-deliberation verdicts, and jurors were more lenient in
their final private verdicts. Overall, this resulted in an asymmetry
in the final outcomes of four-person juries that started with even
faction sizes -- only 21% ultimately convicted the defendant, while
50% acquitted him. Furthermore, a minority of one was more likely to
convert his or her colleagues if he or she favored acquittal.

This study provided an opportunity to test one possible
explanation of this asymmetry effect, which suggests that the
reasonable doubt criterion used in criminal trials makes for an uphill
climb for factions favoring conviction. On the other hand, factions
favoring acquittal need merely create a reasonable doubt in their
opponents to convert them. In the present study, the manipulation of

standard of proof instructions should have only given this advantage

99

to pro-acquittal factions in the reasonable doubt condition. Results
indicated a trend suggesting that juries with an initial 2:2 split
acquit the defendant more frequently when they received reasonable
doubt instructions, but this analysis involved only 10 groups and
was not statistically significant. The data did not support the
hypothesized Instruction x Initial Split x Verdict interaction.

However, several aspects of the present study weakened the
strength of any tests of this prediction. As discussed above, there
was an inflated, and possibly artifactual, hung jury rate which may have
obscured important patterns at the group level. Furthermore, when 59
groups are distributed across three verdict options and five initial
splits, a total of 15 cells, there is a great reduction in statistical
power for the crucial comparisons between close factions reaching
unanimous decisions. Also, the unfortunate necessity of using four-
rather than six- or twelve-person juries produces only one absolute
majority-to-minority ratio, 3:1. This is unfortunate because social
psychologists have long established that a minority faction of one has
unique psychological properties (Asch, 1956; see Kerr & MacCoun, 1984
for a recent example). In the present case, it may obscure important
differences in minority influence. For example, perhaps a minority
faction favoring acquittal only noticeably benefits from the
reasonable doubt standard when it has more than one member; the
advantage may be outweighed by the extreme disadvantage of a lack of
social support.

A more direct test of the standard of proof hypothesis for the
asymmetry effect could be provided by an experiment in which juries
receive either the reasonable doubt or the preponderance of evidence

instructions, and then are ”stacked" -- i.e., explicitly constructed

100

to be evenly split on the first ballot. This would focus a great deal
more statistical power upon the crucial comparisons for the
hypothesis. Of course, there are still other viable explanations for
the asymmetry effect, including the possibility that the effect is
limited to or pronounced in juries composed of predominantly middle-
class college students in their late teens and early twenties. An
experiment which ”stacked” mock juries composed of either college
students or members of a jury pool into even initial splits would
provide a direct test of this latter hypothesis. The issue of mock

jury composition and external validity are discussed below.

The Mock Jury Technique: Is It Externally Valid?

The use of mock juries, especially juries composed of college
undergraduates, is not without its critics (e.g., Konecni & Ebbesen,
1982). Can the decisions reached by students after exposure to a
hypothetical trial tell us anything about actual decisions reached by
actual juries? A complete review of the issues involved in this
question is beyond the scope of this paper; for a thorough review and
a persuasive defense of the mock jury simulation strategy, the reader
is referred to Bray and Kerr (1982). Nevertheless, a few points
should be addressed here.

First of all, there is no way in which the present study could
have been conducted using real, deliberating juries. In fact, it
would be illegal -- "jury tampering“ is a felony. The preponderance of
evidence instructions are unacceptable for use in an actual criminal
trial. And correlational research would not provide the necessary

control gained through random assignment and the ability to hold trial

101

materials constant. As Bray and Kerr (1982) point out, field research
often sacrifices the potential for sound causal inferences afforded by
the experimental simulation strategy.

Second, there is no clear g 9519:; reason why the results in the
present study would fail to generalize to real juries in real trials.
While we can generate a list of obvious differences between this
simulation and a real trial -- no voir dire, a more homogeneous jury
pool, an abbreviated trail and deliberation period, no real judge, no
real outcome at stake, etc. -- none implies an explicit a priori
reason why these simulation results should not apply. We might
generate hypotheses, but these will be empirical questions, and
require data to provide answers. As an example, Kerr, Bull, MacCoun,
and Rathborn (1984) asked whether British mock jurors undergo a
different decision process than American mock jurors -- an empirical
question. They found that the decision processes were the same in two
different nations. Are American college students more similar to
British college students than to American blue collar workers or
retirees on a real jury, or more different? Again, an empirical
question. And as the questions become concrete, theories begin to
germinate.

Note that the present study makes no policy recommendations for
the legal system. Rather, it is an exercise in theory construction.
It is the theories developed over the course of many mock jury
experiments that will make predictions about actual trial situations,
not the point estimates and test statistics obtained along the way
(cf. Mook, 1983). In the meantime, the results reported here should

best be intepreted:

102

...as ’demonstrations’ that may reveal that assumptions inherent

 

in the law do not always hold or that the legal system works in a
way other than officially prescribed (Davis, Bray, & Holt, 1977,
p. 327).

In other words, psychologists can and should point out potential cracks

in Justice’s blindfold, and flaws in her scales; to look is not to touch.

103

References

Asch, S. (1956) Studies of independence and conformity: I. A minority
of one against a unanimous majority. Egyghglggiggl Monggggghg,

29, No. 9.

Anonymous. (1984) Legal vs. quantified definitions of standards of
proof. Manuscript under editorial review, Law 999 39999

Behavior.

Anderson, L., & Ager, J. W. (1978) Analysis of variance in small
group research- 82222221112 222 222121 8222221222 22112112.ﬂ.

341-345.

Anderson, N. H. (1981) Integration theory applied to cognitive
responses and attitudes. In R. E. Petty, T. M. Ostrom, & T. C.
Brock (EdS-). 222211122 222222222 12 2222222122. (22. 361-398)-

Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Berscheid, E., & Walster, E. (1974) Physical attractiveness and
heterosexual attraction. In L. Berkowitz (Ed.), ﬁgvggggs in

122221222121 222121 82x222122x. Z- New York: Academic 82255-

Bray, R. M., & Kerr, N. L. (1982)- Methodological considerations in

the study of the psychology of the courtroom. In N. L. Kerr & R.

New York: Academic Press.

Bray, R. M., & Noble, A. M. (1978) Authoritarianism and decisions of
mock juries: Evidence of jury bias and group polarization.

2222221 21 8222222111! 222 222121 8222221222. 32. 1424-1430-

104

Brehm, J. (1966) A theory 9f psychological reactance. New York:

Academic Press.

Broeder, D. (1959) The University of Chicago jury project. Nebraska

222 822122. 22. 744-760-

Brown, M. B. (1981) Two-way and multiway frequency tables -- Measures
of association and the log-linear model. In W. J. Dixon (Ed.),

2228 21211211221 22112222. (pp. 143-206)-

Campbell, D. T., & Fiske, D. W. (1959) Convergent and discriminant
validation by the multitrait-multimethod matrix. Egyghglggiggl

22112112. 22. 81-105-

Champagne, A., & Nagel, S. (1982) The psychology of judging. In N. L.
Kerr & R. M. Bray (Eds.), The psychology of the courtrogm. New

York: Academic Press.

Charrow, R. P., & Charrow, V. R. (1979) Making legal language
understandable: A psycholinguistic study of jury instructions.

92122212 122 822122. 22. 1306-1374-

Cornish, W. R., & Sealy, A. P. (1973) Juries and the rules of

evidenCE- 122 22121221 122 822122. 208-223-

Cullison, A. D. (1977) The model of rules and the logic of decision.
In S- 5- Nagel (Ed->. 22221122 122 22121221 2221122 222122 (pp-

225-246). Beverly Hills, Calif.: Sage Publications, Inc.

Dane, F. C. (1979) Quantifying the reasonable doubt criterion.

Unpublished doctoral dissertation, University of Kansas.

105

Dane, F. C., & Wrightsman, L. S. (1982) Effects of defendants’ and
In N. L. Kerr & R.

victims’ characteristics on jurors’ verdicts.

York: Academic Press.

Group decision and social interaction: A theory

Davis, J. H. (1973)

of social decision schemes. Egyghglggiggl Review, 99, 97-125.
(1977) The empirical study

Davis, J. H., Bray, R. M., & Holt, R. N.
In J. L.

of decision processes in juries: A critical review.

Tapp & F- J- Levine (625-). 122. 1221122. 222 122 1221212221 12

22:12121 2222221221221 222 12221 122222- New York: Holt-

Doob, A. N., & Kirshenbaum, H. M. (1972) Some empirical evidence on

the effect of s.12 of the Canada Evidence Act upon an accused.
22121221 122 222212212. 12. 88-96-

(1974) The effect of physical appearance on the judgment

Efran, M. G.
interpersonal attraction, and severity of recommended

of guilt,
punishment in a simulated jury task. Jgugggl 9f Beggaggh 1g

82222221112. 2. 45-54-

Elwork, A., Sales, 8. D., & Alfini, J. J. (1977) Juridic decisions: In

ignorance of the law or in light of it? L95 ggg ﬁgggg thgxigg,

1. 163-189.

The analysis of multidimensional contingency

Feinberg, S. E. (1970)

tables. Egglggy, Q1, 419-433.

Teaching the Type I and Type II errors: The

Feinberg, N. (1971)

judicial prOCESS- 122 22221222 212112112122. 22. 30-32-

106

Fried, M., Kaplan, K. J., & Klein, K. N. (1975) Juror selection: An
analysis of voir dire. In R. J. Simon (Ed.), The jggy gygtgg 19
America: A critical overview. Beverly Hills, Calif.: Sage

Publications, Inc.

Grofman, B. (1977) Jury decision-making models. In S. S. Nagel (Ed.),
22221122 122 22121221 1221122 222122 (22- 191-204)- Beverly

Hills, Calif.: Sage Publications, Inc.

Grofman, B. (1981) Mathematical models of juror and jury decision-
making: The state of the art. In B. D. Sales (Ed.), Eggggggtiygg
i Law agg Egyghglggy; Volume 2: The trial ggggggg. New York:

Plenum Press.

Hans, V. P., & Doob, A. N. (1976) Section 12 of the Canada Evidence
Act and the deliberations of simulated juries. Criminal Law

222212212. 12. 235-254-

Horowitz, I. A., & Willging, T. E. (1984) The psychology 91 lag;

Iversen, G. R. (1971) Operationalizing the concept of probability in
legal-social science research. Law 39g Sggigty Bgyigw, §, 331-

333.

Izzett, R., & Leginski, w. (1974) Group discussion and the influence of
defendant characteristics in a simulated jury setting. Journal

21 222121 8222221222. 22. 271-279-

107

Izzett, R., & Fishman, L. (1976) Defendant attractiveness as a
function of attractiveness and justification for actions.

2222221 21 222121 8222221222. 192. 285-290-

Kalven, H., & Zeisel, H. (1966) 199 99951999 jggy. Boston: Little,

Brown.

Kaplan, J. (1968) Decision theory and the factfinding process.

21221222 122 822122. 22. 1065-1092-

Kaplan, M. F. (1977) Discussion polarization effects in a modified
jury decision paradigm: Informational influences. 599199995y,

40, 252-271.

Kaplan, M. F. (1982) Cognitive processes in the individual juror. In
N. L. Kerr 2 R. M. Bray (Eds.), The 9§ychology 9f 999 999595999

(pp. 197-220). New York: Academic Press.

Kaplan, M. F., & Kemmerick, G. D. (1974) Juror judgment as
information integration: Combining evidential and nonevidential
in1ormation- 2222221 21 82222221112 222 222121 8222221222. 22.

493-499.

Kaplan, M. F., & Miller, C. E. (1977) Judgments and group discussion:
Effect of presentation and memory factors on polarization.

2221222122. 52. 227-343-

Kaplan, M. F., & Miller, L. E. (1978) Reducing the effects of juror
2125- 2222221 21 82222221112 222 222121 8222221222. 22. 1443-

1455.

108

Kerr, N. L. (1978a) Beautiful and blameless: Effects of victim
attractiveness and responsibility on mock juror verdicts.

82222221112 222 222121 8222221222 22112112. 9. 479-482-

Kerr, N. L. (1978b) Severity of prescribed penalty and mock jurors’
verdicte- 2222221 21 82222221112 222 222121 8222221222. 22.

1431-1442.

Kerr, N. L., Atkin, R. 8., Stasser, G., Meek, D., Holt, R. H., &
Davis, J. H. (1976) Guilt beyond a reasonable doubt: Effects of
concept definition and assigned decision rule on the judgments of
mock jurors- 2222221 21 82222221112 222 222121 8222221222.

95, 282-294.

Kerr, N. L., Bull, R., MacCoun, R. J., & Rathborn, H. (1984) Victim,
culture, and verdict: Modeling juror decision-making. British

2222221 21 222121 8222221222. In prese-

Kerr, N. L., & MacCoun, R. J. (1983) Pretrial publicity and juror
judgment: A review of empirical research. Technical report:

Pretrial Publicity Project, American Judicature Society.

Kerr, N. L., & MacCoun, R. J. (1984) The effects of jury size and
polling method on the process and product of jury deliberation.
Under editorial review. 2222221 21 82222221112 222 222121

8222221222-

Knoke, D., & Burke, P. J. (1980) 999311999; 999919. Sage University
Paper series on Quantitative Applications in the Social Sciences,

07-001. Beverly Hills and London: Sage Publications.

109

Konecni, V. J., & Ebbesen, E. B. (1982) Social psychology and the
law: The choice of research problems, settings, and methodology.
In V. J. Konecni 2 E. B. Ebbesen (Eds.), 199 99191991 j9§1199
§y§1991 e 90ci91-99ychologic91 9991y919. San Francisco: W. H.

Freedman and Co.

Loftus, E. (1983) Whose shadow is crooked? 99991999 E§y9991991§1,

99, 576-577.

Marshall, C. R., & Wise, J. A. (1975) Juror decisions and the
determination of guilt in capital punishment cases: A Bayesian
perspective. In D. Wendt & C. Vlek (Eds.), 911111y1 999999111gy1

999 99999 999191993999199. Dordrecht, Holland: Reidel.

Michelini, R. L., & Snodgrass, S. R. (1980) Defendant characteristics
and .uridic decisions. 2222221 21 22222222 12 22222221112.

15, 340-350.

Mitchell, H. E., & Bryne, D. (1973) The defendant’s dilemma: Effects
of juror’s attitudes and authoritarianism on judicial decisions.

2222221 21 82222221112 222 222121 8222221222. 22. 122-129-

Mook, D. G. (1983) In defense of external invalidity. American

822222122121. 22. 279-385-

Myers, D. 6., & Kaplan, M. F. (1976) Group-induced polarization in
simulated juriee- 82222221112 222 222121 8222221222 22112112.

9, 63-66.

Myers, D. G., & Lamm, H. (1976) The group polarization phenomenon.

8222221221221 22112112. 22. 202-627-

110

Nagel, S. (1979) Bringing the values of jurors in line with the law.

2221221222. -2. 189-195-

Nagel, 5., Lamm, D., 2 Neef, M. (1981) Decision theory and juror
decision-making. In B. D. Sales (Ed.), 299999911999 19 199 999
99x9991ggy1 Volume 2: The trial 9999999, (pp.353-386). New York:

Plenum Press.

Nagel. S-. & Naai. M- (1979) 22212122 122222 222 122 12221 2222222-

Lexington, Mass.: Lexington Books.

Nemeth, C. (1977) Interactions between jurors as a function of

8222221222. 2. 28-56-

Niabatt. R- E-. 2 Ross. L- (1980) 22222 1212222221 2122122122 222
222212221222 21 222121 22222221- Englewood Cli115. N- J-=

Prentice-Hall, Inc.

Ostrom, T. M., Werner, C., & Saks, M. J. (1978) An integration theory
analysis of jurors’ presumptions of guilt or innocence. Journal

21 82222221112 222 222121 8222221222. 22. 436-450-

Pennington, N., & Hastie, R. (1981) Juror decision-making models: The

generalization 2ap- 8222221221221 22112112. 22. 246-287-

Penrod, 8., & Hastie, R. (1979) Models of jury decision-making: A

critical review- 8222221221221 22112112. 22. 462-492-

Raid. A- H- (1960a) I22 122 21 122122211222 12 222122. Vol- 4

(Civil), 3rd Edition. New York: Bobbs-Merrill Co., Inc.

111

Reid. A. H- (1960a) I22 122 21 122122211222 12 222122. Vol- 5

(Criminal), 3rd Edition. New York: Bobbs-Merrill Co., Inc.

Shaffer, D. R., Case, T., 2 Brannen, L. (1979) Effects of withheld
evidence on juridic decisions: Amount of evidence withheld and
its relevance to the casa- 22222222121122 22222222 12 222121

8222221222. 19. 2-15-

Sigall, H., 2 Ostrove, N. (1975) Beautiful but dangerous: Effects of
offender attractiveness and nature of crime on juridic judgments.

2222221 21 82222221112 222 222121 8222221222. 21. 410-414-

Simon, R. J. (1970) ”Beyond a reasonable doubt" - An experimental

2212222. 2. 202-209-

Simon, R. J., 2 Mahan, L. (1971) Quantifying burdens of proof: A view
from the bench, the jury, and the classroom. 199 999 §99191y

892195! §9 319-330-

Stasser, G, Kerr, N. L., 2 Bray, R. M. (1982) The social psychology of
jury deliberations. In N. L. Kerr 2 R. M. Bray (Eds.), I99
99ychology 91 199 999919999 (pp. 221-256). New York: Academic

Press.

Stasser, 6., Kerr, N. L., 2 Davis, J. H. (1980) Influence processes in
decision-making groups: A modeling approach. In P. Paulus (Ed.),
122 2222201022 21 22222 121122222- Hilladala. New Jersey:

Erlbaum Associates, Inc.

112

Sue, S., Smith, R. E., 2 Caldwell, C. (1973) Effects of inadmissable

evidence on the decisions of simulated jurors: A moral dilemma.

Sue, 8., Smith, R. E., 2 Gilbert, R. (1974) Biasing effects of
pretrial publicity on jUdiCiil decisions. Journal of Criminal

2221122. 2. 162-171-

Thomas, E. A. C., 2 Hogue, A. (1976) Apparent weight of evidence,
decision criteria, and confidence ratings in juror decision-

making- 8222221221221 822122. 22. 442-465-

Thornton, B. (1977) Effect of rape victim’s attractiveness in a jury
simulation- 82222221112 222 222121 8222221222 22112112. 2. 666-

669.

19191, (1983) Numerical gauge for reasonable doubt ruled prejudicial

error. September, 1983, p. 10.

Wolf, S., 2 Montgomery, D. A. (1977) Effects of inadmissable evidence
and level of judicial admonishment to disregard on the judgments
of mock jurors. 9999991 91 9991ied Social Psxchologx, §, 205-

219.

FOOTNOTES

1
Since the present study uses a variation of Kerr’s (1978a)

stimulus materials, the victim was portrayed as taking those same
precautions.
2
Nagel’s (e.g., 1979) simplifying assumptions that :UAI: = :UCI:,
and that :UAG: = :UCG: were supported in the present study; with
t(299) = 1.28, p =.20, and t(300) = 0.84, p = .40, respectively. In
fact, these absolute utilities were highly correlated; r = .88,
p < .001, and r = .83, p < .001, respectively.
3
The Intelligence, Trustworthiness, and Felon scales were
basically used as “tie-breakers"; i.e., when there were more than two
photographs of a given sex of the same approximate attractiveness
level, the photos most neutral on these scales were chosen. Table 2
shows that this attempt was only moderately successful, and more so
for the female photos than the male photos. The "what is beautiful is
good" stereotype (Berscheid & Walster, 1974) is so robust that it is
difficult to manipulate physical attractiveness without manipulating
general positivity.
4
Log-linear analyses were conducted using the method of partial
aggggigtigg (e.g., Brown, 1981). This procedure is analogous to a
hierarchical analysis of variance, in that it provides test

statistics for each 2-way, 3-way, ...N-way interaction effect. This

114

is accomplished by fitting a baseline to the data, and then removing
each effect of interest and observing the subsequent decline in
predictive accuracy. For example, in order to test a Verdict 2
Instructions effect, a baseline of all possible 2-way effects is fit
to the data, and the likelihood ratio, 62 , is computed. Next, the
Verdict 2 Instructions effect is removed and the likelihood ratio is
re-calculated. This latter test statistic is subtracted from the
baseline likelihood ratio, and the resultant 62 , a test statistic
distributed as X2 , is tested against the null hypothesis of
statistical independence; i.e, any differences as a result of dropping
the effect of interest are due to chance. This method is in contrast
to a goodness-of-fit strategy (e.g., Fienberg, 1970) in which the null
hypothesis is that deviations from the hypothesized model are due to

chance.

115

APPENDIX A

Experimental Materials

116

MICHIGAN STATE UNIVERSITY

Department of Psychology

DEPARTMENTAL RESEARCH CONSENT FORM

1. I have freely consented to take part in a scientific study being
conducted by Robert J. MacCoun under the supervision of Dr. Norbert
L. Kerr, Associate Professor of Psychology, MSU.

2. The study has been explained to me and I understand the explanation
that has been given and what my participation will involve.

3. I understand that I am free to discontinue my participation in the
study at any time without penalty.

4. I understand that the results of the study will be treated in
strict confidence and that I will remain anonymous. Within these
restrictions, results of the study will be made available to me at
my request.

5. I understand that my participation in the study does not guarantee
any beneficial results to me.

6. I understand that, at my request, I can receive additional
explanation of the study after my participation is completed.

Signed:

 

Title of Experiment: "THE JURY STUDY"

Date:

 

NOTE: You will be asked to read a brief transcript of a criminal
trial, and will be asked to respond as you would if you were an actual
juror, by completing a questionnaire and deliberating as a group. At
the conclusion of the experiment, you may have a number of questions
about the research. If so, you are invited to attend a discussion
session conducted by the experimenter on Friday, March 30th at 1 pm at
412 Baker Hall, or you may call Rob MacCoun at 353-6611.

THIS EXPERIMENT WILL LAST NO LONGER THAN ONE AND A HALF HOURS AND YOU

WILL RECEIVE THREE EXPERIMENTAL RESEARCH CREDITS FOR YOUR
PARTICIPATION.

117

THE JURY STUDY

Your initials:
Your sex:
Date:

In my opinion, the defendant, William Lambeth, is:

_____ Guilty of auto theft

_____ Not Guilty of auto theft

. How confident are you in the above verdict?

(circle one number)

No Complete
confidence 0 l 2 3 4 S 6 7 8 9 10 confidence

In my opinion, the probability that William Lambeth did commit the
charged offense is (place a check mark somewhere along the
following scale):

 

o """"""""""" .5 1.0
it’s certain there’s a it’s certain
that Lambeth 50-50 chance that Lambeth
did NOT steal that Lambeth DID steal the
the car stole the car car

What is the smallest probability of guilt that you believe would be
necessary in order to conclude that Lambeth is GUILTY of auto

theft (i.e., if there were any less than that probability, you'd
vote NOT GUILTY)?

 

He.

0 .5 .0
No evidence Complete evidence
of guilt of guilt

Suppose for a minute that William Lambeth were found guilty, and that
you were the judge who had to sentence him. Assuming that the maximum
penalty is 20 years imprisonment, what prison sentence would you
recommend?

______ years, months

118

The fact is that William Lambeth either did or did not steal Helen
Bednard’s car. Furthermore, there are two possible decisions that can

be made at the end of the trial:
can be found NOT GUILTY.
William Lambeth’s trial:

VERDICT

Lambeth is
found GUILTY

Lambeth is
found NOT GUILTY

Lambeth can be found GUILTY, or Lambeth

Thus, there are four possible outcomes of

TRUE STATE OF THE WORLD

Lambeth DID
steal the car

Lambeth DID NOT
steal the car

 

A guilty man
is convicted

An innocent man
is convicted

 

 

A guilty man
is set free

 

An innocent man
is set free

 

 

6. Please consider each possible outcome of the trial and indicate
whether you feel that the outcome is a POSTIVE outcome or a NEGATIVE

outcome:

VERDICT

Lambeth is
found GUILTY

Lambeth is
found NOT GUILTY

TRUE STATE OF THE WORLD

Lambeth DID
steal the car

Lambeth DID NOT
steal the car

 

 

 

_ positive _ positive
___ negative ___ negative
___ positive ___ positive
___ negative ___ negative

 

 

 

7. Now consider each possible outcome of the trial and write a number in
each square to indicate HOW postive or HOW negative that outcome

would be if it occurred.

If you think that an outcome would be

positive, use any number between zero and positive infinity. If you
think an outcome would be negative, use any number between zero and

negative infinity.

VERDICT

Lambeth is
found GUILTY

Lambeth is
found NOT GUILTY

TRUE STATE OF THE WORLD

Lambeth DID

steal the car

Lambeth DID NOT
steal the car

 

 

 

 

 

 

119

In my opinion, the probability that William Lambeth did commit the
charged offense is (check one):

O

0

chances in 10 (i.e.,it’s certain that Lambeth did NOT

chance in 10

chances

chances

chances

chances

chances

chances

chances

chances

in

in

in

in

in

in

in

in

10
10
10
10
10
10

IU

10

steal the car)

(i.e., there’s a 50-50 chance that Lambeth
stole the car)

10 chances in 10 (i.e., it’s certain thsat Lambeth DID

steal the car)

What is the smallest probability of guilt that you believe would be
necessary in order to conclude that Lambeth is GUILTY of auto theft

(i.e., if there were any less than that probability, you’d vote
NOT GUILTY)?

O chances in 10 (i.e., no evidence of guilt)

1

I")

(4

UI

~O

chance in 10

chances

chances

chances

chances

chances

chances

chances

chances

in

in

in

in

in

in

in

in

10

10

10

10

10

10

10

10

10 chances in 10 (i.e., complete evidence of guilt)

120

10.

11.

13.

14.

Please rate your impressions of William Lambeth, the defendant, by
placing a check mark on each of the following scales:

 

Believable :____:____:____:____:____:____:____: Unbelievable
Likeable :____:____:____:____:-___:____:_’__: Not Likeable
Attractive :____:____:____: : : : : Unattractive
Intelligent :____:____:____:____:___-:____:_-__: Unintelligent

Please rate your impressions of Helen Bednard, the victim, by
placing a check mark on each of the following scales:

 

Believable :____:____:____:____: : : : Unbelievable
Likeable :____:____:____:____:____:____:____: Not Likeable
Attractive : : : : : : : : Unattractive

 

Intelligent : : : : : : : : Unintelligent

. Please indicate how important the judge’s instructions were in

helping you to determine whether or not there was sufficient
evidence in the trial to convict William Lambeth: (circle one)

1 2 3 4 5 6 7
Completely Completely
unimportant important

Please indicate how comprehensible the jUdQE’S instructions were
for you: (circle one)

1 2 3 4 5 6 7
Completely Completely
incomprehensible comprehensible

Please complete the following sentence by writing a number in the
blank:

"It is better to let _____ guilty defendant(s) go free than to
convict one innocent defendant."

. How sympathetic did you feel towards William Lambeth, the defendant

(circle one)?

1 2 3 4 S 6 7
Very Very
unsympathetic sympathetic

121

16.

17.

18.

How sympathetic did you feel towards Helen Bednard, the victim
(circle one)?

1 2 3 4 5 6 7
Very Very
unsympathetic sympathetic

In my opinion, the odds that William Lambeth did commit the charged
offense are (write a number in each blank):

What are the smallest odds of guilt that you believe would be
necessary in order to conclude that William Lambeth is GUILTY of
auto theft (i.e., if the odds were any smaller than that, you’d
vote NOT GUILTY)?

122

After each of the members of your group have finished filling out the
individual questionnaires, you should close your door and begin
deliberation. Please discuss the case as a group and attempt to reach

a ugagimggg group verdict. (Note that there is a microphone in

your rooom. When you close the door, the experimenter will begin tape-
recording your deliberation. You may find that once you begin discussing
the case, the presence of the microphone will be easy to ignore). Your
group may deliberate until _____ . Your experimenter will
notify you when you have only 5 minutes left to deliberate.

Foreperson’s initials: __________

Time your jury began deliberation:_

 

Time your jury completed deliberation (i.e., reached a unanimous group
verdict or "hung"):

GROUP VERDICT (at completion of deliberation):

"We find William Lambeth ______________ of the charge of auto theft"
(check one)
_____ GUILTY
NOT GUILTY

_____ HUNG (i.e., we were unable to reach a
unanimous group verdict in the
time allotted)

123

THE JURY STUDY

----------- Your initials:
Your sex:
Date:

Jurors, please fill this short questionnaire out QEIEB your jury

has deliberated and reached a unanimous group verdict.

1. What is your personal verdict? GUILTY NOT GUILTY

hJ

. How confident are you in the above verdict? (circle one number)

NO CONFIDENCE 0 1 2 3 4 S 6 7 8 9 10 COMPLETE CONFIDENCE
a. How satisfied are you with your grguplg verdict? (circle one number)

VERY VERY
DISSATISFIED O 1 2 3 4 5 6 7 8 9 10 SATISFIED

4. In my opinion, the probability that William Lambeth did commit the
charged offense is: (fill in the blank with a number from 0 to 10)

chances in 10"

5. In my opinion, the odds that William Lambeth did commit the charged
offense are (write a number in each blank):

6. In my opinion, the probability that William Lambeth did commit the
charged offense is (place a check mark somewhere along the following

 

 

scale):

0 """""‘”T6 """" 1.0
it’s certain there’s a it’s certain
that Lambeth 50-50 chance that Lambeth
did NOT steal that Lambeth DID steal the
the car stole the car car

We’d appreciate any comments you’d like to make about this study. Did
you enjoy it? If so, why? If not, why not? Was there anything you
found confusing or hard to understand?

Thank you for your interest and participation...

124

APPENDIX 8

Analysis of Variance and

Log-Linear Modeling Tables

Analysis of Variance: Victim Attractiveness

Table 8-1

Manipulation Check by Subject Sex, Instructions,

Victim and Defendant Attractiveness

 

 

 

Source df Mean Square F-Ratio
Subject S;;—_- ——-I-- .41 ----:24--
Instructions 1 .19 .11
Victim Att. 1 375.20 219.63 It
Defend. Att. 1 .59 .35
S x I 1 .2 .16
S x V 1 2.70 1.58
S x D 1 .03 .02
I x V 1 1.09 .64
I > D 1 .56 .33
V x D 1 .03 .01
S x I > V l 14.08 8.24 t
S x I x D 1 .00 .00
S x V x D 1 .42 .25
I x V x D 1 .26 .15
S x I > V > 1 .76 .44
Error 301 1.71

t p < .01
13 p < .001

Table 8-2

Analysis of Variance: Defendant Attractiveness

Manipulation Check by Subject Sex,

Instructions,

Victim and Defendant Attractiveness

 

 

 

Source df Mean Square F-Ratio
gum.“ Sex I 5.30 "'IIII'I
Instructions 1 .98 .87
Victim Att. 1 .63 .56
Defend. Att. 1 341.55 303.74 It:
8 x I 1 1.44 1.28
S x V 1 .47 .42
S x D 1 .78 .69
I x V 1 .04 .03
I 4 D 1 1.14 1.02
V x D 1 .96 .85
S x I > V 1 .10 .09
S x I x D 1 15.43 13.72 It
5 x V i D 1 1.03 .92
I x V x D l .69 .62
S > I x V > 1 1.49 1.32
Error 302 1.12

t p < .05
t! p < .001

Table 8-3

Log-Linear Analysis: Individual Pre-Deliberation

Verdicts by Subject Sex, Instructions,

Victim and Defendant Attractiveness

 

 

2
Effect df G

3...“... " "II." "EYE-3'"
Verdict x Sex 1 .06
Verdict x Instructions 1 7.10
Verdict x Victim Att. 1 .00
Verdict x Defendant Att. 1 .51
Baseline 7 -9.43
Verdict x S > I l .33
Verdict x S x V l 3.37
Verdict x S x D 1 .04
Verdict x I x V l 1.56
Verdict x I x D l 2.99
Verdict x V x D 1 .19
Baseline 1 -5.27
Verdict x S x I x V l .12
Verdict x S > I x D l .13
Verdict x S x V x D 1 3.88
Verdict x I x V > D l .00
t p < .05

128

Table 8-4
Analysis of Variance: Pre-Deliberation
Guilt Scores by Subject Sex, Instructions,

Victim and Defendant Attractiveness

 

 

 

Source df Mean Square F-Ratio
Subject—£2 ”--.; 13.90 ""33"
Instructions 1 453.31 7.27 ‘13
Victim Att. 1 .01 .00
Defend. Att. 1 9.91 .16
S x I 25.75 .41
S V V 189.91 3.05
S : D .52 .01
I x V 1 36.12 .58
I I D 1 136.01 2 18
V x D 1 .93 .02
S > I > V 1 3.70 06
S x I x D l 10.15 .16
S x V x D 1 163.67 2.63
.I x V x D 1 2.10 .03
S x I x V x D 1 37.72 .61
Error 305 62.33

t p < .01 — —

129

Analysis of Variance: Pre-Deliberation
Guilt Score Internal Analysis by Subject Sex,

Instructions, Victim and Defendant Attractiveness

Table 8-5

 

 

Source df Mean Square -Ratio

IIIIIQIIIQIIIII IIIIII 33.06 IIIIIIIII
Instructions 1 169.65 2.83
Victim Att. 1 2.81 .05
Defend. Att. 1 39.79 .66

S > I 1 03 .00
S x V 1 292.03 4.86 t
S x D 1 .80 .01

I x V 1 10.44 .17

I x D 1 254.44 4.24 t
V x D 1 50.74 .85

S x I > V 1 .50 .01

S x I x D 1 21.92 .37

S x V x D 1 45.03 .75

I x V x D 1 22.07 .37
S x I x V i D 1 82.88 1.38
Error 177 60.06

t p—< .05

Table B-6
Analysis of Variance: Recommended Sentences
by Subject Sex, Instructions,

Victim and Defendant Attractiveness

 

 

Source df Mean Square F-Ratio
Subject Sex ---I-- 412.45 ___-:14--
Instructions 1 11745.94 3.85
Victim Att. 1 270.56 .09
Defend. Att. 1 86.15 .03
S x I 1 943.36 .31
S x V 1 1962.57 .64
S x D 1 244.51 .08
I x V 1 9298.06 3.05
I x D 1 478.31 .16
V x D 1 623.63 .21
S x I x V 1 50.91 .02
S x I x D 1 1328.99 .44
S x V x D 1 139.58 .05
I x V x D 1 1.56 .00
S x I x V x D 1 536.07 .18
Error 286 3047.76 .62

 

131

Table 8-7

Analysis of Variance: Victim Believability

Victim and Defendant Attractiveness

 

by Subject Sex, Instructions,

 

 

Source Mean Square F-Ratio
3mm 5... .64 IIIIIEIII
Instructions 5.21 3.14
Victim Att. 2.57 1.55
Defend. Att. .01 .01
S x I 8.77 5.28 t
S x V .34 .20
S x D .21 .13
I x V 2.86 1.72
I 4 D 3 12 1.88
V x D .18 .11
S : 1 x V 00 .00
S x I x D .34 .21
S > V x D .99 .60
I x V x D .03 .02
S x I > V i 3.28 1.98
Error 1.66

t p < .05

 

Analysis of Variance: Victim Likeability

Victim and Defendant Attractiveness

by Subject Sex,

 

Table 8-8

Instructions,

 

 

 

Source Mean Square F-Ratio
Subject Sex .84 ----:54--
Instructions 1.29 .83
Victim Att. 25.13 16.16 t
Defend. Att. 1.77 1.14
S > I .91 .59
S x V .10 .07
S x D 1.74 1.21
I x V 2.79 1.79
I > D .04 .03
V x D .08 .05
S x I > V .09 .06
S x I x D .00 .00
S x V > D 1.00 .64
I x V x D .49 .32
S > I . V > .14 .09
Error 1.56

t p < .001

Table 8-9

Analysis of Variance: Victim Intelligence

Victim and Defendant Attractiveness

 

 

by Subject Sex, Instructions,

 

 

Source df Mean Square F-Ratio
5.2... 5., "III .00 IIIIIIBII
Instructions 1 .70 .55
Victim Att. 1 5.44 4.24 t
Defend. Att. 1 .35 .27
S x I 1 .42 .33
S x V 1 .00 .00
S 4 D 1 .00 .00
I < V 1 1.90 1.48
I < D 1 13 .10
V x D 1 .09 .07
S > I > V 1 5.58 4.34
S x I x D 1 .90 .70
S x V > D 1 2.48 1.93
I x V x D 1 .80 .62
S > I . V r 1 .04 .03
Error 301 1.28

t p < .05

134

Table 8-10

Analysis of Variance: Sympathy for Victim
by Subject Sex, Instructions,

Victim and Defendant Attractiveness

 

 

 

 

Source df Mean Square F-Ratio
Subject Sex I ---1-- 6.22 ---2:64--
Instructions 1 2.95 1.25
Victim Att. 1 1.46 .62
Defend. Att. 1 .41 .18
S x I 1 .40 .17
S x V 1 .44 .19
S x D 1 .91 .38
I x V 1 2.89 1.23
I x D 1 1.10 .47
V x D 1 1.27 .54
S x I x V 1 4.42 1.03
S x I x D 1 .00 .00
S x V x D l .11 .05
I x V x D 1 4.58 1.95
S > I x V x D 1 20.59 8.74 1
Error 299 2.36

t p i .01

135

 

 

Table 8-11

Analysis of Variance: Defendant Believability

Victim and Defendant Attractiveness

 

by Subject Sex, Instructions,

 

 

Source Mean Square F-Ratio
Subject Sex ------ .88 ----:36--
Instructions .26 5.44 t
Victim Att. 2.85 1.17
Defend. Att. .74 .30
S x I 35 1.79
S x V 17 .49
S x D .34 .14
I x V .21 .09
I 4 D 12 .05
V x D .01 .01
S x I > V .60 .66
S x I x D .07 03
S x V > D 37 .56
I x V x D .34 .14
S x I > V > 1 .03 .01
Error 3 2 44

X p < .05

136

Table 8-12
Analysis of Variance: Defendant Likeability
by Subject Sex, Instructions,

Victim and Defendant Attractiveness

Source df

 

 

Mean Square F-Ratio

IIIIIIIIIQIIIII IIIIII .21 IIIIIIIII
Instructions 1 .23 .16
Victim Att. 1 1.31 .90
Defend. Att. 1 6.71 4.62 t
S x I 1 .61 .42

S x V 1 .06 .04

S . D 1 1.35 .93

I x V 1 2.65 1.83

I > D 1 1.14 .79

V x D 1 2.09 1.44

S x I > V 1 02 .02

S x I x D 1 4.63 3.19

S x V > D 1 1.75 1.21

I V x D 1 1.05 .72

S ‘ I x V x D l 1.35 .93
Error 302 1.45

t p Z 05 _-

137

Analysis of Variance: Defendant Intelligence

Victim and Defendant Attractiveness

 

Table 8-13

by Subject Sex,

Instructions,

 

 

Source df Mean Square F-Ratio
5.1,... 9...?" IIIIII I 2.23 IIIIIIIII
Instructions 1 .94 .78
Victim Att. 1 .82 .68
Defend. Att. 1 11.03 9.19 it
8 x I 1 2.55 2.13
S x V 1 1.23 1.03
S x D 1 .11 .09
I x V 1 .03 .03
I x D 1 .22 .18
V x D 1 6.88 5.74 t
S x I i V 1 42 .35
S x I x D 1 .17 .14
S x V > D 1 .17 .14
I x V x D 1 .07 .06
S > I x V > D 1 2.96 2.47
Error 302 1.20

t p < .05
t! p < .01

138

 

Table B-14
Analysis of Variance: Sympathy for Defendant
by Subject Sex, Instructions,

Victim and Defendant Attractiveness

 

 

 

Source df Mean Square F-Ratio
ISubject Sex ---1-- .20 -—-—:11--
Instructions 1 1.28 .71
Victim Att. 1 .11 .06
Defend. Att. 1 3.18 1.76
S x I 1 .45 .25
S x V 1 9.25 5.11 x
S D 1 5.98 3.30
I x V 1 .41 .23
I x D 1 .35 .19
V x D 1 .19 .11
S r I . V 1 1.76 .97
S x I / D 1 .88 .49
S x V . D 1 .39 .22
I x V x D 1 .27 .15
S x I x V x D 1 18.82 10.40 It
Error 299 1.81
I p < .05
it p f .001

 

Table B-15
Log-Linear Analysis: Group Verdicts
by Subject Sex, Instructions,

Victim and Defendant Attractiveness

Effect df G

Baseline 27 26.40

 

Verdict x Size

M
>‘
DJ
(I)
at

Verdict x Instructions 2 .57
Verdict x Victim Att. 2 3.45

Verdict x Defendant Att.

1‘.)
\1
N
Ul
N

Baseline 11 -17.62

Verdict x S x I

M
4:.
m
U!

Verdict x S x V 2 .30
Verdict x S x D 2 2.92
Verdict x I x V 2 .71
Verdict x I x D 2 5.52
Verdict x V x D 2 .82

tp< .01

140

 

 

Table 8-16
Repeated Measures ANOVA: Guilt Scores by
Time, Size, Instructions, and

Victim and Defendant Attractiveness

 

Source df Mean Square F—Ratio
s I 150.71 ”XIII"
Instructions 1 52.74 0.99
Victim Att. 1 18.08 0.34
Defend. Att. 1 14.89 0.28
S x I 1 127.70 2.41
S x V 1 4.11 0.08
S x D 1 79.76 1.50
I x V 1 83.77 1.58
I x D 1 14.93 0.28
V x D 1 0.31 0.01
S x I : 1 49.87 0.94
S x I ~ 1 3.64 0.07
S x V ' 1 7.53 0.14
I x V ‘ 1 6.84 0.13
S } I ‘ > D 1 11.50 0.22

Error 71 53.07

 

141

 

 

 

Repeated Measures ANOVA: Guilt Scores by
Time, Size,

Victim and Defendant Attractiveness

Table B-16 (Continued)

Instructions, and

 

 

Source df Mean Square F-Ratio
r IIIII 106.88 IIIIIIIII.
T x Size 1 3.94 .53
T x Instructions 1 .04 .01
T x Victim Att. 1 8.48 1.15
T x Defend. Att. 1 55.36 7 48 t
T x S x I 1 .37 .05
T x S x V 1 6.59 .89
T x S x D 1 26.70 3.61
T x I 4 V 1 4.20 .57
T x I x D 1 4.19 .57
T x V x D 1 8.85 1.20
T x S : I x V 1 9.29 1.25
T x S x I i D 1 4.36 .59
T x S x V x D 1 6.89 .93
T > I > V > D 1 22.28 3.01
T x S x I x V 1 3.26 .44
Error 71 7.40

t p < .01

I! p f .001

142

 

Table 8-17

Repeated Measures ANOVA: Mean p(G) Estimates

Victim and Defendant Attractiveness

by Time, Size,

 

Source df
m. I IIIIII
Instructions 1
Victim Att. 1
Defend. Att. 1
S x I 1
S x V 1
S x D 1
I x V 1
I 4 D 1
V x D 1
S x I > V 1
S < I x D 1
S x V x D 1
I x V x D 1
S x I > V , 1
Error 68

Instructions, and

Mean Square

 

445.39

44.88

145.09

1.91

324.66

271.68

497.53

236.76

42.09

68.66

0.02

76.18

111.17

14.63

438.98

458.68

F-Ratio

 

143

 

Table 8-17 (Continued)

Repeated Measures ANOVA: Mean p(G) Estimates

by Time, Size,

Victim and Defendant Attractiveness

 

Source df
T IIIIII
T A Size 1
T x Instructions 1
T x Victim Att. 1
T x Defend. Att. 1
T x S x I 1
T x S x V 1
T x S ' D 1
T x I : V l
T x I D 1
T x V . D 1
T x S I x V 1
T x S I x D l
T x S V x D 1
T > I V 4 D 1
T x S I x V x D 1
Error 68

t p . .05 __-

Instructions, and

Mean Square

 

440.52

6.11

245.17

12.54

240.78

10.49

28.51

111.06

7.44

74.84

.07

28.17

91.48

93.09

76.27

F-Ratio

5.03 t

.14

.37

1.46

.10

.98

.00

 

144

11111111111111“