EXAMINING INTERJUDGE PUNISHMENT DISPARITIES AND JUDICIAL SENTENCING
PATTERNS WITHIN COURT COMMUNITIES
By
Michael B. Cassidy

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
Criminal Justice – Doctor of Philosophy
2017

ABSTRACT
EXAMINING INTERJUDGE PUNISHMENT DISPARITIES AND JUDICIAL SENTENCING
PATTERNS WITHIN COURT COMMUNITIES
By
Michael B. Cassidy
The current study examines individual judges’ punishment decisions and sentencing
patterns within court communities. Focal concerns perspective states that judges consider
offender blameworthiness, community threat, and practical constraints when sentencing
offenders, and assessment of the focal concerns is likely to vary across judges. In part,
differences are due to judges’ subjective decision-making, but additional theories suggest
sentencing outcomes are also influenced by the court community in which punishment decisions
occur. Extant research generally relies on multilevel models to assess interjudge variation, but
more recent work indicates multilevel analysis obscures variation at the judge-level, and
provides limited information about how individual judges within court communities consider
offender and case characteristics.
The current work offers a more comprehensive examination of interjudge disparity and
judicial sentencing patterns within court communities. The research uses seven years (20042010) of data collected from the Pennsylvania Commission on Sentencing and a sample of large,
medium, and small courts. Findings from multilevel models and individual judge regression
models show individual judge analyses provide a better understanding of variation across judges,
and whether differences associated with key predictors of punishment are meaningful. The
current research also finds little consistency in the ways judges in the same court communities
consider extralegal factors in sentencing decisions. This work highlights the need to further

develop theories to explain why offender and case characteristics influence punishment decisions
for some judges, but not others, and the role court communities play in sentencing decisions.

	
ACKNOWLEDGEMENTS

I would like to thank Dr. Carole Gibbs, my graduate advisor and committee chairperson,
who has provided invaluable guidance and support since the first day I entered the doctoral
program. You offered me a number of opportunities to learn and grow as a scholar, and I would
not have been able to complete this dissertation without you. I would also like to thank my
committee members Dr. Steven Chermak and Dr. J. Kevin Ford for their insightful feedback
throughout this process, and Dr. Chris Melde for his statistical and methodological advice over
the years.
To the friends I made at Michigan State, Jason Rydberg, Rebecca Stone, Kimberly
Bender, John Hakola, Alexis Norris, Julie Yingling, Derrick Franke, Gio Circo, Ellen Jesmok,
Brianna Bermudez, Alexandria Anstett, and Alyssa Badgley, thank you for making graduate
school a truly enjoyable experience. I also want to thank Brian Durant for his friendship and
support.
Finally, I owe the greatest thanks to my family. My parents, Wayne and Roseanne,
grandmother Anna, sister Laura, brother-in-law Steve, and nieces Charlotte and Audrey have
provided constant love, inspiration, and encouragement.
	
	
	
	

	

	

	

iv

	
TABLE OF CONTENTS

LIST OF TABLES

vi

LIST OF FIGURES

vii

CHAPTER 1: INTRODUCTION

1

CHAPTER 2: REVIEW OF THE LITERATURE
Early Theories of Sentencing
Recent Literature
Court Communities and Focal Concerns
Sentencing Variation within Court Communities
Summary
Purpose of the Research
Research Questions and Hypotheses

6
6
7
10
13
15
16
22

CHAPTER 3: DATA AND METHODOLOGY
Data
Data Reduction and Missing Data
Sample Selection
Dependent and Independent Variables
Control Variables
Analytic Strategy

25
25
25
26
28
29
30

CHAPTER 4: RESULTS
Descriptive Statistics
Multilevel Analysis
Individual Judge Analysis
Incarceration Models
Sentence Length Models
Analysis of Judges within Court Communities

34
34
36
41
41
50
64

CHAPTER 5: DISCUSSION
The Current Inquiry
Theoretical and Methodological Implications
Multilevel Analysis of Judge Variation
Individual Analysis of Judge Variation
Sentencing within Court Communities
Implications for Policy
Limitations
Directions for Future Research

85
85
86
86
87
92
94
96
97

REFERENCES

	

101

v

	
LIST OF TABLES

Table 1. Cases and Number of Judges by Court

28

Table 2. Descriptive Statistics for Overall Sample

35

Table 3. Unconditional Models of Incarceration and Sentence Length

37

Table 4. Random Coefficients Models of Incarceration and Sentence Length

38

Table 5. Number of Judges with Significant Effects for Sentencing Outcomes

64

Table 6. Percent of Large Court Judges with Significant Effects

68

Table 7. Percent of Medium Court Judges with Significant Effects

75

Table 8. Percent of Small Court Judges with Significant Effects

83

	

vi

	
LIST OF FIGURES

Figure 1. HGLM and Logit Effects for Offense Severity

42

Figure 2. HGLM and Logit Effects for Prior Record

44

Figure 3. HGLM and Logit Effects for Age

45

Figure 4. HGLM and Logit Effects for Female Offenders

47

Figure 5. HGLM and Logit Effects for Black Offenders

48

Figure 6. HGLM and Logit Effects for Trial Convictions

49

Figure 7. LMM and OLS Effects for Offense Severity

51

Figure 8. LMM and OLS Effects for Prior Record

52

Figure 9. LMM and OLS Effects for Age

54

Figure 10. LMM and OLS Effects for Female Offenders

55

Figure 11. LMM and OLS Effects for Black Offenders

57

Figure 12. LMM and OLS Effects for Trial Convictions

58

Figure 13. Logit and OLS Effects for Age

60

Figure 14. Logit and OLS Effects for Female Offenders

61

Figure 15. Logit and OLS Effects for Black Offenders

62

Figure 16. Logit and OLS Effects for Trial Convictions

63

Figure 17. Individual Judge Effects in Large Court 2

66

Figure 18. Individual Judge Effects in Medium Court 3

70

Figure 19. Individual Judge Effects in Medium Court 6

72

Figure 20. Individual Judge Effects in Small Court 5

76

Figure 21. Individual Judge Effects in Small Court 9

78

	

vii

	
Figure 22. Individual Judge Effects in Small Court 4

	

viii

81

	
CHAPTER 1: INTRODUCTION
Prior to the 1970s, sentencing systems at both the state and federal level offered judges a
wide range of punishment options that could be tailored to individual offenders’ specific needs
(MacKenzie, 2001; Reitz, 1998). Providing judges with nearly unfettered discretion, however,
resulted in unwarranted race and gender disparities in criminal sanctions (Tonry, 1996). Calls for
reforms that would promote uniformity and fairness led several states and the federal
government to enact sentencing guidelines (Reitz, 1998). Guidelines are designed to limit
judicial discretion by providing a sentencing range based on the legally relevant factors of
offense severity and prior criminal history, thus ensuring that similarly situated offenders receive
similar punishment outcomes (Miethe & Moore, 1985; Tonry, 1996).
Over the last few decades, scholars have developed a substantial body of literature
examining the key predictors of sentencing under guidelines systems. Research consistently
shows that legal factors are the primary determinants of sentence severity, but disparities based
on extralegal factors (e.g., age, race/ethnicity, and gender) remain (for reviews, see Pratt, 1998;
Spohn, 2000; Ulmer, 2012; Zatz, 1987, 2000). In recent years, the dominant theoretical
framework used to explain sentencing decisions is the focal concerns perspective (Hartley,
Maddan, & Spohn, 2007; Kramer & Ulmer, 2009). Focal concerns states that judges’ sentencing
decisions are based on assessments of offender blameworthiness, protection of the community,
and practical constraints. Judges rely primarily on legal factors (e.g., offense severity, prior
record) when assessing the focal concerns, but also engage in subjective decision-making based
on attributions associated with offender age, race/ethnicity, and gender when determining the
appropriate punishment (Kramer & Ulmer, 2009).

	

1	

	
Focal concerns theorists also recognize the factors judges consider when assessing the
focal concerns, as well as the weight afforded to these factors, is likely to vary across judges
(Kramer & Ulmer, 2009). In part, this is due to differences in judges’ subjective decisionmaking, but additional theories suggest that judges’ decisions are also influenced by the court
community in which punishment decisions occur. Court communities consist of court actors who
share a common workplace and develop unique case processing and sentencing norms
(Eisenstein, Flemming, & Nardulli, 1988; Kramer & Ulmer, 2009). Extant work also notes,
however, that court size plays an important role in shaping court communities. For example,
court actor autonomy is highest in large courts, moderate in medium courts, and lowest in small
courts (Eisenstein, Flemming, & Nardulli, 1988). As a result, the effects of key predictors of
sentencing may differ across courts (Kautt, 2002; Kramer & Ulmer, 2009), though this variation
may be conditioned by court size.
Some studies have used the focal concerns perspective to explore variation in judges’
subjective sentencing decisions, and findings from multilevel analyses show legal and extralegal
effects vary significantly across judges (Anderson & Spohn, 2010; Johnson, 2006; Wooldredge,
2010). However, Wooldredge (2010) compared results from multilevel models and individual
judge regression models and found that multilevel analyses masked extralegal effects found in
the individual judge models. As such, Wooldredge (2010) highlighted the need for additional
judge-level analyses to gain a better understanding of interjudge disparity. Additional research
suggests that differences in judges’ subjective punishment decisions can be explained, at least in
part, by the court community context in which sentencing occurs (Ulmer, 1997). Early studies
showed that the influence of court communities may vary based on court size, but this work is
limited to examination of a small number of courts, and primarily focused on court processes

	

2	

	
(e.g. docket management, case assignment) as opposed to sentencing decisions (Eisenstein,
Flemming, & Nardulli, 1988; Eisenstein & Jacob, 1977). More recent work employing multilevel
analyses of offenders nested within courts offers support for the court community perspective,
finding that legal and extralegal effects vary significantly across courts (Britt, 2000; Kautt, 2002;
Kramer & Ulmer, 2009; Ulmer & Johnson, 2004). However, these studies provide little
information about how individual judges within court communities consider offender and case
characteristics when determining the appropriate sentence. Thus, while prior work has begun to
explore interjudge punishment disparity and the influence of court communities on sentencing
decisions, a more comprehensive examination is needed.
The current work fills this gap in the prior literature to further advance knowledge of
interjudge disparity and judicial sentencing patterns within court communities. Using seven years
(2004-2010) of data collected from the Pennsylvania Commission on Sentencing and a sample of
large, medium, and small courts, the current work employs multilevel modeling to replicate the
prior work that shows variation in offender and case characteristics across judges. Next, the
present study uses individual judge regression models to examine judges’ contributions to the
legal and extralegal effects found in the multilevel analysis. Finally, analysis of individual judges
grouped by court will be used to assess whether and how judges in the same court communities
consider legal and extralegal factors. This research extends previous work (Wooldredge, 2010)
using multilevel analysis and individual judge models to assess differences in findings produced
by these methodological approaches, and is the first to examine individual judges’ sentencing
patterns within a relatively large sample of court communities.
This work is important from a theoretical and practical standpoint. Focal concerns theory
suggests that judges will vary in the specific factors considered in sentencing decisions.

	

3	

	
However, pooling of judge estimates for legal and extralegal factors in multilevel random effects
models is contrary to the idea that judges vary in their subjective assessments of focal concerns.
Further, contemporary sentencing theories are not only concerned with whether judges vary in
legal and extralegal effects, but also whether these factors matter in punishment decisions. Yet,
multilevel random effects models may not be appropriate when predictors are expected to
produce effects for some groups, but not others (Gelman & Hill, 2016). Thus, the current work
incorporates individual judge models to more directly examine interjudge variation, and assess
whether differences between judges are meaningful. This kind of analysis is a necessary first step
in gaining a better understanding of the extent to which judges vary, which can then be followed
by further developing theories to explain this variation (see Ulmer, 2012; Wooldredge, 2010).
In addition, much of the extant research on focal concerns either ignores whether court
communities affect judges’ subjective decisions, or concludes that variation across courts is
indicative of court context influencing judges’ punishment decisions (Kramer & Ulmer, 2009).
However, findings from the latter provide limited information about what is driving this
variation. For example, differences across court communities may be the result of similar
sentencing patterns among judges within courts, similarities for judges in some courts but not
others, and/or differences between judges across courts. As such, the current work explores the
significance of court communities on judges’ subjective sentencing decisions, and whether
varying levels of court actor autonomy present in large, medium, and small courts conditions this
relationship.
By testing the focal concerns and court community perspectives using the analytic
strategies outlined above, the current work offers implications for theory. Differences between
findings from the multilevel analysis and the individual judge regression models may indicate

	

4	

	
that analysis at the judge level, as opposed to the jurisdiction or state, is necessary to gain a better
understanding of how judges consider legal and extralegal factors when assessing the focal
concerns (Wooldredge, 2010). In addition, consistency among judges within the same courts
may suggest the court community influences punishment decisions, whereas interjudge disparity
in large courts versus small courts, for example, may indicate court contextual influences are
conditioned by court size. Conversely, substantial variation within all courts could suggest that
court communities influence some processes (e.g., docket management, case assignment), but
have less of an impact in the punishment phase. Differences within courts would also indicate a
need for further theoretical development to explain why some judges rely on certain legal and
extralegal factors more than others when determining the appropriate sentence.
The current work also has potential implications for sentencing law and policy.
Sentencing guidelines were developed to reduce unwarranted disparity and increase uniformity
in punishment (Kramer & Scirica, 1986). Examining individual judges’ sentencing decisions will
provide some indication of whether guidelines’ have achieved these goals. To the extent
disparity continues to exist, implications may include additional training for judges on guidelines
implementation, more rigorous research to assist policy makers with identifying sources of
disparity, and stricter appellate review standards.

	

5	

	
CHAPTER 2: REVIEW OF THE LITERATURE
Early Theories of Sentencing
Early theories on the influence of legal and extralegal factors on sentencing decisions
were based on macro-level relationships between law and society. Conflict theory states that the
social and political structures of society are the result of conflict between the ruling class and
those with little or no power (Chambliss & Seidman, 1982). Criminal law reflects the
empowered class’s attempt to continue its political and social dominance, and the courts “tend to
produce solutions in the interest of the wealthy” (Chambliss & Seidman, 1982: 237). Thus,
conflict theory predicts that extralegal factors such as race/ethnicity, gender, and socioeconomic
status play a significant role in sentencing outcomes (Lizotte, 1978). In contrast, consensus
theory posits that laws represent broadly shared societal norms and values (Dixon, 1995). These
laws govern the sentencing process and are applied uniformly to cases, irrespective of offender
class or status (Chiricos & Waldo, 1975). As a result, consensus theory suggests that legal
factors, such as offense severity and criminal history, are the primary determinants of sentencing
outcomes (Dixon, 1995).
While conflict and consensus theories were useful for framing much of the research in the
1970s and 1980s on the effects of offender and case characteristics on sentencing outcomes,
neither perspective garnered significant empirical support (Hagan, 1989). Studies testing conflict
theory often failed to properly control for legally relevant variables (e.g., offense severity and
prior criminal record) (Hagan, 1974; Kleck, 1981). When these variables were included in the
analyses, some researchers found that race and ethnicity effects either did not exist or were
inconsequential (Kleck, 1981; Kleck, 1985; Wilbanks, 1987). Yet, reviews of subsequent studies
using newer data and more sophisticated methodology raised questions about whether legal

	

6	

	
factors fully moderated extralegal effects (Spohn, 2000; Zatz, 1987, 2000). Zatz (1987: 70)
argued that even when controlling for offense severity and criminal history, data from
determinate sentencing systems, including guidelines, “show subtle if no longer overt bias
against minority defendants.” These findings suggest that conflict and consensus theories, which
link sentencing outcomes to either extralegal or legal factors based on macro-level relationships
between law and society are too limited to explain the complex nature of judicial decisionmaking. Consequently, sentencing scholars developed theoretical perspectives that consider the
influence of both legal and extralegal factors on sentencing outcomes.
Recent Literature
Albonetti’s (1991) attribution theory of judicial decision-making states that fully rational
decision-making is only possible when the decision-maker can accurately identify all of the
potential benefits, costs, and alternatives associated with the decision. Since decision-makers
rarely have access to all of this information, they are forced to engage in a process characterized
by “bounded rationality,” where they search for a solution that will limit the uncertainty of
obtaining the desired outcome (Albonetti, 1991: 249). Albonetti (1991) theorized that judges
operate under bounded rationality because information about offenders is often incomplete and
contradictory. As a result, judges develop decision-making shortcuts or “patterned responses” to
address uncertainty (Albonetti, 1991: 17). These patterned responses are the result of judicial
attributions of offenders’ recidivism risk and rehabilitation potential. In contrast to conflict and
consensus theories, Albonetti (1991) suggested that judicial attributions are influenced by both
legally relevant variables and offender characteristics. For example, having a prior record is
likely to increases sentence severity because it triggers an attribution of a “stable and enduring
offender disposition to commit future criminal activity” (Albonetti, 1991: 257). Similarly,

	

7	

	
judges may impose harsher punishments for certain offenders based on attributions linking race
and gender stereotypes with recidivism risk and rehabilitation potential (Albonetti, 1991, 1997,
2002).
Other developments in theories of sentencing take into account the shift by several states
and the federal government from an indeterminate system of punishment to a determinate
structure, which often includes sentencing guidelines. Drawing on the work of Max Weber,
Savelsberg (1992) argued that sentencing guidelines attempt to balance two competing interests:
formal rationality and substantively rational decision-making. Formal rationality refers to the
laws, policies, and sentencing ranges outlined in the guidelines, which limit judicial discretion
and promote uniformity in punishment outcomes (Savelsberg, 1992). However, sentencing has
traditionally been a substantively rational individualized process, where punishment decisions
are guided by judicial consideration of individual offenders’ characteristics, needs, or
circumstances (Savelsberg, 1992; Ulmer & Kramer, 1996). Under sentencing guidelines, formal
rationality provides judges with the guidelines range, but uncertainty remains over selecting the
appropriate sentence within that range. Judges engage in substantively rational decision-making
based on assessments of individual offenders when selecting the actual sentence, which may
result in disparate punishment for similarly situated offenders (i.e., those with similar prior
records convicted of similar crimes) (Kramer & Ulmer, 2009).
Sentencing scholars have drawn from Albonetti’s (1991) and Savelsberg’s (1992) work
to develop the focal concerns perspective. Similar to Albonetti (1991), focal concerns contends
that sentencing outcomes are the result of a multifaceted and complex decision-making process,
where judges make attributions based on assessments of offender blameworthiness, protection of
the community, and practical constraints (Kramer & Ulmer, 2009; Steffensmeier, Ulmer, &

	

8	

	
Kramer, 1998). Blameworthiness stems from a retributive philosophy of punishment, and is
associated with offender culpability and the amount of harm caused. Common factors that
influence blameworthiness are offense severity, the offender’s role in the crime, and prior
victimization of the offender. Protection of the community draws from incapacitation and
deterrence philosophies of punishment. Consequently, court actors make attributions about
offenders’ future behavior based on the crime of conviction (e.g., violent versus property), prior
criminal history, and stereotypes that suggest certain offenders pose a greater threat to the
community. The third focal concern addresses practical constraints, which consist of ensuring
regular case flow, relationships among courtroom actors, and assessment of criminal justice
system resources, such as local jail capacity (Kramer & Ulmer, 2009; Ulmer & Johnson, 2004).
In line with Savelsberg’s (1992) conceptualization of formal and substantive rationality,
proponents of focal concerns recognize that formal sentencing criteria (e.g., sentencing
guidelines) are the primary determinants of sentencing outcomes, but judges also engage in
substantively rational decision-making (Kramer & Ulmer, 2009). Thus, assessments of the focal
concerns are mostly influenced by offense severity and prior criminal history, but attributions
linked to offender demographics and social status also play a role in punishment decisions.
Several studies have applied the focal concerns framework when researching sentencing
decision-making, and findings are consistent with the tenets of this perspective (Doerner &
Demuth, 2010; Kramer & Ulmer, 2009; Steffensmeier & Demuth, 2001, 2006; Steffensmeier,
Ulmer, & Kramer, 1998). Offense severity and criminal history are the primary factors judges
consider when assessing blameworthiness and protection of the community, but extralegal
factors also influence sentencing outcomes. In particular, these studies consistently show that
female offenders are less likely to be incarcerated and receive shorter sentences than male

	

9	

	
offenders, and offenders convicted after trial are punished more harshly than those that enter a
guilty plea. Results concerning offender age and race/ethnicity are somewhat less consistent, but
when significant effects are found they generally show younger offenders and black/Hispanic
offenders are sentenced more severely than older offenders and white offenders, respectively.
However, proponents of this perspective also note that judges are likely to vary in the ways they
assess the focal concerns, as well as the factors they use in their assessments. Though judicial
“reliance on the three concerns is said to be universal, … the meaning, emphasis, and
interpretation of them is local” (Ulmer & Johnson, 2004: 142; see also Kautt, 2002; Kramer &
Ulmer, 2009). More specifically, scholars have argued that whether and how judges consider
legal and extralegal factors in sentencing outcomes is influenced in part by the court community
in which punishment decisions occur (Anderson & Spohn, 2010; Dixon, 1995; Johnson 2006;
Kramer & Ulmer, 2009; Ulmer & Johnson, 2004).
Court Communities and Focal Concerns
Theorizing regarding variation in assessment of the focal concerns is based on work that
views courts as communities. Court communities consist of court actors who share a common
workplace and develop working relationships (Eisenstein, Flemming, & Nardulli, 1988;
Eisenstein & Jacob, 1977). The structure of status and power among participants, as well as the
characteristics and values of group members, shape these relationships. Most literature on court
communities explores courtroom actors’ working relationships and case processing norms (e.g.,
case assignment, charge bargaining) (Eisenstein, Flemming, & Nardulli, 1988; Eisenstein &
Jacob, 1977), but some studies suggest that court communities also influence sentencing (Ulmer,
1997; Ulmer & Johnson, 2004; Ulmer & Kramer, 1996). Court communities develop distinctive

	

10	

	
case processing and sentencing norms, which suggests that both sentence severity and the effects
of key predictors of sentencing may vary across courts (Kautt, 2002; Kramer & Ulmer, 2009).
Overall, this literature suggests that court communities are unique, and the ways in which
they shape processes and outcomes are not necessarily generalizable across multiple
communities. However, Eisenstein and colleagues (1988) note that court size plays an important
role in shaping court communities. In general, court actor autonomy is lowest in small courts,
moderate in medium courts, and highest in large courts. Small courts (one to two judges) are
composed of few judges, prosecutors, and defense attorneys and lack the resources needed for
trials. Consequently, court actors work closely with one another and most cases are settled
through guilty pleas. Conversely, additional resources available in large courts (at least fifteen
judges) allow for more trials, and the greater number of court personnel creates an environment
where mutual dependence between court actors is more fractured compared to small courts
(Eisenstein, Flemming, & Nardulli, 1988). For example, Jacob’s (1997) work on the largest court
in Illinois (Cook County) showed a tight connection between court personnel regarding
courtroom assignment, case assignment, and docket management. Concerning sentencing,
however, Jacob (1997: 28) noted “[i]ndividual judges emphasize their need to exercise discretion
in order to do justice. The court setting permits them to give free reign to their individual traits
and invites them to render their own reading of the law; their rulings have slight if any impact on
other courtrooms.” Case processing and sentencing practices associated with medium-sized
courts (four to fourteen judges) tend to fall somewhere between small and large courts
(Eisenstein, Flemming, & Nardulli, 1988).
Qualitative research supports the notion that differences in case processing and
sentencing practices are associated with court size. Interviews conducted by Ulmer (1997) and

	

11	

	
Ulmer and Kramer (1996) with judges, prosecutors, defense counsel, probation officers, and
court administrative personnel in three courts (one small, one medium, and one large) in
Pennsylvania revealed substantial differences in workgroup structure and culture, sentencing
goals, and guideline adherence among these courts. For example, assistant district attorneys had
very little discretion to negotiate plea agreements in the small court, but a great deal of discretion
in the medium-sized court (Ulmer, 1997). In the large court, discretion was related to experience
(i.e., as assistant district attorneys gained experience discretion increased). Concerning
sentencing goals, small court judges relied on rehabilitation, just deserts, and deterrence
punishment philosophies, while large court judges focused on rehabilitation and just deserts. In
the medium-sized court, deterrence, incapacitation, and just deserts influenced judges’
sentencing decisions (Ulmer, 1997).
However, quantitative analysis of sentencing outcomes in these courts showed mixed
results regarding extralegal effects. All three courts sentenced female offenders more leniently,
and offenders convicted after trial (as opposed to pleading guilty) more harshly (Ulmer, 1997).
Only the large and medium-sized courts sentenced black offenders more harshly than whites; no
significant differences were found in the small court (Ulmer, 1997). This research provides some
evidence of differences in legal and extralegal effects across court communities, but it is limited
to only three courts and interviews with a small sample of court actors. More recent research
testing the focal concerns and court community perspectives using multilevel modeling and
larger samples offers additional support for variation in sentencing patterns across courts.
Several studies have used multilevel models of pooled cases with offenders nested within
courts to examine variation in predictors of punishment severity across courts. In contrast to
prior work that focused on a small number of courts differing in size (Ulmer, 1997; Ulmer &

	

12	

	
Kramer, 1996), this body of research examines cases across all courts at the state or federal level.
Findings from this work offer support for integrating the focal concerns and court community
perspectives. For example, using Pennsylvania guidelines data, Britt (2000), Ulmer and Johnson
(2004), and Kramer and Ulmer (2009) employed multilevel models with random effects that
allow the slopes of legal and extralegal predictors to vary randomly across courts. Significant
variance components indicated that effects associated with offense severity, prior record, age,
race, gender, and mode of conviction differed significantly across courts (Britt, 2000; Kramer &
Ulmer, 2009; Ulmer & Johnson, 2004). Kautt’s (2002) analysis of federal guidelines data
provided similar results for U.S. District Courts. Based on these findings, scholars have
concluded “decisionmakers in different courts differentially weight the importance of these
various individual-level case characteristics at sentencing” (Kramer & Ulmer, 2009: 129; see
also Kautt, 2002; Ulmer & Johnson, 2004).
Overall, the results from multilevel analyses suggest that judicial assessments of the focal
concerns, and the legal and extralegal factors used in these assessments, are conditioned by the
court community in which sentencing occurs. However, while these studies are useful for
gaining a better understanding of sentencing disparity across courts, they provide little
information about whether and how judges consider legal and extralegal factors within these
court communities.
Sentencing Variation within Court Communities
Integration of the focal concerns and court community perspectives suggests that effects
associated with key predictors of sentencing vary across courts, but judges within these courts
may also assess legal and extralegal factors in different ways. According to Johnson and
colleagues, while “court actors use legal factors such as offense seriousness and prior record as

	

13	

	
initial punishment benchmarks [they] then make situational attributions about defendants’
character and risk based on more subtle, subjective, decision-making schema” (Johnson, Ulmer,
& Kramer, 2008: 745). Prior work on interjudge disparity, which “occurs when judges in the
same jurisdiction sentence similarly situated offenders differently” (Kim, Spohn, & Hedberg,
2015: 5), offers support for the notion that judges engage in subjective decision-making. Johnson
(2006) analyzed offenders sentenced in Pennsylvania courts using a three-level mixed model
(e.g., offenders nested within judges, nested with courts) with random effects and found that
effects associated with offense severity, criminal history, age, gender, and race/ethnicity varied
significantly across both judges and courts. Similarly, Anderson and Spohn (2010) analyzed
federal guidelines data using a model with offenders nested within judges and found significant
variation between judges in three U.S. district courts for effects associated with gender,
employment status, and pretrial status.
Wooldredge (2010) also used multilevel analysis for a sample of felony conviction cases
in a single Ohio court. Similar to Johnson (2006) and Anderson and Spohn (2010), results from
multilevel random effects models showed that with the exception of offender race, legal and
extralegal effects differed significantly across judges (Wooldredge, 2010). To further advance
understanding of these differences, Wooldredge (2010) employed individual judge logistic
regression models to examine how judges contribute to extralegal disparities found in the overall
model. The individual models revealed that significant findings from the multilevel model
provided limited information about the ways in which judges consider extralegal factors in
sentencing decisions. For example, six of the 18 judges showed significant effects for race in the
individual models (four were positive, two were negative), despite race being non-significant in
the overall model. Further, though multilevel analysis showed that males were 1.5 times more

	

14	

	
likely to be incarcerated than females, only five of the 18 judges showed significant effects for
gender, and odds ratios for these judges ranged from 2.75 to 16.78. Wooldredge (2010)
concluded that pooling cases results in masked effects, which are likely to arise when judges
yield null or weak effects, or effects that are significant but in opposite directions, for a given
variable. Collectively, this body of literature offers support for the idea that differences among
judges are the result of subjective decision-making, and provides some evidence of legal and
extralegal effects varying not only across court communities, but also within court communities.
Summary
Extant research consistently shows that judicial assessments of the focal concerns are
primarily driven by offense severity and criminal history, but extralegal effects associated with
offender age, race/ethnicity, and gender remain. More recently, scholars have argued that effects
associated with key predictors of sentencing vary across court communities. This is because
court communities develop distinctive case processing and sentencing norms that influence
judicial consideration of legal and extralegal factors in assessing offender blameworthiness and
community threat. Research offers support for integrating the focal concerns and court
community perspectives, finding that effects associated with offense and offender characteristics
vary significantly across courts.
Yet, other work shows that legal and extralegal effects also vary significantly across
judges within courts. According to Ulmer (2012: 17), “[b]etween-actor variation is certainly
congruent with and implied by the focal concerns ….” Though variation across judges would
seem to indicate that the court community is less influential than theory suggests, “this kind of
variation between judges and prosecutors, for example, does not contradict the notion that court
community contexts shape actors’ decisions, and that between-actor variation occurs relative to

	

15	

	
court community norms” (Ulmer, 2012: 17). Thus, variation between judges may suggest that
court communities differ in the presence of, and adherence to, shared case processing and
sentencing norms. With some research suggesting court actor autonomy differs across large,
medium, and small courts (Eisenstein, Flemming, & Nardulli, 1988), this variation may be linked
to the size of the court.
Though prior work has assessed variation across judges and courts using multilevel
modeling, this methodological approach does not allow for a detailed examination of interjudge
disparity and sentencing patterns of judges in the same court community. Only one study
(Wooldredge, 2010) analyzed individual judges’ sentencing decisions in this way, but this
research was limited to a single court. To date, no research has applied the focal concerns and
court community perspectives to the punishment decisions of individual judges in more than one
court to examine whether and how judges in the same court communities consider legal and
extralegal factors. The current study addresses this gap in the literature.
Purpose of the Research
The current research seeks to build on the prior literature to further advance knowledge of
interjudge disparity and judicial sentencing patterns within court communities. Prior work
suggests judges rely primarily on offense severity and criminal history when assessing offender
blameworthiness and community threat, but they also engage in subjective decision-making that
reflects their sentencing philosophy and attributions associated with offender extralegal
characteristics (Johnson et al., 2008). To the extent that subjective decision-making varies across
judges, the legal and extralegal factors associated with sentencing outcomes should vary as well.
Yet, the court community perspective suggests that interjudge disparity may be
conditioned by the context in which sentencing occurs. Court communities develop distinctive

	

16	

	
case processing strategies and sentencing norms that not only affect sentencing outcomes, but
also the predictors that influence these outcomes (Kautt, 2002; Kramer & Ulmer, 2009). Despite
these assertions, extant work does not offer a comprehensive examination of judges’ sentencing
patterns within court communities.
Though prior work has examined court communities and variation in sentencing
predictors across courts and judges, these studies are limited in several ways. Qualitative studies
provide in-depth information about court communities, but are limited to a small number of
courts (typically fewer than three). Further, this work has generally focused on court structure,
workgroup relationships, and case processing strategies, with less attention paid to predictors
associated with sentencing outcomes (Eisenstein, Flemming, & Nardulli, 1988; Jacobs, 1997).
Ulmer (1997) and Ulmer and Kramer’s (1996) work is an exception, but their analysis of legal
and extralegal effects on punishment decisions used pooled case models for each of the three
courts. Thus, findings concerning sentencing disparity reflected all judges’ decisions within each
court, as opposed to each individual judges’ decisions within these courts.
Multilevel analysis of offenders nested within courts allows for large-scale comparisons
across multiple jurisdictions, and findings from these studies show that effects associated with
offender and case characteristics vary significantly across courts (Britt, 2000; Kautt, 2002;
Kramer & Ulmer, 2009; Ulmer & Johnson, 2004). Researchers conclude that these findings offer
support for the idea that court communities influence sentencing decisions, but these studies
provide no information about judges’ sentencing patterns within these courts. Additional
research using multilevel models provides some information about judges’ sentencing patterns
within courts, and this work suggests effects associated with legal and extralegal factors vary
between judges (Anderson & Spohn, 2010; Johnson, 2006). Though this research raises some

	

17	

	
questions about the extent to which court communities influence sentencing decisions,
Wooldredge (2010) demonstrated the limitations of using multilevel modeling to assess
interjudge disparity. Comparing findings from a two-level model (offenders nested within
judges) with individual judge regression models showed that the multilevel analysis obscured
individual judges’ contributions to sentencing disparities. Thus, Wooldredge’s (2010) work
suggests that individual judge models may be more appropriate than multilevel analysis for
assessing judges’ sentencing patterns within courts. However, Wooldredge’s (2010) research was
limited to one court with 18 judges, and focused on differences in extralegal effects on judges’
decisions to incarcerate offenders. In addition, some of the variation found in his analysis may be
due to Ohio’s relatively lax sentencing guidelines, which explicitly allow judges to consider
different sentencing goals (Wooldredge, 2010).
The current research seeks to address these limitations through a more comprehensive
examination of interjudge disparity and judicial sentencing decisions within court communities.
First, multilevel analysis with offenders nested within judges nested within courts for a sample of
large, medium, and small Pennsylvania courts is used to replicate prior work that shows effects
associated with legal and extralegal factors vary significantly across judges. Next, findings from
this model will be compared with results from individual judge regression models to examine
judges’ contributions to legal and extralegal effects found in the multilevel analysis. Finally,
individual judges will be grouped by court to assess whether and how judges in the same court
communities consider offender and case characteristics.
Employing both multilevel models and individual regression models is important for
theoretical and methodological reasons. Focal concerns perspective states that differences in
judges’ subjective decision-making influences the ways in which judges consider offender and

	

18	

	
case characteristics when assessing offender blameworthiness and community threat. In addition,
sentencing theories are not only concerned with whether judges vary in effects associated with
legal and extralegal factors, but also whether these factors significantly influence sentencing
decisions. However, extant research generally relies on fixed effects estimates from models with
cases pooled at the jurisdiction or state level (e.g., Steffensmeier & Demuth, 2006; Tillyer,
Hartley, & Ward, 2015; cf. Wooldredge, 2010), which limits exploration of judges’ variation in
their subjective assessments of the focal concerns. According to Wooldredge, when relying on
overall estimates from pooled case models, “there is a risk of conveying the impression that the
problem is pervasive across all judges” (Wooldredge, 2010: 540). Studies that examine
individual judges’ estimates from the random effects portion of multilevel analyses (e.g.,
Anderson & Spohn, 2010; Johnson, 2006) provide more information about each judges’ value
for a given variable, but limitations remain.
The random group effects in multilevel models are obtained by combining information
about the specific group effect and the overall model coefficient (Gelman & Hill, 2016; Hox,
2010; Snijders & Bosker, 2012). Less reliable group estimates are “shrunk” closer to the overall
mean for the dataset, resulting in biased, but also more precise, estimates (Gelman & Hill, 2016;
Hox, 2010). However, scholars conducting school achievement research have urged caution in
relying solely on multilevel modeling when the purpose of the research is to evaluate the
performance of individual teachers or schools (Fitz-Gibbon, 1991; Tate, 2004; Teddlie &
Reynolds, 2000). Since multilevel models pool the group estimates around the average, schools
with very good results may be pulled down towards the mean, while schools that are
underperforming will be pulled up (de Leeuw & Kreft, 1995; Fitz-Gibbon, 1996). Examination
of multilevel models alone to evaluate judges’ sentencing decisions raises similar concerns,

	

19	

	
particularly in light of contemporary sentencing theories. The pooling of judge estimates for
offender and case characteristics is contrary to the idea that judges vary in their subjective
assessments of factors associated with the focal concerns of sentencing. Thus, estimates obtained
from individual judge regression models may provide information that allows for a more
comprehensive evaluation of each judges’ sentencing patterns.
As noted above, sentencing theories are also concerned with whether interjudge variation
is meaningful, yet multilevel models are not well suited for providing this kind of information.
Gelman and Hill (2016) note that identifying statistically significant results for random effects is
not the primary purpose of multilevel analysis; rather, multilevel models are designed to obtain
the most precise estimate for each group, while taking into account uncertainty. This analytic
strategy may not be appropriate when predictors are hypothesized to produce effects for some
groups, but not others (Gelman & Hill, 2016). The latter is particularly salient since sentencing
theories suggest judges rarely have enough information to make fully informed punishment
decisions (Albonetti, 1991). Consequently, the decision-making process “allows for the subtle
influences of experiences, prejudices, and stereotypes, as well as idiosyncratic interpretations by
different judges” (Johnson, 2006: 267), which is likely to result in factors such as race and
gender affecting sentencing decisions for some judges, but not others.
Further, though multilevel analysis can be used to identify differences in legal and
extralegal effects across court communities, it provides little information about how individual
judges within court communities consider these factors. The court communities perspective
suggests that judges’ subjective decision-making may be conditioned by local case processing
and sentencing norms, and the presence of, and adherence to these norms may be associated with
differences in court actor autonomy based on court size. As such, exploring individual judges’

	

20	

	
sentencing patterns within large, medium, and small court communities is needed to assess the
significance of court communities in sentencing decisions.
This research has a number of methodological and theoretical implications. Concerning
interjudge disparity, though multilevel analysis of sentencing data has identified variation in the
factors that influence punishment decisions across judges, the pooled estimates may be more
appropriate for drawing general conclusions about differences in judicial decision-making. The
individual judge analyses, on the other hand, allow for a more detailed examination of this
variation, and provide insight about whether these factors matter in sentencing outcomes.
Incorporating both analytic strategies is needed to assess whether individual analyses offer a
better test of the focal concerns perspective, which suggests judicial sentencing decisions vary,
but has not been fully explored using extant methodologies. The methodological approach used
in the current work is also a necessary first step in identifying the extent of the variation and
whether it is meaningful, which can then be followed by further developing theories to explain
why differences among judges exist.
Concerning the court community perspective, greater consistency in effects associated
with the key predictors of sentencing among judges in the same courts versus individual judge
model findings would suggest the court community influences punishment decisions. However,
greater interjudge disparity in large courts versus small courts, for example, may indicate that the
influence of court communities on sentencing is conditioned by court size. Substantial judicial
variation within all courts could suggest that court communities influence some court processes
(e.g., docket management, case assignment), but have less of an impact in the sentencing phase.
More generally, this research will contribute significantly to an area of sentencing
research and theory that has received little empirical attention. According to Wooldredge (2010;

	

21	

	
564), “we need more comprehensive quantitative descriptions of how judges differ in their
sentencing decisions. This pursuit is necessary for assessing and informing theories of sentencing
disparities based on extralegal factors.” Similarly, in assessing the state of the research on
between-judge variation, Ulmer (2012: 26-27) notes, “[q]uite simply, the field needs more of
such research. It is likely that a substantial portion of the interesting variation in sentence
severity … and the effects of legally relevant, organizational, and extralegal factors on
sentencing exists at the level of individual judges.” Much of the extant research on betweenjudge variation relies on aggregate analysis of cases across jurisdictions or states, which is useful
for assessing overall patterns and developing general theoretical perspectives about interjudge
variation. Yet, analysis of individual judges and their sentencing patterns within court
communities is necessary to further develop these perspectives.
Finally, this research will add to a growing body of literature that explores whether
sentencing guidelines systems have achieved their intended goals. Reformers recognized that the
sentencing of criminal offenders is a fundamental mechanism of formal social control in society,
and disparity in punishment for offenders with comparable criminal records convicted of the
same crime raises questions about the legitimacy of legal institutions (Reitz, 1998; Tonry, 1996).
Examining individual judges’ sentencing patterns will further our understanding of whether
sentencing guidelines promote uniformity and consistency in criminal sanctions.
Research Questions and Hypotheses
The following research questions and hypotheses guide the current inquiry:
1) How do legal and extralegal factors affect the decision to incarcerate offenders and the
length of the sentence imposed, and do these effects vary significantly across judges?

	

22	

	
2) How do individual judges contribute to the legal and extralegal effects found in the
multilevel analysis?
3) To what extent do judges in the same courthouses exhibit similar sentencing patterns,
in terms of significant legal and extralegal effects?
The first research question will be explored using a three-level multilevel model with offenders
nested within judges nested within courts. The following hypotheses are derived from the prior
research (e.g., Johnson, 2006; Kramer & Ulmer, 2009):
H1: Offense severity and criminal history will be positively associated with sentence
severity.
H2: Younger offenders will be sentenced more harshly than older offenders.
H3: Female offenders will be sentenced more leniently than male offenders.
H4: Black offenders will be punished more harshly than white offenders.
H5: Offenders who plead guilty will receive less severe sentences than those convicted
after trial.
H6: Legal and extralegal effects will vary significantly across judges.1
The second research question will be answered using individual judge regression models. The
data used in the analysis were obtained from a guidelines state that uses offense severity and
prior record to determine the sentencing range. Therefore, these variables should be positively
associated with sentence severity for most judges. Further, based on theories that suggest judges
engage in subjective decision-making concerning offender characteristics and prior work
showing differences in extralegal effects across individual judges (Wooldredge, 2010),

																																																								
1

Though the analysis uses three-level models, this work focuses on variation at the judge level. The court level is
included to control for differences across courts.

	

23	

	
significant results for extralegal effects should be less consistent than findings associated with
legally relevant variables.2
The third research question will be answered by examining individual judges’ sentencing
patterns within their court communities. Though quantitative research using multilevel analysis
shows significant variation across courts (e.g., Britt, 2000; Ulmer & Johnson, 2004), these
studies provide little information about judicial decision-making within courts. Additional
research suggests interjudge disparity exists within courts (e.g., Anderson & Spohn, 2010;
Johnson, 2006), but these studies do not provide a comprehensive assessment of effects
associated with individual judges’ punishment decisions. Qualitative research, however, offers
some evidence of the existence of court communities and their impact on case processing and
sentencing (Eisenstein, Flemming, & Nardulli, 1988; Ulmer, 1997; Ulmer & Kramer, 1996), but
differences in court actor autonomy in small, medium, and large courts may condition this
relationship. Thus, lower autonomy among small court judges may result in judges exhibiting
similar sentencing patterns. Variation in sentencing patterns may be greater in medium sized
courts, and the largest variation in judicial sentencing patterns is likely in large courts as court
actor autonomy is highest.3

																																																								
2

Since sentencing theory and research does not yet provide specific hypotheses about why individual judges vary in
effects associated with legal and extralegal factors, more detailed predictions are beyond the scope of this work.
3
Similar to research question two, prior literature does not offer specific expectations about how court communities
influence judicial consideration of legal and extralegal factors.

	

24	

	
CHAPTER 3: DATA AND METHODOLOGY
Data
To examine the research questions and hypotheses outlined above, seven years of data
(2004-2010) were obtained from the Pennsylvania Commission on Sentencing (PCS).
Pennsylvania’s Courts of Common Pleas (the state’s county-level trial courts) are required by
law to submit all felony and misdemeanor convictions under the Pennsylvania Sentencing
Guidelines to the PCS on a yearly basis (PCS, n.d.). The PCS compiles the data into annual
datasets and makes them available to the public for a fee. The data provide detailed information
about offense type and severity, offender criminal history, and offender characteristics such as
age, race, and gender. The data also include information about mode of conviction, the sentence
imposed, the court in which the offender was sentenced, and the name of the judge who imposed
the sentence (PCS, n.d.).
Data Reduction and Missing Data
In line with prior research using the PCS data, the data were restricted to include only the
most serious offense per judicial transaction (Britt, 2000, 2009; Johnson, 2003, 2006, 2014;
Kramer & Ulmer, 2009; Ulmer & Johnson, 2004). Limiting the data to the most serious offense
per judicial transaction is also consistent with the way in which the PCS analyzes the data for its
annual reports (PCS, n.d.). Given the focus of this research is on individual judges, the data were
further limited to black and white offenders because very few judges outside of the larger
jurisdictions sentence Hispanic offenders frequently enough for inclusion in the analysis.
Consistent with prior work using these data, missing values for offense gravity score,
prior record score, guideline edition, and offender age, race, and gender were removed (Britt,

	

25	

	
2000, 2009; Johnson, 2003, 2006, 2014; Kramer & Ulmer, 2009; Ulmer & Johnson, 2004).4
Though only a modest amount of data are missing for these variables, 18 percent of cases were
missing information on mode of conviction (i.e., whether the offender entered a guilty plea or
was convicted after trial). Because of the information loss associated with removing these cases,
extant work has employed a dummy variable adjustment (Johnson, 2003, 2006, 2014; Kramer &
Ulmer, 2009). However, research suggests that this approach can produce biased estimates of
regression coefficients (Allison, 2001). To address missing values for mode of conviction, the
current work utilized the ‘Amelia’ package in R (Honaker, King, & Blackwell, 2011) to
implement an iterative expectation-maximization algorithm with bootstrapping to substitute
plausible values for this variable. This produced a single dataset with maximum likelihood
imputed values for the mode of conviction variable. These criteria resulted in a sample of
532,440 cases, sentenced by 571 judges in 60 courts.5
Sample Selection
The current work seeks to examine individual judges’ contributions to legal and
extralegal disparity, and assess sentencing patterns among judges in the same courthouses. The
sample used for the current research was initially selected based on the number of cases judges
sentenced between 2004 and 2010. The ‘pwr’ package in R (Champely et al., 2015) was used to
determine that a minimum of 230 cases per judge were needed to detect relatively small effect
sizes (f2 = 0.09) in multiple regression models.6 Of the 571 judges, 287 judges handled 230 or
																																																								
4

Missing values were minimal for offense gravity, prior record score, guidelines edition, and offender age and
gender (for each variable, fewer than 0.002% of all cases were missing). Three percent of cases had missing values
for offender race/ethnicity.
5
Though Pennsylvania has 67 counties, the state has only 60 county-level trial courts. The difference is the result of
seven of these courts handling cases from two counties.
6
R’s ‘pwr’ package uses Cohen’s (1988) suggestions for effect size. For multiple regression, Cohen (1988) used f2
values of 0.02, 0.15, and 0.35 to represent small, medium, and large effects, respectively.

	

26	

	
more cases, though the vast majority of these judges handled many more cases (mean = 1,755
cases, median = 1,396).7 Notably, these 287 judges imposed sentences in 503,602 of the 532,440
cases (95 percent) across the 60 courts. Removing 284 judges (and retaining 95 percent of the
cases) is consistent with prior work that suggests a substantial number of judges serve as senior,
retired, or traveling judges who handle a very small number of cases each year (Johnson, 2006,
2014; Levin, 1977).
The second step in selecting the sample included disproportionate stratified sampling to
examine sentencing patterns among judges in the same courthouses. Based on extant research
that suggests case processing and sentencing practices may be influenced by court size
(Eisenstein, Flemming, & Nardulli, 1988; Kramer & Ulmer, 2009; Ulmer, 1997), a sample of
small, medium, and large courts were selected for the analysis. Prior research using the PCS data
identifies small courts as having seven or fewer authorized judgeships, medium courts having
between eight and 15, and large courts having 16 or more (Johnson, 2006; Kramer & Ulmer,
2009; Ulmer, 1997; Ulmer & Johnson, 2004). This criterion results in splitting the 60
Pennsylvania courts into 44 small, 12 medium, and four large courts. Given the smaller number
of large and medium-sized courts, a disproportionate stratified sample that includes all four of
the large courts, six medium courts (50 percent), and 11 small courts (25 percent) was selected
for the analysis. To determine which medium and small courts would be included, R’s sample
function (R Core Team, 2016) was used to randomly generate numbers between the minimum
and maximum number of small and medium courts. The final sample includes 312,555 cases
sentenced by 161 judges in 21 Pennsylvania county courts. Table 1 shows the total number of
cases handled in each court, and the number of cases from each court included in the analysis
																																																								
7

	

These 287 judges handled at least 230 cases for both the incarceration decision and the sentence length decision.

27	

	
after removing judges with fewer than 230 cases.8 Table 1 also includes the percent of cases
handled in these courts after removing judges, and the number of judges in each court.9
Table 1. Cases and Number of Judges by Court
Cases in the Sample (N=312,555)
Court
Large Court 1
Large Court 2
Large Court 3
Large Court 4
Medium Court 1
Medium Court 2
Medium Court 3
Medium Court 4
Medium Court 5
Medium Court 6
Small Court 1
Small Court 2
Small Court 3
Small Court 4
Small Court 5
Small Court 6
Small Court 7
Small Court 8
Small Court 9
Small Court 10
Small Court 11

Total Cases

Cases in Analysis

Percent

# of Judges

64,914
43,024
36,868
36,196
10,570
32,746
9,113
16,599
13,591
31,218
10,756
7,393
3,182
3,294
4,911
5,956
6,934
7,230
3,519
8,151
5,040

56,247
30,278
34,813
33,581
9,830
29,503
8,158
10,940
11,078
29,108
10,073
5,837
2,497
3,188
3,577
5,694
6,521
6,930
3,471
6,329
4,902

86.65%
70.37%
94.43%
92.78%
93.00%
90.10%
89.52%
65.91%
81.51%
93.24%
93.65%
78.95%
78.47%
96.78%
72.84%
95.60%
94.04%
95.85%
98.64%
77.65%
97.26%

19
29
11
16
7
10
2
6
9
10
5
4
2
2
2
6
5
5
2
4
4

Dependent and Independent Variables
Extant sentencing research suggests that punishment decisions occur in two stages
(Wheeler, Weisburd, & Bode, 1982). The first decision is whether to incarcerate an offender, and
the second involves determining the length of confinement for those who receive a custodial

																																																								
8

Total cases are limited to the most serious offense per judicial transaction and cases resulting in conviction.
In some courts, the number of judges included in the analysis is less than what is outlined in the criteria concerning
court size. For example, though large courts have greater than 16 judges, Large Court 3 has only 11 judges in the
analysis. For this court and Medium Courts 1, 3, and 4, the smaller number of judges is due to removing judges with
fewer than 230 cases.
9

	

28	

	
sentence. Consistent with prior work, two dependent variables are used to model these decisions
separately (e.g., Britt, 2000; Dixon, 1995; Hauser & Peck, 2017; Johnson, 2006, 2014; Ulmer &
Johnson, 2004; Wolfe, Pyrooz, & Spohn, 2011). For the incarceration decision, a binary variable
indicates whether the offender was incarcerated (1 = incarceration, 0 = no incarceration). The
length of the sentence imposed is a continuous variable coded to represent the minimum number
of months the offender is sentenced to serve in jail or state prison. Given the positive skew of the
sentence length data, this variable was recoded to equal the natural logarithm of the minimum
number of months of incarceration (Britt, 2009; Bushway & Piehl, 2001; Johnson, 2006).	
Independent variables include offense severity, which is based on the offense gravity
score developed by the PCS and ranges from 1 (least serious) to 14 (most serious). Prior record
is a measure of the PCS’ prior record score, which is an eight-category scale of prior convictions
with points given for prior misdemeanors and felonies based on offense severity. Due to the
small number of cases in the two highest categories (repeat felony and repeat violent offenders),
these were combined into a single category in all analyses. Offender demographic variables
include a continuous variable for the offender’s age when the sentence was imposed, and binary
variables for female and black (reference = white). In addition, mode of conviction is captured
with a binary trial variable that combines negotiated and non-negotiated pleas into a plea
category (reference), and bench and jury trials into a trial category.
Control Variables
Consistent with prior research using the PCS data (e.g., Johnson, 2014; Kramer & Ulmer,
2009), several control variables are included in the analysis. Mandatory minimum represents
whether a mandatory sentencing provision was applied (reference = no), and changes to the
sentencing guidelines are captured through a guidelines edition dummy variable (reference = 6th

	

29	

	
Edition). In line with extant work using the PCS data, offense type is a dummy variable that is
coded to reflect whether the offender was convicted of a violent, drug, or property offense, with
other offenses that do not fall into these crime types (i.e., bad checks, forgery, DUI) serving as
the reference (Johnson, 2006, 2014; Ulmer & Johnson, 2004). Further, two presumptive sentence
variables are included to capture presumptive guideline sentence recommendations (Engen &
Gainey, 2000). For the incarceration decision models, a binary variable is used to indicate
whether the guidelines prescribe incarceration (reference = no). In the sentence length models,
this variable represents the minimum number of months of incarceration recommended in the
guidelines (Ulmer, 2000; Ulmer & Johnson, 2004). Finally, year is a dummy variable that
controls for annual changes within the courts from 2004 to 2010 (reference = 2004).
Analytic Strategy
Due to the nested structure of sentencing data (i.e., offenders nested within judges nested
within courts), multilevel analysis of predictors associated with punishment outcomes has
become increasing popular (Britt, 2000; Kautt, 2002; Johnson, 2006, 2014; Kramer & Ulmer,
2009; Ulmer & Johnson, 2004). With nested data, there is an assumption that contextual factors
influence individuals, and individuals from the same context likely share common influences
(Steenbergen & Jones, 2002). As a result, traditional regression models are not well suited for
analyzing these data. Because of the influence of contextual factors, the observations are not
truly independent; rather, they are clustered and duplicate one another to some extent, which
violates the regression assumption that errors are independent. When this assumption is violated,
incorrect standard errors and Type I errors are likely (Steenbergen & Jones, 2002). By
incorporating additional disturbance terms and their associated assumptions, multilevel models
produce appropriate error terms that control for potential dependency due to nesting effects

	

30	

	
(Snijders & Bosker, 2012). However, while multilevel models offer an improvement over
classical regression analyses from a statistical standpoint (Fitz-Gibbon, 1996; Gelman & Hill,
2016; Teddlie & Reynolds, 2000), practical concerns remain, particularly when assessing
random group effects.
The random group effects in multilevel models are unobserved variables, as opposed to
statistical parameters (Snijders & Bosker, 2012), and are obtained using empirical Bayes
estimation. Empirical Bayes estimates are weighted averages of the specific group effect and the
overall model coefficient (Gelman & Hill, 2016; Hox, 2010; Snijders & Bosker, 2012). As noted
earlier, multilevel estimates are pooled (Gelman & Hill, 2016) or “shrunk” towards the mean for
the entire data set (Hox, 2010: 29). Shrinkage is determined by the reliability of the estimate, and
reliability is based on the group sample size and the difference between the group estimate and
the overall model estimate (Hox, 2010). Groups with small sample sizes and estimates far from
the overall estimate shrink more to the overall average, while groups with large sample sizes and
estimates near the overall estimate are close to the overall mean (Hox, 2010). For intermediate
groups, the multilevel estimates fall between these two (Gelman & Hill, 2016). As a result, more
variation is expected when looking at unbiased10 results from separate classical regression
analyses, compared to the more precise, but biased, multilevel values (Hox, 2010).
The current work uses multilevel modeling to replicate the findings of recent work that
shows legal and extralegal effects vary significantly across judges, and also employs individual
judge logistic and ordinary least squares (OLS) regression models to assess how judges
contribute to the results from the multilevel analysis. While extant sentencing research has used
either multilevel or single-level regression models for pooled cases (cf. Wooldredge, 2010),
																																																								
10
	Classical regression estimates (e.g., OLS) are unbiased, but error exists due to sampling and measurement
(Willms, 1992).	
	

31	

	
Gelman and Hill (2016) suggest an iterative approach to statistical modeling. With multilevel
data structures, this can include separate models to obtain unadjusted values, and multilevel
analysis to examine pooling of random group effects (Gelman & Hill, 2016). Findings from these
models can also be used to identify groups with high or low estimates, and to get a general sense
of any patterns in the data (Kreft & Yoon, 1994; Snijders & Bosker, 2012).
The analytic approach begins with multilevel modeling. As the incarceration decision
dependent variable is a binary outcome representing incarceration/no incarceration, hierarchical
generalized linear models (HGLM) were selected. For the sentence length outcome, which
represents the minimum months of incarceration (logged), linear mixed models (LMM) were
employed. Both analyses were conducted using R’s ‘lme4’ package (Bates et al., 2016). The first
step in the analysis includes unconditional models (one-way ANOVA with random effects) to
assess whether variation in the decision to incarcerate offenders and the length of the sentence
imposed exists at the judge- and court-levels. In addition, calculating the Akaike information
criterion (AIC) for the unconditional models provides a baseline that can be used to assess model
fit in subsequent analyses. The second step in the multilevel analysis includes random
coefficients ANCOVA models with independent and control variables to examine the fixed and
judge-level random effects of offender and case characteristics on the incarceration decision and
the length of the sentence imposed.
To assess judges’ contributions to findings from the multilevel analysis, separate logistic
and OLS regression models are employed for each of the 161 judges using R’s ‘stat’ package (R
Core Team, 2016). Since this portion of the analysis focuses on individual judges’ caseloads and
sentencing patterns, multilevel models are not needed to account for between-judge variation.
Finally, individual judges are grouped by court and results from the individual judge regression

	

32	

	
models are used to assess similarities in statistically significant legal and extralegal effects
among judges in the same courts.

	

33	

	
CHAPTER 4: RESULTS
The following section contains the findings for the current inquiry. First, descriptive
statistics for the sample are discussed. As outlined in Chapter 2, the majority of studies
examining interjudge disparity use multilevel models. Thus, results from unconditional and
random coefficients models are presented to assess effects associated with key predictors of the
decision to incarcerate offenders and the length of sentence imposed, and whether these effects
vary significantly across judges. However, since prior research shows this approach offers
limited information (Wooldredge, 2010), findings from individual judge regression models are
provided to gain a better understanding of whether and how judges use legal and extralegal
factors in sentencing decisions. Extant work also suggests that judicial decision-making may be
influenced in part by the court community in which sentencing occurs (e.g., Ulmer, 1997).
Consequently, the final section in this chapter provides results concerning judges’ sentencing
patterns within the same courts.
Descriptive Statistics
Table 2 provides descriptive statistics for the sample of offenders used in the analysis.
Approximately half of the offenders (49 percent) in the sample were sentenced to either jail or
prison, while the others (51 percent) received a non-custodial sentence. For those offenders who
were incarcerated, the mean sentence length was just under 10 months. The mean offense gravity
score is 3.57 (scale ranging from one to 14), and the mean prior record score is 1.45 (scale
ranging from one to six). The average offender is 33 years old at sentencing, and males make up
a larger portion of the sample than females (80 percent and 20 percent, respectively). Concerning

	

34	

	
Table 2. Descriptive Statistics for Overall Sample (N = 312,555)
Dependent Variable
Incarcerated
Not Incarcerated
Sentence Length
Independent Variables
Offense Severity
Prior Record
Age at Sentence
Male (reference)
Female
White (reference)
Black
Plea (reference)
Trial
Presumptive Incarceration
Presumptive Sentence
Mandatory Minimum Applied
Offense Type
Violent
Drug
Property
Other (reference)
Guidelines Edition
5th Edition
6th Edition (reference)
6th Edition, Revised
Year
2004 (reference)
2005
2006
2007
2008
2009
2010

M (SD) / N (%)
154,001 (49%)
158,554 (51%)
9.94 (19.13)
3.57 (2.42)
1.45 (1.90)
33.17 (11.07)
250,766 (80%)
61,789 (20%)
205,590 (66%)
106,965 (34%)
295,760 (95%)
16,795 (5%)
69,424 (22%)
11.30 (23.08)
73,122 (23%)
41,816 (13%)
74,718 (24%)
62,810 (20%)
133,211 (43%)
94,840 (30%)
165,313 (53%)
52,402 (17%)
36,926 (12%)
42,483 (14%)
44,817 (14%)
47,158 (15%)
49,155 (16%)
49,752 (16%)
42,264 (14%)

offender race, 66 percent of offenders are white, while the remaining 34 percent are black.
Further, the vast majority (95 percent) of the offenders in the sample entered guilty pleas; only
five percent were convicted after trial. Twenty-two percent of offenders committed crimes that
prescribed a custodial sentence, and the presumptive minimum sentence for incarcerated

	

35	

	
offenders is approximately 11 months. Twenty-three percent were convicted of crimes that
included mandatory minimum penalties. Most offenders (43 percent) were convicted of crimes
other than drug (24 percent), property (20 percent), and violent (13 percent) offenses. Roughly
50 percent of offenders were sentenced under the sixth edition of the guidelines, which were in
effect from June of 2005 until December of 2008, while 30 percent and 17 percent were
sentenced under the earlier fifth edition and later sixth edition revised guidelines, respectively.
Finally, the percent of offenders sentenced each year is relatively consistent, ranging from 12 to
16.
Multilevel Analysis
The analysis addressing the first research question used multilevel modeling to examine
the effects of legal and extralegal factors on the decision to incarcerate offenders and the length
of the sentence imposed, and whether these effects varied significantly across judges (while
controlling for court). Table 3 displays the results from the three-level unconditional models. The
significant variance component for the incarceration model in Table 3 suggests that a portion of
the variation in sentence severity is attributable to differences between judges and courts.11
Results from the sentence length model show that five percent of variation in the sentence length
is attributable to differences between judges, while seven percent is accounted for at the court
level. The incarceration and sentence length unconditional models also provide baseline AICs of
415,129 and 629,062, respectively.

																																																								
11

The binary outcome for the incarceration model does not include an individual-level variance component.
However, Snijders and Bosker (1999) note that if the level 1 model is viewed as a latent variable, the random effects
at level 1 can be assumed to have a standard logistic distribution with a mean of 0 and a variance of p2/3. If this
assumption is valid, three percent and four percent of the variance in the likelihood of incarceration is attributable to
differences between judges and courts, respectively.

	

36	

	
Table 3. Unconditional Models of Incarceration and Sentence Length
Incarceration
Fixed effects
Intercept
Random effects
Level 1
Level 2
Level 3

b
0.05

SE
0.09

Variance

SD

0.11
0.15

0.33***
0.39***

Sentence Length
Fixed effects
Intercept

b
0.91

SE
0.12***

Random effects
Level 1
Level 2
Level 3

Variance
3.46
0.19
0.27

SD
1.86***
.44***
.52***

Intraclass Correlation
Level 2
Level 3
415,129

AIC

AIC

0.05
0.07
629,062

***p < .001; **p < .01; *p < .05
Table 4 provides results for the incarceration and sentence length models with fixed and
random effects associated with offender and case characteristics on sentence severity.12 Findings
from the fixed effects portion of the incarceration model are consistent with prior research using
the PCS data (e.g., Johnson, 2006, 2014; Kramer & Ulmer, 2009). The AIC for the fixed effects
model is lower than the AIC in the unconditional model (324,018 versus 415,129), which
indicates including the offender-level variables produces a better model fit. Results for the
legally relevant variables show a one-unit increase in offense severity and prior record increases
the odds of incarceration by 1.41 and 1.43, respectively. Further, offenders are more likely to be
incarcerated when the guidelines prescribe a jail or prison sentence, and being convicted of a
crime that requires application of a mandatory minimum sentence greatly increases the odds of
																																																								
12

Standard errors in the incarceration model were calculated using the delta method. The delta method is used when
reporting transformed regression parameters (Casella & Berger, 1991). Since the current work reports odds ratios
transformed from the incarceration model coefficients, the delta method is appropriate.	

	

37	

	
Table 4. Random Coefficients Models of Incarceration and Sentence Length
Fixed Effects
Independent Variables
Intercept
Offense Severity
Prior Record
Age
Female
Blacka
Trialb
Presumptive Sentence
Mandatory Minimum
Offense Typec
Violent
Drug
Property
Guidelines Editiond
5th Edition
6th Edition, Revised
Yeare
2005
2006
2007
2008
2009
2010
AIC
N

Incarceration
Est.
SE
Odds
-1.85
0.13
--***
0.34
0.00
1.41***
0.36
0.00
1.43***
-0.01
0.00
0.99***
-0.37
0.01
0.69***
0.31
0.01
1.36***
0.31
0.03
1.36***
0.31
0.02
1.37***
2.41
0.17
11.19***

Sentence Length
b
SE
-1.24
0.05***
0.44
0.00***
0.21
0.00***
0.00
0.00***
-0.11
0.01***
-0.02
0.01**
0.22
0.01***
0.04
0.00***
-0.85
0.01***

0.49
-0.15
0.25

0.03
0.01
0.02

1.63***
0.86***
1.28***

0.01
0.20
0.30

0.01
0.01***
0.01***

0.21
-0.01

0.02
0.02

1.23***
0.99

-0.01
0.04

0.01
0.01**

-0.09
-0.09
-0.10
-0.19
-0.22
-0.21

0.02
0.02
0.02
0.02
0.02
0.02

0.91***
0.92***
0.91***
0.83***
0.80***
0.81***

0.16
0.11
0.07
0.09
0.06
0.01

0.01***
0.01***
0.02***
0.02***
0.02***
0.02

324,018
312,555

464,775
154,001

***p < .001; **p < .01; *p < .05; aReference category is white; bReference category is
plea; cReference category for all crime types is other crimes; dReference category for all
guidelines editions is the 6th Edition; eReference category for all years is year 2004.

	

38	

	
Table 4 (cont’d).
Random
Effects
Offense
Severity
Prior Record
Age
Female
Black
Trial
AIC

Incarceration

Sentence Length

Variance

SD

X2

Variance

SD

X2

0.03
0.02
0.00
0.02
0.18
0.22

0.16
0.13
0.01
0.13
0.43
0.47

3392.60***
1719.10***
814.90***
47.32***
1210.90***
211.96***

0.01
0.00
0.00
0.01
0.02
0.03

0.09
0.05
0.00
0.12
0.13
0.19

3384.60***
787.73***
128.56***
91.65***
215.40***
118.62***

320,629 - 323,975

461,394 - 464,687

incarceration. Concerning extralegal effects, as age increases the likelihood of incarceration
decreases, and female offenders are less likely to be incarcerated than male offenders. Blacks and
offenders sentenced after a trial verdict are 1.36 times more likely to be incarcerated compared to
whites and those who enter a guilty plea.
Findings from the sentence length model are generally consistent with prior studies using
the PCS data (e.g., Britt, 2009; Johnson, 2006, 2014; Kramer & Ulmer, 2009), with one
exception. In line with prior work, a one-unit increase in offense severity is associated with a 44
percent increase in sentence length, and a one-unit increase in prior record score results in
sentences that are 21 percent longer. The extralegal effects in the sentence length model show
age has almost no effect, female offenders are sentenced more leniently than males, and
conviction after a trial increases the sentence length by 22 percent. However, in contrast to extant
research on Pennsylvania sentencing (Johnson, 2006; 2014), black offenders receive slightly
shorter sentences (two percent) than whites. One other notable difference between the models is
that offenders sentenced for mandatory minimum crimes receive are treated much more leniently
in the sentence length model. This finding is likely due to the large number of DUI offenders in
	

39	

	
the data receiving relatively short mandatory sentences (Britt, 2009). In addition, the AIC for the
fixed effects model is lower than the AIC in the unconditional model (464,775 versus 629,062),
which indicates including the offender-level variables produces a better model fit.
Table 4 also shows the random effects associated with select predictors of sentence
severity.13 For both the incarceration decision and the length of the sentence imposed, all
relevant predictors varied significantly across judges. More specifically, effects associated with
offense severity, prior record, age, gender, race, and mode of conviction on sentence severity
vary depending on the judge handling the case. These findings are consistent with studies
analyzing the PCS data using random effects models (Britt, 2000; Johnson, 2006; Kramer &
Ulmer, 2009; Ulmer & Johnson, 2004). Including random effects also shows a modest
improvement in model fit for both sentencing outcomes. The AICs for the random effects
incarceration models ranged from 320,629 to 323,975, compared to the fixed effects model AIC
of 324,018. Similarly, the AIC for the fixed effects sentence length model is 464,775, whereas
the AICs for the random effects models range from 461,394 to 464,687.14
With the exception of one study (Wooldredge, 2010), assessing findings from multilevel
analyses is the extent to which prior work has examined the variation in effects associated with
legal and extralegal factors across judges. The following section provides results from the
multilevel model random group effects and 161 individual judge logistic and OLS regression

																																																								
13

Given the current study’s focus on assessing variation in legal and extralegal effects across judges, only six
variables were considered for inclusion as random effects in the models: offense severity, prior record, age, gender,
race, and mode of conviction.
14
Models with six simultaneous random effects would not converge. Consequently, random effects analyses were
conducted by running a model with all predictors as fixed effects, and a single predictor as a random effect. This
random effects model was then compared to the fixed effects model with no random effects using a likelihood ratio
test to assess statistical significance (Snijders & Boskers, 2012). The process was repeated for all of the random
effects, and the range of AICs under the random effects in Table 4 reflect the model fit for each of the six random
effects models.

	

40	

	
models to explore judges’ contributions to legal and extralegal effects found in the multilevel
analysis.
Individual Judge Analysis
Figures 1 through 12 offer a visual representation of the differences in judges’ estimates
from the multilevel analysis and findings from the individual judge logistic and OLS regression
models for offense severity, prior record, age, gender, race and mode of conviction. The Y-axis
represents the judges (ranging from 1 to 161 from top to bottom), and the X-axis reflects the
odds ratio for the incarceration models, and percents for the logged sentence length models. The
panels on the left include each judges’ estimates and standard errors obtained from the random
effects models, and the panels on the right display odds ratios or percents and standard errors for
the individual judge regression models. In both panels, estimates in black lie between two
standard deviations below and above the mean, while those in gray are outside this range. The
individual judge regression panels on the right also include information about whether a given
variable is statistically significant; triangles represent coefficients with a p-value of <.05, and
circles are above this threshold.
Incarceration Models
Figure 1 provides the odds ratio for a one-unit increase in offense severity for the
incarceration models. The odds ratios range from 0.56 to 2.51 (standard errors from 0.02 to 0.14)
in the HGLM, and 0.54 to 3.11 (standard errors between 0.02 and 0.38) in the judge logit
models. Findings also indicate that 95 percent of the values in the HGLM are predicted to lie
between 1.02 and 1.94, and within 0.93 and 2.13 for the logit models. As expected, more
variation exists in the logit models than the HGLM, where the estimates in the latter show some

	

41	

	
Figure 1. HGLM and Logit Effects for Offense Severity

	

42	

	
pooling to the overall mean of 1.41. Though this occurs to varying degrees across the judges, it is
most noticeable for the more extreme cases towards the bottom half of the judge distributions. To
the extent the p-value obtained in the logit models can be used to assess whether individual
judges consider offense severity when deciding whether to incarcerate offenders, 158 of the 161
judges are associated with a statically significant increase in the odds of incarceration as offense
severity increases. Concerning prior record effects (Figure 2), the HGLM odds ratios range from
1.07 to 2.30 (standard errors 0.02 to 0.04) and 1.15 to 2.25 (standard errors 0.03 to .40) in the
logit models. Results also show that 95 percent of the values are predicted to lie between 1.10
and 1.85 and 1.02 and 2.02 in the HGLM and logit models, respectively. Similar to findings for
offense severity, the judge logit estimates shrink closer to the overall average of 1.43 in the
HGLM. In addition, the individual logit models indicate that 159 of the 161 judges exhibit
statistically significant effects associated with harsher punishment for offenders with prior
records.
The remaining figures examine differences between the HGLM and logit incarceration
models for extralegal factors. Figure 3 shows the findings for offender age are very similar, with
the odds ratios ranging just below 0.96 through 1.02 for both the HGLM and logit models. The
range of standard errors are also nearly identical (HGLM, 0.00 to 0.01; logit, 0.00 to 0.02).
Concerning predicted values across all judges, 95% of the estimates lie between 0.96 and 1.01
for the HGLM, and 0.96 and 1.02 for the logit models. Notably, there are a few judges in the top
portion of the HGLM plot that are moving in the opposite direction—rather than shrinking
towards the overall mean of 0.99, they moved farther away from it. In addition, the individual
judge logit models indicate that statistically significant findings associated with higher odds of

	

43	

	
Figure 2. HGLM and Logit Effects for Prior Record

	

44	

	
Figure 3. HGLM and Logit Effects for Age

	

45	

	
incarceration for younger offenders are limited to 80 of the 161 judges, and one judge is more
likely to incarcerate older offenders.
The findings for female offenders are presented in Figure 4, and the HGLM estimates
show substantial pooling when compared to the judge logit models. Odds ratios range from 0.55
to .90 (standard errors from 0.04 to 0.09) in the random effects panel, and 0.19 to 1.48 (standard
errors from 0.05 to .72) in the logit models. Nearly all of the judges’ values in the HGLM fall in
the 95 percent predicted range of 0.53 to 0.90, while a number of judges are outside the 0.33 to
0.97 range provided by the logit models. In terms of statistically significant effects, 112 of the
161 judges are less likely to incarcerate female offenders than males.
Race effects can be found in Figure 5, and from a pooling standpoint, appear contrary to
previous results. Odds ratio ranges associated with offender race are much larger than what was
found in the analyses above, spanning 0.52 to 4.09 (standard errors 0.50 to 0.67) in the HGLM,
and 0.42 to 5.49 (standard errors 0.08 to 3.15) in the logit models. Ninety-five percent of the
values are predicted to fall between 0.58 and 3.20 and 0.93 and 2.09 in the HGLM and logit
models, respectively. Whereas analysis of the other sentencing predictors showed less variation
in the multilevel analysis compared to the judge logit models, the race models show an opposite
pattern. The difference is due to several judges’ multilevel estimates for race moving farther, as
opposed to closer, to the overall mean of 1.36. Further, the individual judge logit model p-values
suggest that 90 of the 161 judges are more likely to incarcerate black offenders compared to
whites.
The final incarceration models provide findings concerning mode of conviction (Figure
6). Odds ratios for the HGLM are between 0.51 and 4.65 (standard errors ranging from 0.10 to
1.22), and 0.32 and 8.14 (standard errors ranging from 0.10 to 5.78) for the logit models. The

	

46	

	
Figure 4. HGLM and Logit Effects for Female Offenders

	

47	

	
Figure 5. HGLM and Logit Effects for Black Offenders

	

48	

	
Figure 6. HGLM and Logit Effects for Trial Convictions

	

49	

	
judge logit plot excludes three judges whose standard errors were extremely high (i.e., over 20)
and likely unreliable; due to pooling, these judges’ estimates are included in the HGLM and were
shrunk to the overall mean of 1.36. Results also show that 95 percent of the predicted values lie
between 0.53 and 3.50 in the HGLM, and 0.40 and 3.36 in the logit models. Comparing the
panels, the HGLM estimates show substantial pooling. This suggests that the small number of
offenders convicted after trial produce less reliable estimates across a large number of judges,
resulting in more shrinkage to the overall mean. Finally, despite the fixed effect trial estimate in
Table 4 being statistically significant, only 45 of 161 judges are more likely to incarcerate
offenders convicted after trial as opposed to those entering a guilty plea.
Sentence Length Models
The next set of findings provide information for legal and extralegal effects on the length
of the sentence imposed. Figure 7 shows the percent increase for a one-unit increase in offense
severity for the sentence length models. Percent increases range from 22 to 82 (standard errors
from 0.01 to 0.03) in the LMM, and 11 to 98 (standard errors 0.01 to 0.12) in the judge OLS
models. Ninety-five percent of the predicted values lie between 27 and 61 percent in the LMM,
and 23 and 63 percent across the OLS models. Similar to offense severity findings in the
incarceration models, slightly less variation is observed in the LMM estimates than the OLS
values. P-values from each judges’ individual OLS model indicate that offense severity
significantly affects the length of the sentence imposed for all 161 judges. For prior record
(Figure 8), the LMM findings show a one-unit increase in prior record score results in an
increase in sentence length ranging from 11 to 32 percent across the judges (standard errors 0.01
to 0.03), and 95 percent of the values are predicted to fall between 11 and 31 percent. In the OLS
models, prior record effects span between seven and 32 percent, with slightly higher standard

	

50	

	
Figure 7. LMM and OLS Effects for Offense Severity

	

51	

	
Figure 8. LMM and OLS Effects for Prior Record

	

52	

	
errors overall (0.01 to 0.09). Predicted values for 95 percent of the judges range from eight to 33
percent. Similar to offense severity, the LMM estimates show some pooling to the overall mean
of 21 percent, and the OLS models reveal that prior criminal history results in a significant
increase in sentence length across all 161 judges.
Figures 9 through 12 examine differences between the LMM and OLS sentence length
models for extralegal factors. Concerning offender age (Figure 9), the LLM shows a range of a
two percent decrease in sentence length to a one percent increase (standard errors 0.00), while
the OLS models range from a two percent decrease to a three percent increase (standard errors
from 0.00 to 0.01). Predicted values for 95 percent of the judges lie between negative one
percent and one percent for both the LMM and the OLS models. As expected, the LMM
estimates are shrunk closer to the overall mean of zero, and statistically significant effects from
the OLS models (39 judges with p <.05) are less consistent than what was found for offense
severity and prior record. In addition, whereas nearly all judges in the logit models were more
likely to incarcerate younger offenders, statistically significant effects from the OLS models
show that 33 of the 39 judges impose longer sentences for older offenders.
As shown in Figure 10, estimates for female offenders range from a 52 percent decrease
in sentence length to a 21 percent increase (standard errors from 0.04 to 0.11) for the LMM, and
a 69 percent decrease to a 25 percent increase (standard errors 0.05 to 0.40) in sentence length
for the judge OLS models. Estimates in the LMM model are pooled closer to the model mean of
an 11 percent decrease in sentence length, and predicted values for 95 percent of the judges fall
between negative 35 percent and 13 percent. For the OLS models, predicted estimates range
from negative 44 to 20 percent. The judge OLS models indicate only two judges show significant

	

53	

	
Figure 9. LMM and OLS Effects for Age

	

54	

	
Figure 10. LMM and OLS Effects for Female Offenders

	

55	

	
effects for sentencing females to longer periods of incarceration, while the remaining 43 of the
161 judges impose shorter sentences for female offenders.
Offender race effects are displayed in Figure 11. Values range from negative 37 percent
to 25 percent (standard errors 0.04 to 0.11) in the LMM and negative 55 percent and 31 percent
(standard errors 0.05 to 0.28) in the OLS models. The lower end of the predicted values for 95
percent of the judges is the same in both the LMM and OLS models (negative 27 percent), and
the upper end is 24 percent and 25 percent in the LMM and OLS models, respectively. Once
again, the LMM estimates show some shrinkage to the model mean of negative two percent.
Among the 24 of 161 judges who exhibit statistically significant effects in the OLS models, 14
judges impose shorter sentences for black offenders, and 10 sentence black offenders to longer
periods of incarceration.
The final figure for the sentence length models shows effects associated with trial
convictions (Figure 12). Judge percents range from negative five to 75 (standard errors from 0.07
to 0.17) in the LMM, and negative 33 to 113 (standard errors 0.05 to 0.84) in the OLS models.
Substantial pooling occurs around the LMM mean of 22 percent, and predicted values for the
judges fall between negative 16 and 59 percent in the LMM, and negative 29 and 71 percent
across the OLS models. Though no judges sentence offenders convicted after trial to shorter
sentences, the statistically significant effects from the OLS models show that 55 of the 161
judges impose longer sentences for offenders convicted after trial.

	

56	

	
Figure 11. LMM and OLS Effects for Black Offenders

	

57	

	
Figure 12. LMM and OLS Effects for Trial Convictions

	

58	

	
Before moving to the third research question, examination of the ways in which judges
consider offender and case characteristics for the incarceration decision and the length of the
sentence imposed revealed additional information that is worth noting. In the logit and OLS
models, nearly all judges showed statically significant effects for offense severity and prior
record, indicating that increases in these legally relevant variables increased the odds of
incarceration and the length of the sentence imposed. Figures 13 through 16 provide side by side
comparisons of significant and non-significant effects from the logit and OLS models for
offender age, gender, race, and mode of conviction. The results suggest that some judges
consider these extralegal factors for both sentencing outcomes, others when deciding whether to
incarcerate but not when determining the appropriate sentence length (and vice versa), and still
others who do not consider these factors in either decision.
Concerning offender age (Figure 13), most judges (65 of 161) are nonsignificant for age
effects in either decision, followed by 57 judges who consider age in both the decision to
incarcerate and the length of the sentence imposed. Figure 14 displays effects for female
offenders and indicates that only 32 of the 161 judges consider gender in both decisions, 80 show
significant effects for incarceration alone, and 13 for just the sentence length decision.
Significant race effects (Figure 15) are most prevalent when deciding whether to incarcerate (77
of 161 judges), followed by 60 judges who do not consider race a significant factor when
determining sentence severity. Finally, for just over half of the judges (84 of 161), whether
offenders enter a guilty plea or take their case to trial seems to have no bearing on either the
decision to incarcerate or sentence length (Figure 16). Remaining comparisons can be found in
Table 5, but overall these findings suggest individual judges’ consideration of extralegal factors
is conditioned by the sentencing outcome.

	

59	

	
Figure 13. Logit and OLS Effects for Age

	

60	

	
Figure 14. Logit and OLS Effects for Female Offenders

	

	
	

61	

	
Figure 15. Logit and OLS Effects for Black Offenders

	

	

62	

	
Figure 16. Logit and OLS Effects for Trial Convictions

	

63	

	

Table 5. Number of Judges with Significant Effects for Sentencing Outcomes
(N = 161)
Age
Female
Black
Trial
Sentencing Outcome
Incarceration and
Sentence Length
24
32
13
23
Incarceration Only

57

80

77

22

Sentence Length Only

15

13

11

32

Neither Incarceration
Nor Sentence Length

65

36

60

84

Analysis of Judges within Court Communities
The final portion of the analysis examined the extent to which judges in the same
courthouses exhibit similar sentencing patterns. Integrating the focal concerns and court
community perspective suggests that sentencing decisions may be influenced by the court
community in which punishment decisions occur, but differences in court actor autonomy in
large, medium, and small courts may condition this relationship. Specifically, court communities
in large courts have been characterized as diffuse, based in part on a high degree of autonomy
between members of the courtroom workgroup, whereas the close working relationship
developed among small court actors is expected to limit autonomy (Eisenstein, Flemming, &
Nardulli, 1988; Jacob, 1997). Court actor autonomy in medium courts is expected to fall
somewhere in between large and small courts. Figures 17 through 22 provide results from select
courts. The plots display individual judge logit (odds ratios and standard errors) and OLS
(percents and standard errors) findings for legal and extralegal factors. Estimates in black
represent coefficients with a p-value of <.05, while those in gray are above this threshold.

	

64	

	
Significant legal and extralegal effects for the remaining courts are included in Tables 6 through
8.
Figure 17 provides findings for the largest court in the sample (Large Court 2). The
results indicate that while legal factors are the primary determinants of punishment, substantial
variation exists in the role extralegal factors play in sentencing outcomes. Nearly all 29 judges
are associated with increasing the likelihood of incarceration and sentence length as offense
severity and prior record increase. Concerning extralegal factors, very few judges consider
offender age in the incarceration decision (five of 29) and when determining sentence length
(eight of 29). In addition, while five of the judges are associated with a very slight increase in
sentence length for younger offenders, the other three sentence older offenders to longer
sentences. Significant findings are more prevalent for gender, with just over half of the judges
being less likely to incarcerate female offenders. Fewer judges consider gender in the sentence
length decision (11 of 29), and one judge imposes longer sentences for female offenders
compared to males. Similar to age, very few judges are significant for race effects, though race in
this court appears to play a larger role in determining the sentence length than whether to
incarcerate black offenders. Finally, 13 of 29 and 18 of 29 judges are more likely to incarcerate
and impose longer sentences for offenders convicted after trial compared to those who enter a
guilty plea, respectively. With the exception of gender effects for the incarceration decision,
more judges exhibit significant effects associated with trial convictions than any other extralegal
variable.
Table 6 provides information for all of the large courts. The table includes the percent of
judges with significant effects for the key predictors of sentencing across both punishment
decisions. Findings show significant effects associated with offense severity and prior criminal

	

65	

	
Figure 17. Individual Judge Effects in Large Court 2
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	

66	

	
Figure 17 (cont’d).
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	

	

67	

	
Table 6. Percent of Large Court Judges with Significant Effects
Court

	

Offense Severity

Prior Record

Age

Female

Black

Trial

Large Court 1 (N=19)
Incarceration
Sentence Length

89%
100%

100%
100%

74%
5%

95%
32%

74%
26%

26%
16%

Large Court 2 (N=29)
Incarceration
Sentence Length

100%
100%

97%
100%

17%
28%

59%
38%

10%
24%

45%
62%

Large Court 3 (N=11)
Incarceration
Sentence Length

100%
100%

100%
100%

36%
9%

91%
18%

64%
18%

36%
27%

Large Court 4 (N=16)
Incarceration
Sentence Length

100%
100%

100%
100%

31%
19%

75%
31%

69%
13%

19%
69%

68	

	
history for nearly all judges in these courts. Conversely, judges in these large court communities
vary more in significant effects associated with age, gender, race, and mode of conviction.
Further, with the exception of trial convictions in Large Court 4, significant extralegal effects are
more prevalent for the incarceration decision than the sentence length decision.
The next set of findings provides information on judges’ sentencing patterns in medium
sized courts, where court actor autonomy exists but to a lesser degree than found in large courts.
Figure 18 provides findings for Medium Court 3, and shows highly consistent sentencing
patterns among the judges in this court. Specifically, both judges increase punishment severity as
offense severity and prior record increases, and sentence female offenders more leniently than
males. In addition, both judges are less likely to incarcerate older offenders, are more likely to
sentence blacks to jail or prison when compared to whites, and exhibit non-significant effects
associated with age, race, and trial and the sentence length decision. The only exception is trial
convictions in the incarceration models, where one judge is more likely to incarcerate offenders
convicted after trial and the other is not.
In contrast, Figure 19 provides results from another medium court where individual
judges exhibit substantial differences across extralegal effects. In Medium Court 6, while eight
of the 10 judges are less likely to incarcerate older offenders, only half of the judges consider age
in the sentence length decision. Notably, the latter increase sentence length as age increases, and
the four judges exhibiting significant effects for both outcomes are more lenient on older
offenders for the incarceration decision, but more punitive when determining the appropriate
sentence length. Seventy percent of judges are less likely to incarcerate female offenders, and 40
percent impose shorter sentences for females than males. Additional inconsistency is found
concerning race effects, where 60 percent of judges are more likely to incarcerate blacks than

	

69	

	
Figure 18. Individual Judge Effects in Medium Court 3
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	

70	

	
Figure 18 (cont’d).	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	

	

71	

	
Figure 19. Individual Judge Effects in Medium Court 6
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	

72	

	
Figure 19 (cont’d).	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	

	

73	

	
whites, and 30 percent consider race in the sentence length decision. However, findings from the
sentence length models show judges in this court sentence black offenders in different ways,
with two judges imposing shorter sentences and one sentencing black offenders to longer periods
of incarceration. Finally, 30 percent and 40 percent of judges increase the odds of incarceration
and sentence length (respectively) for offenders convicted after trial, while the remaining judges
show non-significant effects.15
Table 7 provides information for judges in all of the medium courts. Overall, judges in
these courts exhibit similar patterns to those found in the large courts. The vast majority of
judges in these courts show significant effects for offense severity and prior criminal history. In
contrast, significant findings associated with age, gender, race, and mode of conviction vary
substantially across these judges, and extralegal factors are generally more influential for the
incarceration decision as opposed to length of the sentence imposed.
The last set of results provides information on judges in small courts. Court actors in
these court communities work closely with one another, which is likely to limit individual
autonomy. Findings for three of the 11 small courts conformed to these expectations, showing
consistent sentencing patterns among judges. As expected, judges in these small courts show
significant effects for legal factors across both outcomes. In Small Court 5 (Figure 20), both
judges are associated with a decrease in the odds of incarceration as offender age increases, and
are more likely to incarcerate blacks compared to whites. With the exception of mode of
conviction, where one judge imposes longer sentences for offenders convicted after trial, no
other extralegal predictors are significant for either judge. Similarly, in Small Court 9 (Figure
21), the only difference between the judges is that one judge is less likely to incarcerate female
																																																								
15

An additional judge exhibits significant effects for trial convictions in the incarceration decision, but was excluded
from the 30 percent due to an extremely high standard error.

	

74	

	
Table 7. Percent of Medium Court Judges with Significant Effects
Court

	

Offense Severity

Prior Record

Age

Female

Black

Trial

Medium Court 1 (N=7)
Incarceration
Sentence Length

100%
100%

100%
100%

43%
29%

86%
14%

71%
0%

29%
0%

Medium Court 2 (N=10)
Incarceration
Sentence Length

100%
100%

100%
100%

10%
10%

50%
10%

30%
30%

20%
0%

Medium Court 3 (N=2)
Incarceration
Sentence Length

100%
100%

100%
100%

100%
0%

100%
100%

100%
0%

50%
0%

Medium Court 4 (N=6)
Incarceration
Sentence Length

100%
100%

100%
100%

83%
33%

67%
67%

67%
0%

17%
50%

Medium Court 5 (N=9)
Incarceration
Sentence Length

100%
100%

89%
100%

33%
44%

67%
22%

89%
0%

0%
0%

Medium Court 6 (N=10)
Incarceration
Sentence Length

100%
100%

100%
100%

80%
50%

70%
40%

60%
30%

30%
40%

75	

	
Figure 20. Individual Judge Effects in Small Court 5
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	

76	

	
Figure 20 (cont’d).
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	

	

77	

	
Figure 21. Individual Judge Effects in Small Court 9
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	

78	

	
Figure 21 (cont’d).
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	

	

79	

	
offenders, while the other is not. Effects associated with the legal factors for both outcomes and
age and race in the incarceration models are significant and in the same direction, and the
remaining extralegal factors are non-significant. Judges in Small Court 4 (Figure 22) only differ
in terms of offender race and mode of conviction in the incarceration models, though these
findings should be interpreted with caution given the large standard errors. However, for the
sentence length models, only legal factors influence these judges’ decisions.
Finally, Table 8 includes findings for judges in all of the small courts. In contrast to
judges in large and medium courts (with the exception of Medium Court 3), there are a few
pockets of consistency among judges in small courts. For example, in Small Court 3, judges are
consistent in terms of significant effects for offense severity, prior record, age, and mode of
conviction in the incarceration models, and for all predictors in the sentence length models. In
addition, though judges in Small Courts 10 and 11 exhibit differences for the incarceration
decision, similar sentencing patterns are found for the sentence length models. Judges differ in
race effects only in Small Court 10, and mode of conviction in Small Court 11.

	

80	

	
Figure 22. Individual Judge Effects in Small Court 4
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	

81	

	
Figure 22 (cont’d).
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	

	

82	

	
Table 8. Percent of Small Court Judges with Significant Effects
Court

	

Offense Severity

Prior Record

Age

Female

Black

Trial

Small Court 1 (N=5)
Incarceration
Sentence Length

80%
100%

100%
100%

80%
40%

80%
60%

60%
0%

20%
40%

Small Court 2 (N=4)
Incarceration
Sentence Length

100%
100%

100%
100%

50%
25%

75%
25%

50%
0%

75%
50%

Small Court 3 (N=2)
Incarceration
Sentence Length

100%
100%

100%
100%

100%
0%

50%
0%

50%
0%

0%
0%

Small Court 4 (N=2)
Incarceration
Sentence Length

100%
100%

100%
100%

0%
0%

0%
0%

50%
0%

50%
0%

Small Court 5 (N=2)
Incarceration
Sentence Length

100%
100%

100%
100%

100%
0%

0%
0%

100%
0%

0%
50%

Small Court 6 (N=6)
Incarceration
Sentence Length

100%
100%

100%
100%

50%
33%

83%
0%

67%
0%

0%
50%

83	

	
Table 8 (cont’d).
Court

	

Offense Severity

Prior Record

Age

Female

Black

Trial

Small Court 7 (N=5)
Incarceration
Sentence Length

100%
100%

100%
100%

80%
20%

20%
40%

40%
0%

0%
20%

Small Court 8 (N=5)
Incarceration
Sentence Length

100%
100%

100%
100%

100%
20%

100%
0%

80%
20%

100%
80%

Small Court 9 (N=2)
Incarceration
Sentence Length

100%
100%

100%
100%

100%
0%

50%
0%

100%
0%

0%
0%

Small Court 10 (N=4)
Incarceration
Sentence Length

100%
100%

100%
100%

50%
0%

75%
0%

100%
25%

0%
0%

Small Court 11 (N=4)
Incarceration
Sentence Length

100%
100%

100%
100%

75%
100%

75%
0%

50%
0%

0%
25%

84	

	
CHAPTER 5: DISCUSSION
This chapter begins by briefly reviewing the purpose of the current inquiry, including the
research questions and hypotheses examined in the analyses. Next, an overview of the findings is
presented and theoretical and methodological implications are discussed. The final portion of this
chapter provides policy implications, limitations, and directions for future research.
The Current Inquiry
This study explored three research questions to advance knowledge of interjudge
disparity and judicial sentencing patterns within court communities. The first research question
involved multilevel analysis of all judges in the sample to examine legal and extralegal effects on
the decision to incarcerate offenders and the length of the sentence imposed, and whether these
effects varied significantly across judges. Drawing on extant research (e.g., Johnson, 2006;
Kramer & Ulmer, 2009), increases in offense severity and prior record were expected to increase
sentence severity. Younger offenders, males, black offenders, and offenders convicted after trial
were expected to be punished more harshly than older offenders, females, whites, and those who
entered a guilty plea, respectively.
The second research question employed individual judge logistic and OLS regression
models to assess judges’ contributions to legal and extralegal disparities found in the multilevel
analysis. Given the sentencing guidelines’ use of offense severity and prior criminal history in
determining the appropriate punishment, these legal factors were expected to significantly affect
most individual judges’ sentencing decisions. However, judicial consideration of extralegal
factors is influenced, at least in part, by individual judges’ subjective decision-making. Thus,

	

85	

	
significant effects associated with extralegal predictors were expected to vary across judges more
so than effects associated with legal factors.
The final research question examined the sentencing patterns of judges in the same court
communities. Prior work suggests that court communities develop distinctive case processing
and sentencing norms that may influence punishment outcomes (Eisenstein, Flemming, &
Nardulli, 1988; Kramer & Ulmer, 2009; Ulmer, 1997), but differences in court actor autonomy in
small, medium, and large courts may condition this relationship. Consequently, similar
sentencing patterns may be most prevalent in small courts where court actor autonomy is
restricted; on the other hand, high levels of autonomy associated with large courts may result in
more diverse sentencing patterns among large court judges.
Theoretical and Methodological Implications
Multilevel Analysis of Judge Variation. Extant research on interjudge disparity using
multilevel analysis consistently shows both legal and extralegal factors influence sentence
severity (Anderson & Spohn, 2010; Johnson, 2006; Wooldredge, 2010). Using this approach, the
current work finds a one-unit increase in offense severity and prior record increase the likelihood
of incarceration and the length of the sentence imposed, and female offenders are treated more
leniently than male offenders. Prior research also indicates that younger offenders, black
offenders, and those convicted after trial are more likely to be sentenced to jail or prison and
receive longer sentences than older offenders, whites, and offenders pleading guilty, respectively
(e.g., Kramer & Ulmer, 2009; Ulmer & Johnson, 2004). The multilevel analysis offers support
for judges imposing a trial penalty, but findings associated with offender age and race produced
mixed results. More specifically, while younger offenders and black offenders are more likely to
be incarcerated, age has almost no effect on the length of the sentence imposed, and black

	

86	

	
offenders receive shorter sentences than whites. Differences in these results may be attributable
to using data from different time periods and/or over a longer period of time, or may be unique to
the sample of judges selected for this analysis. Ultimately, these findings offer support for
hypotheses one, three, and five, and partial support for hypotheses two and four.
Research employing multilevel analysis also shows that effects associated with legal and
extralegal factors differ across judges (Anderson & Spohn, 2010; Johnson, 2006; Wooldredge,
2010). In line with hypothesis six, the current research indicates that effects associated with
offense severity, prior record, age, gender, race, and mode of conviction vary significantly across
judges. Overall, the results from this portion of the analysis provide support for the notion that
judges rely on legal and extralegal factors when assessing the focal concerns, and effects
associated with these offender and case characteristics vary across judges.
Individual Analysis of Judge Variation. To explore this variation across judges in more
detail, the second research question employed individual judge logistic and OLS regression
models to assess judges’ contributions to legal and extralegal disparities found in the overall
models. As expected, nearly all judges mete out harsher punishments for offenders committing
more serious crimes and those with lengthier criminal records, whereas significant extralegal
effects were less consistent. Further, consistent with the limited prior work on individual judges
(Wooldredge, 2010), comparing findings from the overall models with the judge logit and OLS
models indicates variation at the judge level is masked when using multilevel analysis.
Specifically, very few individual judge effects from the logit and OLS models were in
line with the fixed effects estimates for the legal and extralegal factors found in the multilevel
models. In addition, statistically significant extralegal effects vary widely across individual
judges, despite age, race, gender, and mode of conviction being significant in the multilevel fixed

	

87	

	
effects. This should be expected to some degree, given the overall estimates are just the predicted
values across all judges (Hox, 2010). Still, the overall estimates are generally the focus of much
of the existing sentencing literature (Anderson & Spohn, 2010; Johnson, 2006; Wooldredge,
2010), and have played a substantial role in developing theories that explain the differential
treatment of similarly situated offenders (i.e., those convicted of the same crimes, with similar
criminal histories). Results from the random effects portion of the multilevel analyses provide
more information about individual judge effects for a given variable, and offer three primary
findings with methodological and theoretical implications.
First, as discussed previously, the individual judge estimates in the multilevel analysis are
weighted averages that take into account group information and the overall model mean, and less
reliable estimates are shrunk closer to the model average. Consequently, values from multilevel
models are biased, but also more precise. Yet, this is not always the case. Though the majority of
the findings from the current work show shrinkage towards the mean, results from the multilevel
and individual judge logit incarceration models for offender race revealed a number of judges’
odds ratios in the multilevel model increased (when compared to the logit models) and were
pushed farther away, as opposed to closer to, the overall model mean. Burnham (2017) notes that
while shrinkage estimators are optimal for the overall set of model parameters (i.e., they
minimize the mean squared error for all groups), they may not be for all individual parameters.
Consequently, individual estimates may be “incorrectly shrunk” or move in the wrong direction
(Burnham, 2017: 20; see also Lipsky et al., 2011), resulting in misleading estimates about how
judges consider offender and case characteristics in sentencing decisions.
The second finding indicates that even when shrinkage estimators are operating as
expected, the random effects estimates are of limited value when the purpose of the research is to

	

88	

	
assess individual judge decision-making. For example, the multilevel and logit incarceration
models (and to a lesser extent the sentence length models) for gender show the differences in
values for female offenders in the logit models are reduced to a substantially tighter grouping
around the overall mean in the multilevel model. As such, the random effects allow for drawing
general conclusions about leniency for female offenders, but understate the extent of the
variation across judges. More noticeable, however, are the shrinkage effects for the mode of
conviction variable in the incarceration models. Strong pooling should be expected, since
shrinkage estimators are designed to obtain the most precise estimates, and the individual models
show a number of judges with large effects and high standard errors. Yet, comparing the results
from the different methodological approaches raises questions about how mode of conviction
should be used in sentencing research, and what can be interpreted from findings associated with
offenders convicted after trial. Including mode of conviction as a predictor of punishment
severity is ubiquitous in sentencing research, and findings consistently show that trial
convictions result in harsher punishment (e.g., Dixon, 1995; Johnson, 2003, 2006; for reviews,
see Ulmer, 2012; Ulmer & Bradley, 2006). However, to the extent these results are driven by
unreliable estimates and/or a small number of judges who impose extremely high penalties, as
the individual models suggest, drawing general conclusions from multilevel models about how
judges sentence offenders convicted after trial should be reconsidered.
The third finding highlights an additional limitation when relying exclusively on
multilevel analyses to examine interjudge disparity. Multilevel models are designed to obtain the
most precise estimate for each group, but may not be appropriate when predictors are
hypothesized to produce effects for some groups, but not others (Gelman & Hill, 2016). This is
particularly important for sentencing research on individual judges because sentencing theories

	

89	

	
are not only concerned with whether judges vary in effects associated with offender and case
characteristics, but also whether these factors matter in punishment decisions. Specifically, focal
concerns suggests that the influence of legal and extralegal factors on punishment decisions is
likely to differ based on judges’ subjective assessments of blameworthiness and community
threat. Findings from extant studies have generated broad conclusions about sentencing
predictors, such as females receiving more lenient sentences than males, and blacks being
punished more harshly than whites. Yet, the current work finds substantial differences in
significant effects associated with extralegal factors in the individual judge analyses, indicating
that more variation exists among certain sentencing predictors than previously understood. These
include effects for age (80 of 161 judges p <.05), gender (112 of 161), race (90 of 161), and
mode of conviction (45 of 161) for the incarceration decision, and even fewer significant effects
for the sentence length decision (with the exception of mode of conviction). Ultimately, these
findings suggest that extralegal factors matter in sentencing outcomes, but the ways in which
they influence punishment severity is conditioned by the individualized nature of the judicial
decision-making process. As such, analysis of individual judges provides new insights about the
key predictors of sentencing over what is traditionally found using multilevel analysis, and offer
important implications for theory.
Similar to findings from the overall model, the results from the individual judge models
offer support for the focal concerns perspective. Judges rely primarily on legal factors when
assessing offender blameworthiness and community threat, but also engage in subjective
decision-making based on attributions associated with extralegal characteristics when
considering the focal concerns. Yet, given that few individual judges are in line with the overall
multilevel model estimates, the pooling of estimates in the random effects, and that individual

	

90	

	
judges’ significant effects vary in ways that are not represented in the overall model or the
random group effects, the results suggest that focal concerns may be more appropriately tested
using separate judge models. In particular, individual analyses would provide a better
understanding of whether and how judges consider the key predictors of sentencing when
assessing the focal concerns (see also Wooldredge, 2010).
However, findings from the analytic strategies used in the current work highlight
additional problems with the focal concerns perspective. Additional analyses (not shown)
indicate that some judges are associated with significant effects for legal factors only. Others are
significant for legal factors and some extralegal factors, and still other judges exhibit significant
effects for all legal and extralegal variables. Moreover, the influence of extralegal factors is
conditioned by the sentencing outcome (e.g., incarceration, sentence length). In some sense, all
of these judges sentencing decisions can be explained by the focal concerns perspective. This is
because focal concerns recognizes that judges will vary in their assessments of the focal
concerns, and the factors used to assess them. For some judges, protection of the community may
be assessed based on prior criminal history, while others might consider the nature of the offense
(e.g., violent versus property crime). Others may consider one or both of these legal factors in
addition to offender characteristics such as race and gender if they view certain offenders as
posing a greater threat to the community.
In another sense, the variation found in the individual judge models concerning
significant effects suggests this perspective may be too parsimonious because it cannot explain
these patterns in sentencing decisions. Focal concerns outlines broad concepts associated with
punishment decisions; blameworthiness, protection of the community, and practical constraints.
The perspective also notes legal and extralegal factors that may be associated with assessing the

	

91	

	
focal concerns, but acknowledges that judges are likely to vary in how they consider legal and
extralegal factors in relation to the focal concerns. It does not, however, provide a clear
indication of the specific factors that influence each of the focal concerns, or why certain factors
may be relevant for the incarceration decision but not the sentence length decision (and vice
versa), which limits testing this perspective using existing data and quantitative methods (see
Hartley, Maddan, & Spohn, 2007). Moreover, the perspective relies heavily on judges’
perceptions and subjective decision-making, but provides little information about how judges
develop their sentencing philosophies. Additional theoretical perspectives that may address this
gap are discussed in the current work’s section on directions for future research.
Sentencing Within Court Communities. The third research question examined judges
grouped by court to assess whether and how judges in the same court communities consider legal
and extralegal factors in the decision to incarcerate offenders. The current study hypothesized
that less autonomy among small court judges would result in similar sentencing patterns among
these judges. Differences in sentencing patterns were expected to increase in medium courts, and
the largest variation was predicted in large courts where autonomy is highest.
Overall, findings concerning the relationship between court size and sentencing patterns
provide limited support for the current work’s hypotheses. As expected, judges in large courts
exhibit substantial variation in terms of significant extralegal effects. These findings are in line
with limited prior work that suggests while certain aspects of case processing are tightly coupled
in large courts (e.g., docket management, courtroom assignment), judges exercise discretion in
punishment decisions in ways they feel promotes justice (Jacob, 1997).
More similarities in judges’ sentencing patterns were expected in medium-sized courts,
but the findings show a mix of patterns. While individual judge legal and extralegal effects are

	

92	

	
near identical in one medium court, judges in the remaining five medium courts look similar to
those found in the large courts; that is, judges vary substantially in effects associated with
offender age, gender, race, and mode of conviction. Notably, the medium court where judges
exhibit very similar sentencing patterns has a total of 11 authorized judgeships, but only two
judges handle the majority of the criminal caseload (roughly 90 percent of cases). Thus, though
the court community is categorized as medium, it may operate in ways that are more reflective of
small court communities.
With prior research suggesting that small courts are composed of very few court actors
who work closely with one another (Eisenstein, Flemming, & Nardulli, 1988; Ulmer, 1997), the
current work expected judges in small courts to exhibit similar sentencing patterns than found in
large and medium courts. Findings offer some support for this expectation, with judges in three
of the 11 small courts exhibiting relatively similar sentencing patterns in terms of statistically
significance effects associated with offender and case characteristics. Three additional courts
show pockets of consistency across judges for most extralegal effects, offense severity and/or
prior record, but much of this is limited to the sentence length decision.
Though the results concerning the conditioning effect of court size on individual judges’
sentencing patterns garnered limited support, the findings overall are consistent with the focal
concerns and court community perspective. Similar to focal concerns theory alone, this is
because current theorizing about the court community influence on sentencing has not been well
defined. Recall that integrating these perspectives suggests that differences in judicial
consideration of legal and extralegal factors in sentencing decisions can be explained in part by
the distinctive case processing and sentencing norms present in the court community in which
punishment decisions occur. However, scholars also note that the court community influence is

	

93	

	
likely dependent on the presence of, and adherence to, shared norms within these communities
(Ulmer, 2012). As such, similarities between individual judges’ sentencing decisions in the same
court communities can be interpreted as consistent with the focal concerns and court community
perspectives, but differences in judges’ sentencing patterns can as well.
Overall, the findings from this portion of the analysis suggest court size is too broad of a
measure to explain the relationship between court communities and sentencing decisions.
Though extant qualitative work provides clear evidence of court actors developing working
relationships and case processing strategies, it stops short of explaining how these court
community elements affect sentencing. Thus, the current research highlights a need for
additional theoretical development to explain the ways in which court communities influence
sentencing decisions.
Implications for Policy
The current work has implications for sentencing law and policy. The sentencing of
criminal offenders is a fundamental mechanism of formal social control in society, and disparity
in punishment raises questions about the legitimacy of legal institutions (Reitz, 1998; Tonry,
1996). Perceived illegitimacy in the application of criminal sanctions may have a significant
impact on crime rates, the deterrent capacity of the criminal justice system, race relations, and
the generation and reproduction of social inequalities (Anderson, 1999; Gottschalk, 2008;
Klinger, 1994; LaFree, 1998; Russell, 1998; Ruth & Reitz, 2003; Tyler, 1990; Western, 2006).
Sentencing guidelines were developed to increase uniformity in punishment and reduce
unwarranted disparity (Kramer & Scirica, 1986). Since the present study does not compare
sentencing decisions pre- and post-guidelines, it is unclear whether the guidelines have achieved
their intended goals. However, findings suggest that extralegal disparity associated with age,

	

94	

	
gender, race, and mode of conviction continues to exist under structured sentencing systems, and
the ways in which these factors influence sentencing varies across judges. As a result, the
probability of incarceration and the sentence length imposed for two similar offenders may be
significantly different depending on the judge who sentences them.
Despite the existence of legal and extralegal disparities, sentencing guidelines offer a
compromise between eliminating judicial discretion entirely and sentencing bounded by only
wide ranging statutory minimums and maximums. The former would include policies such as
mandatory minimum sentencing provisions, which have been widely criticized as unduly harsh
(Tonry, 1996). Further, some research suggests mandatory minimums simply shift sentencing
discretion to other court actors, such as prosecutors (Tonry, 1996). Sentencing bounded by only
statutory minimums and maximums would grant judges nearly unfettered discretion, which
would almost guarantee disparate treatment of similar offenders.
Still, policy changes may be necessary to achieve more uniformity in criminal sanctions.
These changes may include training for judges to ensure that sentencing is based primarily on the
guidelines rather than extralegal criteria, as well as having judges provide some explanation of
their reasons for imposing the selected sentence. In addition, since Pennsylvania’s guidelines
allow for more judicial discretion than any other state operating under a guidelines system
(Kramer & Ulmer, 2009), the current work may signify a need for stricter appellate review
standards. Finally, the Pennsylvania Commission on Sentencing’s (PCS) Annual Reports are
currently limited to descriptive analyses of offender and case characteristics. If the PCS is
concerned with guideline compliance and accountability, more rigorous examinations of
individual judges’ sentencing decisions are necessary.

	

95	

	
Limitations
Though the current research offers significant theoretical and methodological
contributions to the sentencing literature, like any research several limitations exist. As noted by
others who have used the PCS data (Johnson, 2005, 2006; Kramer & Ulmer, 2009; Ulmer &
Kramer, 2004), the data do not include information on charging decisions, bail outcomes,
offender socioeconomic status, and victim characteristics, all of which may predict variation in
punishment severity (e.g., Baumer, 2010). Further, some research shows that offender
characteristics interact to produce greater disparity than found when only exploring direct effects
alone (e.g., Doerner & Demuth, 2010; Spohn & Holleran, 2000; Steffensmeier, Ulmer, &
Kramer, 1998). As such, it is possible that effects associated with extralegal factors from the
individual judge models would be more prevalent if age, gender, and race, were examined in
combination.
In addition, the analyses are limited to a sample of judges from large, medium, and small
courts in Pennsylvania. Consequently, findings are only generalizable to these judges in these
courts. It is possible that research on individual judges in other courts in Pennsylvania, as well as
judges in other states, would produce different findings.
Similar to other studies that have applied the focal concerns and court community
perspectives (Johnson, 2006; Kramer & Ulmer, 2009; Ulmer & Johnson, 2004), the present work
lacks direct measures of judicial sentencing philosophies, as well as information about judges’
perceptions associated with offender and case characteristics. In addition, the current work does
not include measures of court community features, such as information about other courtroom
actors (e.g., prosecutors, defense counsel), workgroup relationships, case processing strategies,
and sentencing norms. Thus, interpretations of workgroup autonomy, judges’ sentencing

	

96	

	
patterns, and the role the court community plays in influencing punishment outcomes only serve
as inferences.
However, the current work’s use of multilevel modeling is generally consistent with the
way in which other studies have examined interjudge variation and court communities (e.g.,
Anderson & Spohn, 2010; Johnson, 2006). More importantly, the present study is the first to
apply these perspectives to analyze individual judges’ sentencing patterns from a relatively large
sample of courts differing in size. As such, it offers a substantial contribution in terms of
identifying individual judge variation, and further advances knowledge of judges’ sentencing
patterns within court communities. Thus, despite these limitations, this work provides a number
of avenues for future research.
Directions for Future Research
Given most theories of sentencing recognize individual differences in the ways judges
consider legal and extralegal factors in sentencing decisions, multilevel models have become
increasingly popular in sentencing research. However, results from the current research suggest
that future studies should consider using separate judge models to gain a better understanding of
variation across judges, to examine extreme cases and patterns in the data, and to assess whether
and how judges consider extralegal factors, which are likely to influence some judges’ decisions,
but not others. This is not to say that multilevel analyses are inappropriate for sentencing
research, but they are likely better suited for some research questions over others. Multilevel
models may be beneficial when examining effects of sentencing predictors that theory and
empirical research suggest are highly influential for all judges, such as offense severity and prior
criminal history. These factors are consistently associated with significantly affecting
punishment severity, and the shrinkage estimators used in multilevel analysis may provide

	

97	

	
precise estimates for judges with varying sample sizes. Multilevel analysis may also be useful for
drawing general conclusions about variation for other sentencing predictors, particularly when
some judges sentence a small number of offenders. Yet, for testing theories that predict effects
associated with offender and case characteristics are likely to vary based on judges’ subjective
decision-making, individual judge models offer some advantages. Though using separate
regression models is dependent on having access to judge information and sample size, this kind
of research would complement the larger body of sentencing literature that has focused mostly
on multilevel analyses.
To move beyond describing whether and how judges consider offender and case
characteristics, future work would benefit from additional theoretical integration and
development to better understand why certain legal and extralegal factors influence sentencing
outcomes. For example, future research might take an organizational view of case processing,
which suggests that individual judges’ sentencing patterns are influenced by the cases they
handle over time. According to Emerson (1983: 425), “the individual case provides an adequate
unit of analysis only if social control agents themselves examine and dispose of cases as discrete
units, treating each on its own merits independently … of other cases.” What is more likely, he
argued, is that individual cases are not treated independently, but rather viewed in connection
with the agents’ overall flow of cases (Emerson, 1983). In the context of sentencing, judges who
handle a larger number of violent cases over time may become desensitized; as a result, these
judges may sentence violent offenders less harshly than judges who encounter violent cases less
often (Johnson, 2006). Additional work indicates that attributions associated with extralegal
factors may also be influenced by the overall flow of cases (Maynard-Moody & Musheno, 2003).
Thus, examining individual judges’ sentencing decisions in relation to their caseload may further

	

98	

	
understanding of why judges consider legal and extralegal factors in different ways. Future
research in this area may include trajectory analysis to examine judges’ caseloads and changes in
sentencing patterns over time, and qualitative work to gain an in-depth understanding of how
judges’ overall flow of cases affects sentencing decisions.
In addition, more qualitative research is needed to better understand the relationship
between court communities and sentencing decisions. Research with judges should explore
whether judges are aware of their colleagues’ sentencing decisions, and the extent to which those
decisions influence their own. Additional work is also needed to assess the prosecutor’s role in
punishment decisions, and particular attention should be devoted to the courtroom workgroup’s
approach to prosecutor recommended sentences as part of plea agreements. Limited work
suggests this varies across courts (Eisenstein, Flemming, & Nardulli, 1988; Ulmer, 1997), and it
is an important component to understanding the root causes of differences in sentencing
outcomes.
More generally, the findings from the current work highlight the need for more research
at the individual court actor level (see also Ulmer, 2012). Current theories of sentencing view
punishment decision-making as an individualized process, where judges and potentially other
court actors assess offender blameworthiness and community threat in their own ways. The
factors that influence these decisions, as well as the weight afforded to these factors, is likely to
vary across decision-makers. In addition, contextual theories concerning court influences
recognize that court communities are unique, and they develop their own distinctive case
processing strategies and sentencing norms. Yet, much of the extant research has taken these
theories and tested them at the aggregate level, using large datasets, and pooling cases across all
judges in a jurisdiction or a state. Though research at the individual judge level may limit

	

99	

	
generalizability, it has the potential to further refine current sentencing perspectives and advance
theoretical development.

	

100	

	

REFERENCES

	

101	

	
REFERENCES

Albonetti, C. A. (1991). An integration of theories to explain judicial discretion. Social
Problems, 38, 247-66.
Albonetti, C. A. (1997). Sentencing under the federal sentencing guidelines: Effects of defendant
characteristics, guilty pleas, and departures on sentence outcomes for drug offenses,
1991-1992. Law and Society Review, 31, 789-822.
Albonetti, C. A. (2002). The joint conditioning effects of defendant’s gender and ethnicity on
length of imprisonment under federal sentencing guidelines for drug traffickingmanufacturing offenders. Journal of Gender, Race, and Justice, 6, 39-60.
Allison, P. D. (2001). Missing data. Thousand Oaks, CA: Sage.
Anderson, D. A. (1999). The aggregate burden of crime. Journal of Law and Economics, 42(2),
611- 642.
Anderson, A. L., & Spohn, C. (2010). Lawlessness in the federal sentencing process: A test for
uniformity and consistency in sentencing outcomes. Justice Quarterly, 27(3), 362-393.
Bates, D., Maechler, M., Bolker, B., Walker, S., Christensen, R., Singmann, H., … Green, P.
(2016). lme4: Linear Mixed-Effects Models using 'Eigen' and S4. R package version 1.112. https://cran.r-project.org/web/packages/lme4/index.html
Baumer, E. P. (2013). Reassessing and redirecting research on race and sentencing. Justice
Quarterly, 30(2), 231-261.
Britt, C. L. (2000). Social context and racial disparities in punishment decisions. Justice
Quarterly, 17(4), 707-732.
Britt, C. L. (2009). Modeling the distribution of sentence length decisions under a guidelines
system: An application of quantile regression. Journal of Quantitative Criminology,
25(4), 341-370.
Burnham, K. P. (2017). Appendix D: Variance components and random effects models in
MARK. In E. G. Cooch & G. C. White (Eds.), Program MARK: A gentle introduction
(D1-D45). Retrieved from
http://www.phidot.org/software/mark/docs/book/pdf/app_4.pdf
Casella, G., & Berger, R. L. (1990). Statistical inference. Pacific Grove, CA: Wadsworth.
Chambliss, W. J., & Seidman, R. B. (1982). Law, order, and power. Reading, Massachusetts:
Addison-Wesley.

	

102	

	
Champely, S., Ekstrom, C., Dalgaard, P., Gill, J., Wunder, J., & De Rosario, H. (2015). pwr:
Basic functions for power analysis. R package version 1.1-3. https://cran.rproject.org/web/packages/pwr/index.html
Chiricos, T. G., & Waldo, G. P. (1975). Socioeconomic status and criminal sentencing: An
empirical assessment of a conflict proposition. American Sociological Review, 40, 753772.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Lawrence
Erlbaum.
de Leeuw, J., & Kreft, I. G. (1995). Questioning multilevel models. Journal of Educational and
Behavioral Statistics, 20(2), 171-189.
Dixon, J. (1995). The organizational context of criminal sentencing. American Journal of
Sociology, 100, 1157-1198.
Doerner, J. K., & Demuth, S. (2010). The independent and joint effects of race/ethnicity, gender,
age on sentencing outcomes in U.S. federal courts. Justice Quarterly, 27(1), 1-27.
Eisenstein, J., Flemming, R. B., & Nardulli, P. F. (1988). The contours of justice: Communities
and their courts. Boston, MA: Little, Brown.
Eisenstein, J., & Jacob, H. (1977). Felony justice: An organizational analysis of criminal courts.
Boston, MA: Little, Brown.
Emerson, R. M. (1983). Holistic effects in social control decision-making. Law & Society
Review, 17(3), 425-455.
Fitz-Gibbon, C. T. (1991). Multilevel modeling in an indicator system. In S. W. Raudenbush &
J. D. Willms (Eds.), Schools, classrooms, and pupils: International Studies of Schooling
from a multilevel perspective. San Diego, CA: Academic Press, Inc.
Fitz-Gibbon, C. T. (1996). Monitoring education: Indicators, quality and effectiveness. London,
UK: Continuum.
Gelman, A., & Hill, J. (2016). Data analysis using regression and multilevel/hierarchical
models. Cambridge, UK: Cambridge University Press.
Gottschalk, M. (2008). Hiding in plain sight: American politics and the carceral state. Annual
Review of Political Science, 11, 235-260.
Hagan, J. (1974). Extra-legal attributes and criminal sentencing: An assessment of a sociological
viewpoint. Law and Society Review, 8, 357-383.

	

103	

	
Hagan, J. (1989). Why is there so little criminal justice theory? Neglected macro- and microlevel links between organization and power. Journal of Research in Crime and
Delinquency, 26(2), 116-135.
Hartley, R. D., Maddan, S., & Spohn, C. (2007). Concerning conceptualization and
operationalization: Sentencing data and the focal concerns perspective—a research note.
The Southwest Journal of Criminal Justice, 4(1), 58-78.
Hauser, W., & Peck, J. H. (2017). The intersection of crime seriousness, discretion, and race: A
test of the liberation hypothesis. Justice Quarterly, 34(1), 166-192.
Honaker, J., King, G., & Blackwell, M. (2011). Amelia II: A program for missing data. Journal
of Statistical Software, 45, 1-47.
Hox, J. J. (2010). Multilevel analysis: Techniques and Applications. New York, NY: Routledge.
Jacob, H. (1997). The governance of trial judges. Law and Society Review, 31(1), 3-30.
Johnson, B. D. (2003). Racial and ethnic disparities in sentencing departures across modes of
conviction. Criminology, 41(2), 449-489.
Johnson, B. D. (2005). Contextual disparities in guidelines departures: courtroom social context,
guidelines compliance, and extralegal disparities in criminal sentencing. Criminology,
43(3), 761-796.
Johnson, B. D. (2006). The multilevel context of criminal sentencing: Integrating judge and
county level influences in the study of courtroom decision making. Criminology, 44, 259298.
Johnson, B. D. (2014). Judges on trial: A reexamination of judicial race and gender effects across
modes of conviction. Criminal Justice Policy Review, 25(2), 159-184.
Johnson, B. D., Ulmer, J., & Kramer, J. (2008). The social context of guideline circumvention:
The case of federal district courts. Criminology, 46, 711-783.
Kautt, P. M. (2002). Location, location, location: Interdistrict and intercircuit variation in
sentencing outcomes for federal drug-trafficking offenses. Justice Quarterly, 19(4), 633671.
Kim, B., Spohn, C., & Hedberg, E. C. (2015). Federal sentencing as a complex and collaborative
process: Judges, prosecutors, judge-prosecutor dyads, and disparity in sentencing.
Criminology, 53(4), 597-623.
Kleck, G. (1981). Racial discrimination in criminal sentencing: A critical evaluation of the
evidence with additional evidence on the death penalty. American Sociological Review,
46(6), 783-805.

	

104	

	
Kleck, G. (1985). Life support for ailing hypotheses: Modes of summarizing the evidence for
racial discrimination in sentencing. Law and Human Behavior, 9, 271-85.
Klinger, D. A. (1994). Demeanor or crime? Why "hostile" citizens are more likely to be arrested.
Criminology, 32, 475-493.
Kramer, J., & Scirica, A. (1986). Complex policy choices: The Pennsylvania commission on
sentencing. Federal Probation, 50, 15-23.
Kramer, J. H., & Ulmer, J. T. (2009). Sentencing guidelines: Lessons from Pennsylvania.
Boulder, CO: Lynne Rienner.
Kreft, I. G., & Yoon, B. (1994). Are multilevel techniques necessary? An attempt at
demystification. Paper presented at the Annual Meeting of the American Educational
Research Association, New Orleans, LA. Retrieved from
http://files.eric.ed.gov/fulltext/ED371033.pdf
LaFree, G. (1998). Losing legitimacy: Street crime and the decline of social institutions in
America. Boulder: Westview Press.
Levin, M. A. (1977). Urban politics and the criminal courts. Chicago, IL: University of Chicago
Press.
Lipsky, A. M., Gausche-Hill, M., Vienna, M., & Lewis, R. J. (2011). The importance of
“shrinkage” in subgroup analyses. Annals of Emergency Medicine, 55(6), 544-552.
Lizotte, A. (1978). Extra-legal factors in Chicago’s criminal courts: Testing the conflict model of
criminal justice. Social Problems, 25, 564-580.
Maynard-Moody, S., & Musheno, M. (2003). Cops, teachers, counselors: Stories from the front
lines of public service. Ann Arbor, MI: University of Michigan Press.
MacKenzie, D. L. (2001). Corrections and sentencing in the 21st century: evidence-based
corrections and sentencing. Prison Journal, 81, 3-17.
Miethe, T. D., Moore, C. A. (1985). Socioeconomic disparities under determinate sentencing
systems: A comparison of preguideline and postguideline practices in Minnesota.
Criminology, 23, 337-363.
Paternoster, R., Brame, R., Mazerolle, P., & Piquero, A. (1998). Using the correct statistical test
for equality of regression coefficients. Criminology, 36(4), 859-866.
Pennsylvania Commission on Sentencing (PCS). (n.d). Sentencing Guidelines Manuals.
Retrieved from http://pcs.la.psu.edu/guidelines/sentencing/sentencing-guidelines-andimplementation-manuals

	

105	

	
Pratt, T. (1998). Race and sentencing: A meta-analysis of conflicting empirical research results.
Journal of Criminal Justice, 26(6), 513-523.
R Core Team (2016). R: A language and environment for statistical computing. R Foundation for
Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Reitz, K. (1998). Sentencing. In M. Tonry (Ed.), The handbook of crime and punishment (pp.
543-546). New York: Oxford University Press.
Russell, K. (1998). The color of crime: Racial hoaxes, white fear, black protectionism, police
harassment, and other macro-aggressions. New York: New York University Press.
Ruth, H. S., & Reitz, K. (2003). The challenge of crime: Rethinking our response. Cambridge:
Harvard University Press.
Savelsberg, J. (1992). Law that does not fit society: Sentencing guidelines as a neoclassical
reaction to the dilemmas of substantivized law. American Journal of Sociology, 97, 13461381.
Snijders, T.A., & Bosker, R.J. (2012). Multilevel analysis: An introduction to basic and
advanced multilevel modeling. London: Sage.
Spohn, C. (2000). Thirty years of sentencing reform: The quest for a racially neutral sentencing
process. Criminal Justice: The National Institute of Justice Journal, 3, 427-501.
Spohn, C., & Holleran, D. (2000). The imprisonment penalty paid by young, unemployed black
and Hispanic male offenders. Criminology, 38(1), 281-306.
Steenbergen, M. R., Jones, B. S. (2002). Modeling multilevel data structures. American Journal
of Political Science, 46(1), 218-237.
Steffensmeier, D., & Demuth, S. (2001). Ethnicity and judges’ sentencing decision: Hispanicblack-white comparisons. Criminology, 39(1), 145–178.
Steffensmeier, D., & Demuth, S. (2006). Does gender modify the effects of race-ethnicity on
criminal sanctioning? Sentences for male and female white, black, and Hispanic
defendants. Journal of Quantitative Criminology, 22, 241–261.
Steffensmeier, D., Ulmer, J. T., & Kramer, J. H. (1998). The interaction of race, gender, and age
in criminal sentencing: The punishment cost of being young, black, and male.
Criminology, 36(4), 763- 98.
Tate, R. L. (2004). A cautionary note on shrinkage estimates of school and teacher effects.
Florida Journal of Educational Research, 42, 1-21.

	

106	

	
Teddlie, C., & Reynolds, D. (2000). The international handbook of school effectiveness
research. New York, NY: Falmer Press.
Tillyer, R., Hartley, R. D., & Ward, J. T. (2015). Differential treatment of female defendants:
Does criminal history moderate the effect of gender on sentence length in federal
narcotics cases? Criminal Justice and Behavior, 42, 703-721.
Tonry, M. (1996). Sentencing matters. New York, NY: Oxford University Press.
Tyler, T. R. (1990). Why people obey the law. New Haven: Yale University Press.
Ulmer, J. T. (1997). Social worlds of sentencing: Court communities under sentencing
guidelines. Albany, NY: State University of New York Press.
Ulmer, J. T. (2012). Recent developments and new directions in sentencing research. Justice
Quarterly, 29(1), 1-39.
Ulmer, J. T., & Bradley, M. S. (2006). Variation in trial penalties among serious violent offenses.
Criminology, 44(3), 631-670.
Ulmer, J. T. & Johnson, B. D. (2004). Sentencing in context: A multilevel analysis. Criminology,
42(1), 137-175.
Ulmer, J. T., & Kramer, J. H. (1996). Court communities under sentencing guidelines: Dilemmas
of formal rationality and sentencing disparity. Criminology, 34, 383-408.
Western, B. (2006). Punishment and inequality in America. New York: Russell Sage Foundation.
Wheeler, S., Weisburd, D., & Bode, N. (1982). Sentencing the white-collar offender: Rhetoric
and reality. American Sociological Review, 47, 641-659.
Wilbanks, W. (1987). The myth of a racist criminal justice system. Monterey, CA: Brooks/Cole.
Willms, J. D. (1992). Monitoring school performance: A guide for educators. Washington, DC:
Falmer Press.
Wolfe, S. E., Pyrooz, D. C., Spohn, C. (2011). Unraveling the effect of offender citizenship
status on federal sentencing outcomes. Social Science Research, 40, 349-362.
Wooldredge, J. (2010). Judges’ unequal contributions to extralegal disparities in imprisonment.
Criminology, 48(2), 539-567.
Zatz, M. (1987). The changing forms of racial/ethnic bias in sentencing. Journal of Research in
Crime and Delinquency, 24(1), 69-92.
Zatz, M. (2000). The convergence of race, ethnicity, gender, and class on court decisionmaking:

	

107	

	
Looking toward the 21st century. Criminal Justice: The National Institute of Justice
Journal, 3, 503-552.

	

108