SCHOOL INSPECTION IN THE UNITED STATES:
POTENTIAL FOR SCHOOL REFORM AND LASTING INSTITUTIONAL CHANGE
                                     By
                                Pablo Bezem
                            A DISSERTATION
                                 Submitted to
                         Michigan State University
                 in partial fulfillment of the requirements
                              for the degree of
                 Education Policy–Doctor of Philosophy
                                    2021


                                               ABSTRACT
                         SCHOOL INSPECTION IN THE UNITED STATES:
       POTENTIAL FOR SCHOOL REFORM AND LASTING INSTITUTIONAL CHANGE
                                                     By
                                               Pablo Bezem
        In an era of test-based accountability, school inspection may offer a nuanced understanding
of school performance and actionable information for improvement. Yet, little empirical evidence
exists on its effectiveness in advancing performance, particularly in the United States. This
dissertation examines the potential of school inspection as an alternative accountability mechanism
through a series of studies based in one of the only U.S. districts to experiment with inspection.
Three papers evaluate: 1) How inspectors arrive at their decisions, 2) whether inspections promote
principals’ attitudes that are associated with lasting institutional change, and 3) effects of inspection
on school planning directions. Each of these papers offers a significant contribution to the literature
on school inspection and provides evidence regarding the inspection process and its potential to
enable school improvement.
        The first paper inquires about how inspectors evaluate school and reach determinations. To
shed light on inspectors’ decision-making processes, the case of a U.S. district is contrasted with two
long-established international systems. Results reveal that decisions are strongly influenced by local
culture and professional traditions. Despite efforts to introduce alternative means for school
assessment in the U.S., a test-based accountability mindset dominates and limits the potential of
inspection.
        The second paper investigates whether principals’ attitudes towards inspection are those
associated with lasting institutional change. This study bridges insights from the organizational
change literature in the education field. Semi-structured interviews with 20 principals in the selected
U.S. district inquire about the perceived effectiveness of the diagnosis, appropriateness of inspection


feedback, and readiness for implementing changes. Results show that strongly positive attitudes
toward inspection associated with lasting change. A majority of principals view inspection favorably
since its breadth and depth contributes to a more accurate diagnosis of key challenges. Holistic
evaluation and actionable findings are not feasible through test-based accountability alone. Principals
also express a strong commitment to implementing changes based on inspection feedback. These
results provide the first empirical evidence of the influence of school inspection on sustained
institutional change.
         The third essay examines the effect of inspection on school planning. Prior to implementation
of reforms, priorities are set through the school planning process. Despite the wide use of inspection
globally, no previous study has tested whether a causal relationship exists between inspection and
school planning. This study uses mixed methods to examine whether inspection shifts the areas of
focus in school planning documents. The study sample comprises 160 public schools in the selected
U.S. district. In-depth interviews with school principals reveal that inspections are perceived as
useful and led to planned reforms focused in two areas: instructional practices and school climate.
Results from a difference-in-differences analysis suggest that inspection shifted the focus of planning
documents towards these two areas. Inspection led to nearly a doubling of keywords in improvement
plans related to these two focus areas. This study provides empirical evidence regarding the potential
of inspection to inform school planning.
         Overall, this dissertation advances understanding of the potential of school inspection to offer
insights for improved school reforms, in a high-stakes, test-based accountability system. Findings
demonstrate how local and professional culture influence and condition inspection practices. In
addition, I show that principals demonstrate strong positive attitudes towards inspection, which are
favorable to sustainable change. Finally, I provide causal evidence that inspection shapes the focus of
school improvement plans. These results have implications for U.S. accountability policies and the
potential for a more comprehensive approach, beyond test-based accountability.


With all my love to Maurita—my dearest partner in this journey—Aidan—my new inspiration—
and Hernán & Alicia—the bedrock for all my endeavors—.
                                             iv


                                      TABLE OF CONTENTS
LIST OF TABLES                                                                               viii
LIST OF FIGURES                                                                                ix
Introduction                                                                                    1
Paper 1: Informing New Approaches to School Accountability in the United States: Inspectors’
Decision-making from a Comparative Perspective                                                  4
Introduction                                                                                    4
Background: U.S. Education Accountability                                                       6
Literature Review: Inspectors’ Decision Making                                                  7
Theoretical Framework                                                                           9
Methods                                                                                       10
        Case Selection                                                                        11
                International Comparison                                                      11
                        U.S. District                                                         11
                        Rio Negro, Argentina                                                  12
                        The Netherlands                                                       13
        Data Collection and Analysis                                                          14
Results                                                                                       17
        Indicators of School Quality                                                          18
                U.S. Case: Focusing on Rubrics and Avoiding Bias                              18
                Comparison: Role of Indicators of School Quality                              20
        Multi-Informant Approach                                                              22
                U.S. case: Finding Trends and Discarding Outliers                             22
                Comparison: Role of Multi-Informant Approach                                  23
        Interactions Among Inspectors                                                         25
                U.S. Case: Consensus Building & Guided Sensemaking                            25
                Comparison: Role of Interactions among Inspectors                             28
        Local Context Information                                                             29
                U.S. Case: Minor Role in Inspectors’ Thinking                                 29
                Comparison: Role of Local Context Information                                 30
        Inspectors’ Perspectives                                                              31
                U.S. Case: Personal Judgement within the Scope of the Protocol                31
                Comparison: Role of Inspectors’ Perspectives                                  34
Conclusions and Policy Implications                                                           37
        The Legacy of Test-Based Accountability in the United States                          38
APPENDIX                                                                                      40
REFERENCES                                                                                    48
Paper 2: Principals’ Attitudes towards School Inspection in a U.S. district: Contribution to
Sustained School Reform                                                                       60
Introduction                                                                                  60
                                                  v


Background                                                                             62
Literature Review                                                                      64
Theoretical Framework                                                                  66
Case Study: Description of School Inspection System                                    68
Methods                                                                                69
Results                                                                                72
         Perceptions about the District Diagnosis Effectiveness excluding Inspection   73
         School Inspections: Positive Attitudes among Principals                       75
                 Perceptions of Diagnosis Effectiveness                                75
                 Sentiments of Appropriateness                                         78
                 Readiness for Change                                                  80
         Mixed Sentiments and Ambivalence                                              82
                 Concerns about Diagnosis Effectiveness                                82
                 Concerns about Appropriateness                                        83
                 Uncertainty about Readiness for Change                                84
         Negative Attitudes                                                            86
Conclusions                                                                            88
APPENDICES                                                                             92
         Appendix A: Interview Protocol                                                93
         Appendix B: Coding Scheme                                                     97
REFERENCES                                                                             99
Paper 3: The Effect of Inspection on School Improvement Planning: Evidence from a US.
District                                                                              111
Introduction                                                                          111
Literature Review                                                                     113
         School Change based on Inspection Feedback                                   113
         The Uses of School Improvement Plans                                         114
District Background                                                                   116
         School Inspections                                                           116
         School Improvement Plans.                                                    117
Research Design                                                                       118
         Stage I. Interviews with School Principals                                   119
                 Influential areas based on inspection feedback                       120
         Stage II. Content Analysis                                                   121
                 Word Frequencies in SIPs - Most Influential Areas                    124
         Stage III - Statistical Analysis                                             125
Results                                                                               128
         Interview Analysis - Perceived Usefulness of Inspections                     128
         Difference-in-Differences Analysis                                           133
Conclusions                                                                           138
APPENDICES                                                                            142
         Appendix A – Codebook for Interviews to School Principals                    143
         Appendix B – Content Analysis Dictionary                                     144
         Appendix C –Most Frequent Phrases on School Improvement Plans – School Years
         2016-17 and 2018-19                                                          146
                                                    vi


     Appendix D – Test Score Results            147
     Appendix E – Dictionary for Placebo Test   148
REFERENCES                                      149
                                            vii


                                         LIST OF TABLES
Table 1. Inspectors’ Background and Experience                                            15
Table 2. Sources Guiding Inspectors’ Thinking                                             16
Table 3. Principals’ Experience and Education                                             69
Table 4. Principals’ Attitudes towards School Inspection                                  72
Table 5. Summary of principals’ views                                                     76
Table 6. Influential Areas - Changes Implemented in Schools based on Inspection Feedback 121
Table 7. Content Analysis Coverage and Term Frequency                                    125
Table 8. Summary Statistics                                                              127
Table 9. Non-parametric Difference-in-Difference                                         133
Table 10. Content Analysis Coverage for Panel                                            133
Table 11. DD Regression Results                                                          135
Table 12. Balance Tests                                                                  137
Table 13. Placebo Tests.                                                                 138
                                                 viii


                                        LIST OF FIGURES
Figure 1. Comparison: Influence of Information Sources on Inspectors’ Thinking  18
Figure 2. Research Design                                                      119
Figure 3. Parallel trends: Word Count for Inspected vs. Not Inspected Schools  136
                                                  ix


                                              Introduction
         Test-based accountability (TBA) prevails as the central paradigm for school improvement
efforts across the United States (e.g. Figlio & Loeb, 2011; Hanushek & Raymond, 2005). Schools are
incentivized to raise student achievement on standardized tests (e.g. Ladd & Figlio, 2008). Yet, test
scores alone offer limited insight into specific reforms that might benefit a given school (e.g. Gagnon
& Schneider, 2019). An alternative approach to accountability is school inspection, which is widely
used outside of the United States. Instead of primarily relying on test scores, inspection consists of
holistic evaluations in schools and are conducted by expert educators. These evaluations include
classroom observations, school document reviews, and interviews with school staff, students, and
families. By closely observing school operations, inspectors might gain better insight into factors that
might help or hinder improvement (e.g. Barber, 2005).
         Despite the promise of inspection to enable reforms, there is little empirical evidence on its
effectiveness (de Wolf & Janssens, 2007; Ehren, 2016b; Klerks, 2012). No prior empirical studies have
assessed inspection in the United States. This dissertation examines the potential of school inspection
as an alternative accountability mechanism by developing a case study of one of the only U.S. districts
to experiment with inspection. Through three papers, different aspects of inspections are evaluated: 1)
Inspectors’ decision making from an international comparative perspective, 2) Principal attitudes
towards inspection associated with lasting institutional change, and 3) Effects of inspection on school
planning. Each of these papers offers a significant contribution to the literature on school inspection
and provides evidence regarding the potential of inspection to enable school improvement.
         The first paper focuses on school inspectors’ decision-making and the role of sensemaking in
their evaluations. Using a comparative case study of a U.S. district and two long-established
international systems, this study examines the information sources that guide inspectors’ thinking. It
                                                     1


assesses their use of professional judgement and the implications of inspectors’ sensemaking for
school improvement. Results show that decisions are strongly influenced by local and professional
culture. In the U.S. case, a “test-based accountability mindset” dominates the inspection process.
This leads to inspections strictly adhering to protocols and reduces inspectors’ professional insights
in an effort to avoid bias. In contrast, in the two international cases, inspectors rely more on their
professional judgment and delve into complex issues beyond the limited procedures outlined in
protocols. Despite efforts to introduce alternative means for school assessment in the United States,
the prevalent mindset might limit the potential of inspection to gain insights for school improvement.
        The second paper investigates whether principals’ attitudes towards inspection are those
associated with lasting institutional change. This study is grounded in organizational change theory
and uses semi-structured interviews with principals. Responses are compared between inspected and
not inspected schools in the selected U.S. district. Interviews inquire about perceived effectiveness of
the diagnosis, appropriateness of inspection feedback, and readiness for making changes. Results
show that strongly positive attitudes toward inspection lead to dispositions that are associated with
lasting change. A majority of principals highlight that the breadth and depth of inspection contribute
to a more accurate diagnosis of school’ problems. Such a holistic evaluation with actionable findings
is not available through test-based accountability alone. Principals also express a strong commitment
to making changes based on the inspection feedback. These results provide the first empirical
evidence of the effects of school inspection in enabling sustained institutional change.
        The third paper uses mixed methods to examine the causal effect of inspection on the focus
areas in school planning. No prior study provides evidence of the causal effect of inspection on
school planning. A step prior to implementing school reforms is typically setting priorities through
the school planning process (e.g. Matthews & Sammons, 2004). In-depth interviews with school
principal reveal that inspections were generally perceived as useful for planning purposes and led to
anticipated or actual reforms. Inspection particularly influenced two areas of reform: instructional
                                                      2


practices and school climate conducive to learning. Next, the study evaluates the influence of inform
on these two areas of reform, using content analysis and difference-in-differences analysis. Content
analysis evaluated the presence of terms related to these areas in school improvement plans. A
difference-in-differences analysis finds that inspection shifted the focus of planning documents.
Inspection led to nearly a doubling of key words in school improvement plans in related to these
focus areas.
         This dissertation reveals several aspects of inspection relevant to inform policymaking:
contrasting global practices, implementing sustained reforms, and the effect of inspection in shaping
school planning. The first paper offers a contrast between the U.S. case and long-established
inspection systems. It shows that even when there are apparent similarities between formal inspection
mechanisms, culture influences practice and decision-making. It reveals a trade-off between strict
adherence to the protocols to avoid bias versus greater flexibility and reliance on professional
judgement. It shows that the potential to gain insights from inspections in the U.S. system may be
constrained by the professional culture. Despite this evidence, the second paper shows that the
principals in the U.S. case perceive that the larger scope of inspection offers a more accurate and
actionable diagnosis than the TBA system. Further, most principals demonstrate strong positive
attitudes that are associated with lasting institutional change. The third paper builds causal evidence
that inspections led to actual changes in the school improvement plans.
         Overall, this dissertation sheds light on the potential of school inspection to offer insights for
improved school reforms. As the implementation of Every Student Success Act sparked debate over
the design of more comprehensive accountability designs, inspection could be viewed as an
alternative or complement to test-based accountability. Given the lack of a consistent body of
literature on holistic, on-site evaluations of schools in the United States, this dissertation contributes
to this debate by shedding light on the potential and limitations of school inspection.
                                                      3


Paper 1: Informing New Approaches to School Accountability in the United States: Inspectors’
Decision-making from a Comparative Perspective 1
                                               Introduction
         High stakes testing prevails as the central paradigm for school improvement efforts across the
United States. This is driven, in part, by an audit culture that emphasizes performance measurement
as a primary policy focus (Apple, 2005; Clarke & Ozga, 2011). An appealing feature of test-based
accountability (TBA) is the perception that it objectively measures educational performance and thus
allows comparisons across districts and years (cf. Bloem, 2015). Yet, TBA typically does not provide
nuanced information to identify why certain schools fall behind; it does not capture the myriad
factors that influence school quality (e.g. Darling-Hammond et al., 2016; Gagnon & Schneider,
2019). As a stand-alone policy, TBA often incentivizes schools to narrowly focus on tested subject
areas (e.g. Fitchett & Heafner, 2010; Jacob, 2005) and strategies to boost scores, which might not
promote substantive learning (Rothstein, Jacobsen, & Wilder, 2008).
         School inspection (SI) is an alternative approach for monitoring and improving school
quality, that emerged in national policy discussions on how to redesign accountability systems after
the 2015 enactment of the Every Student Success Act (Darling-Hammond et al., 2016; Klein, 2016;
Ladd, 2016; K. Ryan et al., 2013). Unlike TBA, school inspection is not limited to standardized tests
to evaluate student performance. Inspection evaluations usually assess a variety of school processes
through observation and direct contact with school stakeholders, such as teachers, students, and
parents (van Bruggen, 2010). In this way, inspections can provide a summative assessment of overall
1
  This paper was led by Pablo Bezem and co-authored with Dr. Anne Piezunka and Dr. Rebecca Jacobsen.
                                                     4


school quality while also uncovering factors that help or hinder school improvement. Several U.S.
districts have experimented with inspection in some form including New York City, Los Angeles,
Oakland, and Cleveland. In the U.S., inspections are often referred to as Quality Reviews. Lessons
from previous experience with inspection can inform the debate about redesigning the U.S.
accountability system.
         School inspections might serve as an alternative mechanism to achieve more nuanced
accountability and insights for improvements. Yet, a major concern is its perceived subjective nature
(Glazerman, 2016), which can raise questions about its reliability. A tradeoff exists between
obtaining insights from inspectors with subjective perspectives versus achieving greater reliability
with standardized metrics. To understand how inspection works, it is critical to shed light on
inspectors’ decision process and how school quality is judged. No previous study has evaluated what
drives inspectors’ decision-making in the U.S.; internationally, scarce empirical literature exists
regarding this aspect of SI.
         While TBA relies on evaluation of academic content knowledge to determine school quality,
SI depends on the complex decision-making process of individual inspectors. In this way, inspectors
are the are the linchpin for SI reliability and effectiveness. Understanding inspectors’ thinking and
how personal perspective are utilized can shed light on this process. This can then inform policy
discussions regarding the advantages and limitations of SI, compared to TBA. This study poses three
research questions: (1) What are the sources guiding inspectors’ thinking during inspections?, (2)
How do personal perspectives of inspectors influence school evaluations?, (3) What implications
does inspectors’ sensemaking have for school improvement and school quality?
         To address these questions, we develop a comparative case study of inspection in a U.S.
school district and two long-established international SI systems. Our case in the U.S. is one of the
few districts in the country to consistently use inspection for over a decade. The two international
cases include a province in Argentina (Rio Negro) and the Netherlands. The Argentinean case has a
                                                      5


system of continuous support, flexible procedures, and low stakes; the Dutch case has a high-stakes
system with a formal protocol and expert-based flexibility for inspectors. These three systems
represent distinct contexts and can illuminate evaluation processes and inspectors’ decision making.
                             Background: U.S. Education Accountability
        The United States has assembled one of the most developed education accountability systems
in the world (Figlio & Loeb, 2011). This system has been shaped by New Public Management
principles in the 1980s that promoted greater rationalization, evidence-based change, output
orientation, and rigorous accountability. The emphasis on accountability in policy reform was
evident in the 1990s, with some states implementing high-stakes accountability systems. These
became widespread with the passage of the No Child Left Behind (NCLB) Act of 2001, which
mandated the nationwide use of test scores to measure of school quality. This legislation led to a new
definition of school reform that was broadly supported by elective representative across the political
spectrum (Figlio & Loeb, 2011; Ravitch, 2016) and became common sense among school reformers.
This situation has been described as the audit culture (Apple, 2005).
        Subsequently, calls for a broader set of quality indicators and state-level flexibility led to the
passage of the Every Student Succeeds Act in 2015. This sparked a nationwide discussion regarding
how to redesign U.S. accountability systems. Already, most states and districts use a greater variety
of school quality metrics (Edgerton, 2019; Portz & Beauchamp, 2020). Although the U.S. continues
to emphasize TBA (Mathis & Trujillo, 2016), greater flexibility for states and districts creates new
possibilities for more holistic approaches for accountability and school improvement (Darling-
Hammond et al., 2016). Inspection mechanisms emerged as an alternative approach to evaluate
school quality (Darling-Hammond et al., 2016; Klein, 2016; Ladd, 2016, 2017; K. Ryan et al., 2013).
SI considers a greater variety of factors that influence education quality, such as inputs, expert
observation of school process, and interaction with school stakeholders.
                                                     6


                               Literature Review: Inspectors’ Decision Making
        Policy efforts to incorporate inspection mechanisms into U.S. accountability systems are
limited by scarce empirical evidence (de Wolf & Janssens, 2007; Ehren, 2016b; Klerks, 2012).
Despite the fact that inspection systems have long existed around the world, most previous research
focuses narrowly on European systems. Furthermore, the wide variety of SI arrangements that exist
makes it challenging to build a coherent body of literature that converges on key findings. The local
nature of inspection has reinforced a tendency of focusing inspection research on local systems,
which is often then published in country-specific journals in the native language. SI research
published in more widely read journals, in the English language, has expanded during the last five
years. Yet, the literature remains limited and fragmented. The empirical literature that does exist has
tended to focus on the effects and side effects of SI (e.g. Altrichter & Kemethofer, 2015; Klerks,
2012).
        Despite this growing body of research, limited empirical research has centered on school
inspectors themselves and their influence on the evaluation process. Most early studies published in
English were conducted primarily in the UK, where the inspectorate, the Office for Standards in
Education, Children's Services and Skills (OFSTED), is a longstanding institution. Despite using a
highly standardized inspection procedure and a reliable system of classroom observation (Peter
Matthews et al., 1998), various studies conclude that the professional judgement of OFSTED
inspectors played a key role in their evaluations (Gilroy & Wilcox, 1997; Lee & Fitz, 1997; Woods
& Jeffrey, 1998). It was found that inspectors’ feedback to schools is influenced by perceived
constraints that the local context imposes on teachers (Woods & Jeffrey, 1998). In addition,
inspectors’ professional background impacts their judgement. For example, prior experience serving
as a classroom teacher can increase empathy and a sense of collegiality with teachers (Baxter, 2013;
Millett & Johnson, 1998).
                                                    7


         Since professional judgement plays a role in inspectors’ decision making, it is relevant to
explore how individual judgement varies. Although this question has not been directly studied,
Silcock and Wyness (1998) shed some light on the issue, finding a wide diversity in inspector beliefs
about education and current reforms as well as their empathy with challenges faced by teachers.
These differences were apparent despite standardized training and evaluation tools. This early
research demonstrated that profound differences in core beliefs regarding education can persist in a
highly standardized system. However, it does not attempt to link how these beliefs influence
evaluations and school feedback.
         Recent studies have focused on the process of judgement formation in SI systems, where
feedback is decided through consensus among a group of inspectors (Dedering & Sowada, 2017;
Lindgren, 2015; Rutz et al., 2017). Despite the use of protocols and standards, inspectors have some
discretion, both as individuals and the overall group, when working towards a consensus and making
decisions (Dedering & Sowada, 2017; Rutz et al., 2017). Lindgren (2015) demonstrates that in the
highly standardized Swedish system, there is a stark contrast between how decisions are formed
during the inspection process (the “backstage” of inspection) versus how final feedback is presented
to the school and community (the “front stage”). Even when inspectors present hard evidence to
justify decisions in the “front stage,” there is negotiation among inspectors in the “backstage,” where
their judgments encompass a mix of uncertainty, adaptation, and creativity.
         These findings show that the human element and professional judgement remain central in
the inspection process, regardless of efforts to standardize processes and procedures. Despite
knowing that variation does occur, it is not yet understood how specific personal aspects of the
inspectors and institutional features of the school system influence the inspection process. This study
aims to provide initial insights into this critical aspect of inspection systems.
                                                       8


                                        Theoretical Framework
         This study draws on sensemaking theory as a conceptual framework (Weick, 1995). A
growing body of literature in education draws on this theory to understand teachers’ and
administrators’ interpretative frameworks when enacting educational policies (Coburn,2005;
Halverson et al,2004; Rigby,2015; Spillane et al,2002). Sensemaking theory is particularly useful for
understanding how individual actors comprehend a situation, make meaning of it, and then act based
on this interpretation (Weick, 1995; Weick et al., 2005). Educational studies that draw on this
approach have addressed how this process is influenced by preexisting worldviews, prior knowledge,
experience, formal and informal networks, and the organizational and social context within which
sense-makers work (Ball & Bowe, 1992; Coburn, 2001; Hill, 2001; Porac et al., 1989; Spillane et al.,
2002).
         Sensemaking literature related to policy implementation has focused on how knowledge
structures are accessed and applied in practical situations. One finding is that observations made by
individuals who implement policy can often focus on the superficial aspects of a situation that then
trigger a memory of another situation. This jeopardizes the ability to dive into the deeper significance
of what is observed (Spillane et al., 2006). This literature has also found that individual reasoning
about a complex judgement tends to be biased toward interpretations that are consistent with their
beliefs and values (Spillane et al., 2002). Research has also found that sensemaking processes are
mediated by considerations about organizational structures (e.g. work environment, norms, and
rules), professional affiliation and networks, and traditions (e.g. Coburn, 2001; Spillane, 1999;
Spillane et al., 2006). Policy implementation studies have shown the relevance of socially mediated
sensemaking. For example, when teachers implement instructional policies, sensemaking is mediated
by school leaders’ participation in the interpretation of the policies (Coburn, 2005) as well as by
interactions with other teachers (Coburn, 2001; Hill, 2001).
                                                     9


        A separate body of literature in organizational studies focuses on sensemaking within
organizations. Sensemaking theory has been used in this field to understand confusing or ambiguous
events within organizations (Maitlis & Christianson, 2014; Sandberg & Tsoukas, 2015; Weick,
1995). Similar to sensemaking research on education policy implementation, organizational studies
highlight the importance of constructing intersubjective meaning, which occurs when various actors
within an organization, such as managers and peers, shape each other’s understanding (Gioia et al.,
1994).
        Using these perspectives and building upon past research, our study draws on sensemaking
theory to understand how inspectors interpret situations that they observe in schools and arrive at
judgements regarding school quality. School inspectors must reconcile government guidelines, best
practices, and inspection protocols with the situations they find in the schools. Therefore,
sensemaking theory provides a useful lens to understand this process. Sensemaking is likely
mediated by inspectors’ own experience and beliefs about education, the interaction with other
inspectors, and organizational culture. The sensemaking literature provides useful constructs to
capture the variety of factors that influence how inspectors reconcile policies with practice (e.g.
Coburn, 2005; Hill, 2001). While protocols do exist, there is flexibility for inspectors to use
professional judgement (Dedering & Sowada, 2017; Gilroy & Wilcox, 1997; Lindgren, 2015).
Therefore, we rely on the sensemaking literature to analyze how inspectors interpret complex
situations observed at schools and how they arrive at decisions.
                                                Methods
        To investigate school inspectors’ decision-making process, this study uses a comparative,
multi-site case study approach. A district in the U.S. serves as the main focus. We then conduct a
horizontal examination of the decision process of inspectors across sites (Phillips & Schweisfurth,
2014; Vavrus & Bartlett, 2016). This comparison highlights contrasts and similarities to the U.S.
case, where SI experience is limited. Through these cross-site comparisons, we characterize a
                                                   10


diversity of SI arrangements and practices, which can advance understanding of the broad spectrum
of inspection thinking processes (Chabbott & Elliott, 2003). The other two cases provide a broader
view of various aspects of the inspectors’ decision-making process, showing similarities and
differences with the U.S. and identifying aspects of inspection not captured by the U.S. case. The
analysis focuses mostly on inspectors’ thinking processes. It takes into consideration less formal
aspects of these processes, including inspectors’ personal perspectives, such as preferences, beliefs,
and professional judgement.
Case Selection
         We selected one district in the U.S. since it is one of the few cases in the country to
consistently use inspecting mechanisms for over a decade. We then contrast this case with two long-
established international SI systems, Argentina and the Netherlands. Within all of three SI systems, a
group of experts conducts in- school evaluations and use several modes of data collection –
classroom observation, school stakeholder interviews, and document analysis. Yet, these cases also
differ from one another in key aspects relevant to our study objectives. Notable differences include
the purpose that inspection serves (accountability vs. support) and severity of consequences due to
the inspection results. Differences in protocol of conducting inspections are also present – the
frequency and length of inspection visits, number of inspectors, and public availability of inspection
reports. These differences may influence the information sources that inspectors consider when
evaluating schools.
International Comparison
         U.S. District. The U.S. case is a large, urban school district that relies heavily on an high-
stakes framework, based on high-stakes test scores, for accountability purposes. The district began
experimenting with inspection processes more than a decade ago as part of school reforms. The
inspection program, referred to as Quality Reviews, primarily targets low-performing schools. Unlike
the other cases included in this study, inspection is outsourced to private consulting firms and is not
                                                    11


directly managed by a governmental office. Since 2012, the process has been led by a company we
will refer to as QualiEv. This inspection program gathers qualitative evidence about school programs
for accountability and formative reviews. School visits are conducted by groups of three to four
inspectors. At least one is a representative from QualiEv, a full-time inspector who leads the process,
and the others are certified reviewers from the District Department of Education. The team is guided
by a detailed protocol which outlines the evaluation process and includes research-based standards
regarding effective school practices. Inspection activities consist of school document reviews,
classroom observations, as well as interviews and focus groups with teachers and administrators.
Immediately after inspection visits, inspectors share main findings in an oral report to school
administrators. Then, inspectors and school administrators work jointly in a planning process,
discussing school strengths and areas of growth, establishing the next steps, defining strategies,
setting measures to establish success, and a timeline to achieve these goals. A written report
summarizing conclusions is provided to schools, which includes suggestions for priority areas, but
not specific recommendations for improvement.
         Rio Negro, Argentina. Each province in Argentina manages its educational system.
Inspection is the main mechanism for school accountability. Standardized tests are low-stakes and
only used for diagnostic purposes. Inspectors are in the highest position of the teaching ladder. They
are full-time public officials who report directly to the provincial Ministry of Education and must
have considerable experience: 12 years in teaching, 2 years in leadership, and inspectorate training
(Concurso de Supervisores Rio Negro, 2013).
         The main purpose of SI in Rio Negro Province is to provide support to all schools. In the
process, school administrators are held accountable. Inspectors develop inspection projects, and
while they must follow broad guidelines, there are no specific protocols for school visits or
inspection activities. Inspectors are assigned to a group of schools to conduct administrative controls
and provide continuous support. School visits occur at least three times a year, and can be more
                                                    12


frequent if a school requires more support (Resolución Del Consejo Provincial de Educación de Río
Negro N 1053, 1994). Inspectors consult with their technical team of professionals in education to
inform their work. All inspectors go through basic training and are accountable for following legal
standards. Inspectors prepare reports for the schools, which are not publicly available. No sanctions
are imposed for poor academic performance. Furthermore, inspection does not track standardized
educational outcomes, such as test scores, nor must follow specific standards regarding education
processes.
         The Netherlands. The Dutch Ministry of Education coordinates educational policy with
municipalities. Accountability relies on both outcome- and school-based components (Nusche et al.,
2014). High-stakes testing has an important role (Scheerens et al., 2012; van der Sluis et al., 2017),
while at the same time, inspection is a central instrument for monitoring standards. Inspectors are
full-time public officials and receive specific training.
         The Netherlands emphasizes SI of low-performing schools.2 While all schools are inspected
at least once every four years, the lowest performing receive more frequent and rigorous visits. To
determine the frequency and type of inspections, inspectors use a risk-based model. This model
assesses school risk based on administrative information, including standardized test scores,
accountability documents, and failure signals, such parents’ complaints or negative media reports
(Education Inspectorate - Ministry of Education, 2010). Inspectors follow an assessment framework
for legal aspects, process quality, and outcomes. The framework pays particular attention to learning
outcomes, educational process, school environment, quality assurance and ambition, and financial
management (Education Inspectorate - Ministry of Education, 2017a, 2017b). Each of these areas
include a set of standards that is operationalized based on statutory requirements. Results of
2
  At the time of the interviews, the Dutch system was transitioning to School Board inspections in
addition to continuing with the risk-oriented school inspections. In our interviews, we focused on the on-
site school inspections as implemented until the academic year 2016-17.
                                                     13


inspection are shared with the school and public through a summary report. If schools do not
demonstrate improvement for two years, inspectors can recommend to the Ministry of Education
administrative and/or funding sanctions. In the most extreme cases, this can lead to school closure
(Ehren,Altrichter,McNamara, & O’Hara, 2013; OECD,2015).
Data Collection and Analysis
         We conducted semi-structured interviews with inspectors of K-12 schools in the three study
locations. We inquired about the inspector’s background, activities performed during the inspection
process, and outcomes of inspection. In addition, we asked about how they make decisions regarding
school quality and what aspects of quality they value the most. Emphasis was placed on capturing the
inspectors’ thought process through the use of probes that asked for concrete examples to illustrate
their thinking and we provided inspectors with scenarios to gauge how they would response to a
given situation.
         In the United State, we interviewed inspectors from the district Department of Education who
were certified by QualiEv. We invited all 29 certified reviewers who had previously conducted
inspections. For the other sites, we selected a purposive sample of inspectors (Teddlie & Yu, 2007).
The objective was to conduct 6 to 10 interviews at each site. In total, we completed 23 interviews: 8
in the United States, 9 in Argentina, and 6 in the Netherlands. Interviews lasted an average of 72
minutes. In the United States and Argentina, interviews were conducted in-person in the local
language, English and Spanish, respectively. In the Netherlands, interviews were conducted via
videoconference in English. Interviews in Argentina were transcribed in Spanish and then translated
into English. Interviews from the Netherlands and the U.S. sites were transcribed and checked for
accuracy.
         Participants were informed that interview responses were anonymous, transcripts would not
be shared, and a pseudonym would be used to cite them. In the United States and Argentina,
inspectors were given a US$ 25 gift card after participation. In the Netherlands, we were advised by
                                                   14


SI researchers not to offer incentives. Participants in each study site are in-line with the
characteristics of inspectors in their location with respect to years of experience and demographics.
Descriptive information about interviewees are presented in Table 1.
Table 1. Inspectors’ Background and Experience
                                                    U.S. Case      Netherlands Case         Argentina Case
  Variable
                                                       n=8                n=6                     n=9
   Individual inspectors
  Inspector experience, in years                        2.3                9.8                     6.2
  Education experience, in years                       14.3               22.8                    31.0
  Classroom teaching experience, in years1              8.9               10.7                    13.1
  Administrative experience, in years1                  2.7                7.0                    15.2
   % of inspectors, within case
  % former classroom teachers                         100%                50%                    100%
  % former administrators                              38%                20%                    100%
  % with graduate degree                              100%                75%                     56%
1
  Only those inspectors with experience as teachers/administrators were included in these indicators.
         The interview transcripts were coded using deductive and inductive codes. Deductive codes
were formulated based on our theoretical framework, mainly from concepts related to sensemaking
theory (Coburn, 2005; Maitlis & Christianson, 2014; Spillane et al., 2002, 2006; Weick, 1995).
Inductive codes stemmed from interviews in the three sites. Responses were coded according to
Miles, Huberman, and Saldaña’s (2014) approach to qualitative analysis by observing patterns and
themes within and across case studies. We used DEDOOSE qualitative data analysis software for
coding and analysis. To ensure the reliability of codes, we used an independent-coder method. First,
                                                        15


interview transcripts were independently coded by two researchers, before then comparing the coding
for agreement. We followed an iterative process until at least 75% agreement was achieved.
         We conducted two rounds of coding. The first round focused on: i) sources of information
used during inspection, ii) use of local context information, and iii) inspectors’ definition of good
quality education. These codes were defined inductively. The second round relied more heavily on
deductive codes that were more abstract and required more interpretation. The main codes include iv)
inspectors’ perception of school administrators, v) type of recommendations, and vi) sources guiding
thinking. This latter code is emphasized in our analysis (Table 2). The sub-codes are based on the
inspection procedures and sensemaking theory. The theory was used to define the foci on the
knowledge structures accessed by inspectors when facing practical situations, especially when they
have to make sense of complex situations.
Table 2. Sources Guiding Inspectors’ Thinking
            Source                                           Description
       Indicators of school    Standardized rubrics, indicators, or metrics used to evaluate school
  A
       quality                 quality during inspection
       Multi-informant         Simultaneous use of multiple sources of information - of the same
  B
       approach                or different kind - to validate evidence
       Interactions among      Interaction among inspectors or between inspectors and technical
  C
       inspectors              personal to discuss about findings
                               References to local context information, including students'
       Local context
  D                            demographics, characteristics of the neighborhood, school history,
       information
                               and change of staff
       Inspectors’
  E                            References personal experience, beliefs, or professional judgment
       perspectives
         After the coding procedure was complete, we identified general and country-specific patterns
in the data using descriptors in DEDOOSE. The data were examined visually in the form of code
                                                     16


clouds, cross-tabulations, and charts showing the frequency with which codes occurred as well as the
presence or absence of codes within and across interviews, for each of the case studies. To ensure
that reported patterns are an accurate representation of each study site, we shared initial findings with
interviewees and incorporated their feedback (Miles et al., 2014).
                                                  Results
         Overall, our comparative analysis indicates that personal perspectives influence inspectors’
evaluations. The extent of influence is affected by local culture, professional traditions, and values.
In particular, our U.S. case relies on high-stakes testing for accountability and its audit culture
emphasizes performance measurement and data-driven decision making. U.S. inspectors’ thinking is
infused with this culture and their decision making is highly influenced by a high-stakes
accountability and a standardization mindset that leads to strictly adhering to protocols and reduces
inspectors’ professional insights in an effort to avoid bias. Despite opportunities presented by SI to
dig deeper and identify unique strengths and weaknesses at schools, U.S. inspectors actively
disregarded insights that do not fit within the confines of the protocol. In contrast, the Dutch and the
Argentinean cases illustrate approaches in which inspectors have more flexibility and rely on their
professional background and judgement. Inspectors routinely pursued “surprising” observations even
when these do not fit neatly into a protocol or the anticipated focus areas. Instead, these inspectors
adopt a more holistic understanding of school quality and avenues for improvement.
         In this section we present findings for the categories of sources that guide inspectors’
thinking: Indicators of school quality, Multi-informant approach, Interaction among inspectors, Local
context information, and Inspectors’ perspectives (see definitions in Table 2). Key findings regarding
the influence of these categories on inspectors’ thinking across our cases are summarized in Figure 1.
                                                     17


Figure 1. Comparison: Influence of Information Sources on Inspectors’ Thinking
                              U.S. Case         Netherlands Case       Argentina Case
    Indicators of
   school quality
  Multi-informant
      approach
    Interactions
      inspectors
   Local context
    information
     Inspectors’
    perspectives
Legend: Degree of Influence on Inspectors' Thinking
     High                                       Low
Note: The relative influence of each source was determined by number of mentions in interviews.
Indicators of School Quality
U.S. Case: Focusing on Rubrics and Avoiding Bias
         In the U.S. case, indicators of school quality in the form of a standardized rubric are the
cornerstone of the evaluation process and all of the inspectors mentioned them repeatedly when
explaining their thought process. Their data collection is structured around various aspects of school
quality, a series of metrics to measure them, guidelines for making observations, and questions for
schools in order to evaluate each indicator. Unlike the two other study sites, we found that U.S.
inspectors restrict data collection almost exclusively to sources specified in the protocol: classroom
observations, interviews and focus groups, and school documents. Only two inspectors mentioned
that they seek publicly available school information before their visit (e.g. test results and school
                                                        18


website) or during the visit (e.g. teachers’ planning documents, students’ files). This stands in stark
contrast with our other two sites where developing a deep knowledge of a school before the actual
visit is a vital part of the inspection process.
          Moreover, in the U.S. case, the protocol explicitly determines the structure and nature of
classroom observations and the questions to be asked in interviews and focus groups. For example,
when inspector Amy-US was asked about what information she personally looks for or asks to see
apart from the required data, she emphasized the importance of adhering to the protocol:
          I just followed the … protocol. I ask only the questions that are outlined for teachers and only
          the questions that are outlined for students. If a student [response] needs to be elaborated, I
          would say “could you tell me more …,” but I don’t bring my own questions to the process or
          anything like that. I just tried to follow what is asked of me.
We asked inspectors about the value they placed on different information sources to evaluate school
quality. We found that observing classroom instruction is consistently the most valued source of
information used by inspectors to evaluate school quality. In contrast to the other study sites, most
U.S. inspectors considered other sources of information to play a secondary role, if considered at all.
School planning documents were deemed the least valuable source of information to evaluate school
quality and were used mostly for triangulation purposes. Similarly, the use of school climate
observations—such as culture, interactions among students and teachers, and facility condition–was
also secondary.
          More than in the other study sites, when the U.S. inspectors explained their thought process,
they repeatedly made direct remarks about how they try to avoid personal bias when completing the
rubric and following how the protocol defines good instruction. This was highlighted by Sarah-US and
Donna-US:
          Sarah-US: We all come to the table with our own expertise, with our own beliefs and values,
          with our own biases or preconceived notions about what school should look like … the rubric
                                                      19


        then helps people put those things aside, understand their influence, and then really ground
        themselves in both evidence and the rubric to get to a shared understanding.
                 Donna-US: It's really important … as a reviewer, to not have a bias … I might have a
        bias towards what good instruction looks like. So instead of using the rubric in front of me,
        I'm going towards what I think is good. Or I might have a bias towards what a functioning
        school environment looks like and sounds like. So instead of using the evidence in front of
        me, I'm just going towards what I think … I think that can be both positive and negative.
Across all U.S. interviews, we found a rigid emphasis on observing the protocol, sometimes in ways
that appear to impede important insights. Most inspectors recognized the relevance of their own
expertise and experience; they acknowledged which indicators best capture school quality and their
preferred information sources. Yet, as we show in sub-section “E) Inspectors’ perspectives,” U.S.
inspectors dismissed this wealth of knowledge and equated personal views with bias. Incorporating
such information into evaluations was viewed as a validity thread to the inspection process. U.S.
inspectors claim to actively suppress the influence of their education knowledge and experience
during inspection by stating that they strictly adhere to protocol. Furthermore, unlike the other
research sites, when explaining their thinking in specific situations, they were rarely able to share
concrete examples based on their own experience. Instead, they repeatedly referred to “good
practices” outlined in the rubric.
Comparison: Role of Indicators of School Quality
        Similar to the U.S. case, Dutch inspectors utilize indicators of school quality in all their
school visits. However, these indicators were not a central aspect in inspectors’ narratives about their
thinking process. While the policies and procedures used by the Dutch shares similarities with the
U.S. case, and its QualiEv protocol, Dutch inspectors expressed greater flexibility in the inspection
process, including stages of data collection, choosing which criteria to focus on and emphasize, and
how to interpret indicators of school quality. When collecting information, Dutch inspectors use
                                                    20


standardized data only as a starting point. Our interviews revealed that data collection and usage are
guided by inspectors’ choices to probe more deeply into key areas, as their insights and
understanding of a given school evolve over the course of the visit (We explore this further in the
sub-section below, which addresses the multi-informant approach). No consensus exists among the
Dutch inspectors regarding the most valuable information sources. Several inspectors found this
question difficult to answer, in contrast to the U.S. inspectors who promptly referenced protocols.
Some Dutch inspectors especially valued interviews with teachers and administrators as well as
classroom observations. Only a few mentioned school climate or student interviews. Dutch
inspectors were more likely to view the usefulness of information sources as highly contextual, based
on their experience, school context, and specific issues that emerged during a school visit.
        In contrast to these two cases, in Argentina, a standardized set of indicators of school quality
have almost no role in the inspection process. When indicators are collected, they are mainly used for
administrative purposes. Inspectors do not use standardized metrics to evaluate quality
systematically. This is not to say that the inspectors lack standards for evaluating school quality.
Rather, they have considerable freedom to decide what information to collect and how to evaluate
schools. Consistent with this freedom, we observed immense heterogeneity in terms of information
sources used and which sources are most valued. Inspectors emphasized the importance of gaining an
understanding of how the school functions, emphasizing the importance of “being present” in the
school, “walking around,” and “living in the moments of the institutional life.” Being present allows
inspectors to be critical and provide support to the school. Without insights from school visits,
inspectors do not feel they could truly understand a school. Therefore, they would be unable to
provide necessary assistance to help schools improve. Planning documents are considered useful for
evaluation and collaborative work between inspectors and schools.
        Bias was rarely a concern among inspectors from Argentina and the Netherlands. In these
two countries, inspectors did not hesitate to make use of their professional judgement, experience as
                                                    21


instructional experts, and familiarity with schools gained from multiple visits. Rather than be a cause
for concern, this was viewed as being exactly what enables them to be effective. Furthermore, in the
Netherlands, utilizing this type of knowledge is viewed as necessary to be a good inspector. Part of
the inspector training process includes an extensive shadowing of experience inspectors, where those
who are new have an opportunity to further develop and to learn to use their expertise. This is not to
say inspectors in Argentina and the Netherlands do not reflect critically on their own practices and
maintain a concern for integrity in the process. In both countries, nearly all inspectors expressed
spontaneously in the interview a concern about being thorough in their analysis and the importance
of justifying their conclusions in their feedback to schools. But attempts to standardize the process
and a concern for validity and reliability were less prominent in our conversations.
Multi-Informant Approach
U.S. case: Finding Trends and Discarding Outliers
         Once U.S. inspectors collect data, they are turned over to QualiEv where staff conduct a
standardized rating process. The processed data are then used to assess trends and patterns in the
school. This multi-informant approach seeks evidence that is confirmed repeatedly using the same
type of information and then triangulated with other sources. For example, evidence from only one or
two classrooms is insufficient to make a claim. Affirmation must be found in multiple classes and
then triangulated with evidence from additional sources, such as interviews and school documents.
Many inspectors highlighted that this approach provides a holistic view, as Michelle-US explains:
         [The multi-informant approach] really gives you that holistic view of, “Okay, this is what we
         saw in the classroom. This is what the teachers and students are doing.” But then, what are
         people actually saying about it? What are the parents saying, the students, and the staff? And
         how did those stories support one another, or how are they different?
Importantly, most inspectors expressed confidence in the focus on the trends, as opposed to outliers,
as a reliable approach to evaluate the overall quality of the schools. Overall, most inspectors seem to
                                                    22


embrace this approach as an effective way to evaluate the quality of the school. As Laura-US
explained:
         Both times that I've done it [(the inspection)], it's been really clear, even after the first half of
         one day, what the trends are. It's been kind of shocking, because you think, "Oh, obviously,
         these classrooms [are] different than that." No, they're never different. It's all the same. It's
         always been really shocking how quickly you can come to what the big problem is. Usually
         it's actually been pretty easy to pick the top two or three, and because the schools that they
         pick to do these things are ... literally on fire, so it's pretty clear.
Interviews demonstrated how systematically the multi-informant approach is applied: Evidence is
gathered, inspectors focus on major trends, and discard information that does not fit within these
broad trends. Donna-US emphasize that they “are looking for trends and consistencies, versus an
outlier of something that might strike you as wrong.” Contradictory information observed while data
is being collected might rise a red flag and can help narrowing the focus. When this happens, there is
some leeway for further inquiries as long as the search and sources of information to be used are part
of the protocol. Yet, in contrast to the other research sites, we did not find examples of inspectors
pursuing professional hunches that lead to additional question being asked, nor focusing on
exceptional observations, nor conducting additional interviews that might lead to new discoveries.
Comparison: Role of Multi-Informant Approach
         Inspectors in the Netherlands act as investigators. Each has considerable freedom in deciding
which focus areas to emphasize during school visits. Several inspectors explained that after
reviewing school information, prior to the visit, they try to anticipate the main difficulties at the
school. Hypotheses are developed that they then seek to verify or disprove during the visitation day.
Several inspectors noted how their expertise can assist in developing these initial assessments. They
actively draw upon their prior knowledge and vast experiences with a wide array of schools to help
them anticipate and hone in on issues that the school is facing. Rather than a standardized approach
                                                       23


that eliminates variation both between and within a school, the Dutch approach results in a dynamic
process only guided by their protocols, not dictated by them. Lotte-NETH illustrates the questions that
inspectors ask themselves during inspections:
         I try to see what the most important papers are, and I read them. I try to think about what I
         might see in the school. I have some hypotheses in my head and I also see what I can make of
         the context of the school. For example… In what kind of neighborhood is it? What can I
         expect of the school? What’s the difficult thing over there? Then I go to the school and be as
         open minded as possible because sometimes, when you already think you know what it will
         be, you will be very much surprised by what happens in the school … I just have that in
         mind, somehow, but not have that on the front of my head. I just be open and see what
         happens during the day, but… I’ve got a [starting] schema in my head.
Dutch inspectors’ interviews revealed that they too look for patterns and trends through a multi-
informant approach. However, they also strive to identify conflicting evidence so that deeper inquiry
can be made and observations and impressions during interviews can be confirmed. Rather than
dismissing discrepancies or outliers, the Dutch inspectors view such findings as critical points for
further investigation. Thus, inspectors use these insights to identify problems that often lurk beneath
the surface. For example, Lars-NETH explained how he actively looks for points of disagreement:
         [An important source of information is] talking to the teachers, like how they tell the [way]
         the school really works, how they perceive how the managements makes them work and
         doesn’t really work, … and do the teachers understand that vision and do they really use that
         vision inside their classrooms? And good thing is, we always visit the classes first, before we
         talk to all the people. So we can give back to the teachers and to the team leaders and to the
         director, what we saw in the classes. And so they can immediately give back how they
         perceive it.
                                                    24


A multi-informant approach guides thinking in both the Dutch and the U.S. cases (see Figure 1).
However, there is a great difference in how information is corroborated. In the Netherlands, there is
less emphasis on accumulating evidence through a rigid prescribed process, and more on finding
hidden problems and testing if evidence can confirm nascent hypotheses. Thus, the process is
dynamic and evolves during the visit. Inspectors determine in real time which additional documents
to request, questions to ask, and aspects of classroom observations to emphasize.
        The Argentinean inspectors also seek corroborating evidence, often comparing formal
planning documents to actual practices. Similar to the Dutch case, and unlike the U.S. case,
inconsistent findings are viewed as a critical window into key issues faced by the school. In our
interviews, inspectors provided several examples of how unexpected cues during a visit can lead to
additional sources. This was illustrated by Monica-AR:
        I value walking in the schools. The fact of being present. Because face-to-face you can get to
        ask a new question about something specific, and you can be surprised. You can find
        something that you hadn’t thought. … [Sometimes you find that] the pedagogic proposals
        don’t correspond with what you see in the visits, when you see they are not [using] the
        methodology they say they are applying … If you take a child´s workbook and you see
        mistakes in the corrections made by the teachers, or there are no corrections made by the
        teachers, you say: what is going on here?
Unlike the Dutch and the U.S. cases, the multi-informant approach was not stressed as a central
aspect of the inspection process for most Argentinean inspectors.
Interactions Among Inspectors
U.S. Case: Consensus Building & Guided Sensemaking
        In our U.S. case, data collection and synthesis are followed by a consensus building process,
led by QualiEv. During the group discussion, QualiEv reinforces the previously mentioned factors in
guiding thinking in order to avoid bias: focusing on the rubric and the trends, while discarding
                                                    25


outliers. In this phase, U.S. inspectors have an opportunity to explain their observations. QualiEv
guides the discussion and consensus building process. Most inspectors rely on and trust in the
contracted organization for facilitating the discussion and “pushing their thinking” (Amy-US). In this
process, QualiEv ensures all claims are aligned with the rubric. This process was explained by Sarah-
US:
         So the [QualiEv] team leads a collaborative consensus building process, but they lead that in
         alignment with their practice and process. So it's a collaborative effort that is heavily guided
         by the contracting organization… So they sit as experts on how the rubric should be utilized
         and how things then should be scored, but they go through a process of team consensus
         building. Everyone presents their evidence; they do that in a group setting, and then everyone
         talks through it and then determines where the preponderance of evidence fits on the rubric,
         which then leads to the scoring process.
Inspectors must discuss evidence until arriving at a consensus regarding the evaluation. The
procedure for discussion starts from the quality claims and evaluation criteria based on best practices.
Then, they discuss if there is enough evidence and how to weight this evidence in order to support
each claim. In this stage, they compile the collected data and the data synthesized and trends that are
identified by QualiEv. The inspectors emphasize that any claim must be supported by evidence. This
dynamic was explained by Aidan-US:
         So I think the factors that usually go into play would be, “Which ones do we have the most
         evidence from our observations about? How strong is that evidence?” If we didn't see checks
         for understanding in one classroom that's not enough that we can make a priority claim
         around checks for understanding whatever the case might be. And so it's usually about what
         is the weight of the evidence that we have.
When asked whether inspectors gather further information if they have not yet found evidence to
support or disprove a quality claim, Heather-US responded:
                                                     26


         No, there's no return observation it just, the claim is tweaked based upon what you did see.
         So one of the norms they [(QualiEv)] often use is see the donut, not the hole. So it's not about
         what you didn't see, it's about what you did see. So if you didn't see any evidence towards
         that claim, you go with the evidence you did see.
Avoiding personal bias is a focus during group discussions. During the formal evaluation, inspectors
strive to not introduce their views regarding alternative criteria that might be informative when
assessing quality. This happens during the during discussion process, as illustrated by Michelle-US:
         When you're collaborating with a team… you share something that you saw or heard ...
         you're looking for another example of it. And if you don't, then you let it go. … You don't
         want to be biased or making comments based on personal opinion. So, you do very much
         keep it factual, and you make it collaborative so it's not just one person saying one thing.
Most inspectors perceive the process of reaching a consensus to be straightforward. We did not find
evidence of heated group discussions or inspectors challenging the evaluation results. Furthermore,
several inspectors indicated that for many quality claims that require consensus, QualiEv develops
preliminary statements before convening inspectors for a discussion. This appears to be a feature of
the way that the process is structured. Multiple U.S. inspectors visit a given school and attend
selected classes. Thus, inspectors only observe a portion of instruction; this might not encompass all
domains and factors to be evaluated. For example, when we asked Aidan-US whether he maintains his
positions when observing something that differs from the bulk of the data, he stated that he looks at
"the overall picture”: “I could've gone to three classrooms in the morning and in those I didn't see a
particular aspect, but other people did. What I saw is just one part of all the data that's collected.”
This partial observation may limit the ability to develop a full view of the school and might cause
inspectors to adopt the narrative of the contracting organization. This situation fits the concept of
guided sensemaking in which leaders actively build a narrative that promote understandings and
explanations of events (Maitlis, 2005). On the other hand, this configuration restricts individual
                                                      27


sensemaking and the scope of the intersubjective construction of meaning during group discussions
(Gioia et al., 1994).
Comparison: Role of Interactions among Inspectors
        In Argentina and the Netherlands, inspectors are employed as dedicated government staff and
are placed in regional offices. In the Argentinean case, each inspector leads a team comprised of
various educational expertise: Pedagogy, psychology, school administration, and social work. The
group dynamic is established by each inspector and varies substantially. In some cases, the technical
team actively participates in decision making and discusses which strategies are generally effective.
Most inspectors use the team to make school visits in conflictive or complex situations and make
specific interventions in schools. Consultations with other inspectors can be initiated; these mostly
occur for complex situations or when inspectors have doubts about regulations. Some consult
regularly with their colleagues in the office, others through text messages, and other in occasional
provincial meetings.
        In the Netherlands, inspectors receive ongoing group training and meet weekly to discuss
current education issues, research and the inspection process. Some of these meetings focus on
specific practices and feature invited speakers, while others offer training videos on classroom
observation. When asked about how inspectors learn how to conduct inspections, Lotte-NETH
commented on interactions among colleagues:
         I’ll read articles, and we’ve got information sessions at the office, where someone tells you
        something he’s been working on or some interesting… with some colleagues of mine, we
        organize lunch sessions in which we’ve invited someone from outside the inspectorate to tell
        us and inform us about certain subjects…. [for specific subjects] we try to invite someone
        from outside our office. We get new input and we also use our team to discuss about the
        standards … “How do I interpret what I see? How do you interpret? Okay. Do we come to
                                                    28


         the same judgments or do we judge something differently? What is the difference between
         us?”
Most Dutch inspectors mentioned that they generally consult with colleagues, but not for the purpose
of achieving consensus. Dutch inspectors relied the least on interactions with other inspectors to
inform their thinking process during an inspection. In secondary school, where inspections are
conducted in groups, inspectors naturally interact with each other. However, none of the respondents
emphasized a role for consensus building during the school visit. In primary school, where usually
only one inspector visits the school, the process has been described as solitary, such as Emma-NETH:
“You go to a school alone. You arrive alone… You think alone.” Nonetheless, some inspectors noted
that they consult with colleagues in the inspection office when they face complex or challenging
situations.
         Interaction among inspectors plays a distinct role in the sensemaking process within each of
the three cases. In the Netherlands and Argentina, inspectors work in a more stable group that share a
long-term conversation about their inspection practices more generally rather than focused on a
specific school example. Thus, the constant conversations shape the construction of meaning within
the inspectorates (Rouleau, 2005). In the Dutch case, these conversations also have a more formal
component in their ongoing training. In both cases interactions among inspectors regarding specific
schools occur when they face controversial or complex situations. In this regard, inspector teams act
as a sounding board to provide feedback and suggestions, but no formal consensus is required. This
is in contrast to the U.S. case, where interactions are used to systematically integrate information and
identify trends. Controversy and complexity tend not to be addressed.
Local Context Information
U.S. Case: Minor Role in Inspectors’ Thinking
         Consideration of the “local context” plays a minor role in U.S. inspectors’ narratives about
their thinking. Local context includes student demographics, neighborhood characteristics, or a
                                                    29


school’s history. Several inspectors explained that after an extended time working in the district, they
had interacted with most schools at some point and became familiar with the local context of the
schools. Some of them mentioned that they have this knowledge, but they view it only as background
information, as illustrated by Aidan-US who said that this contextual information is “in the back of
their head.” However, inspectors never mentioned this type of information as factoring into their
understanding of a school’s functioning. The interviews did not reveal many specific examples of
how this knowledge influences thinking during inspection. Considerations of school context are not
explicitly described in the formal protocol. Therefore, most inspectors actively strove to exclude this
information, stating that “it should not matter” in their evaluations. When asked for further
explanation, some inspectors highlighted that they must closely follow the rubric and avoid bias, thus
assuming that objectivity of the process might be compromised by considering context. Several
inspectors go further by arguing that inspected schools are low-performing and thus major
differences in school context are not present.
Comparison: Role of Local Context Information
        Unlike U.S. inspectors, the Argentinean and Dutch inspectors consider contextual
information a critical piece of information for understanding a school’s functioning. Argentina is the
only research site in which each inspector is permanently assigned to a group of schools. Therefore,
inspectors become deeply acquainted with the local context, student demographics, school history,
and school staff. In most interviews, Argentinean inspectors highlighted that the inspections are
“situated,” locally oriented based on the school reality. As pointed out by Alejandra-AR: “Everything
you do in the school in based on the context, in the situational aspect, that is what I ask for, that the
pedagogic project depart from there.” To a greater extent than the other case studies, inspectors
interviewed in Argentina continuously reference their knowledge of school context when interpreting
problems, prioritizing information sources, interpreting student performance indicators, and
determining recommendations.
                                                    30


         In the Netherlands, inspectors also work with a fixed group of schools. Yet, after several
years, inspectors switch groups as a way to ensure objectivity. Dutch inspectors exhibited more
knowledge of student demographics students compared to U.S inspectors. They also offered detailed
descriptions of challenges faced by schools, with specific contextual circumstances such a large
immigrant population where learning Dutch was critical or parents having low educational capital to
support learning at home. This local knowledge was used by inspectors to interpret various sources
of information collected during inspection. For example, inspectors consider whether a school with a
high proportion of immigrant students should develop provisions for language education in their
planning documents. As Sven-NETH explained,
         If you have a school with parents who speak at home another language, the schools have to
         invest more in curriculum in vocabulary of Dutch for those children. Then, the expectation
         about the quality of curriculum are different… You cannot put that into strict criteria. ...
         [Another example,] if you are in a small school that has to put children of several [grades
         together] in one group, …. you know it's a very hard job for the teacher to organize the
         lessons in a way that he challenges all the children … so, this kind of situation plays into the
         way you judge the quality of instruction.
In the U.S. case, the protocols do not include context as part of their evaluation. Explicit
consideration about how local context influences evaluation of school quality played a very minor
role in inspectors’ narratives. This configuration downplays the role of situational factors that might
spark sensemaking processes in inspectors (Sandberg & Tsoukas, 2015).
Inspectors’ Perspectives
U.S. Case: Personal Judgement within the Scope of the Protocol
         We find that U.S. inspectors prioritize objectivity and reliability; there is an emphasis on
standardized rubrics, which constrains the use of personal and professional knowledge. Yet, nearly
all U.S. inspectors believe their background in education provides a necessary qualification for their
                                                    31


role. All interviewed inspectors had experience as classroom teachers and their work at the district
Department of Education involves evaluation of classroom instruction (see Table 1). Fewer
inspectors (less than half) have experience as a school administrator; among those who do, the
average experience is less than three years. Some inspectors noted that an instructional background
was necessary because they knew what to look for during school visits. This was illustrated by Sarah-
US:
         I think that having [an] instructional background is critically important… being an educator,
        someone who is highly familiar with the instructional aspect of education … folks who have
        that instructional-specific lens, who carry with them the lens of what high-quality teaching
        and learning looks like. If you know what it means to stand up in front of students and deliver
        instructional content and assess students. I mean there's a lot of insider language in the
        rubrics. … You have to know what you are looking for, so you have to know what teaching
        looks like.
In addition, three of the interviewees had experience as administrators and they believed this was
important preparation for their role as an inspector. Lisa-US explained how her judgment is informed
by administrative experience, which can provide a more systemic view:
        I really think that my years as assistant principal helped me because I'm able to see the school
        as an entire system and not just as one specific part, and so I think that's a great qualification
        [for] …. understanding ... So if some classrooms are having anger management issues, if it's
        not at trend across the entire school, if there's a bigger trend arising that the instruction isn't
        rigorous, that that's a bigger focus for the school in trying to work with some individual
        classrooms.
Experience working in the district office is additional factor that inspectors feel prepares them.
Several inspectors mentioned that this experience allows them to “have a sense of what the schools
look like.”
                                                     32


        We found that in most cases, inspectors rely on their professional judgement in ways that are
within the scope of the protocol, yet the approaches might not be explicitly stated in the protocol.
Furthermore, in some cases, interviews revealed a tension between using professional judgement to
complete the rubric and maintaining an unbiased and uniform process. This tension was illustrated by
Michelle-US who noted how she must reconcile the evaluation rubric with the wide variety of
elements she personally considers during classroom observation:
        Michelle-US: When I'm in the classroom, I look for student engagement, and comfort, and
        listening, and learning. … I think, because too often we can focus just on the teachers or the
        adults. … we really have to look at the kids. When you're in a school environment, it's
        holistic… you're using your senses, right? You're looking, and you're feeling, and you're
        hearing, and all of these different things that you get when you're in a place that is not
        necessarily on any rubric, but you get the vibe, and the feeling of it. And then, you kind of
        couple that with what people are saying in the interviews, and what their body language is,
        and their emotional level, and how they respond to things.
        Interviewer: … how do you put all of this together in the rubric and the feedback to the
        schools?
        Michelle-US: Well, those things I was just sharing, I do personally. So, those aren't necessarily
        on the rubric. But I think that's what comes out when you're collaborating with a team. You
        don't want to be biased or make comments based on personal opinion.
Michelle’s comments illustrate the tension many of the U.S. inspectors expressed between valuing
their professional background in education and trying to adhere to the rubrics so as not to appear
biased.
        Finally, we found that when crafting feedback for inspected schools, U.S. inspectors use their
judgement mainly for diagnostic purposes. Unlike the other sites, we did not find many explicit
considerations regarding how feedback and outcomes from the inspection affect the schools. This
                                                    33


shows that U.S. inspectors’ sensemaking process is delimited by the scope prescribed by the
protocols and is essentially retrospective more than prospective (Gioia et al., 1994; Sandberg &
Tsoukas, 2015).
Comparison: Role of Inspectors’ Perspectives
         In direct contrast to the U.S. case, inspectors in Argentina and the Netherlands rely more
heavily on their personal perspectives. In Argentina, inspectors shared openly the ways that their
personal experience, beliefs and professional judgement influence multiple aspects of inspection. The
value placed on this wealth of knowledge might be due to inspector’s positions being the highest step
in the teaching professional ladder in Argentina. More than in the other research sites, inspectors
frequently made explicit remarks about how they rely on their experience to inform decisions. This
was illustrated by Alejandra-AR explaining that her recommendations to schools are not only based on
government norms:
         Alejandra-AR: Based on what the educational policy is posing, but mixed with my perspective
         and stance, what I’ve learned all these years. Obviously, the educational policy gives you a
         framework in many regards. You can’t stray from what is stipulated. But within these limits,
         my experience and knowledge are also important when the time comes to make suggestions.
Inspectors in Argentina did not shy away from frequently explaining how their personal perspectives
influence their thought process. Several inspectors explained that the process is informed by their
views on what they consider critical issues in education. Inspectors see their role as more political, as
several explained how they act as a bridge between the macro and the micro policies of schools.
         Since inspectors from Argentina determine their procedures, in contrast to the other cases,
they have considerably more leeway to use their personal judgement. As in the case of the U.S.
district, inspectors rely on their experience as teachers to judge teaching quality in the classroom.
This was expressed, by Marcelo-AR who said that “classroom presence, [allows you] to verify the
processes…. After so many years, you trust your intuitive knowledge. And you can realize very
                                                     34


quickly whether the kid learned or not what he should have.” But unlike the U.S. case, Argentinean
inspectors’ use of professional judgement goes beyond classroom instruction and includes a wide
range of aspects of institutional life, including observations of the climate, interactions among
teachers, and the relationships of the schools with the families and the community.
         Also, inspectors in Argentina tend to use their judgement for a wide range of issues. Since
inspectors make recommendations for interventions in schools, they are obligated to go a step further
and make recommendations regarding how to correct the problems identified. They must judge
which practices are likely to be effective at a given school. We found that interview excerpts in
Argentina coded as “personal experience and beliefs” show high co-occurrence with the parent codes
“recommendations to the schools” and “responses to struggling schools.” This differs from the U.S.
case, where inspectors only evaluate aspects included in the protocol and restrict their personal
judgement, with the exception of classroom instruction evaluation.
         In the Netherlands, inspectors’ leeway to manage and direct the inspection process within
their framework, offers opportunities for relying on their personal preferences and using their
professional judgement. In this process, several inspectors explained how they determine what the
problems are, relying on their expertise and “gut feeling”. Some inspectors make a distinction
between “hard data” found in school statistics and documents and “soft data” that is more reliant on
their judgement. This was illustrated by Lars-NETH:
         Some [documents are] just results, like how much of the children are on the right level when
         they’ve left primary school and are in secondary school now. So you can’t argue with that…
         you can argue about … “how did you come to these results?” That’s the hard part, but the
         other parts, the soft parts, like giving chances to children… those are not always in the
         papers, so you can only see that in [person], when you’re at the school, and well sometimes
         you can get a feeling of how it should be at the school… it’s a bit of a gut feeling ...
                                                      35


All the Dutch inspectors provided examples of how they rely on their professional judgement to
inform the process and decide on final feedback to schools. However, Dutch inspectors’ narratives
about their thinking process did not explicitly refer to their prior professional experience as heavily
as the Argentinean inspectors.
         We found that inspectors in Argentina and the Netherlands tend to provide holistic
judgements of school quality and are more vocal about their personal views regarding higher-level
goals of education, what is a good quality education, and how schools should function. In their
narratives, their thinking is mediated by holistic judgements focused on what they believe is
important for a school. In the Netherlands, for example, to evaluate the quality of the school, Lars-
NETH  asks “what is important for the kids?” and “what is the school administration doing to give the
best education they can?” And Lotte-NETH asks how she “would feel if she had kids in the school.” In
Argentina, when there is a specific conflict situation in a school, Carlos-AR listens to students and
tries to view the situation from their perspective. In the U.S. case, inspectors avoid mentioning this
type of thinking, which they fear poses a risk of introducing bias.
         Unlike the U.S. case, Argentinean and Dutch inspectors’ narratives about their thinking not
only focus on diagnosing the current situation, but also how the school has progressed and how the
feedback might affect the school. Therefore, inspectors’ sensemaking process is both prospective and
retrospective. Familiarity with schools from previous inspections facilitates the prospective emphasis
(Kaplan & Orlikowski, 2013). For example, we found that inspectors in both countries use their
personal knowledge of stakeholders as an indicator of school quality. In Argentina and the
Netherlands, nearly half of inspectors believe that a key indicator of school quality is their
“confidence that the school administrators understand and address the main problems faced by the
school,” or more generally, “trust in the administrators.” This was illustrated by Sven-NETH when he
was asked how he responds when a school faces weakness, but is not failing:
                                                    36


        I think that has to do with trust. Then you try to predict the future. You look at the quality of
        the staff and the quality of the management and you ask yourself the question ‘if they are not
        at the level they have to be at the moment, but do I trust improvement process, do I think the
        improvement process will go on and they will really improve, the quality education will
        improve in one or two years.’
The fact that inspectors from Argentina and the Netherlands highlight trust in school administrators
as a key indicator of school quality might be attributable to sustained relationships between
inspectors and school stakeholders. In this way, these countries differ from the U.S. case, where the
inspection process is designed to avoid repeat interactions of inspectors with the same schools, as a
way of enabling an objective process.
                                 Conclusions and Policy Implications
        This study focuses on school inspectors’ decision-making and the role of sensemaking in
their evaluations. Not all inspection systems operate in the same way. Even when processes appear
quite similar, the actual work of inspectors can be starkly different. We found that sensemaking
mechanisms shape inspectors’ evaluations in different ways at each of the three study sites.
Opportunities for sensemaking in the U.S. case are limited by strict adherence to the protocol and
avoidance of personal bias, disregard of local context and outliers, and avoidance of complexity. The
strong guidance of the contracting organization also limits the scope of socially mediate
sensemaking. Inspectors do rely on experience as specialist in instruction to make sense of what they
observe during evaluations of classrooms. However, this activity is limited by the scope of the rubric
and do not seem to further inform the focus of the inspection. In contrast, individual and socially
mediated sensemaking play a key role in the Dutch and the Argentinean cases. The evaluation
process relies heavily on inspector perspectives, experience, and intuition, as well as local context
information. Complexities tend to be addressed by corroborating information sources. Corroboration
influences the evaluation focus. In addition, inspectors consider progress already made within the
                                                   37


school and the potential impact of their feedback. Therefore, the sensemaking process is not only
retrospective, but also prospective.
        Consistent with previous research, our study found that inspectors’ personal perspectives
influence evaluations in all three case studies (Dedering & Sowada, 2017; Gilroy & Wilcox, 1997;
Lindgren, 2015) , albeit to varying degrees. This influence depends not only on the structure of the
inspection process, but also on local culture and professional traditions. In the U.S. case the audit
culture aims to reduce this influence to a minimum. Yet, in the two international cases, inspectors
regard their professional experience as a means to strength their evaluations.
The Legacy of Test-Based Accountability in the United States
        This study informs the debate regarding how to hold schools accountable and foster
improvement. School inspection is widely used around the world, yet only recently has it been
adopted in the U.S. education systems. During the past forty years, the shifted towards a New Public
Management approach and the paradigm of test-based accountability has forged a path dependence in
educational institutions that is difficult to abandon, both within administrative structures and street-
level practices (McDonnell, 2008, 2013; see also Spillane et al., 2011). Despite the potential that
inspection holds for introducing a more robust way of examining school quality and fostering school
improvement, the legacy of TBA continues to prevail in U.S. inspectors’ thinking. Their thinking is
dominated by efforts to preserve objectivity of the inspection process through strict adherence to the
protocols. If something does not fit, it is discarded. The high-stakes accountability and
standardization mindset influences the way inspectors think at all stages of the inspection process.
        Our contrasting international cases have relevant policy implications as more U.S. districts
experiment with inspection systems. The more flexible inspection models we explore capitalize on
the professional expertise of the inspectors as well as the rich and detailed portrait of each school that
they are able to paint using the inspection process. In the Dutch and Argentinean cases, inspectors
rely on their personal perspectives and judgement in a more open way to investigate further when
                                                    38


they find possible cues, seek more comprehensive sources of information, and delve into the
complexities of the local context. Evaluations in these cases lead to more holistic judgements of the
key challenges faced by schools and therefore offer additional directions for improvement strategies.
On the other hand, the heavy reliance on inspectors’ professional judgement risks introducing
conscious and unconscious bias. This might lead school administrators to question the fairness of
inspections.
        Future shifts towards alternatives such as inspection systems likely need to make explicit and
ongoing efforts to account for past educational paradigms that might influence how these systems
operate and perform. Overall, our results suggest that a more flexible approach can allow inspection
to reach its full potential. Greater latitude for inspectors can allow this approach to accountability to
better uncover underlying factors that hinder advancement in under-performing schools and to offer
new insights to improve.
                                                      39


APPENDIX
   40


Interview Protocol
                                IRB application ID#: STUDY00001267
SECTION I – INSPECTOR TRAINING AND PROCESS
Inspector Preparation, Training and Educational Experience
    1. First, tell me a bit about your background in education?
            a. How did you come to be a school inspector?
    2. In general, what qualifications must someone posses (knowledge and/or skills) in order to be
        an inspector in [SITE]?
    3. Did you receive specialized training to become an inspector?
                      i. What experiences or information did it include?
    4. Given your expertise, what qualities or qualifications do you think are critical for an
        outstanding inspector?
    5. What sort of formal interaction is there between inspectors in [SITE]?
                IF YES:
            a.
                      i. How often do you interact with other inspectors?
                     ii. Around what topics or purposes do you interact?
                    iii. Do you ever seek the advice of another inspector when you are making a
                         decision about a school?
                IF NO FORMAL INTERACTION:
            b.   Are there other inspectors you work with informally? How did those relationships
                develop?
                                                  41


The SI Process
   6. While I’ve read about the SI process in [SITE]’s official documents, I know that how things
       work in reality can be different. So, can you describe for me the school SI process as you
       experience it?
       FOLLOW UP AS NEEDED
           a. How often are schools inspected?
           b. How are schools assigned to inspectors?
                    i. Do you stay with schools you’ve inspected previously?
           c. What do you know about a school before a SI?
                    i. In your experience, are reports and information from schools at risk as
                       trustworthy as other schools? If not, in what ways do they differ?
                   ii. How much do you know about the local context of a school – SES,
                       race/ethnicity demographics, ELL population, other contextual challenges,
                       etc.?
           d. How much time do you spend at a school during an inspection?
           e. Do you visit a school alone or as a team?
           f. Are there other things you look for or people you talk to beyond what’s officially
               required?
                    i. IF YES: How did you select those things?
           g. What do you spend most of your time doing during a SI? (interviewing, observing
               classes, reviewing documents, reviewing test scores, reviewing other school
               performance indicators, checking safety related items, etc.)
                                                  42


Recommending Reform based on SI Observations
   7. Are specific reform suggestions included in your final report?
      IF NO – Skip to 8
      IF YES
         a. How detailed are the suggestions?
                   i. What are some typical recommendations you’ve made?
                  ii. Do you recommend both what to reform and how to reform it?
                 iii. What are some of the most extreme recommendations you’ve made?
         b. How do you know what to recommend? What knowledge or experience do you draw
             upon?
                   i. Are there established reform strategies that have been approved by the
                      government?
                  ii. Do you read any research on school reform?
                      IF NO – Skip to C
                      IF YES:
                         1. What do you read? What are your sources of research?
                         2. How often do you read research?
                         3. How you do you find research on school reform?
                                  a. Does anyone send it to you?
                                  b. Does DPS send you research?
                         4. How do you then use that research in your SI process?
                 iii. Do inspectors ever discuss what reforms work and don’t work? In what
                      context?
                                                43


            c. To what degree are your suggested reforms co-developed with the school (leaders
                and/or teachers)?
                      i. IF YES CO-DEVELOP, Is there a difference in this process for a low
                         performing schools?
            d. When you consider sanctions or a bad report for a school, do you prefer to be more
                conservative with your recommendations or do you tend to propose major changes?
                Why do you prefer that?
   8. IF NO: Does someone else make these recommendations? Or is this left to the school
        leadership to decide?
SECTION II - SI sources of information
Next I’d like to talk more about the information or data you use when making your decision about
school quality.
   9. First, can you tell me the required sources of information or data you must collect, observe or
        review during a SI?
   10. What else do you personally look for or ask to see?
            a. Why do you ask for/look at these additional things?
            b. What do they tell you that the officially required sources don’t?
            IF TEST SCORES NOT MENTIONED IN THE ABOVE, ASK: What role do student test
            scores play in your SI process? Are these standardized tests that all schools/students take?
   11. Thinking about all of the different sources of information about school quality that you gather
        during a SI, which three sources do you think are most informative to evaluate school
        quality?
            a. Why?
            b. If you had to rank these three, which is most important, second and third?
                                                  44


12. Of all of the information sources you gather during a SI, which three sources are least
    important?
         a. Why?
13. Are there official guidelines for which sources of information are supposed to be most highly
    valued?
         a. IF YES: What?
         b. Do the official guidelines match what you think as an expert reviewer?
                  i. Why or why not?
         c. IF NO: Are there unwritten rules amongst inspector regarding which sources of
             information ought to be more highly valued?
14. Are there sources of information you’d like to include in your assessment of school quality
    but cannot?
         a. What are the barriers to their inclusion? (official regulation, lack of time, lack of
             human resources, lack of cooperation by school staff, other)
15. Is there any type of finding that automatically leads to a poor rating or formal report (if they
    have no ratings)? For example, if a school has poor performance on their test scores, must
    they automatically receive a poor rating?
16. What role does the specific context of the school play in shaping how you interpret or value
    different sources of information?
         a. For example, if a school is located in a very poor neighborhood, does that shape how
             you view school test scores?
         b. Are there times when some information is more important because of a school’s
             context?
                                                45


SECTION III – ACCOUNTABILTY PRESSURE
  17. How is your final report circulated?
          a. Is it publicly posted or shared with the school community?
          b. Is there a rating that is posted somewhere?
  18. Are there any benefits or rewards for a positive SI report?
  19. What are the consequences if your inspection results in a poor performance/rating?
          a. Are there any secondary consequences such as better teaching transferring to a new
              school or more able parents transferring their children to a new school?
          b. How much time does the school have to address the issues raised in your report?
          c. Do you or other inspectors return to assess progress?
  20. Do you personally receive any rewards or sanctions for school performance or improvement?
          a. IF YES: Can you describe what you receive and how it’s determined?
          b. Are you ever reluctant to give a negative report/rating to a school?
                   i. If yes, when?
  21. I’m sure that there are clear cases of both outstanding and failing schools that are easy to
      spot, but I’m wondering about those schools that are on the margin. What might persuade
      you to give a school on the margin of failure a more positive report/rating?
                   i. How much professional judgment do you feel you can use in these cases?
                  ii. Can you give me an example of when you’ve been faced with a school that is
                      right on the boarder of failure?
                          1. How did you decide what to write in your report?
                          2. What factors did you consider?
                          3. Did you feel like you needed to leave anything out or stress anything
                               in particular?
                                                  46


                 iii. For those schools on the margin, is there any particular information that you
                      find tips your decision in one direction or the other?
  22. Do you rely more heavily on the government standards to make recommendations when a
      school is of poor quality?
         a. If there are sanctions: What role do potential government sanctions play in shaping
              your reliance on government standards?
SECTION IV - Inspector Background
  23. What is your official title?
  24. What educational level do you inspect?
  25. How many years have you worked as a school inspector?
  26. How many years total do you have working in the field of education?
  27. What is your level of education? (specify if is post-secondary or university level and years of
      education)
  28. What specific education degrees do you hold?
  29. Do you have experience as a classroom teacher? How many years? What level/subject?
  30. Do you have experience as a school administrator? What position/s? How many years?
  31. Do you have any other education related experience?
                                                 47


REFERENCES
    48


                                          REFERENCES
Ahuvia, A. (2001). Traditional, Interpretive, and Reception Based Content Analyses: Improving
     the Ability of Content Analysis to Address Issues of Pragmatic and Theoretical Concern.
     Social Indicators Research, 54, 139–172. https://doi.org/101108781350505
Allen, R., & Burgess, S. (2012). How should we treat under-performing schools? A regression
     discontinuity analysis of school inspections in England (No. 12; 87).
Altrichter, H., & Kemethofer, D. (2015). Does accountability pressure through school
     inspections promote school improvement? School Effectiveness and School Improvement,
     26(1), 32–56. https://doi.org/10.1080/09243453.2014.927369
Apple, M. (2005). Education, markets, and an audit culture. Critical Quarterly, 47(1–2), 11–29.
     https://doi.org/doi: 10.1111/j.0011-1562.2005.00611
Armenakis, A., Bernerth, J., Pitts, J., & Walker, H. (2007). Organizational Change Recipients’
     Beliefs Scale. The Journal of Applied Behavioral Science, 43(4), 481–505.
     https://doi.org/10.1177/0021886307303654
Armenakis, A., & Harris, S. (2009). Reflections: our Journey in Organizational Change Research
     and Practice. Journal of Change Management, 9(2), 127–142.
     https://doi.org/10.1080/14697010902879079
Armenakis, A., Harris, S., Cole, M., Fillmer, L., & Self, D. (2007). A Top Management Team’s
     Reactions to Organizational Transformation: The Diagnostic Benefits of Five Key Change
     Sentiments. Journal of Change Management, 7(3–4), 273–290.
     https://doi.org/10.1080/14697010701771014
Armstrong, J. (1982). The value of formal planning for strategic decisions: review of empirical
     research. Strategic Management Journal, 3, 197–211.
Ball, S., & Bowe, R. (1992). Subject departments and the ‘implementation’ of National
     Curriculum policy: an overview of the issues. Journal of Curriculum Studies, 24(2), 97–
     115. https://doi.org/10.1080/0022027920240201
Barber, M. (2005). The virtue of accountability: System redesign, inspection, and incentives in
     the era of informed professionalism. Journal of Education, 185(1), 7–38.
     https://doi.org/10.1177/002205740518500102
Baxter, J. A. (2013). Professional inspector or inspecting professional? Teachers as inspectors in
     a new regulatory regime for education in England. Cambridge Journal of Education, 43(4),
     467–485. https://doi.org/10.1080/0305764X.2013.819069
Behnke, K., & Steins, G. (2017). Principals’ reactions to feedback received by school inspection:
     A longitudinal study. Journal of Educational Change, 18(1), 77–106.
                                                 49


     https://doi.org/10.1007/s10833-016-9275-7
Bengston, D., & Xu, Z. (1995). Changing National Forest Values: a content analysis - Research
     Paper NC-323. http://www.nrs.fs.fed.us/pubs/rp/rp_nc323.pdf
Berry, F. S., & Wechsler, B. (1995). State agencies’ experience with strategic planning: findings
     from a national survey. Public Administration Review, 55(2), 159.
     https://doi.org/10.2307/977181
Bitan, K., Haep, A., & Steins, G. (2014). School inspections still in dispute – an exploratory
     study of school principals’ perceptions of school inspections. International Journal of
     Leadership in Education, 18(4), 1–22. https://doi.org/10.1080/13603124.2014.958199
Bloem, S. (2015). The OECD Directorate for Education as an independent knowledge producer
     through Pisa. In H. G. Kotthoff & E. Klerides (Eds.), Governing Educational Spaces (pp.
     169–185). SensePublishers. https://doi.org/10.1007/978-94-6300-265-3_10
Brier, A., & Hopp, B. (2011). Computer assisted text analysis in the social sciences. Quality &
     Quantity, 45(1), 103–128. https://doi.org/10.1007/s11135-010-9350-8
Chabbott, C., & Elliott, E. J. (2003). Understanding others, educating ourselves: Getting more
     from international comparative studies in education. In Social Sciences.
     https://doi.org/10.17226/10622
Chun, Y. H., & Rainey, H. G. (2005). Goal ambiguity and organizational performance in U.S.
     federal agencies. Journal of Public Administration Research and Theory, 15(4), 529–557.
     https://doi.org/10.1093/jopart/mui030
Clarke, J., & Ozga, J. (2011). Governing by inspection? Comparing school inspection in
     Scotland and England. Social Policy Association Conference, 25.
Coburn, C. (2001). Beyond decoupling: Rethinking the relationship between the institutional
     environment and the classroom. Sociology of Education, 77, 211–244.
     https://doi.org/10.1177/003804070407700302
Coburn, C. (2005). Shaping teacher sensemaking: School leaders and the enactment of reading
     policy. Educational Policy, 19(3), 476–509. https://doi.org/10.1177/0895904805276143
Cole, M. S., Harris, S., & Bernerth, J. B. (2006). Exploring the implications of vision,
     appropriateness, and execution of organizational change. Leadership & Organization
     Development Journal, 27(5), 352–367. https://doi.org/10.1108/01437730610677963
Concurso de Supervisores Rio Negro, (2013).
Resolución del Consejo Provincial de Educación de Río Negro N 1053, Pub. L. No. 1053 (1994).
Conway, M. (2006). The Subjective Precision of Computers: A Methodological Comparison
     with Human Coding in Content Analysis. Journalism & Mass Communication Quarterly,
                                                 50


     83(1), 186–200. https://doi.org/10.1177/107769900608300112
Cook, T. D., Campbell, D. T., & Shadish, W. (2002). Experimental and quasi-experimental
     designs for generalized causal inference. Houghton Mifflin.
Cuckle, P., Hodgson, J., & Broadhead, P. (1998). Investigating the relationship between
     OFSTED Inspections and school development planning. School Leadership &
     Management, 18(2), 271–283. https://doi.org/10.1080/13632439869691
Darling-Hammond, L., Bae, S., Cook-Harvey, C. M., Lam, L., Mercer, C., Podolsky, A., &
     Stosich, E. L. (2016). Pathways to new accountability through the Every Student Succeeds
     Act. http://learningpolicyinstitute.org/our-work/publications-resources/%0Apathways-new-
     accountability-every-student-succeeds-act.
De Vries, H., Elliott, M. N., Kanouse, D. E., & Teleki, S. S. (2008). Using Pooled Kappa to
     Summarize Interrater Agreement across Many Items. Field Methods, 20(3), 272–282.
     https://doi.org/10.1177/1525822X08317166
de Wolf, I., & Janssens, F. (2007). Effects and side effects of inspections and accountability in
     education: An overview of empirical studies. Oxford Review of Education, 33(3), 379–396.
     https://doi.org/10.1080/03054980701366207
Dedering, K., & Müller, S. (2011). School improvement through inspections? First empirical
     insights from Germany. Journal of Educational Change, 12(3), 301–322.
     https://doi.org/10.1007/s10833-010-9151-9
Dedering, K., & Sowada, M. G. (2017). Reaching a conclusion—procedures and processes of
     judgement formation in school inspection teams. Educational Assessment, Evaluation and
     Accountability, 29(1), 5–22. https://doi.org/10.1007/s11092-016-9246-9
Deng, Q., Hine, M., Ji, S., & Sur, S. (2019). Inside the black box of dictionary building for text
     analytics: a design science approach. Journal of International Technology and Information
     Management, 27(3), 119–159.
Doud, J. (1995). Planning for school improvement: A curriculum model for school based
     evaluation. Peabody Journal of Education, 70, 175–187.
Edgerton, A. K. (2019). The essence of ESSA: More control at the district level? Phi Delta
     Kappan, 101(2), 14–17. https://doi.org/10.1177/0031721719879148
Education Inspectorate - Ministry of Education, C. and S. (2010). Risk-based Inspection as of
     2009 - Primary and Secondary Education.
Education Inspectorate - Ministry of Education, C. and S. (2017a). Inspection framework
     primary education.
Education Inspectorate - Ministry of Education, C. and S. (2017b). Inspection framework
     secondary education.
                                                 51


Ehren, M. (2016a). Methods and modalities of effective school inspections (M. Ehren (ed.)).
     Springer International Publishing. https://doi.org/10.1007/978-3-319-31003-9
Ehren, M. (2016b). Methods and Modalities of Effective School Inspections. In M. C.M. Ehren
     (Ed.), Methods and Modalities of Effective School Inspections. Springer International
     Publishing. https://doi.org/10.1007/978-3-319-31003-9
Ehren, M., Altrichter, H., McNamara, G., & O’Hara, J. (2013). Impact of school inspections on
     improvement of schools—describing assumptions on causal mechanisms in six European
     countries. Educ Asse Eval Acc, 25, 3–43. https://doi.org/10.1007/s11092-012-9156-4
Ehren, M., Gustafsson, J.-E., Altrichter, H., Skedsmo, G., Kemethofer, D., & Huber, S. (2015).
     Comparing effects and side effects of different school inspection systems across Europe.
     Comparative Education, 51(3), 375–400. https://doi.org/10.1080/03050068.2015.1045769
Ehren, M., Perryman, J., & Shackleton, N. (2015a). School Effectiveness and School
     Improvement. School Effectiveness and School Improvement - An International Journal of
     Research, Policy and Practice, 26(2), 296–327.
Ehren, M., Perryman, J., & Shackleton, N. (2015b). Setting expectations for good education:
     how Dutch school inspections drive improvement. School Effectiveness and School
     Improvement, 26(2), 296–327. https://doi.org/10.1080/09243453.2014.936472
Ehren, M., & Shackleton, N. (2016). Risk-based school inspections: impact of targeted
     inspection approaches on Dutch secondary schools. Educational Assessment, Evaluation
     and Accountability, 28(4), 299–321. https://doi.org/10.1007/s11092-016-9242-0
Ehren, M., & Visscher, A. (2006). TOWARDS A THEORY ON THE IMPACT OF SCHOOL
     INSPECTIONS. British Journal of Educational Studies, 54(1), 51–72.
     https://doi.org/10.1111/j.1467-8527.2006.00333.x
Ehren, M., & Visscher, A. (2008). THE RELATIONSHIPS BETWEEN SCHOOL
     INSPECTIONS, SCHOOL CHARACTERISTICS AND SCHOOL IMPROVEMENT.
     British Journal of Educational Studies, 56(2), 205–227. https://doi.org/10.1111/j.1467-
     8527.2008.00400.x
Fernandez, K. E. (2011). Evaluating school improvement plans and their affect on academic
     performance. Educational Policy, 25(2), 338–367.
     https://doi.org/10.1177/0895904809351693
Figlio, D., & Loeb, S. (2011). School accountability. In Handbook of the Economics of
     Education (pp. 383–421).
Fitchett, P., & Heafner, T. (2010). A national perspective on the effects of high-stakes testing
     and standardization on elementary social studies marginalization. Theory & Research in
     Social Education, 38(1), 114–130. https://doi.org/10.1080/00933104.2010.10473418
Gagnon, D. J., & Schneider, J. (2019). Holistic school quality measurement and the future of
                                                 52


     accountability: Pilot-test results. Educational Policy, 33(5), 734–760.
     https://doi.org/10.1177/0895904817736631
Gilroy, P., & Wilcox, B. (1997). OFSTED, criteria and the nature of social understanding: A
     Wittgensteinian critique of the practice of educational judgement. British Journal of
     Educational Studies, 45(1), 22–38. https://doi.org/10.1111/1467-8527.00034
Gioia, D., Thomas, J., Clark, S., & Chittipeddi, K. (1994). Symbolism and strategic change in
     academia: The dynamics of sensemaking and influence. Organization Science, 5(3), 363–
     383. https://doi.org/10.1287/orsc.5.3.363
Glazerman, S. (2016). The ralse dichotomy of school inspection. Mathematica Policy Research -
     Blog Post. https://www.mathematica-mpr.com/commentary/the-false-dichotomy-of-school-
     inspections
Gray, C., & Gardner, J. (1999). The impact of school inspections. Oxford Review of Education,
     25(4), 455–468. https://doi.org/10.1080/030549899103928
Gray, J., & Wilcox, B. (1995). In the aftermath of inspection: the nature and fate of inspection
     report recommendations. Research Papers in Education, 10(1), 1–18.
     https://doi.org/10.1080/0267152950100102
Greene, J. C., Caracelli, V. J., & Graham, W. F. (1989). Toward a conceptual framework for
     mixed-method evaluation designs. Educational Evaluation and Policy Analysis, 11(13),
     255–274.
Grimmer, J., & Stewart, B. M. (2013). Text as Data: The Promise and Pitfalls of Automatic
     Content Analysis Methods for Political Texts. Political Analysis, 21(3), 267–297.
     https://doi.org/10.1093/pan/mps028
Grimolizzi-Jensen, C. J. (2018). Organizational change: Effect of motivational interviewing on
     readiness to change. Journal of Change Management, 18(1), 54–69.
     https://doi.org/10.1080/14697017.2017.1349162
Gustafsson, J.-E., Ehren, M., Conyngham, G., McNamara, G., Altrichter, H., & O’Hara, J.
     (2015). From inspection to quality: Ways in which school inspection influences change in
     schools. Studies in Educational Evaluation, 47, 47–57.
     https://doi.org/10.1016/j.stueduc.2015.07.002
Halverson, R., Kelley, C., & Kimball, S. (2004). Implementing teacher evaluation systems: How
     principals make sense of complex artifacts to shape local instructional practice. Educational
     Administration, Policy, and Reform: Research and Measurement, 3, 153–188.
Hanushek, E. A., & Raymond, M. E. (2005). Does school accountability lead to improved
     student performance? Journal of Policy Analysis and Management, 24(2), 297–327.
     https://doi.org/10.1002/pam.20091
Herscovitch, L., & Meyer, J. P. (2002). Commitment to organizational change: Extension of a
                                                 53


      three-component model. Journal of Applied Psychology, 87(3), 474–487.
      https://doi.org/10.1037/0021-9010.87.3.474
Hill, H. (2001). Policy is not enough: language and the interpretation of State standards.
      American Educational Research Joumal, 38(2), 289–318.
      https://doi.org/10.3102/00028312038002289
Hines, R. T. (2017). An Exploration of the Effects of School Improvement Planning and
      Feedback Systems: School Performance in North Carolina.
Holt, D., Armenakis, A., Feild, H., & Harris, S. (2007). Readiness for Organizational Change.
      The Journal of Applied Behavioral Science, 43(2), 232–255.
      https://doi.org/10.1177/0021886306295295
Husfeldt, V. (2011). Wirkungen und wirksamkeit der externen schulevaluation; uberblick zum
      stand der forschung [The impact of school inspection - Does it really work? State of
      research]. Zeitschrift Für Erziehungswissenschaft, 14(2), 259–282.
      https://doi.org/10.1007/s11618-011-0204-5
Hussain, I. (2015). Subjective Performance Evaluation in the Public Sector Evidence from
      School Inspections. The Journal of Human Resources, 50(1), 189–221.
Jacob, B. (2005). Accountability, incentives and behavior: the impact of high-stakes testing in
      the Chicago Public Schools. Journal of Public Economics, 89(5–6), 761–796.
      https://doi.org/10.1016/j.jpubeco.2004.08.004
Jones, K., & Tymms, P. (2014). Ofsted’s role in promoting school improvement: the
      mechanisms of the school inspection system in England. Oxford Review of Education,
      40(3), 315–330.
Jones, K., Tymms, P., Kemethofer, D., O’Hara, J., McNamara, G., Huber, S., Myrberg, E.,
      Skedsmo, G., & Greger, D. (2017). The unintended consequences of school inspection: the
      prevalence of inspection side-effects in Austria, the Czech Republic, England, Ireland, the
      Netherlands, Sweden, and Switzerland. Oxford Review of Education, 43(6), 805–822.
      https://doi.org/10.1080/03054985.2017.1352499
Kaplan, S., & Orlikowski, W. J. (2013). Temporal Work in Strategy Making. Organization
      Science, 24(4), 965–995. https://doi.org/10.1287/orsc.1120.0792
Klein, A. (2016). School inspections offer a diagnostic look at quality. Education Week.
      https://www.edweek.org/ew/articles/2016/09/28/school-inspections-offer-a-diagnostic-look-
      at.html
Klerks, M. (2012). The effect of school inspections: a systematic review. http://janbri.nl/wp-
      content/uploads/2014/12/ORD-paper-2012-Review-Effect-School-Inspections-
      MKLERKS.pdf
Kluger, A. N., & DeNisi, A. (1996). The effects of feedback interventions on performance: A
                                                 54


     historical review, a meta-analysis, and a preliminary feedback intervention theory.
     Psychological Bulletin, 119(2), 254–284. https://doi.org/10.1037/0033-2909.119.2.254
Koretz, D. (2008). Measuring up. Harvard University Press.
Krippendorff, K. (2013). Content analysis: An introduction to Its methodology (3rd ed.). SAGE
     Publications.
Ladd, H. F. (2016). Now is the time to experiment with inspections for school accountability.
     Brookings. https://www.brookings.edu/blog/brown-center-chalkboard/2016/05/26/now-is-
     the-time-to-experiment-with-inspections-for-school-accountability/
Ladd, H. F. (2017). NCLB: RESPONSE TO JACOB. Journal of Policy Analysis and
     Management, 36(2), 477–480. https://doi.org/10.1002/pam.21979
Ladd, H. F., & Figlio, D. (2008). School accountability and student achievement. In Handbook of
     research in education finance and policy (pp. 166–182).
Lee, J., & Fitz, J. (1997). HMI and OFSTED: evolution or revolution in school inspection.
     British Journal of Educational Studies, 45(1), 39–52. https://doi.org/10.1111/1467-
     8527.00035
Lewin, A. Y., & Minton, J. W. (1986). Determining Organizational Effectiveness: Another
     Look, and an Agenda for Research. Management Science, 32(5), 514–538.
     https://doi.org/10.1287/mnsc.32.5.514
Lindgren, J. (2015). The front and back stages of swedish school inspection: opening the black
     box of judgment. Scandinavian Journal of Educational Research`, 59(1), 58–76.
     https://doi.org/10.1080/00313831.2013.838803
Luginbuhl, R., Webbink, D., & de Wolf, I. (2009). Do Inspections Improve Primary School
     Performance? Analysis, 31(3), 221–237. https://doi.org/10.3102/0162373709338315
Maitlis, S. (2005). The Social Processes of Organizational Sensemaking. The Academy of
     Management Journal, 48(1), 21–49. https://doi.org/10.2307/20159639
Maitlis, S., & Christianson, M. (2014). Sensemaking in organizations: Taking stocks and moving
     forward. The Academy of Management Annals, 8(1), 57–125.
     https://doi.org/10.1080/19416520.2014.873177
March, J. G., & Olsen, J. P. (2011). The Logic of Appropriateness. In R. E. Goodin (Ed.), The
     Oxford Handbook of Political Science (pp. 1–22). Oxford University Press.
     https://doi.org/10.1093/oxfordhb/9780199604456.013.0024
Mathis, W., & Trujillo, T. (2016). Lessons from NCLB for the Every Student Succeeds Act.
     http://nepc.colorado.edu/%0Apublication/lessons-from-NCLB
Matthews, P., & Sammons, P. (2004). Improvement through inspection. An evaluation of the
                                                 55


     impact of Ofsted’s work. Ofsted.
Matthews, Peter, Holmes, J. R., Vickers, P., & Corporaal, B. (1998). Aspects of the reliability
     and validity of school inspection judgements of teaching quality. Educational Research and
     Evaluation, 4(2), 167–188. https://doi.org/10.1076/edre.4.2.167.6959
McDonnell, L. (2008). The politics of educational accountability: Can the clock be turned back?
     In K. E. Ryan & L. A. Shepard (Eds.), The future of test-based educational accountability.
     Routledge.
McDonnell, L. (2013). Educational accountability and policy feedback. Educational Policy,
     27(2), 170–189. https://doi.org/10.1177/0895904812465119
Meyers, C. V., & VanGronigen, B. A. (2019). A lack of authentic school improvement plan
     development. Journal of Educational Administration, 57(3), 261–278.
     https://doi.org/10.1108/JEA-09-2018-0154
Miles, M. B., Huberman, A. M., & Saldaña, J. (2014). Qualitative data analysis: A methods
     sourcebook (3th ed.). SAGE Publications.
Millett, A., & Johnson, D. C. (1998). Expertise or “baggage”? What helps inspectors to inspect
     primary mathematics? British Educational Research Journal, 24(5), 503–518.
     https://doi.org/10.1080/0141192980240502
Mintrop, H., MacLellan, A. M., & Quintero, M. F. (2001). School improvement plans in schools
     on probation: A comparative content analysis across three accountability systems.
     Educational Administration Quarterly, 37(2), 197–218.
     https://doi.org/10.1177/00131610121969299
Morse, J. (2010). Procedures and practice of mixed method design - Maintaining control, rigor,
     and complexity. In A. M. Tashakkori & C. B. Teddlie (Eds.), Handbook of mixed methods
     in social & behavioral research (pp. 339–352). SAGE Publications.
Neuendorf, K. A. (2017). The content analysis guidebook. SAGE Publications, Inc.
     https://doi.org/10.4135/9781071802878
Nusche, D., Braun, H., Halász, G., & Santiago, P. (2014). OECD Reviews of Evaluation and
     Assessment in Education: Netherlands 2014. OECD.
     https://doi.org/10.1787/9789264211940-en
OECD, O. for E. C. and D. (2015). Education at a glance 2015 - OECD Indicators.
     https://doi.org/10.1787/19991487
Ouston, J., Fidler, B., & Earley, P. (1997). What do schools so after OFSTED school
     inspections-or before? School Leadership & Management, 17(1), 95–104.
     https://doi.org/10.1080/13632439770195
Penninckx, M., & Vanhoof, J. (2015). Insights gained by schools and emotional consequences of
                                                56


     school inspections. A review of evidence. School Leadership & Management, 35(5), 477–
     501. https://doi.org/10.1080/13632434.2015.1107036
Penninckx, M., Vanhoof, J., De Maeyer, S., & Van Petegem, P. (2014). Exploring and
     explaining the effects of being inspected. Educational Studies, 40(4), 456–472.
     https://doi.org/10.1080/03055698.2014.930343
Penninckx, M., Vanhoof, J., De Maeyer, S., & Van Petegem, P. (2015). Effects and side effects
     of Flemish school inspection. Educational Management Administration & Leadership.
     https://doi.org/10.1177/1741143215570305
Perryman, J. (2007). Inspection and emotion. Cambridge Journal of Education, 37(2), 173–190.
     https://doi.org/10.1080/03057640701372418
Perryman, J. (2009). Inspection and the fabrication of professional and performative processes.
     Journal of Education Policy, 24(5), 611–631.
Phillips, D., & Schweisfurth, M. (2014). Comparative and international education - An
     introduction to theory, methods , and practice (2nd ed.). Group, Continuum International
     Publishing.
Piderit, S. K. (2000). Rethinking resistance and recognizing ambivalence: A multidimensional
     view of attitudes toward an organizational change. The Academy of Management Review,
     25(4), 783. https://doi.org/10.2307/259206
Pond, S., Armenakis, A., & Green, S. (1984). The Importance of Employee Expectations in
     Organizational Diagnosis. The Journal of Applied Behavioral Science, 20(2), 167–180.
     https://doi.org/10.1177/002188638402000207
Porac, J. F., Thomas, H., & Baden-Fuller, C. (1989). COMPETITIVE GROUPS AS
     COGNITIVE COMMUNITIES: THE CASE OF SCOTTISH KNITWEAR
     MANUFACTURERS. Journal of Management Studies, 26(4), 397–416.
     https://doi.org/10.1111/j.1467-6486.1989.tb00736.x
Portz, J., & Beauchamp, N. (2020). Educational Accountability and State ESSA Plans.
     Educational Policy, 089590482091736. https://doi.org/10.1177/0895904820917364
Ravitch, D. (2016). The death and life of the great American school system: How testing and
     choice are undermining education. Basic Books.
Redding, C., & Searby, L. (2020). The Map Is Not the Territory: Considering the Role of School
     Improvement Plans in Turnaround Schools. Journal of Cases in Educational Leadership,
     23(3), 63–75. https://doi.org/10.1177/1555458920938854
Riffe, D., Lacy, S., & Fico, F. (2014). Analyzing media messages: Using quantitative content
     analysis in research. Routledge.
Rigby, J. G. (2015). Principals’ sensemaking and enactment of teacher evaluation. Journal of
                                                 57


     Educational Administration, 53(3), 374–392. https://doi.org/10.1108/JEA-04-2014-0051
Rosenthal, L. (2004). Do school inspections improve school quality? Ofsted inspections and
     school examination results in the UK. Economics of Education Review, 23, 143–151.
Rothstein, R., Jacobsen, R., & Wilder, T. (2008). Grading education: Getting accountability
     right. Economic Policy Institute and Teacher College Press.
Rouleau, L. (2005). Micro‐practices of strategic sensemaking and sensegiving: How middle
     managers interpret and sell change every day. Journal of Management Studies, 42(7), 1413–
     1441.
Rutz, S., Mathew, D., Robben, P., & Bont, A. (2017). Enhancing responsiveness and
     consistency: Comparing the collective use of discretion and discretionary room at
     inspectorates in England and the Netherlands. Regulation & Governance, 11(1), 81–94.
     https://doi.org/10.1111/rego.12101
Ryan, K., Gandha, T., & Ahn, J. (2013). School self-evaluation and inspection for improving
     U.S. schools? In National Education Policy Center.
     http://nepc.colorado.edu/publication/school-self-evaluation
Sandberg, J., & Tsoukas, H. (2015). Making sense of the sensemaking perspective: Its
     constituents, limitations, and opportunities for further development. Journal of
     Organizational Behavior, 36(S1), S6–S32. https://doi.org/10.1002/job.1937
Scheerens, J., Ehren, M., Sleegers, P., & Leeuw, R. de. (2012). OECD Review on Evaluation and
     Assessment Frameworks for Improving School Outcomes.
Shaw, I., Newton, D. P., Aitkin, M., & Darnell, R. (2003). Do OFSTED Inspections of
     Secondary Schools Make a Difference to GCSE Results? British Educational Research
     Journal, 29(1), 63–75.
Spillane, J. P. (1999). External reform initiatives and teachers’ efforts to reconstruct their
     practice: The mediating role of teachers’ zones of enactment. Journal of Curriculum
     Studies, 31(2), 1–33. https://doi.org/10.1080/002202799183205
Spillane, J. P., Parise, L. M., & Sherer, J. Z. (2011). Organizational Routines as Coupling
     Mechanisms. American Educational Research Journal, 48(3), 586–619.
     https://doi.org/10.3102/0002831210385102
Spillane, J. P., Reiser, B. J., & Gomez, L. M. (2006). Policy Implementation and Cognition The
     Role of Human, Social, and Distributed Cognition in Framing Policy Implementation. In M.
     I. Honig (Ed.), New directions in education policy implementation (pp. 47–64). State
     University of New York Press, Albany.
Spillane, J. P., Reiser, B. J., & Reimer, T. (2002). Policy implementation and cognition:
     Reframing and refocusing implementation research. Review of Educational Research, 72(3),
     387–341. https://doi.org/10.3102/00346543072003387
                                                   58


Stiglitz, J. (2000). Economics of the public sector (3rd ed.). Norton.
Strunk, K. O., Marsh, J. A., Bush-Mecenas, S., & Duque, M. R. (2016). The Best Laid Plans.
     Educational Administration Quarterly, 52(2), 259–309.
     https://doi.org/10.1177/0013161X15616864
Teddlie, C., & Tashakkori, A. (2009). Foundations of mixed methods research: Integrating
     qualitative and quantitative approaches in the social and behavioral sciences. SAGE.
Teddlie, C., & Yu, F. (2007). Mixed methods sampling : A typology with examples. Journal of
     Mixed Methods Research, 1(1), 77–100. https://doi.org/10.1177/1558689806292430
UNESCO. (2017). Global Education Monitoring Report - Accountability in education: Meeting
     our commitments.
van Bruggen, J. C. (2010). Inspectorates of Education in Europe; some comparative remarks
     about their tasks and work.
van der Sluis, M. E., Reezigt, G. J., & Borghans, L. (2017). Implementing New Public
     Management in Educational Policy. Educational Policy, 31(3), 303–329.
Vavrus, F. K., & Bartlett, L. (2016). Rethinking case study research: A comparative approach
     (1st ed.). Routledge.
Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Using school performance
     feedback: perceptions of primary school principals. School Effectiveness and School
     Improvement, 21(2), 167–188. https://doi.org/10.1080/09243450903396005
Visscher, A. J., & Coe, R. (2003). School performance feedback Systems: conceptualisation,
     analysis, and reflection. School Effectiveness and School Improvement, 14(3), 321–349.
     https://doi.org/10.1076/sesi.14.3.321.15842
Weick, K. E. (1995). Sensemaking in organizations. SAGE Publications.
Weick, K. E., Sutcliffe, K. M., & Obstfeld, D. (2005). Organizing and the process of
     sensemaking. Organization Science, 16(4), 409–421.
     https://doi.org/10.1287/orsc.1050.0133
Weiner, B. J. (2009). A theory of organizational readiness for change. Implementation Science,
     4(1), 67. https://doi.org/10.1186/1748-5908-4-67
Woods, P., & Jeffrey, B. (1998). Choosing Positions: Living the Contradictions of OFSTED.
     British Journal of Sociology of Education, 19(4), 547–570.
     https://doi.org/10.1080/0142569980190406
                                                 59


Paper 2: Principals’ Attitudes towards School Inspection in a U.S. District: Contribution to
Sustained School Reform
                                             Introduction
        Accountability aims to improve education quality by providing oversight of school
performance. High-stakes testing and school inspection (SI) are the most widely used accountability
approaches around the world. In the United States, high-stakes testing is the primary instrument for
accountability, although a few states and cities have experimented with SI. Both approaches aim to
influence stakeholders’ actions to spur improved outcomes. Yet, these approaches have different
theories of action for improvement. In the case of inspection, principals acceptance of feedback is an
intermediate step before implementing reforms (Ehren et al., 2013). Principals’ attitudes towards
inspection feedback likely influences their readiness for taking improvement actions. Further, there
are key attitudes from those who implement reforms in an organization that are critical to its
sustainability (Armenakis & Harris, 2009). This study assesses how inspections affect principals’
attitudes associated with lasting change.
        Many countries use SI as the main mechanism for accountability. Rather than relying solely
on standardized tests, inspection considers a variety of evidence on school processes and practices.
These evaluations are conducted by inspectors, which in the best cases are experienced educators
who maintain close contact with schools. Inspectors are able to provide informed feedback taking
into consideration the local context, system-wide evidence, and academic research (Altrichter &
Kemethofer, 2015; UNESCO, 2017). By visiting schools and observing practices, inspectors can
obtain a deeper understanding of the existing capacity for improvement and existing efforts to
improve performance. Such observations, in combination with test scores, enable a more in-depth
                                                   60


understanding of school operations and nuanced insights into what might help or hinder improvement
(Barber, 2005).
         School leaders’ perceptions about the validity of inspection influence their response to results
(e.g. Bitan et al., 2014). If inspections are viewed as relevant, accurate, and fair, then school leaders
might be compelled to implement suggestions. Despite potential advantages of inspection, few
studies investigate its role in supporting school reform. It is not clear whether it is an effective
mechanism for long-term changes. No peer-reviewed study provides empirical evidence of the ability
of inspection to enable sustained institutional change.
         Bridging insights from the organizational change literature in the education field, this paper
assesses how inspection affects the attitudes and sentiments of school principals (agents of change),
whose decisions are crucial for achieving lasting institutional change (Armenakis & Harris, 2009).
These attitudes include 1) beliefs about the diagnosis effectiveness, 2) perceived appropriateness of
feedback, and 3) readiness for making changes (Armenakis & Harris, 2009; Holt et al., 2007;
Weiner, 2009). I draw on case study data collected in a large urban district that is one of the few
districts in the U.S. to use school inspection. With one of the most well-developed inspection systems
in the country, this district presents a unique research opportunity to understand how school leaders
perceive and act upon inspection feedback. Test-based accountability has been subject to debate and
criticism. It is unclear if inspection is viewed by school administrators as more accurate and fair.
         This study examines how inspection influences school principals’ attitudes that are associated
with lasting institutional change. Understanding underlying attitudes can shed light on whether
inspection offers potential to create long-term school improvements. I address the following research
questions:
     1) How do inspection and test-based accountability compare in terms of principals’ perceived
         accuracy in evaluating areas for improvement?
                                                     61


    2) Do school inspections promote principals’ attitudes associated with lasting institutional
         change?
             a. Do principals feel that inspections provide an effective diagnosis of the main
                 challenges and factors that hinder improvement?
             b. Do principals perceive suggested reforms as appropriate?
             c. Do inspections promote principals’ readiness for change?
                                              Background
         As the implementation of Every Student Success Act in the U.S. has sparked debate over the
design of accountability schemes, inspection could be viewed as an alternative option for test-based
accountability. Yet, one major barrier to expanded inspection in the U.S. is limited empirical
evidence about its impact (Husfeldt, 2011; Klerks, 2012). Furthermore, there is no evidence about its
role in supporting lasting school reforms.
         Test-based accountability and inspection systems rely on different sets of assumptions
regarding school stakeholder behavior. In addition, these two systems of accountability may
influence school stakeholders’ attitudes and decisions in different ways. Theories of action behind
these contrasting accountability schemes shed light on these differences.
         The theory of action of test-based accountability for improvement is grounded in the
principal-agent problem. This problem theorizes that inefficiencies in the public sector arise when
agents (i.e. government bureaucrats, such as school leaders) focus on their own interests, which are
not necessarily the same as those of the principals (i.e. the citizens they are supposed to serve)
(Stiglitz, 2000). This theory reasons that a correct incentive structure is an effective way of aligning
the interests of agents with those of the principals. Grounded in this rationale, test-based
accountability uses standardized test results as a metric for determining rewards and/or sanctions to
schools based on the aggregate results in these tests. The set of incentives can be explicit actions
enforced by the government (e.g. positive incentives such as bonuses; negative incentives such as
                                                    62


school closure) or implicit through community pressure to improve after test results are published
(Figlio & Loeb, 2011). Thus, standardized test results can provide an incentive structure for schools
to make a concerted effort to improve the outcomes that are measured (Jacob, 2005; Ladd & Figlio,
2008).
         Despite its promise, the rationale of test-based accountability oftentimes is inadequate when
there are local circumstances that prevent administrators and teachers from responding to incentives
as policymakers intended. Barriers can include low-resource settings or cultural and language factors
(Figlio & Loeb, 2011; Koretz, 2008). Also, pressure to improve test results in the short run may
compel school administrators to implement changes that improve measured outcomes. Yet, these
short-term reforms may displace more holistic strategies that require a longer timeframe to produce
results (e.g. Figlio & Loeb, 2011; Ladd & Figlio, 2008). As a result, administrators of low
performing schools might feel pressure to focus on narrow, short-term improvements and implement
changes that they do not fully support (Ladd & Figlio, 2008). This might reduce the sustainability
and long-term success of school reforms.
         In contrast, SI relies on different mechanisms for school improvement. There is less
consensus on theories of action for SI, partly due to the wide variety of inspection systems (Husfeldt,
2011). Ehren et al ( 2013) developed a comprehensive framework, which describes world wide
variations in school inspection.. According to this framework, accountability pressure plays a role for
school improvement, especially high-stakes inspection systems. Feedback is another mechanism for
school improvement for all types of inspection arrangements. This is the focus of this study. In
theory, feedback compels schools to make improvements in areas evaluated as weak. For this theory
of action to function, it is crucial for school inspectors to provide relevant input for school
improvement and for school stakeholders accept the feedback and to act on it.
         Unlike test-based accountability, on-site feedback mechanisms have the potential to offer
relevant information and insights as a basis for practitioner-led improvement actions (Visscher &
                                                     63


Coe, 2003). Information is more localized and also considers the specific context, processes, and
practices (Altrichter & Kemethofer, 2015). This entails a relative advantage for its potential to cover
a much broader range of education quality aspects, to account for the multi-dimensional nature of
school quality (Ryan, Gandha, & Ahn, 2013). Thus, inspections can offer nuanced insight into what
might hinder or enable school improvement (Barber, 2005). While test results show that performance
is poor, inspection information has greater ability to determine why. This can provide schools with
greater ability to understand specifics about problems areas and to take improvement actions based
on a more holistic evaluation.
         In addition, inspection processes have the potential for opening dialogue between the district
leaders and school principals. This interaction can potentially shed light on specific school problems
and enable inspectors to propose alternative reforms that allow administrators to retain ongoing
successful practices. This in turn can lead to smoother incorporation of reforms and avoid abrupt
changes in administration and conflicts with school culture (Ehren, 2016).
         As the literature above describes, school inspections can provide relevant information and
insights that lead to school improvements. However, the success of this process hinges on principals
accepting and acting upon inspection feedback. Extant research has not yet evaluated whether and
how school principals evaluate and use inspection feedback to implement long-term reforms.
                                           Literature Review
         Few studies address the influence of principal attitudes toward inspection on institutional
change in schools (Behnke & Steins, 2017; Bitan et al., 2014; C. Gray & Gardner, 1999). Other
studies mention attitudinal reactions toward inspections (e.g. de Wolf & Janssens, 2007; Jones et al.,
2017; Penninckx & Vanhoof, 2015). Generalizing findings from this literature is challenging due to
studies addressing a wide variety of attitudes and side-effects in countries that have varied inspection
systems.
                                                    64


         Principals’ attitudes towards inspection influence their acceptance of feedback. Most prior
studies find positive sentiments towards inspection feedback (Behnke & Steins, 2017; Bitan et al.,
2014; C. Gray & Gardner, 1999). Principals feel inspection feedback is relevant (Bitan et al., 2014),
accurate and fair (C. Gray & Gardner, 1999), constructive, and supportive (Behnke & Steins, 2017).
In some cases, principals positive attitudes are due to inspection serving as a source of legitimacy for
reform they propose (Behnke & Steins, 2017; Ehren, Perryman, et al., 2015a). Negative sentiments
have been found in literature focused on effects and side-effects of inspection. These sentiments
include emotional distress (i.e. stress, anxiety, fear), concerns due to work overload, tension between
staff members, and reduced self-efficacy (e.g. de Wolf & Janssens, 2007; Jones et al., 2017;
Penninckx et al., 2015; Penninckx & Vanhoof, 2015; Perryman, 2007, 2009). Much of the past
literature addressing the emotional consequences of inspections has been conducted in the UK
(Penninckx & Vanhoof, 2015), which has a high-stakes inspection system. No study was found in
this review indicating an overall negative attitude from school principals towards inspection.
         The way in which feedback is delivered can influence principal attitudes. Findings indicate
that administrators tend to be more ready to take improvement actions when
inspection identifies weaknesses and creates an ongoing dialogue between inspectors and school
administrators (Bitan et al., 2014; Ehren & Visscher, 2008). Past work has also consistently shown
that inspection feedback better enables school development and improvement when it is combined
with agreement on targets with schools (Behnke & Steins, 2017; Dedering & Müller, 2011; Ehren,
Perryman, et al., 2015b; Ehren & Visscher, 2008). School leaders will more readily implement
improvement plans if inspectors report weaknesses in a straightforward manner, make written
recommendations, and encourage the development of improvement plans (Ehren & Visscher, 2006,
2008; Penninckx et al., 2014). Yet, critical feedback can have negative effects on the self-worth of
those evaluated that might lead to dismissing the feedback despite its value (Kluger & DeNisi, 1996).
                                                     65


         Accountability pressures also may influence principals’ attitudes towards inspection
feedback. Higher accountability pressure can lead to greater sensitivity in the schools regarding
inspectors’ quality expectations, which are translated in developmental activities (Altrichter &
Kemethofer, 2015). Nonetheless, higher pressure is also associated with unintended consequences
such as narrowing the curriculum and instructional strategies (Altrichter & Kemethofer, 2015; Jones
et al., 2017).
         These studies have explored a wide range of factors that demonstrated an association
between inspection feedback with school stakeholders’ attitudes and undertaking improvement
action. Yet, none have focused on the specific attitudes that are relevant lasting institutional change.
This study addresses this gap in the literature.
                                         Theoretical Framework
         The organizational change literature offers useful constructs to analyze the role that attitudes
and beliefs in promoting change in schools. Over time, this literature has empirically identified key
factors that facilitate sustained organizational change. Especially relevant is a paper by Armenakis
and Harris (2009), which reviews studies from the past 30 years. Building on their work, I focus on
factors relevant for educational systems. Within this framework, I consider the educational system as
an organization, with school principals as local agents of change.
         Past literature has identified beliefs or sentiments that underlie actors motives to support
change within organizations and increase the likelihood of sustained change (e.g. Armenakis et al.,
2007; Lewin & Minton, 1986). Reforms are more likely to be successful when actors believe that
changes are needed to improve organizational performance. In addition, specific changes gain
support when actors trust that they are suitable to address problems and can be successfully
implemented. Commitment to reform is another critical factor associated with sustained change. In
an influential paper, Herscovitch and Meyer (2002) argue that what motivates change is associated
with the level of commitment. They propose that in an organization, members’ commitment to
                                                      66


change occurs because they “want to” (affective commitment), “have to” in order to avoid failure
(continuance commitment), or “ought to” out of a sense of obligation (normative commitment).
Affective and continuance commitment are found to be associated with a higher level of commitment
(Herscovitch & Meyer( 2002).
        Building on these findings, the organizational change literature has identified central factors
for sustained organizational change:
        1) Diagnosis effectiveness. An effective organizational diagnosis is accurate, recognizes the
            main problem and identifies the root cause (Armenakis & Harris, 2009). This is a critical
            condition to convince an individual that a change is needed (Armenakis & Harris, 2009;
            Pond et al., 1984).
        2) Perceived appropriateness of feedback: An effective organizational diagnosis leads to
            changes that are appropriate to address the problems in an organization (Armenakis &
            Harris, 2009; Holt et al., 2007). The appropriateness capture how course of actions are
            perceived as natural, rightful, or legitimate to pursue the organizational need for change
            and vision (Cole et al., 2006; March & Olsen, 2011). It is possible that a vision is
            embraced by individuals, but they do not believe that a specific change leads to fulfill that
            vision (Cole et al., 2006).
        3) Readiness for change. Reforms must be perceived to be appropriate to succeed (Holt et
            al., 2007). Yet, organizational members can have ambivalent attitudes. Conflicts can arise
            across multiple dimensions – cognitive, emotional, and behavioral (Grimolizzi-Jensen,
            2018; Piderit, 2000). If there is insufficient planning to prevent stalled decisions, reforms
            are less likely to succeed (Armenakis & Harris, 2009). Readiness of change refers to the
            commitment and belief of efficacy towards organizational changes (Weiner, 2009). This
            belief is necessary to overcome inertia due to ambivalent attitudes and increase the
            chances of sustainable organizational change.
                                                     67


These constructs and findings directly inform the research questions. I examine principals’
perceptions of diagnosis effectiveness, both for inspection and the district accountability framework,
which is largely based on standardized tests. Then, I assess principals’ sentiments regarding the
appropriateness of the SI feedback. Finally, I assess whether the diagnosis and changes based on the
feedback influence principals’ readiness for change. These are key conditions associated with
sustainable institutional change.
                        Case Study: Description of School Inspection System
        Similar to all U.S. districts, the selected case study has a high-stakes accountability system,
which relies heavily on standardized testing. All schools are part of the district accountability
framework. This framework rate schools based on a series of performance metrics, where
standardized test results are the most influential component. In addition, the district also features
support mechanisms, including principals’ supervisors and instructional partners. For almost a
decade, the district has combined their accountability framework with school inspection. Low-
performing schools are targeting for inspection visits, yet other schools can opt-in. Since 2012, the
inspections have been led by a private company that holds a contract with the district.
        Inspections are conducted for accountability and support purposes. Qualitative evidence is
compiled via a school visit, which is conducted by groups of three to four inspectors. Inspectors
include personnel from an outside contractor and certified inspectors from the District Department of
Education. The onsite inspections last one or two days and include document reviews, classroom
observations, focus groups, and interviews with school staff, students, and parents. Inspection areas
include instruction, students’ opportunities to learn, educators’ opportunities to learn, and leadership
and community. Evaluation criteria emphasize instructional quality, support, and assessment,
classroom climate, and school-wide practices and culture.
                                                     68


         To conclude the school visit, a feedback meeting between inspectors and school
administrators allows discussion of findings and improvement strategies. A written report
summarizing conclusions is provided to schools, which includes suggestions for priority areas.
                                                  Methods
         This study aims to evaluate whether SI influence a series of key factors important for lasting
institutional change. It uses a large U.S. district as a case study to investigate how inspection
influences principal attitudes towards sustainable institutional change. In-depth semi-structured
interviews were conducted with 20 principals. All principals from schools in the tier-support system
were invited to participate. Schools within the district’s tier-support system are primarily low-
performing schools. Most have had at least one inspection visit (55 out of 78 schools in the tier-
support system). Interviews were completed with 16 principals from inspected schools and 4 from
not yet inspected schools. Participants were informed that interview responses were anonymous, and
a pseudonym would be used to cite them. Principals were given a US$ 25 gift card after participation.
Descriptive data about interviewees are presented in Table 3.
Table 3. Principals’ Experience and Education
                                     Inspected schools       Not inspected schools
                                        Number (%)                Number (%)
 Principal experience
   0 to 4                              4       (25.0)             4       (100.0)
   5 to 8                              7       (43.8)             -            -
   9 to 12                             3       (18.8)             -            -
   13+                                 2       (12.5)             -            -
 Years working in the school
   0 to 4                              9       (56.3)             4       (100.0)
   5 to 8                              4       (25.0)             -            -
   9 to 12                             3       (18.8)             -            -
   13+                                 -                          -            -
                                                      69


Table 3 (cont’d)
                                    Inspected schools      Not inspected schools
                                       Number (%)               Number (%)
 Teacher experience
   0 to 4                           3     (18.8)           1   (25.0)
   5 to 8                           9     (56.3)           3   (25.0)
   9 to 12                          3     (18.8)            -           -
   13+                              1     (6.3)             -           -
 Highest Degree
   Masters                         12     (75.0)           4   (100.0)
   Specialist                       2     (12.5)            -           -
   Ph.D. / Doctorate                2     (18.8)            -           -
 n                                 16                      4
         Interviews inquire about principals’ attitudes towards inspection and its influence on
institutional change. Inspection is the main focus, yet some questions address the district
accountability framework, which is used in all schools. The district accountability framework relies
heavily on high-stakes test results to rate schools. This allows for a direct comparison of principals’
views on these two accountability approaches. Interviews inquired about what the principals learned
from the SIs, the perceived legitimacy of feedback, and how the process aligned with the principal’s
vision for the school, specific programing, and long-term goals. Principals discussed their
motivations for change, reforms already implemented based on the inspections, and their
expectations for reform success and continuity. The interview protocol is presented in Appendix A.
         Interview questions were informed by the organizational change literature (see the
Theoretical Framework section). Interviews were coded primarily through deductive coding,
although inductive coding was used to a lesser extent. Deductive codes were formulated using an a
priori scheme developed based on constructs from organizational change theory (e.g. Armenakis,
                                                    70


Bernerth, et al., 2007; Holt et al., 2007). Constructs were adapted to the context of school reform
implementation. Principals act both as agents of change and as change recipients. Changes stemming
from the SI are mostly decided by the principals—as opposed to top-down changes typically
considered in the organizational change literature. Inductive codes were added to reflect key ideas
being investigated that were not captured by a priori codes. These codes were developed to analyze
principals’ perceptions of SI feedback, for example, its usefulness and how it influenced reform
implementation in the school. Details of the coding scheme are provided in Appendix B.
         To ensure code reliability, an independent-coder method was employed. First, a subset of
interview transcripts was independently coded by two researchers using DEDOOSE software. Then,
codes were compared for agreement. An iterative process was followed until at least 80% agreement
was achieved on the pooled Kohen Kappa (De Vries et al., 2008). After this reliability check, the
remaining transcripts were coded by the lead author. Using DEDOOSE, patterns were identified both
within and across interviews. Data were examined visually in the form of cross-tabulations and
frequency charts in order to determine the presence and absence of codes both within and across
interviews.
         A limitation of this study is related to relying on self-reporting of planned reforms. The
interviews rely on self-reported reforms and therefore are limited to actions that principals can recall
or decide to report. The study does not capture implemented reforms due to SI. Another possible
limitation is that principals’ attitudes towards inspection might be positively biased, to satisfy the
district. Several factors alleviate all these concerns. Principals were informed that interviews were
anonymous and that no identifying information for the school would be reported. There are
indications that principals’ responses were quite candid and honest. Many respondents did not shy
away from expressing critical views of the district, its accountability system, and school inspections.
I had no indication in the interviews that principals were using coded language or seemed hesitant to
respond to particular questions. This alleviates some concern regarding self-censoring. In addition,
                                                      71


inspection visits were relatively recent and a major component of their planning process; principals
provided detail about their experiences, which lessens concern over recall bias.
                                                  Results
         Results indicate that principals hold attitudes towards inspection that are associated with
lasting institutional change. Most principals from inspected schools exhibit positive attitudes and
believe that on-site visits enable inspectors to attain a holistic understanding. In contrast, the district
accountability framework alone has limited ability to understand the schools and perform an accurate
assessment, as acknowledged by a majority of principals.
         Interviewees believe that the inspection diagnosis is effective (69%), the feedback is
appropriate for their schools (69%), and they are ready to support changes based on the inspections
(75%). These attitudes that were strongly interrelated within cases. A minority of principals (25%)
hold negative attitudes towards inspection and question its validity. These negative attitudes are
driven by perceptions that changes suggested by inspections are not well-aligned with school goals
and have a negative tone. They are not ready to make changes based on inspection feedback. Results
are summarized in Table 4.
Table 4. Principals’ Attitudes towards School Inspection
                   Diagnosis           Feedback         Readiness for
 Principal
                  Effectiveness    Appropriateness         changes
 Nicholas              yes                yes                yes
 Mary                  yes              mostly               yes
 Tyler                 yes              mostly               yes
 Thomas                yes              mostly               yes
 Monica              mostly               yes                yes
 Linda               mostly               yes                yes
 Sarah               mostly               yes                yes
 Mark                  yes                yes               mostly
 Amy                   yes              mostly              mostly
                                                     72


Table 4 (cont’d)
                   Diagnosis          Feedback         Readiness for
 Principal
                 Effectiveness    Appropriateness        changes
 Heather          no mention              yes             mostly
 Matthew            mostly              mostly            mostly
 David              mostly           mostly not           mostly
 Sebastian        mostly not               no           mostly not
 Brian                 no                  no                no
 Paul             no mention               no                no
 Ashley           no mention               no                no
 Yes                  69%                69%               75%
 No                   19%                31%               25%
Perceptions about the District Diagnosis Effectiveness excluding Inspection
        Besides inspection, all schools are part of the district accountability framework. This
framework rate schools based on a series of performance metrics, where standardized test results are
the most influential component. In addition, the principals’ supervisors also assess schools to provide
support and keep them accountable.
        Half of the principals think that this framework can produce an accurate evaluation. Other
principals argue that this information is limited, and assessments are unlikely to be accurate. The
consensus among principals is that the district relies heavily on quantitative metrics from the district
accountability framework for diagnostic purposes, yet these data might not best characterize specific
conditions within schools. Most principals feel that accurate assessment could be made by the few
people that know the school. Nearly 75% of principals acknowledged that several individuals are
deeply familiar with their school, including principals’ supervisors and, to a lesser degree,
instructional leaders and senior district leaders.
        Some interviewees point out the downside of information not being widely available and
residing in “pockets” within the district. Principal Thomas voices this concern, explaining that an
                                                    73


diagnosis of the school “would probably not be very accurate because the information lives in
pockets instead of it living in some centralized place, where I have confidence that everybody has
access to the information that they need to have.” Furthermore, several principals note that the
accessibility and continuity of this concentrated knowledge is vulnerable to high turnover in school
leadership.
         Limitations of district knowledge of school conditions are acknowledged by nearly all
principals interviewed (19 out of 20). When asked specifically about what the district overlooks with
the performance framework, principals most frequently mentioned that relying on test scores can fail
to capture specific school challenges, strengths, and ongoing initiatives. More than half of principals
raised this view. For example, some principals described limitations of district metrics to capture
impediments to school improvement. Often, assessments are unable to identify specific factors that
limit student achievement. Several principals attribute this to the district not having full knowledge of
the student body and communities, particularly in terms of socio-emotional factors. Principals Mary
and Linda illustrate these perspectives:
         Principal Mary: [The central office] looks at the data. … They have plenty of data. … But if
         you want to come in and dig deep into why kids don't come to school, [and] why kids are not
         succeeding … then I think they may not be as familiar…. Other social and emotional issues
         that factor into the students' success, I would say does not come out very well in a data chart.
                  Principal Linda: I'm not sure how they're looking at the data because when you have
         children who come to this school with trauma and no language skills, that isn't always
         showing up in the data… It's the social emotional data that I don't see them looking at.
Other principals note that the district is not very familiar with the school specifics: the school
program and academic focus, what the school is doing well, and how much the school has improved.
Principal Sarah voices this concern:
                                                     74


         I think for an accurate diagnostic, someone would need to know my school on a deeper level,
         and there would need to be commitment from folks from [the] central office to really spend
         some time in our school, to understand, and to be able to speak to … different initiatives that
         we have, the way that our school culture runs.
Principals reported that test scores remained the dominant information considered. Overall, a
majority of principals believe that the diagnosis accuracy could be improved if these were based on
mechanisms that provide a close look at individual schools, and not from a system-level evaluation
of quantitative metrics. Many principals highlighted the district’s incomplete knowledge of schools’
challenges and strengths as well as why students fail.
School Inspections: Positive Attitudes among Principals
         Principals’ attitudes towards Sis can be categorized as positive, mixed, or negative. A
summary of findings is presented in Table 5. I assess these attitudes in terms of perceived diagnosis
effectiveness, sentiments of appropriateness, and readiness for change.
Perceptions of Diagnosis Effectiveness
         A majority of the principals (70%) believe that SIs can effectively evaluate schools. Most of
these principals explain that inspection is effective due to the thoroughness of the process. School
visits offer “boots on the ground” (Principal David) and provide a “more holistic” diagnosis
(Principal Thomas). Principal Monica illustrates this point in more detail:
         [The inspectors] actually lived the life of the school … they were in the building two days;
         they were in every single classroom. So they had an opportunity to see every single teacher
         and the fact that they were able to touch every single piece, it meant that they knew
         everything that was going on in that building …, I really felt at the end, we had a really good,
         strong picture of the school.
                                                     75


Table 5. Summary of principals’ views
 Attitudes            Diagnosis Effectiveness                         Appropriateness                                 Readiness to change
 Positive   Accurate knowledge of schools              Aligned with school vision & long-term goals      Strong commitment
              Holistic                                  Opportunity to assess areas of interest            Changes based on convictions
              Thorough                                  Insights from staff                                No sense of obligation
              Contrasting different sources             Confirms what schools are doing well             Perceptions of change efficacy
              "Real"                                    Source of legitimacy to implement reforms        Already have evidence that changes worked
                                                        Enables smooth changes
 Ambivalent Limits to know the school in a short visit Disagreement with feedback                        Some doubts about self-efficacy
            Limits of any external review               Disagreement on "high leverage" areas              Not sure school staff will "buy-in"
                                                        Disagreement on specific feedback                  Staff tiered of constant reforms
                                                       Already implemented different plans               Changes are not central for the district
                                                        Discussed other changes with district leaders
                                                        Staff should decide handling of specific issues
            Inability to know the "true culture" of
 Negative                                              Doesn't focus on what is important for the school Questions towards the whole process validity
            schools
                                                                                                         Negative tone of evaluations
                                                                         76


Principals highlighted the value of different mechanisms used during the school visit for attaining an
accurate diagnostic. Interestingly, there was no consensus regarding which mechanisms are the most
valuable. Principals mentioned varied perspectives about how the varied sources of information led
to a thorough knowledge of the school. For example, Principal Mary valued inspections as an “honest
insight to what was actually going on inside the classrooms,” referring to classroom observations.
Principal Nicholas explained that through focus groups and interviews they “learned a lot about just
how … [the teachers] were feeling and how… [school leaders] could help with their morale.”
         Principal Monica thought that by analyzing both school data and school planning documents
prior to the visit, the inspectors “walked in the door,” and are able to say: “Yeah. This is exactly what
it looks like.” Despite differing in their perception of the most valuable information sources,
principals highlight how these sources can reveal different aspects of the schools.
         Several principals draw direct comparison between the SIs with the district accountability
framework to illustrate their points of view regarding perceived accuracy. Principal Amy notes that
the inspectors “come into our space to ask the questions, whereas [with the district accountability
framework], you look at a school on paper. You can't read between the lines [about] a school on
paper.” Principal David explains that “the outer inspection really provided … [an] on the ground
assessment [in a way that the accountability framework] just isn't capable of providing.” Principal
Thomas mentioned that he felt like the inspectors actually “asked more questions than a lot of people
of the district asked.” Similarly, several principals draw direct comparison with the work of their
supervisors, who also visit the school and get to see the local context in greater detail. Some
principals believe that the inspections provide a more thorough and objective assessment. Other
principals acknowledge that their supervisors know the schools well and trust the assessments. Yet,
most principals agree that these sentiments are greatly reliant on the individual supervisors.
                                                    77


        Overall, most principals deem the SI diagnoses as more effective than the data accountability
framework. In comparison to the district supervisors, it is unclear which is perceived as more
accurate; there is a difference in opinion among interviewed principals.
Sentiments of Appropriateness
        To assess sentiments of appropriateness, the interviews inquired whether reforms based on
the SI are aligned with the school vision, programming, and long-term goals. Nearly 70% of
principals confirm that the feedback aligns with at least one of these aspects.
        Many principals think that the inspection feedback aligns with the programming and vision
(56% of respondents). They explain that this alignment is due to the evaluation’s broad scope and
insights gained from school members (administrators, teachers, and students).
        Principals feel that the SI encompasses a wide variety of issues that go beyond the district
accountability framework. This broadens the possibilities for informing and strengthening school
programming. For example, Principal Matthew’s programming has been focusing on teachers’
growth mindset, although he feels these efforts have been “falling short.” The SI addressed this focus
and provided a chance to strengthen it; Principal Matthew has “capitalized” on the SI feedback as an
opportunity for collegial learning. He explains: the SI has been “fitting with where the school needs
to go” with a “very helpful and time efficient” process.
        Principal Linda provides another example. She manages a school with a high proportion of
immigrant students. Her school programming addresses trauma and socio-emotional wellbeing; this
is viewed as a necessary condition for improving student performance. She explains that the wider
scope of the SI is appropriate in multiple ways: it aligns with their programming, affirmed that they
“honor the diversity,” and identifies areas of strength and improvement.
        In addition, several principals emphasized that on-site SI evaluations can obtain critical
insights from school staff and students. This wealth of information aligns well with the schools’
                                                    78


values and has been used to inform, support, and adjust the schools’ programming. Principal
Nicholas provides an example about key insights helping to pursue their programming:
         I think [the SI] has been very supportive of our overall vision as a school. Our vision is to
         empower students to be self-agents, to be independent, to have a voice. And teachers don't
         seem to exercise enough of their [own voice] … the fact that it came out in the SI that they
         felt powerless, that they felt like they had no voice in decision-making and decisions that
         affected them, really shook me … That's why we made such a huge change very quickly to
         address their concern.
I also assess the alignment between SI feedback and long-term goals. To do so, all changes in the
school vision, planning documents, and explicit mentions to the in the interviews about the long-term
goals were considered. About 56% of respondents believe there is alignment between the SI and
long-term goals of the school. Three of the interviewed principals based their school’s vision on SI
feedback. One of these principals, Principal Tyler, illustrates this modification:
         I think it [the SI] laid the groundwork for us to start to have [a vision] …. We really didn't
         have a school vision or mission, or we didn't have a strategic way of attack of what we're
         doing. So I think that helped uncover we needed to have it.
Some principals mention that the SI aligns with their long-term goals better than the district
accountability framework. For example, Principal Nicholas explains that the SI focuses on teacher
voice, rather than performance data, fits with their “values and vision as a school.” He continues:
“That's why I found it very encouraging, because they did focus more on the people and on the long-
term rather than the short-term gains.” A similar point is made by Principal Thomas:
         [The district accountability] framework freaks me out because, sometimes, when I feel like
         I'm going to focus on the long-term, then I don't think that it will necessarily show up on [the
         framework]. And then I think I'm going to get lots of questions about whether I'm a good
         leader or whether my school is going in the right direction. I think the SI has validated that
                                                      79


         we're doing really good things here [at our school] and I would like to have more holistic
         tools to be able to showcase the work that's happening here.
Other principals appreciated the discretion that they have to make gradual changes; this enables them
to maintain long-term goals. Principal Mary illustrates this point saying that the SI did not say they
had to “make some radical changes”; on the contrary, it said: “here are some strategies” that you can
we implement with fidelity that will make the greatest impact in your classroom, in our school,
school-wide.”
Readiness for Change
         The analysis of readiness for change emphasizes two elements: commitment to change and
change efficacy (the sentiment that a change will be successfully implemented). About 75% of
principals expressed their commitment for reforms suggested by SI. In interviews, principals were
directly asked about their commitment to these changes. In their responses to this direct question,
most principals conveyed a strong commitment. At various points in interviews, principals expressed
their degree of commitment. In most cases, principals’ commitment appeared to be based on their
own convictions, and not as an obligation. For example, Principal Tyler said that is “super
convinced” of the changes and Principal Linda that is “totally committed” and “took to heart” parts
of the feedback.
         Following Herscovitch and Meyer (2002) typology, I find that most expressions of
commitment were either of the affective or continuance type. Affective commitment arises when
principals want to implement the changes, while continuance implies that reforms must be
implemented in order to avoid failure. Besides these two motivations, only a few principals
acknowledge some normative type of commitment, which is spurred by a sense of obligation. This
distribution of responses—with a low incidence of normative the normative type—is associated with
a higher level of commitment (Herscovitch & Meyer, 2002).
                                                   80


         Most principals expressed their commitment to specific changes they had already
implemented based on the SIs. About 88% of principals had implemented or planned reforms based
on the SI at the time of the interview. In a majority of these cases, these changes were part of key
reform areas in the schools. This commitment was evident in the three schools where the SI led to
establishing a new mission, vision, or strategic plan. This was illustrated by Principal Tyler who has
restructured their strategic plan around their four areas that they that the SI focused on, which he
considered that “changed the trajectory of the school.” In other cases, the SI led to inclusion of a
more targeted focus or new areas for reform. Principal Sarah illustrates expressions the commitment
toward these reforms:
         We talked about instructional rigor, and that was one that we really latched onto, and that we
         really took, and thought about, and turned over in our heads, and tried to figure out for the
         next year, for us to say, "Our school-wide focus, we're going to focus on rigorous
         instruction," that helped us out.
Additionally, some of the strongest expressions of commitment coincide with perceptions of the SI as
a source of legitimacy to carry out school reform plans. This arises in nearly a third of the interviews.
These perceptions refer to increased legitimacy with respect to ether the school staff and/or the
district. Principal Nicholas mentioned both of these perspectives. He implemented “drastic” changes
in teacher’ schedules, reducing time devoted to professional development and adding planning time.
This decision was based on SI feedback that indicated teachers had low morale and felt their voices
were not heard. Principal Nicholas explains how the SI served as a source of legitimacy to move
forward with this reform:
         I'm completely committed. [This initiative] was really my idea. I really wanted this to happen
         long before. I didn't think it was possible this kind of change. I didn't think the District would
         allow us, but because the sentiment came across so strongly in the SI …. I felt like we had a
         lot stronger case to present to the District to say, "Hey, this is what the school wants, the staff
                                                      81


        is asking for. It came across in the SI as a huge concern and a challenge and we have all this
        staff input into this plan." And so, I think all that helped to push the District on a decision that
        I thought would've been impossible without all of that work from the SI to the staff input ….
        So, where we might've felt tempted to just do away with it, now there's just a lot more
        excitement to continue with it.
The interviews asked about principals’ perceived efficacy of reforms due to the SI. Half of principals
among those who express commitment to change, also stated about change efficacy always
favorably. In most of these cases, confidence in efficacy was based on what they observed in
previously implemented changes. This was the case of Principals Linda and David:
        Principal Linda: I'm totally committed … Because I've seen a change in the students and in
        the teachers and in the number of suspensions. All of that has changed.
                 Principal David: we were able to really … use that [(the SI report)] to dig into our
        formative assessment practices. And then I think we ended up seeing a lot of really strong
        results out of that.
This subset of principals shows a strong readiness for changes based on the SIs, even when there are
specific aspects of the feedback that they do not embrace or are not certain about the efficacy. The
next section describes these instances of mixed sentiments towards changes.
Mixed Sentiments and Ambivalence
        While most principals show overall positive attitudes towards the SI diagnosis, feedback, and
readiness for changes based on it, many also raise some concerns about at least one of these
categories (see Table 4).
Concerns about Diagnosis Effectiveness
        Specific concerns raised about diagnosis effectiveness, were not emphasized. Reasons behind
these concerns varied considerably. Two types of concerns were mentioned in several interviews: 1)
                                                     82


the capacity to understand schools during a short visit, and 2) general limitations of an external
review to make an effective diagnosis.
         First, a few principals questioned the ability of the SI to understand the school in the short
time of the visit. Principal Matthews, while appreciating insights from the SI, explains that after a
short visit “calling out” behavioral management in the school “doesn't have a lot of validity.”
Similarly, Principal David thinks that while specific aspects of the diagnostic where accurate, others
were not for the following reason: “I didn't feel it was the most helpful … around some “soft skills”
or areas around culture and climate … I didn't think that in the two days they were really able to pick
up a lot of necessary context and know the best direction for us to go.”
         Second, two principals raise general concerns about the limitations of diagnostic tools that
are external to the school. Principal Linda mentioned the lack of emphasis placed on the social
emotional aspects of a school, arguing that an improvement would be to assess the distress of the
students. In this regard, she claims that this is not captured by the district accountability framework,
district leadership, nor the SI. Principal Sarah perceives that there are different “pockets of data,” but
nobody can tell the full story. Thus, there are principals who feel that neither the SI or the school
district can capture school conditions in a comprehensive way.
Concerns about Appropriateness
         Among the principals who find the SI to be appropriate overall for their school, half of them
identify specific aspects that are not well-suited. General sentiments of appropriateness of the
feedback does not translate in buy-in of every aspect included in the inspection. The most common
reason for concern is that principals have different focus or do not think is the best path to take or
“high leverage” (Principal Thomas). This stance is also illustrated by Principals Matthew:
         A couple years ago, we had [an inspection that] … focused a lot on behavior … and the
         umbrella of culture. To me, the issue wasn't around anything but expectations. The
         expectations were too low … I pretty much ignored it. I continued to focus on making sure
                                                      83


        that we have standard based instruction, to make sure that we have high expectations. That
        was my means to improve culture.
In a couple of cases, principals thought that some areas of the SI feedback did not align with their
strategic planning that the school had decided or that has already been discussed with the school
supervisor. This was illustrated by David:
        Principal David: [What I discussed with my supervisor] didn't always match with what the
        inspection surfaced as the biggest gaps. And at times it got a little tricky to try to balance
        those two and saying: “this is what I know our school needs, and this is what our team has
        decided.” ... [Because] there's only so much that you can have the capacity to change at a
        certain time. ... I think that led to some in-depth conversations with my [supervisor].
Similarly, Principal Thomas decided not to focus on specific SI feedback and instead relied on the
professional decisions of his staff:
        One of the recommendations was around better structures related to lesson planning. ... with a
        really professional staff who's taught for a long time, we chose not to focus on that because
        that felt constraining to some of our staff members … they have their own way to plan.
Uncertainty about Readiness for Change
        Among principals who were committed to the changes based on the SI, some expressed
uncertainty about reform efficacy. In general, prevalence ambivalence feelings about the changes
was very limited. The major source of uncertainty about the success of the changes based on the SI,
mostly because of the challenge it poses:
        Principal Matthew: [I am] … about 70 percent [confident that these changes can be
        successfully implemented]. ... Because we're a hard school. … To have positions that are half
        in the classroom, half out of the classroom [(a reform implemented based on the SI
        feedback)], it sounds to me like a hell job, and I don't know how it'll work, quite frankly. So
        it's a little bit of an experiment.
                                                      84


                 Principal Heather: “When I first saw the [SI] report, I wasn't sure how much we could
        really get done quickly, because there was a lot that [was] needed. And when I was hired it
        was already June, so teachers had left, and we were really scrambling to pull a team together
        … and not lose school culture while we did these changes.
Several principals expressed more concerns about whether the school community would buy in the
changes. This stance was illustrated by Principals David and Heather:
        Principal David: We had some people who are ready to jump in right away. Some people
        who took more time, especially teachers who had been at the school for a while, I think it's
        going to felt more in that line of, we've been here, we've ridden this roller coaster before, how
        are we going to drop and then go back up that sort of thing.
                 Principal Heather: It took a good share of one entire school year for people to
        completely buy into this. …. And the same was true of students, they weren't necessarily
        loving some of the changes either, because they felt like some of their freedoms had been
        taken away.
The lack of centrality of the SI as an accountability and improvement mechanism in the district is
another factor that seem to increase ambivalence towards changes. The district accountability
framework and supervisors’ recommendations are the central mechanisms. This is the case of
Principal Amy; she found that the SI had an accurate diagnostic, appropriate feedback, and she was
committed to the proposed reforms. However, she noted that the SI “doesn't feel central.” Similarly,
Principal Sebastian, who acknowledges the effort to make out the SI and find some specific aspects
useful, still considers the SI secondary: “I felt like I should honor that work … but it was not as
important to me as my district leaders’ feedback.” Finally, when Principal Linda was asked whether
the SI facilitates the implementation of long-term goals, she explained that it gave her some ideas,
but “there's district mandates and they don't always align with what the schools can do.”
                                                      85


        Despite principals’ positive attitudes toward the SI, and their commitment stemming from
their own convictions, some principals feel pressure to improve. Several principals that show overall
positive attitudes still perceived the process to be judgmental at times (Principal Mathew); some felt
“like you were under the gun” (Principal Sarah). Principal David explains how he experiences
pressure:
        Pressure of leading a red school [(the lowest performing school in the district accountability
        framework)], it's very much implicit … you get the SI, you get this big assessment that tells
        you everything's wrong with your school that does, I think in any principal really provide a
        source of motivation and pressure to want to improve.”
Despite the pressure, explicit or implicit, when most of these principals did not deem SI reforms to be
appropriate or a priority, they maintained their previous focus and ignored the feedback. Despite
these decisions, these group of principals deemed the feedback overall as appropriate and are
prepared to implement changes.
Negative Attitudes
        A group of four principals hold attitudes contrary to lasting change towards the whole SI
process. The reasons underlying these attitudes are varied (see Table 4). Interestingly, comments
about the diagnosis effectiveness were not as prevalent. Only two principals directly question the
accuracy and the capacity to know the school. Principal Sebastian argues that he “didn't feel like [the
SI] observed all of the important systems” they had in place; however, he does not question the
overall diagnosis effectiveness. Principal Brian poses the strongest criticism, directly questioning SI
capacity to become familiar with schools, due to the short visit and lack of follow-up:
        I didn't feel [they] understood our true culture and our school. The amount of time spent...
        there's got to be an ongoing process where they're spending more time in our schools, more
        time understanding true context, just getting follow-up, after the SI is completed, for support
                                                     86


         … [The SI resources] would have been invaluable; but somebody coming in for a couple of
         days …, and then getting back on the plane, that's not helpful.
In contrast, all principals in this group were critical of the SI feedback not being appropriate for their
school. The basis for these attitudes varies. Several principals criticize the focus—or lack of focus—
of the evaluation instrument. For example, Principal Ashley characterized the SI as “just a
compilation of best practices that you would expect to see in pretty much any school across the
planet, [that] doesn´t say anything innovative or riveting.” Similarly, Principal Paul argue that the SI
is “only interested in a handful of items that never really took our school-wide focus into account.”
Principal Brian emphasizes the lack of usefulness: “I don't think they told us anything that we didn't
already know. They didn't give us anything that led directly to anything that we're currently doing
instructionally and culturally.”
         Three principals expressed their lack of commitment to the SI process. They dismissed the
overall validity of the SI process. This is illustrated by Principals Paul and Brian:
         Principal Paul: They came into the building for about three days, they used the room to
         conduct interviews, they went to multiple classrooms to take notes, and they formulated their
         own report and gave it back to us. So, it didn't seem authentic, I guess.
                  Principal Brian: I didn't find it a very useful process or tool. You know, I just didn't
         feel like it was authentic. … I wouldn't put a lot of weight into the impact it had. It was a
         report that I probably read one time and that was it and then we moved on.
Accountability pressure and decision-making control appears as underlying themes across most
interviews. These perceptions might influence attitudes towards SIs. Interestingly, three out of the
four principals who demonstrated strongly negative attitudes only participated in the process once,
during the one year that the SI had high-stakes. In contrast, among principles with positive attitudes
towards SIs, only one out of 12 participated solely during the year when high-stakes were in effect.
In all of these cases, there is a perception of high pressure and a feeling that they were required to
                                                       87


implement recommended reforms. For example, Principal Ashely say that she is not going to “disrupt
the school improvement cycle of our entire school based upon one metric.” Principal Paul remarked
about the lack of transparency for evaluating quality:
         There was a lack of transparency. They didn't want to talk about the questions that they
         wanted to ask teachers or students. … They didn't provide a rubric for how things were
         scored. … Say what it is that you're going to look for in a SI …. “Here is the scoring system
         in which we are going to use this inspection”
Similarly, this group of principals also mentioned that the inspection felt evaluative and highlighted
negative aspects of the process. Further, there is a sense of perceived unfairness. Principal Sebastian
explains that “the process was informative”, yet he “also felt vulnerable as the principal.” Principal
Paul say that the SI “wasn't about guiding feedback for a development of instructional learning. It
was more almost a negative, these are the things that you're not doing.” Principal Ashley voices her
frustration in a more personal tone: “I just find it belittling that individuals think the external tool is
going to make us somehow improve our buildings when it's something that we do every single day of
our lives as public educators.” She goes a step further, questioning the purpose of the SI: “I believe
the SI is a passive tool to confirm the district's perception of a school so that they can … support their
decisions that they already have of closing schools.”
         Overall, while the basis for negative attitudes varies, it leads principals to dismiss the validity
of the process. Principals with negative attitudes do not perceive the feedback as providing useful
information. They discard most results, in part due to feelings of accountability pressure and a sense
of obligation to make changes based on the feedback.
                                               Conclusions
         Inspections may offer an alternative to test-based accountability to help schools gain insight
into promising directions for improvement. Results indicate that the majority of interviewed
principals have positive attitudes towards inspection. The positive attitudes demonstrated by
                                                      88


principals are ones associated with lasting institutional change; this suggests that inspection might
enable sustained school reforms. In addition to inspection, all schools in the study also are subject to
high-stakes accountability that emphasizes standardized testing. Yet, a majority of principals
questioned the ability of testing alone to effectively evaluate their school. Test scores alone do not
provide explanations for low performance or provide limited directions for specific reforms.
        A major contribution of this study is that it demonstrates the connection between positive
attitudes and known dispositions that lead to lasting change. It finds that most principals perceive the
diagnosis as effective, feel the suggested changes to be appropriate, and are ready to take action.
These results are consistent with previous studies that find principals have positive attitudes
regarding inspections (Behnke & Steins, 2017; Bitan et al., 2014; C. Gray & Gardner, 1999). This
study goes beyond past efforts, focusing on attitudes associated with sustained change.
        Principals appear to form positive attitudes based on aspects of inspection that are absent
from test-based accountability. Perceived effectiveness of the diagnosis is attributable to the
thoroughness of the inspection process. Inspections result in an accurate picture of the school and
identify key challenges, strengths, and improvement areas. Perceived appropriateness of the feedback
is associated with the SI considering a comprehensive set of reform areas that inform school
planning. Many principals view SI findings as an opportunity to support and refine their plans for the
school to achieve long-term goals. Finally, readiness for changes recommended in the inspection
feedback is evident; principals expressed commitment to these improvement areas. In addition, they
show great confidence that reforms will be effectively implemented.
        However, most principals also expressed some ambivalent feeling towards aspects of
inspection. Some argued that brief, external evaluations can be limited in their ability to accurately
assess school conditions. Others felt that the specific feedback did not align well with their strategic
planning, or that it undermined the decision-making making power of school staff. In addition,
several principals have some doubts on the efficacy of changes, recognizing that change is hard, and
                                                    89


reforms can fail. Overall, most principals have strong positive attitudes towards inspection, yet some
note the limitations of certain aspects.
         Principal attitudes appear to be associated with accountability pressure and perceived control
over decision-making. When principals perceive greater accountability pressure, they are more
critical of the process and less likely to implement recommended reforms. This was evident for a
small number of principals who only received one inspection visit during a year when stakes were
higher. While specific criticisms varied, these principals questioned the validity of the overall process
and motives for the inspections. Yet, most schools had at least one inspection in years with lower
accountability pressure and only faced an implicit pressure to improve. Most principals do not feel
obliged to implement changes based on the results. Principals are selective and target feedback that is
useful to support their long-term plans.
         This paper sheds lights on how principals perceive inspection. I contrast these perceptions
with those of district’s main mechanism for accountability: test-based accountability and school
supervisors. Test-based accountability is viewed as a transparent way to measure school
performance, yet it is limited in its ability to identify specific strengths and weaknesses. Then,
principals’ supervisors are familiar with the local context and school practices; this offers the
possibility of open dialogues on improvement strategies. Yet, principals view that individual
assessments by supervisors are not as transparent as inspections. In addition, this in-depth knowledge
relies on specific individuals, and risks being lost. In contrast, inspections appear to offer improved
transparency and insights for reforms. On-site evaluations evaluate school operations following a
protocol and produce formal reports; the process is clearly defined and principals find the
assessments useful.
         Lastly, principals highlighted limitations of the district inspection policy to inform school
reforms. First, the lack of centrality of inspection in contrast to the district accountability framework.
Principals have strong incentives to respond to district ratings, which are heavily based on
                                                      90


standardized test results. They do not exhibit the same urgency to implement changes based on
inspection feedback. Second, many principals feel that the lack of follow-up procedures after
inspection limits the ability of the district to support reforms they implement in response to
inspection feedback.
        Overall, this study presents evidence of the potential for school inspection to enable sustained
reforms in systems dominated by high-stakes accountability. It shows that brief inspection visits are
perceived by principals as effective and provides motivation to implement reforms based on
feedback. It shows the value of evaluating schools holistically and considering school stakeholder’
perspectives, to provide insight for improvement that is well-aligned with school goals.
                                                     91


APPENDICES
    92


Appendix A. Interview Protocol
                               IRB application ID#: STUDY00001267
SECTION I – DISTRICT DIAGNOSIS AND KNOWLEDGE OF THE LOCAL CONTEXT
   1. First, tell me a bit about your background in education. How did you come to be a principal?
   2.  How much do you think the District Office knows about your school and its context?
          a. If the district decided to do an integral diagnosis of your schools with the information
               they have now, how accurate do you think it would be?
          b. What sources of information would the district use for this diagnosis (considering
               information the district already has)?
          c. What information would the district be missing to make an accurate diagnosis of your
               school?
          d. Would you consider this the district diagnosis to be fair?
   3. What feedback regarding school performance or how well your school works have you
      received from the district?
          a. Which of those were most useful to you? Why?
          b. To what degree can you (or you and a school-based committee) determine or interpret
               what changes should be implemented in your schools based on this feedback?
SECTION II – SCHOOL IMPROVEMENT AND DECISION MAKING
   4. Can you tell me about an important area of your school you are currently or have recently
      been working to improve?
          a. Why did you select FILL IN as the area to focus on?
          b. Thinking back, how did you select FILL IN as a focus for improvement?
                       i.   Were there any resources, including people, that helped inform your
                            decision?
                                                  93


         c. What information did you examine prior to selecting FILL IN as an area to focus on?
         d. IF NOT ALREADY SAID: What have you and your faculty been doing to address
             this area?
                      i.   What resources, including people, helped you decide this course of
                           action?
  5.  Looking back, is there any information or resources you wished you had to help you plan
     strategically for school improvement?
  6. What is the most important the district feedback or support mechanism that led you to change
     how you see the main problems in your school?
     FOLLOW-UP:
         •   Performance framework, school supervisor, and SI.
SECTION III – SCHOOL INSPECTION (SI): PERCEPTIONS & RESPONSE
  7. Can tell me about the time you received an SI visit in your school?
         a. How it was the process and how did you personally experience it?
         b. In what ways was this process useful for you?
     FOLLOW-UP:
         •   To what degree has the SI led you to change how you “see” the main problems that
             hinder improvement in your school?
         •   Has the SI helped you become aware of any significant changes were needed at your
             school in order to improve education quality?
         •   To what degree did SI feedback confirm what you already suspected?
                  §   If the SI feedback confirmed what you already now, was it helpful to have
                      this confirmation? Why?
         •   In what ways it wasn’t helpful enough.
                                                94


8. Do you think that the changes that should be implemented based on the SI feedback are
    aligned with your school history, vision, and current programming?
        a. How are the changes that should be implemented based on the SI feedback more or
            less aligned with your values and current programing in comparison to the changes
            based on other the district assessment and support mechanisms?
            [if not mentioned, ask about the Performance Framework and the School Supervisor]
        b. How convinced are you that that these changes can be successfully implemented?
        c. How motivated and committed are you to these changes? Why?
            Follow-up: To what extent are you motivated because you feel these changes are
            valuable/ appropriate vs. because you are required to carry these out (& could face
            consequences)?
        d. How invested was your staff in following these changes?
9. Can you tell me how the most relevant changes on focus or strategy you implemented based
    on the SI feedback? Can you give me an example of how you used the information in the SI
    to create change?
        a. Why have you decided to focus on FILL IN?
        b. How was the SI useful in implementing this change?
        FOLLOW UP:
        Was the SI helpful to define specific improvement strategies? How?
        What was the role of the school supervisor in this process?
10. What significant change could have been implemented based on the SI feedback, but you
    decided not to?
        a. What circumstances led to these changes not being implemented?
11. Due to accountability pressure to improve, have you ever considered changing strategies that
    you think would deliver improvements in the long-term, but not in the short-term?
                                               95


         a. Have the SI changed this idea in any way?
  12. I’m sure you spend a lot of time reviewing data and other indicators of your school’s quality.
      How do you sort through all of this information? Can you walk me through how you
      approach the different sources of data and how you weigh different sources?
         a. Compared to FILL IN, how useful is the SI feedback for the school improvement
             planning process?
         b. Compared to FILL IN, how useful is the school supervisor for the school
             improvement planning process?
         c. Compared to FILL IN, how useful are School Performance Framework for the school
             improvement planning process?
         d. In what ways is the SI information different from other sources of school quality
             indicators?
  13. Overall, how valuable do you think the SIs are? How could they be improved?
SECTION IV – PRINCIPAL BACKGROUND
  14. How many years total have you been a teacher?
  15. PRIOR to this school year, how many years did you serve as the principal of THIS OR ANY
      OTHER school? (Count part of a year as 1 year)
  16. PRIOR to this school year, how many years did you serve as the principal of THIS school?
      (Count part of a year as 1 year)
  17. What is the highest degree you have earned? IF NOT STATED, Is that degree specific to
      education leadership?
                                                96


Appendix B. Coding Scheme
1. Diagnosis accuracy / knowledge of school
       1.1. Accurate / Good knowledge
               1.1.1. Inspection
               1.1.2 District
       1.2 Inaccurate / Incomplete knowledge
               1.2.1. Inspection
               1.2.2 District
2. Diagnosis usefulness
       2.1. Useful / New insights
               2.1.1. Better prioritize
               2.1.2. Reaffirm existing goals / Confirm diagnosis
               2.1.3. Gain legitimacy
               2.1.4. Somewhat useful
       2.2. Not useful / Not relevant
3. Appropriateness
       3.1 Most Appropriate
               3.1.1. Aligned with school history, vision, values
               3.1.2. Focus on the long term
               3.1.3. Consistent with the District
       3.2. Not a priority
       3.3. Not appropriate
4. Commitment / Trust in efficacy
       4.1. Commitment / Trust in efficacy
       4.2. Ambivalent attitudes
                                                   97


        4.3. Lack of commitment / Distrust efficacy
5. Changes – What motivates them?
        5.1. Principal initiatives / Staff initiatives
        5.2. Inspections
        5.3. Test results, performance framework, evaluations
        5.4. School supervisors
        5.5. District (excluding 5.3. & 5.4)
        5.6. Other sources
        5.7. Did not implement changes based on inspection (explicit)
6. Principals views
        6.1. Critical views
                 6.1.1. Inspections
                 6.1.2. District
                 6.1.3. School supervisors
        6.2. Positive views
                 6.2.1. District
                 6.2.2. School supervisors
                                                       98


REFERENCES
    99


                                          REFERENCES
Ahuvia, A. (2001). Traditional, Interpretive, and Reception Based Content Analyses: Improving
     the Ability of Content Analysis to Address Issues of Pragmatic and Theoretical Concern.
     Social Indicators Research, 54, 139–172. https://doi.org/101108781350505
Allen, R., & Burgess, S. (2012). How should we treat under-performing schools? A regression
     discontinuity analysis of school inspections in England (No. 12; 87).
Altrichter, H., & Kemethofer, D. (2015). Does accountability pressure through school
     inspections promote school improvement? School Effectiveness and School Improvement,
     26(1), 32–56. https://doi.org/10.1080/09243453.2014.927369
Apple, M. (2005). Education, markets, and an audit culture. Critical Quarterly, 47(1–2), 11–29.
     https://doi.org/doi: 10.1111/j.0011-1562.2005.00611
Armenakis, A., Bernerth, J., Pitts, J., & Walker, H. (2007). Organizational Change Recipients’
     Beliefs Scale. The Journal of Applied Behavioral Science, 43(4), 481–505.
     https://doi.org/10.1177/0021886307303654
Armenakis, A., & Harris, S. (2009). Reflections: our Journey in Organizational Change Research
     and Practice. Journal of Change Management, 9(2), 127–142.
     https://doi.org/10.1080/14697010902879079
Armenakis, A., Harris, S., Cole, M., Fillmer, L., & Self, D. (2007). A Top Management Team’s
     Reactions to Organizational Transformation: The Diagnostic Benefits of Five Key Change
     Sentiments. Journal of Change Management, 7(3–4), 273–290.
     https://doi.org/10.1080/14697010701771014
Armstrong, J. (1982). The value of formal planning for strategic decisions: review of empirical
     research. Strategic Management Journal, 3, 197–211.
Ball, S., & Bowe, R. (1992). Subject departments and the ‘implementation’ of National
     Curriculum policy: an overview of the issues. Journal of Curriculum Studies, 24(2), 97–
     115. https://doi.org/10.1080/0022027920240201
Barber, M. (2005). The virtue of accountability: System redesign, inspection, and incentives in
     the era of informed professionalism. Journal of Education, 185(1), 7–38.
     https://doi.org/10.1177/002205740518500102
Baxter, J. A. (2013). Professional inspector or inspecting professional? Teachers as inspectors in
     a new regulatory regime for education in England. Cambridge Journal of Education, 43(4),
     467–485. https://doi.org/10.1080/0305764X.2013.819069
Behnke, K., & Steins, G. (2017). Principals’ reactions to feedback received by school inspection:
     A longitudinal study. Journal of Educational Change, 18(1), 77–106.
                                                 100


     https://doi.org/10.1007/s10833-016-9275-7
Bengston, D., & Xu, Z. (1995). Changing National Forest Values: a content analysis - Research
     Paper NC-323. http://www.nrs.fs.fed.us/pubs/rp/rp_nc323.pdf
Berry, F. S., & Wechsler, B. (1995). State agencies’ experience with strategic planning: findings
     from a national survey. Public Administration Review, 55(2), 159.
     https://doi.org/10.2307/977181
Bitan, K., Haep, A., & Steins, G. (2014). School inspections still in dispute – an exploratory
     study of school principals’ perceptions of school inspections. International Journal of
     Leadership in Education, 18(4), 1–22. https://doi.org/10.1080/13603124.2014.958199
Bloem, S. (2015). The OECD Directorate for Education as an independent knowledge producer
     through Pisa. In H. G. Kotthoff & E. Klerides (Eds.), Governing Educational Spaces (pp.
     169–185). SensePublishers. https://doi.org/10.1007/978-94-6300-265-3_10
Brier, A., & Hopp, B. (2011). Computer assisted text analysis in the social sciences. Quality &
     Quantity, 45(1), 103–128. https://doi.org/10.1007/s11135-010-9350-8
Chabbott, C., & Elliott, E. J. (2003). Understanding others, educating ourselves: Getting more
     from international comparative studies in education. In Social Sciences.
     https://doi.org/10.17226/10622
Chun, Y. H., & Rainey, H. G. (2005). Goal ambiguity and organizational performance in U.S.
     federal agencies. Journal of Public Administration Research and Theory, 15(4), 529–557.
     https://doi.org/10.1093/jopart/mui030
Clarke, J., & Ozga, J. (2011). Governing by inspection? Comparing school inspection in
     Scotland and England. Social Policy Association Conference, 25.
Coburn, C. (2001). Beyond decoupling: Rethinking the relationship between the institutional
     environment and the classroom. Sociology of Education, 77, 211–244.
     https://doi.org/10.1177/003804070407700302
Coburn, C. (2005). Shaping teacher sensemaking: School leaders and the enactment of reading
     policy. Educational Policy, 19(3), 476–509. https://doi.org/10.1177/0895904805276143
Cole, M. S., Harris, S., & Bernerth, J. B. (2006). Exploring the implications of vision,
     appropriateness, and execution of organizational change. Leadership & Organization
     Development Journal, 27(5), 352–367. https://doi.org/10.1108/01437730610677963
Concurso de Supervisores Rio Negro, (2013).
Resolución del Consejo Provincial de Educación de Río Negro N 1053, Pub. L. No. 1053 (1994).
Conway, M. (2006). The Subjective Precision of Computers: A Methodological Comparison
     with Human Coding in Content Analysis. Journalism & Mass Communication Quarterly,
                                                101


     83(1), 186–200. https://doi.org/10.1177/107769900608300112
Cook, T. D., Campbell, D. T., & Shadish, W. (2002). Experimental and quasi-experimental
     designs for generalized causal inference. Houghton Mifflin.
Cuckle, P., Hodgson, J., & Broadhead, P. (1998). Investigating the relationship between
     OFSTED Inspections and school development planning. School Leadership &
     Management, 18(2), 271–283. https://doi.org/10.1080/13632439869691
Darling-Hammond, L., Bae, S., Cook-Harvey, C. M., Lam, L., Mercer, C., Podolsky, A., &
     Stosich, E. L. (2016). Pathways to new accountability through the Every Student Succeeds
     Act. http://learningpolicyinstitute.org/our-work/publications-resources/%0Apathways-new-
     accountability-every-student-succeeds-act.
De Vries, H., Elliott, M. N., Kanouse, D. E., & Teleki, S. S. (2008). Using Pooled Kappa to
     Summarize Interrater Agreement across Many Items. Field Methods, 20(3), 272–282.
     https://doi.org/10.1177/1525822X08317166
de Wolf, I., & Janssens, F. (2007). Effects and side effects of inspections and accountability in
     education: An overview of empirical studies. Oxford Review of Education, 33(3), 379–396.
     https://doi.org/10.1080/03054980701366207
Dedering, K., & Müller, S. (2011). School improvement through inspections? First empirical
     insights from Germany. Journal of Educational Change, 12(3), 301–322.
     https://doi.org/10.1007/s10833-010-9151-9
Dedering, K., & Sowada, M. G. (2017). Reaching a conclusion—procedures and processes of
     judgement formation in school inspection teams. Educational Assessment, Evaluation and
     Accountability, 29(1), 5–22. https://doi.org/10.1007/s11092-016-9246-9
Deng, Q., Hine, M., Ji, S., & Sur, S. (2019). Inside the black box of dictionary building for text
     analytics: a design science approach. Journal of International Technology and Information
     Management, 27(3), 119–159.
Doud, J. (1995). Planning for school improvement: A curriculum model for school based
     evaluation. Peabody Journal of Education, 70, 175–187.
Edgerton, A. K. (2019). The essence of ESSA: More control at the district level? Phi Delta
     Kappan, 101(2), 14–17. https://doi.org/10.1177/0031721719879148
Education Inspectorate - Ministry of Education, C. and S. (2010). Risk-based Inspection as of
     2009 - Primary and Secondary Education.
Education Inspectorate - Ministry of Education, C. and S. (2017a). Inspection framework
     primary education.
Education Inspectorate - Ministry of Education, C. and S. (2017b). Inspection framework
     secondary education.
                                                 102


Ehren, M. (2016a). Methods and modalities of effective school inspections (M. Ehren (ed.)).
     Springer International Publishing. https://doi.org/10.1007/978-3-319-31003-9
Ehren, M. (2016b). Methods and Modalities of Effective School Inspections. In M. C.M. Ehren
     (Ed.), Methods and Modalities of Effective School Inspections. Springer International
     Publishing. https://doi.org/10.1007/978-3-319-31003-9
Ehren, M., Altrichter, H., McNamara, G., & O’Hara, J. (2013). Impact of school inspections on
     improvement of schools—describing assumptions on causal mechanisms in six European
     countries. Educ Asse Eval Acc, 25, 3–43. https://doi.org/10.1007/s11092-012-9156-4
Ehren, M., Gustafsson, J.-E., Altrichter, H., Skedsmo, G., Kemethofer, D., & Huber, S. (2015).
     Comparing effects and side effects of different school inspection systems across Europe.
     Comparative Education, 51(3), 375–400. https://doi.org/10.1080/03050068.2015.1045769
Ehren, M., Perryman, J., & Shackleton, N. (2015a). School Effectiveness and School
     Improvement. School Effectiveness and School Improvement - An International Journal of
     Research, Policy and Practice, 26(2), 296–327.
Ehren, M., Perryman, J., & Shackleton, N. (2015b). Setting expectations for good education:
     how Dutch school inspections drive improvement. School Effectiveness and School
     Improvement, 26(2), 296–327. https://doi.org/10.1080/09243453.2014.936472
Ehren, M., & Shackleton, N. (2016). Risk-based school inspections: impact of targeted
     inspection approaches on Dutch secondary schools. Educational Assessment, Evaluation
     and Accountability, 28(4), 299–321. https://doi.org/10.1007/s11092-016-9242-0
Ehren, M., & Visscher, A. (2006). TOWARDS A THEORY ON THE IMPACT OF SCHOOL
     INSPECTIONS. British Journal of Educational Studies, 54(1), 51–72.
     https://doi.org/10.1111/j.1467-8527.2006.00333.x
Ehren, M., & Visscher, A. (2008). THE RELATIONSHIPS BETWEEN SCHOOL
     INSPECTIONS, SCHOOL CHARACTERISTICS AND SCHOOL IMPROVEMENT.
     British Journal of Educational Studies, 56(2), 205–227. https://doi.org/10.1111/j.1467-
     8527.2008.00400.x
Fernandez, K. E. (2011). Evaluating school improvement plans and their affect on academic
     performance. Educational Policy, 25(2), 338–367.
     https://doi.org/10.1177/0895904809351693
Figlio, D., & Loeb, S. (2011). School accountability. In Handbook of the Economics of
     Education (pp. 383–421).
Fitchett, P., & Heafner, T. (2010). A national perspective on the effects of high-stakes testing
     and standardization on elementary social studies marginalization. Theory & Research in
     Social Education, 38(1), 114–130. https://doi.org/10.1080/00933104.2010.10473418
Gagnon, D. J., & Schneider, J. (2019). Holistic school quality measurement and the future of
                                                103


     accountability: Pilot-test results. Educational Policy, 33(5), 734–760.
     https://doi.org/10.1177/0895904817736631
Gilroy, P., & Wilcox, B. (1997). OFSTED, criteria and the nature of social understanding: A
     Wittgensteinian critique of the practice of educational judgement. British Journal of
     Educational Studies, 45(1), 22–38. https://doi.org/10.1111/1467-8527.00034
Gioia, D., Thomas, J., Clark, S., & Chittipeddi, K. (1994). Symbolism and strategic change in
     academia: The dynamics of sensemaking and influence. Organization Science, 5(3), 363–
     383. https://doi.org/10.1287/orsc.5.3.363
Glazerman, S. (2016). The ralse dichotomy of school inspection. Mathematica Policy Research -
     Blog Post. https://www.mathematica-mpr.com/commentary/the-false-dichotomy-of-school-
     inspections
Gray, C., & Gardner, J. (1999). The impact of school inspections. Oxford Review of Education,
     25(4), 455–468. https://doi.org/10.1080/030549899103928
Gray, J., & Wilcox, B. (1995). In the aftermath of inspection: the nature and fate of inspection
     report recommendations. Research Papers in Education, 10(1), 1–18.
     https://doi.org/10.1080/0267152950100102
Greene, J. C., Caracelli, V. J., & Graham, W. F. (1989). Toward a conceptual framework for
     mixed-method evaluation designs. Educational Evaluation and Policy Analysis, 11(13),
     255–274.
Grimmer, J., & Stewart, B. M. (2013). Text as Data: The Promise and Pitfalls of Automatic
     Content Analysis Methods for Political Texts. Political Analysis, 21(3), 267–297.
     https://doi.org/10.1093/pan/mps028
Grimolizzi-Jensen, C. J. (2018). Organizational change: Effect of motivational interviewing on
     readiness to change. Journal of Change Management, 18(1), 54–69.
     https://doi.org/10.1080/14697017.2017.1349162
Gustafsson, J.-E., Ehren, M., Conyngham, G., McNamara, G., Altrichter, H., & O’Hara, J.
     (2015). From inspection to quality: Ways in which school inspection influences change in
     schools. Studies in Educational Evaluation, 47, 47–57.
     https://doi.org/10.1016/j.stueduc.2015.07.002
Halverson, R., Kelley, C., & Kimball, S. (2004). Implementing teacher evaluation systems: How
     principals make sense of complex artifacts to shape local instructional practice. Educational
     Administration, Policy, and Reform: Research and Measurement, 3, 153–188.
Hanushek, E. A., & Raymond, M. E. (2005). Does school accountability lead to improved
     student performance? Journal of Policy Analysis and Management, 24(2), 297–327.
     https://doi.org/10.1002/pam.20091
Herscovitch, L., & Meyer, J. P. (2002). Commitment to organizational change: Extension of a
                                                 104


      three-component model. Journal of Applied Psychology, 87(3), 474–487.
      https://doi.org/10.1037/0021-9010.87.3.474
Hill, H. (2001). Policy is not enough: language and the interpretation of State standards.
      American Educational Research Joumal, 38(2), 289–318.
      https://doi.org/10.3102/00028312038002289
Hines, R. T. (2017). An Exploration of the Effects of School Improvement Planning and
      Feedback Systems: School Performance in North Carolina.
Holt, D., Armenakis, A., Feild, H., & Harris, S. (2007). Readiness for Organizational Change.
      The Journal of Applied Behavioral Science, 43(2), 232–255.
      https://doi.org/10.1177/0021886306295295
Husfeldt, V. (2011). Wirkungen und wirksamkeit der externen schulevaluation; uberblick zum
      stand der forschung [The impact of school inspection - Does it really work? State of
      research]. Zeitschrift Für Erziehungswissenschaft, 14(2), 259–282.
      https://doi.org/10.1007/s11618-011-0204-5
Hussain, I. (2015). Subjective Performance Evaluation in the Public Sector Evidence from
      School Inspections. The Journal of Human Resources, 50(1), 189–221.
Jacob, B. (2005). Accountability, incentives and behavior: the impact of high-stakes testing in
      the Chicago Public Schools. Journal of Public Economics, 89(5–6), 761–796.
      https://doi.org/10.1016/j.jpubeco.2004.08.004
Jones, K., & Tymms, P. (2014). Ofsted’s role in promoting school improvement: the
      mechanisms of the school inspection system in England. Oxford Review of Education,
      40(3), 315–330.
Jones, K., Tymms, P., Kemethofer, D., O’Hara, J., McNamara, G., Huber, S., Myrberg, E.,
      Skedsmo, G., & Greger, D. (2017). The unintended consequences of school inspection: the
      prevalence of inspection side-effects in Austria, the Czech Republic, England, Ireland, the
      Netherlands, Sweden, and Switzerland. Oxford Review of Education, 43(6), 805–822.
      https://doi.org/10.1080/03054985.2017.1352499
Kaplan, S., & Orlikowski, W. J. (2013). Temporal Work in Strategy Making. Organization
      Science, 24(4), 965–995. https://doi.org/10.1287/orsc.1120.0792
Klein, A. (2016). School inspections offer a diagnostic look at quality. Education Week.
      https://www.edweek.org/ew/articles/2016/09/28/school-inspections-offer-a-diagnostic-look-
      at.html
Klerks, M. (2012). The effect of school inspections: a systematic review. http://janbri.nl/wp-
      content/uploads/2014/12/ORD-paper-2012-Review-Effect-School-Inspections-
      MKLERKS.pdf
Kluger, A. N., & DeNisi, A. (1996). The effects of feedback interventions on performance: A
                                                105


     historical review, a meta-analysis, and a preliminary feedback intervention theory.
     Psychological Bulletin, 119(2), 254–284. https://doi.org/10.1037/0033-2909.119.2.254
Koretz, D. (2008). Measuring up. Harvard University Press.
Krippendorff, K. (2013). Content analysis: An introduction to Its methodology (3rd ed.). SAGE
     Publications.
Ladd, H. F. (2016). Now is the time to experiment with inspections for school accountability.
     Brookings. https://www.brookings.edu/blog/brown-center-chalkboard/2016/05/26/now-is-
     the-time-to-experiment-with-inspections-for-school-accountability/
Ladd, H. F. (2017). NCLB: RESPONSE TO JACOB. Journal of Policy Analysis and
     Management, 36(2), 477–480. https://doi.org/10.1002/pam.21979
Ladd, H. F., & Figlio, D. (2008). School accountability and student achievement. In Handbook of
     research in education finance and policy (pp. 166–182).
Lee, J., & Fitz, J. (1997). HMI and OFSTED: evolution or revolution in school inspection.
     British Journal of Educational Studies, 45(1), 39–52. https://doi.org/10.1111/1467-
     8527.00035
Lewin, A. Y., & Minton, J. W. (1986). Determining Organizational Effectiveness: Another
     Look, and an Agenda for Research. Management Science, 32(5), 514–538.
     https://doi.org/10.1287/mnsc.32.5.514
Lindgren, J. (2015). The front and back stages of swedish school inspection: opening the black
     box of judgment. Scandinavian Journal of Educational Research`, 59(1), 58–76.
     https://doi.org/10.1080/00313831.2013.838803
Luginbuhl, R., Webbink, D., & de Wolf, I. (2009). Do Inspections Improve Primary School
     Performance? Analysis, 31(3), 221–237. https://doi.org/10.3102/0162373709338315
Maitlis, S. (2005). The Social Processes of Organizational Sensemaking. The Academy of
     Management Journal, 48(1), 21–49. https://doi.org/10.2307/20159639
Maitlis, S., & Christianson, M. (2014). Sensemaking in organizations: Taking stocks and moving
     forward. The Academy of Management Annals, 8(1), 57–125.
     https://doi.org/10.1080/19416520.2014.873177
March, J. G., & Olsen, J. P. (2011). The Logic of Appropriateness. In R. E. Goodin (Ed.), The
     Oxford Handbook of Political Science (pp. 1–22). Oxford University Press.
     https://doi.org/10.1093/oxfordhb/9780199604456.013.0024
Mathis, W., & Trujillo, T. (2016). Lessons from NCLB for the Every Student Succeeds Act.
     http://nepc.colorado.edu/%0Apublication/lessons-from-NCLB
Matthews, P., & Sammons, P. (2004). Improvement through inspection. An evaluation of the
                                                106


     impact of Ofsted’s work. Ofsted.
Matthews, Peter, Holmes, J. R., Vickers, P., & Corporaal, B. (1998). Aspects of the reliability
     and validity of school inspection judgements of teaching quality. Educational Research and
     Evaluation, 4(2), 167–188. https://doi.org/10.1076/edre.4.2.167.6959
McDonnell, L. (2008). The politics of educational accountability: Can the clock be turned back?
     In K. E. Ryan & L. A. Shepard (Eds.), The future of test-based educational accountability.
     Routledge.
McDonnell, L. (2013). Educational accountability and policy feedback. Educational Policy,
     27(2), 170–189. https://doi.org/10.1177/0895904812465119
Meyers, C. V., & VanGronigen, B. A. (2019). A lack of authentic school improvement plan
     development. Journal of Educational Administration, 57(3), 261–278.
     https://doi.org/10.1108/JEA-09-2018-0154
Miles, M. B., Huberman, A. M., & Saldaña, J. (2014). Qualitative data analysis: A methods
     sourcebook (3th ed.). SAGE Publications.
Millett, A., & Johnson, D. C. (1998). Expertise or “baggage”? What helps inspectors to inspect
     primary mathematics? British Educational Research Journal, 24(5), 503–518.
     https://doi.org/10.1080/0141192980240502
Mintrop, H., MacLellan, A. M., & Quintero, M. F. (2001). School improvement plans in schools
     on probation: A comparative content analysis across three accountability systems.
     Educational Administration Quarterly, 37(2), 197–218.
     https://doi.org/10.1177/00131610121969299
Morse, J. (2010). Procedures and practice of mixed method design - Maintaining control, rigor,
     and complexity. In A. M. Tashakkori & C. B. Teddlie (Eds.), Handbook of mixed methods
     in social & behavioral research (pp. 339–352). SAGE Publications.
Neuendorf, K. A. (2017). The content analysis guidebook. SAGE Publications, Inc.
     https://doi.org/10.4135/9781071802878
Nusche, D., Braun, H., Halász, G., & Santiago, P. (2014). OECD Reviews of Evaluation and
     Assessment in Education: Netherlands 2014. OECD.
     https://doi.org/10.1787/9789264211940-en
OECD, O. for E. C. and D. (2015). Education at a glance 2015 - OECD Indicators.
     https://doi.org/10.1787/19991487
Ouston, J., Fidler, B., & Earley, P. (1997). What do schools so after OFSTED school
     inspections-or before? School Leadership & Management, 17(1), 95–104.
     https://doi.org/10.1080/13632439770195
Penninckx, M., & Vanhoof, J. (2015). Insights gained by schools and emotional consequences of
                                               107


     school inspections. A review of evidence. School Leadership & Management, 35(5), 477–
     501. https://doi.org/10.1080/13632434.2015.1107036
Penninckx, M., Vanhoof, J., De Maeyer, S., & Van Petegem, P. (2014). Exploring and
     explaining the effects of being inspected. Educational Studies, 40(4), 456–472.
     https://doi.org/10.1080/03055698.2014.930343
Penninckx, M., Vanhoof, J., De Maeyer, S., & Van Petegem, P. (2015). Effects and side effects
     of Flemish school inspection. Educational Management Administration & Leadership.
     https://doi.org/10.1177/1741143215570305
Perryman, J. (2007). Inspection and emotion. Cambridge Journal of Education, 37(2), 173–190.
     https://doi.org/10.1080/03057640701372418
Perryman, J. (2009). Inspection and the fabrication of professional and performative processes.
     Journal of Education Policy, 24(5), 611–631.
Phillips, D., & Schweisfurth, M. (2014). Comparative and international education - An
     introduction to theory, methods , and practice (2nd ed.). Group, Continuum International
     Publishing.
Piderit, S. K. (2000). Rethinking resistance and recognizing ambivalence: A multidimensional
     view of attitudes toward an organizational change. The Academy of Management Review,
     25(4), 783. https://doi.org/10.2307/259206
Pond, S., Armenakis, A., & Green, S. (1984). The Importance of Employee Expectations in
     Organizational Diagnosis. The Journal of Applied Behavioral Science, 20(2), 167–180.
     https://doi.org/10.1177/002188638402000207
Porac, J. F., Thomas, H., & Baden-Fuller, C. (1989). COMPETITIVE GROUPS AS
     COGNITIVE COMMUNITIES: THE CASE OF SCOTTISH KNITWEAR
     MANUFACTURERS. Journal of Management Studies, 26(4), 397–416.
     https://doi.org/10.1111/j.1467-6486.1989.tb00736.x
Portz, J., & Beauchamp, N. (2020). Educational Accountability and State ESSA Plans.
     Educational Policy, 089590482091736. https://doi.org/10.1177/0895904820917364
Ravitch, D. (2016). The death and life of the great American school system: How testing and
     choice are undermining education. Basic Books.
Redding, C., & Searby, L. (2020). The Map Is Not the Territory: Considering the Role of School
     Improvement Plans in Turnaround Schools. Journal of Cases in Educational Leadership,
     23(3), 63–75. https://doi.org/10.1177/1555458920938854
Riffe, D., Lacy, S., & Fico, F. (2014). Analyzing media messages: Using quantitative content
     analysis in research. Routledge.
Rigby, J. G. (2015). Principals’ sensemaking and enactment of teacher evaluation. Journal of
                                                108


     Educational Administration, 53(3), 374–392. https://doi.org/10.1108/JEA-04-2014-0051
Rosenthal, L. (2004). Do school inspections improve school quality? Ofsted inspections and
     school examination results in the UK. Economics of Education Review, 23, 143–151.
Rothstein, R., Jacobsen, R., & Wilder, T. (2008). Grading education: Getting accountability
     right. Economic Policy Institute and Teacher College Press.
Rouleau, L. (2005). Micro‐practices of strategic sensemaking and sensegiving: How middle
     managers interpret and sell change every day. Journal of Management Studies, 42(7), 1413–
     1441.
Rutz, S., Mathew, D., Robben, P., & Bont, A. (2017). Enhancing responsiveness and
     consistency: Comparing the collective use of discretion and discretionary room at
     inspectorates in England and the Netherlands. Regulation & Governance, 11(1), 81–94.
     https://doi.org/10.1111/rego.12101
Ryan, K., Gandha, T., & Ahn, J. (2013). School self-evaluation and inspection for improving
     U.S. schools? In National Education Policy Center.
     http://nepc.colorado.edu/publication/school-self-evaluation
Sandberg, J., & Tsoukas, H. (2015). Making sense of the sensemaking perspective: Its
     constituents, limitations, and opportunities for further development. Journal of
     Organizational Behavior, 36(S1), S6–S32. https://doi.org/10.1002/job.1937
Scheerens, J., Ehren, M., Sleegers, P., & Leeuw, R. de. (2012). OECD Review on Evaluation and
     Assessment Frameworks for Improving School Outcomes.
Shaw, I., Newton, D. P., Aitkin, M., & Darnell, R. (2003). Do OFSTED Inspections of
     Secondary Schools Make a Difference to GCSE Results? British Educational Research
     Journal, 29(1), 63–75.
Spillane, J. P. (1999). External reform initiatives and teachers’ efforts to reconstruct their
     practice: The mediating role of teachers’ zones of enactment. Journal of Curriculum
     Studies, 31(2), 1–33. https://doi.org/10.1080/002202799183205
Spillane, J. P., Parise, L. M., & Sherer, J. Z. (2011). Organizational Routines as Coupling
     Mechanisms. American Educational Research Journal, 48(3), 586–619.
     https://doi.org/10.3102/0002831210385102
Spillane, J. P., Reiser, B. J., & Gomez, L. M. (2006). Policy Implementation and Cognition The
     Role of Human, Social, and Distributed Cognition in Framing Policy Implementation. In M.
     I. Honig (Ed.), New directions in education policy implementation (pp. 47–64). State
     University of New York Press, Albany.
Spillane, J. P., Reiser, B. J., & Reimer, T. (2002). Policy implementation and cognition:
     Reframing and refocusing implementation research. Review of Educational Research, 72(3),
     387–341. https://doi.org/10.3102/00346543072003387
                                                  109


Stiglitz, J. (2000). Economics of the public sector (3rd ed.). Norton.
Strunk, K. O., Marsh, J. A., Bush-Mecenas, S., & Duque, M. R. (2016). The Best Laid Plans.
     Educational Administration Quarterly, 52(2), 259–309.
     https://doi.org/10.1177/0013161X15616864
Teddlie, C., & Tashakkori, A. (2009). Foundations of mixed methods research: Integrating
     qualitative and quantitative approaches in the social and behavioral sciences. SAGE.
Teddlie, C., & Yu, F. (2007). Mixed methods sampling : A typology with examples. Journal of
     Mixed Methods Research, 1(1), 77–100. https://doi.org/10.1177/1558689806292430
UNESCO. (2017). Global Education Monitoring Report - Accountability in education: Meeting
     our commitments.
van Bruggen, J. C. (2010). Inspectorates of Education in Europe; some comparative remarks
     about their tasks and work.
van der Sluis, M. E., Reezigt, G. J., & Borghans, L. (2017). Implementing New Public
     Management in Educational Policy. Educational Policy, 31(3), 303–329.
Vavrus, F. K., & Bartlett, L. (2016). Rethinking case study research: A comparative approach
     (1st ed.). Routledge.
Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Using school performance
     feedback: perceptions of primary school principals. School Effectiveness and School
     Improvement, 21(2), 167–188. https://doi.org/10.1080/09243450903396005
Visscher, A. J., & Coe, R. (2003). School performance feedback Systems: conceptualisation,
     analysis, and reflection. School Effectiveness and School Improvement, 14(3), 321–349.
     https://doi.org/10.1076/sesi.14.3.321.15842
Weick, K. E. (1995). Sensemaking in organizations. SAGE Publications.
Weick, K. E., Sutcliffe, K. M., & Obstfeld, D. (2005). Organizing and the process of
     sensemaking. Organization Science, 16(4), 409–421.
     https://doi.org/10.1287/orsc.1050.0133
Weiner, B. J. (2009). A theory of organizational readiness for change. Implementation Science,
     4(1), 67. https://doi.org/10.1186/1748-5908-4-67
Woods, P., & Jeffrey, B. (1998). Choosing Positions: Living the Contradictions of OFSTED.
     British Journal of Sociology of Education, 19(4), 547–570.
     https://doi.org/10.1080/0142569980190406
                                                 110


Paper 3: The Effect of Inspection on School Improvement Planning: Evidence from a U.S.
District
                                                Introduction
          Prioritization of educational reform areas is an issue of national concern. Districts in the
United States overwhelmingly rely on test-based accountability to promote school improvement
(Figlio & Loeb, 2011; Hanushek & Raymond, 2005). Schools are incentivized to raise student
achievement on standardized tests (e.g. Ladd & Figlio, 2008), yet test scores alone offer limited
insight into specific reforms that might benefit a given school (e.g. Gagnon & Schneider, 2019).
          An alternative approach to accountability is school inspection, which is widely used outside
of the United States. Such an approach, using on-site evaluations, allows for a deeper assessment of
schools. In this way, inspection feedback can guide planning and implementation of reforms (Ehren
et al., 2013; Jones & Tymms, 2014). Gains in standardized test scores have been used to evaluate the
effectiveness of school inspection (e.g. Allen & Burgess, 2012; Ehren & Shackleton, 2016; Hussain,
2015; Luginbuhl et al., 2009). Yet this offers limited insight regarding the influence of inspection on
school planning and reform. In contrast, effectiveness can be evaluated focusing on whether
inspection feedback leads to school reforms. This approach considers if comprehensive on-site
evaluations can inform principals’ actions. Further, it enables to see what areas of inspection are the
most influential in promoting school reforms.
          A crucial step prior to implementation of reforms is the school planning process (Matthews &
Sammons, 2004). At this stage, school leaders prioritize areas for improvement and set strategic
goals. Despite the wide use of inspection globally, no empirical evidence exists regarding its effect
on school planning documents.
        Prior studies have investigated the influence of school inspection on reform implementation
European countries (e.g. Altrichter & Kemethofer, 2015; Dedering & Müller, 2011; Ehren &
                                                    111


Visscher, 2008; Gray & Wilcox, 1995; Ouston et al., 1997). This research is limited to post-
inspection data that is self-reported. Interviews and surveys capture changes in school reforms based
on inspection feedback. These studies are unable to capture causal effects of inspection since
observations prior to inspection are not included. No prior research utilizes longitudinal data to
evaluate the effect of inspection on school reform or improvement planning (Dedering & Müller,
2011). To address this gap, this study examines whether inspection feedback shapes school planning.
Overall this study asks: Does school inspection feedback lead to changes in school planning? More
specifically, my research addresses the following research questions:
                 §    Do principals find school inspection useful for planning purposes?
                 §    What areas covered by inspections influence principals to plan and/or implement
                      changes in schools?
                 §    What is the causal impact of inspections on the presence of key influential areas
                      on School Improvement Plans (SIPs)?
        To examine these questions, this study is based in one of the few U.S. districts to experiment
with school inspection. This urban school district has conducted inspections for nearly a decade. A
mixed method approach is used. In-depth interviews with school principals examine whether and
how inspection has been useful for planning purposes and identify which inspection topics were
more influential to inform school reforms. Then, a content analysis is used to the establish the
incidence of these topics on school improvement plans. Finally, a difference-in-differences approach
is used to determine how inspection shapes the emphasis on these topics in school improvement
plans. This is the first study to measure the causal effect of school inspection on school planning. In
addition, this study provides empirical evidence in the U.S. of the potential of school inspection to
inform school planning reforms beyond standardized tests.
                                                    112


                                          Literature Review
School Change based on Inspection Feedback
         Few empirical studies address the influence of inspection on school planning. Prior work
indicates that schools tend to implement reforms after inspection (Cuckle et al., 1998; Dedering &
Müller, 2011; Ehren & Visscher, 2008; J. Gray & Wilcox, 1995; Ouston et al., 1997; Verhaeghe et
al., 2010). However, results are not consistent across studies. Considerable variation exists regarding
the extent to which inspection recommendaitons lead to implementation of improvements (de Wolf
& Janssens, 2007). For example, a study in Germany found that inspection led to an increase in
reforms in a majority of schools (Dedering & Müller, 2011). Yet, a U.K. study found that only a
small portion of inspection recommendations were implemented (J. Gray & Wilcox, 1995).
         Prior research has found that how feedback is delivered can influence whether improvements
are implemented (e.g. Ehren & Visscher, 2008; Gustafsson et al., 2015; Matthews & Sammons,
2004; Ouston et al., 1997; Penninckx et al., 2015) . Greater implementation occurs when feedback is
clear and explicit (Matthews & Sammons, 2004), shows school weaknesses (Ouston et al., 1997;
Penninckx et al., 2015), and when shared goals are established between schools and inspectors
(Ehren & Visscher, 2008; Ouston et al., 1997).
         Reforms are also influenced by accountability pressure. There is evidence that principals
who feel greater accountability pressure tend to be more attentive to inspectors’ expectations and
more responsive in terms of improvement actions (Altrichter & Kemethofer, 2015). Yet, pressure
that is viewed as ill-intentioned might be detrimental to reforms. Implementation of reforms is less
likely if inspection uses coercive methods (Gustafsson et al., 2015) or if the process is perceived as
threatening (Visscher & Coe, 2003). Similarly, it was found that differentiated inspection models—
where low performance schools have more intensive inspection—tend to be more effective to
enabling reforms (Ehren, Gustafsson, et al., 2015).
                                                  113


         A separate body of literature analyzes the causal effect of inspection on student achievement.
This literature is also thin and far from conclusive. Most of these studies have found small positive
effects on student achievement (e.g. Allen & Burgess, 2012; Ehren & Shackleton, 2016; Hussain,
2015; Klerks, 2012; Luginbuhl et al., 2009; Shaw, Newton, Aitkin, & Darnell, 2003). Yet, others
have found no significant impact (e.g. Rosenthal, 2004).
         The two bodies of literature described present a contrast. First, the existing literature on
school responses to inspection, are based only on post-inspection observations and therefore do not
estimate a causal effect of school inspection on developmental actions (de Wolf & Janssens, 2007;
Dedering & Müller, 2011). Most of these studies focus on acceptance of inspection feedback and do
not evaluate reforms within specific areas (Altrichter & Kemethofer, 2015; Ehren & Visscher, 2008;
J. Gray & Wilcox, 1995; Ouston et al., 1997). Second, the literature on the causal effect of school
inspection on student achievement does not address what aspects of inspection are responsible for
gains. This study fills a gap and evaluates the causal effect of school inspection feedback on school
improvement planning, identifying the effect on influential areas.
The Uses of School Improvement Plans
         SIPs are strategic management instruments used as road map for school improvement.
Although SIP content varies across districts, considerable similarities have been found (Mintrop et
al., 2001). Common content areas in SIPs include: 1) establishing a vision, 2) assessing needs, 3)
setting strategic goals and actions, and 4) using measurable performance metrics to evaluate past
performance and monitor progress (Fernandez, 2011; Redding & Searby, 2020; Strunk et al., 2016).
A key function of SIPs is establishing school priorities. In the management literature, this has been
associated with better organizational performance (Chun & Rainey, 2005; Hines, 2017). Regarding
the effectiveness of planning documents in education, the empirical literature is very thin. Fernandez
(2011) found a strong association between the quality of school planning and student performance in
reading and math.
                                                   114


         The SIPs have been a central instrument for high-stakes accountability systems in the United
States (Mintrop et al., 2001). The New Public Management reform since 1980s promoted the use of
strategic planning in public agencies. This management technique used by successful corporations
was viewed as a practice that would enable rational planning and greater efficiency in the public
sector (Berry & Wechsler, 1995). Improvement plans emphasize performance quantification and
promote accountability. This aligned well with the emphasis of test-based accountability on data
driven decision making and a focus on outcomes (Fernandez, 2011; Redding & Searby, 2020).
         SIPs have played a central role in U.S. education reforms (Armstrong, 1982; Doud, 1995;
Fernandez, 2011; Mintrop et al., 2001; Strunk et al., 2016). Federal and state mandates have widened
their use. At the national level, the Elementary and Secondary Education Act (1965), No Child Left
Behind (2001), and Every Student Succeeds Act (2015) have successively advanced requirements of
submitting SIPs to state education agencies in order for low performing schools to access federal
funds (Meyers & VanGronigen, 2019; Mintrop et al., 2001). In addition, states agencies often
provide rules, guidelines, and templates for developing these plans.
         SIPs have been used as a management tool to align the goals of accountability systems and
individual schools. This has resulted in widespread internalization of state goals into the operations
of individual schools. For example, a content analysis of SIPs from low-performing schools in three
states found relatively uniform goals and activities, as schools adopted goals mandated by state
agencies (Mintrop et al., 2001). While filing SIPs is mandatory, this does not necessarily mean that
they will be a key planning tool for schools (Cuckle et al., 1998; Meyers & VanGronigen, 2019). One
study found that the use of inflexible bureaucratic practices resulted in 80% of principals adopted
“satisficing behavior” (i.e. “good enough” practices) for SIPs; this included resubmit previous
years’ plans or solely focusing goals on test scores (Meyers & VanGronigen, 2019).
                                                   115


                                          District Background
         This research focuses on a large urban school district in the United States that has used
inspection as a supplemental mechanism for school accountability and improvement during the last
10 years. Like other school districts in the United States, my case study relies primarily on high-stake
testing for accountability purposes. The main accountability instrument is the Performance
Framework, which rate schools based on a variety of performance indicators. Yet, standardized test
results are the most influential rating component. These ratings guide incentives, sanctions, and
support actions for schools.
School Inspections
         The district has used school inspection since 2012. Inspections are focused on low
performing schools, based on the Performance Framework. However, the district has discretion in
selecting schools for inspection. This study focuses on inspections conducted in the school years
2016-17 and 2017-18. These are the two years with the most school inspections. The procedures used
for inspection remained the same during these school years; the process was changed for visits in
2018-19 and later.
         School inspections are facilitated by a contractor. Teams of 3 to 4 inspectors make school
visits; each team includes two contracting staff and at least one representative from the district
department of education. A protocol guides the process. Inspections entail a two-day, on-site
evaluation of school quality. During this visit, inspector review school documents, observe
classrooms, and conduct interviews and focus groups with administrators, teachers, parents and
students.
         The scope of topics covered by the inspection is broad, yet, it has an instructional lens. Areas
covered by the inspection include classroom instruction, support to students, professional
development, school climate and culture, leadership, and relationship with the families and
community. The process is highly structured. Inspectors use rubrics to evaluate classroom
                                                   116


observations and questionnaires to conduct interviews and focus groups. At the completion of the
visit, the inspection team meets with school administrators to provide feedback on findings and
discuss improvement strategies. After the inspection, a written report summarizing the findings is
sent to the school.
         The final report includes an evaluation of each domain (e.g. “Instruction”), a rating of each
quality criteria (e.g. for the criteria “Classroom instruction is intentional, engaging, and challenging
for all students,” the school “does not meet,” “partially meets,” “meets,” or “exceeds” expectations)
and an evaluation of the topics within each quality criteria (e.g. “Instruction does not require students
to use and develop higher order thinking skills”). The report highlights the school’s “strengths” and
“areas of growth.” Finally, based on the discussion among the inspection team and school leadership
team, the document reports areas to prioritize, goals set, and measures to evaluate success. There are
not follow up instance after inspections.
School Improvement Plans
         The district follows the state requirement for all schools to submit a SIP every year. High
performing schools can request to only submit plans every two years. The SIPs build on the strategic
planning of the schools, looking for a consistent format to capture planning efforts and aligning to
state and federal requirements for multiple programs and grants. The State provides a template as
well as detailed guidelines and assistance for developing the plans. In addition, the SIP intentionally
looks to provide enough flexibility so that the planning process is meaningful for the schools.
         A major part of the SIP is a narrative section, which is unstructured. Schools describe their
plan in detail, giving the opportunity to build a coherent case that summarizes the overall plan. This
narrative typically includes a description of the school, mission and vision, climate and culture,
instructional models, family and community engagement, leadership and staff, diagnosis of
challenges and performance problems, current activities, programs, and partners, past support and
                                                    117


grants, a deep analysis of prioritization of areas of focus, strategies for improvement, and future
plans.
         All other SIP sections are highly structured and rely on performance indicators. These
sections include trend analyses, performance challenges, root causes of performance challenges,
prioritized areas, improvement strategies, action plans, and monitoring the impact and progress of the
action plan.
         This study focuses on the narrative section of the SIPs. Analyzing this section, as opposed to
the whole plan, enables me to focus on the main message school leaders decide to highlight.
Furthermore, it avoids the repetition of topics across sections. Its unstructured format facilitates
content analysis, which is used to capture the message of this type of texts and avoid focusing on the
topics set by the plans’ templates.
                                             Research Design
         This study uses mixed methods with a quantitatively driven design (Morse, 2010). The
research design is sequential (Greene et al., 1989; Teddlie & Tashakkori, 2009) and comprises three
stages (Figure 2). The first stage consists of in-depth interviews with principals of inspected schools,
looking to identify their perceptions regarding usefulness of inspection and influential areas (i.e.
areas of reform that were planned or implemented based on feedback). Based on interview responses,
most influential areas are identified. The second stage uses content analysis to measure the presence
of these topics in the SIPs. Quantitative content analysis involved coding text into categories and
counting the frequencies of occurrences within each category (Ahuvia, 2001), which are used as a
proxy of topic importance. This is an intermediary step for the following stage. Lastly, a difference-
in-differences analysis tests for evidence of a causal effect of inspections on SIP improvement areas.
I analyze if there is a change in focus within SIPs of inspected schools, compared to schools not
inspected.
                                                    118


Figure 2. Research Design
          Qualitative Analysis
                                                  STAGE I
                                                                                  Usefulness
                                          Interviews with principals
                                                                              inspection feedback
                                   Outcome: influential areas of inspection
          Quantitative Analysis
                                                  STAGE II
                                       Quantitative Content Analysis
                                     Outcome: Word frequencies in SIPs
                                                 STAGE III
                                      Difference-in-Difference Analysis
                                 Outcome: Causal impact of inspection on SIPs
Stage I. Interviews with School Principals
        In-depth semi-structured interviews with principals from schools that were inspected in
school years 2016-17 to 2018-19 were conducted. The goal of the interviews is to evaluate how
inspection feedback was useful and what areas covered by inspection were more influential leading
to changes in the schools. A total of 55 schools were inspected at least one time during this period
(44 schools were inspected one time, 10 schools twice, and 1 school three times). All 55 principals
were invited to participate in the interviews; 16 were interviewed. Participants were informed that
interview responses would be anonymous and would be attributable to a pseudonym. They received
US$ 25 gift card after participation.
        Interview questions inquired about the perceived usefulness of inspection feedback, main
ongoing or recent changes implemented in schools, what motivated the changes, and what changes
were based on inspection feedback. To assess responses, a code book was developed inductively (see
codebook in Appendix A). Codes of usefulness inquire on the ways in which the inspection were
                                                           119


useful for planning purposes, such as narrowing the improvement focus, reaffirming existing goals,
or gaining legitimacy. Codes for changes implemented inquire on the different sources that led to
changes in the schools, such as “principal initiative,” “data analysis,” or “inspection feedback.”
         To ensure reliability, an independent-coder approach was used. First, interview transcripts
were independently coded by two researchers. Then, codes were compared for agreement. An
iterative process was followed until reaching at least 80% agreement, based on a pooled Kohen
Kappa indicator (De Vries et al., 2008). We achieved an 83% of agreement. Next, I used individual
codes to search for themes and patterns both within individual interviews and across interviews. I
identified the presence and absence of codes based on frequency charts and cross-tabulations.
Influential areas based on inspection feedback
         Changes implemented or planned based on “Inspection feedback” are coded based on areas
of change. These areas stem from analyzing the inspection protocols. Eight areas of inspection were
identified: 1) Community Involvement, 2) Climate & Culture conducive to Learning, 3) Instructional
Practices, 4) Leadership, 5) Professional Development, 6) Support to Students, 7) Teachers-
Administrators Collaboration, and 8) Other Organizational Issues.
         Interviews reveal that most principals (88%) implemented or planned changes based on
inspection feedback. About 80% of principals focused on changes related to Instructional Practices
and/or Climate & Culture conducive to Learning (from now on “Climate & Culture”). In most cases,
several areas of change are addressed simultaneously. About 70% of principals who mention reforms
in these two areas also implemented changes in other areas.
         Instructional Practices is the most commonly mentioned reform area that principals address
as a result of inspection feedback (see Table 6). This is not surprising, since instruction is at the
center of the inspection process. Instructional topics mentioned repeatedly by principals include
promoting higher order thinking, depth of questioning in classroom, setting clear expectations, and
improving formative assessments. This improvement area is closely followed by Climate & Culture,
                                                   120


which half of principals address in implemented and planned reforms. The scope of topics was wide,
including “how the school culture impact student learning,” attitudes and expectations toward
students, behavioral interventions, and social emotional learning.
Table 6. Influential Areas - Changes Implemented in Schools based on Inspection Feedback
                                                  Number of
 Area                                                            Percentage
                                                  Principals
 Instructional Practices                              9             56%
 Climate & Culture conducive to Learning              8             50%
 Other Organizational Issues                          5             31%
 Leadership                                           4             25%
 Professional Development                             3             19%
 Support to Students                                  3             19%
 Staff Collaboration                                  3             19%
 Community Involvement                                 -              -
        Neither of the other specific categories were mentioned by more than four principals. The
category Other Organizational Issues (31%) includes changes on “teacher schedules”, “building a
strong vision” or “data meaning.” Some principals mentioned changes that appear to be too broad or
vague, such as “setting systems in place,” and “consistency in structures.”
        To capture the influence of school inspection and to limit overlap across categories, the
research focuses in the two areas were most principals implemented changes: Instructional Practices
and Climate & Culture. These identified areas were then used to develop categories for the content
analysis and the difference-in-difference analysis in the following stages.
Stage II. Content Analysis
        A dictionary-based, quantitative content analysis (Krippendorff, 2013; Riffe et al., 2014) of
SIPs identifies the presence of words associated with influential areas emphasized by school
principals. In order to evaluate the impact of inspections on school planning during the school years
                                                   121


2016-17 and 2017-18, the analysis includes 399 improvement plans from all 205 K-12 public schools
in the district.
         The areas of Instructional Practices and Climate & Culture are the categories of the content
analysis. To define the scope of these areas I analyze the inspection protocols, leading to the
following definition for inclusion for the two areas:
         i.  Instructional Practices. Focus on high quality instructional practices and interactions.
             Purposeful, intentional, and engaging teaching. Emphasis on rigor and higher order
             thinking skills. Group work and cooperation. Feedback to students and ongoing
             assessments. Alignment with the common core state standards.
         ii. Climate & Culture. General school culture and climate conducive to learning. High
             behavioral and academic expectations. Rewards for positive behavior and consequences
             for misbehavior. Clear expectations, respect for school norms. Consideration for the
             whole child and support for emotional learning. Supportive, collaborative, and caring
             interactions with the students.
The content analysis uses a semi-automatic dictionary building process (Brier & Hopp, 2011; Deng
et al., 2019; Grimmer & Stewart, 2013; Neuendorf, 2017). All phrases and words included in the
dictionary stem from the SIPs. The dictionary building process was conducted using WordStat 8
software, following these steps in an iterative process:
     1) Corpus creation: All narrative sections of the SIPs for school years 2016-17 and 2018-19
         conform the corpus of analysis.
     2) Initial word frequency list: A frequency list with all the words included in the documents is
         created. I identify 8,309 unique words.
     3) Pre-processing: Removes all “stop words” or function words that do not convey meaning,
         such as conjunctions and prepositions. I also exclude words and phrases that appear in less
         than 5% of the documents (i.e. less than 20 SIPs). This results in 1,201 words.
                                                   122


4) Initial phrase frequency list: A frequency list with all the phrases—at least 2 words
   together—included in the documents is created. I identify unique 398 phrases.
5) Entry identification and classification: Words and phrases are the basic unit for classification.
   A selection of all phrases and individual words was conducted manually, eliminating those
   that clearly do not belong to the category analysis. The process started with most frequent
   phrases which are usually more “context resistant” (Conway, 2006; Deng et al., 2019). This
   results 411 phrases and words.
6) Consolidation: Words included in the preliminary dictionary were further reduced through
   word stemming (i.e. the stem emotion* counts the words “emotions,” “emotional,” and
   “emotionally”). Alternative spelling and acronyms were added as substitute of dictionary
   words and phrases. This results in 362 phrases, words, word stems, and acronyms.
7) Contextual validation: A key-words-in context analysis was conducted. I assessed each term
   in context—reading the whole sentence in which the terms are used—to decide whether it
   belong or it does not belong to the category of analysis. If less than 50% does not belong, I
   exclude the word from the dictionary. If 50% to 80% belongs to the category of analysis,
   further analysis was conducted. This included checking other word forms and co-occurrence
   with other words to attain more precision. This refining process was conducted until over
   80% of key-words-in context rendered true positive results; at this point, the word is
   considered to belong to the main categories of analysis (Bengston & Xu, 1995; Deng et al.,
   2019).
   This step led to directly including of 52 terms (>80% of true positive), directly excluding of
   246 terms (<50% of true positives), and further considering of 64 terms.
   Partial result after iterations: 104 terms.
8) Extensions: Extension of the dictionary considering word co-occurrence, synonyms and
   antonyms of pre-selected words, to include words that might have been overlook or excluded
                                               123


        due to low frequency. A misspelling identification analysis was conducted to detect false
        negatives. Then, validation of key-words-in context, described above, was repeated.
        Final results: 119 terms.
An 80% cut-off criteria used to determine word inclusion or exclusion in the dictionary is based on
precedent from prior literature (Bengston & Xu, 1995). Any threshold risks dropping potentially
relevant entries (Deng et al., 2019). In addition, semi-automatic dictionary building process will
inevitably include some categorization errors. To assess the overall validity of the dictionary, I
compare software and human coding for a random sample of 10% of the SIP documents. I find a
Scott Pi of 79%, indicating a good level of agreement. The outcome from this stage is a frequency of
words within the categories of analysis—most influential areas—for each SIP.
Word Frequencies in SIPs - Most Influential Areas
        The dictionary includes 119 terms—including phrases, words, word-stems, and acronyms: 62
terms for Instructional practices and 57 terms for Culture & Climate (See the dictionary in Appendix
B). For the classification of Instructional Practices, the categorization focuses on transversal issues
across subject. In contrast, it omits subject specific issues— such as “literacy”, “phonics”, or
“manipulatives”–; it also omits references to standardized tests, performance framework, and specific
support and curricular programs. For example, the dictionary includes terms such as “checking for
understanding,” “grouping,” and “formative”; but excludes “SAT,” “school performance”, or the
program “Amplify Science.” For the classification of Climate & Culture terms, the dictionary
excludes professional staff aiming to provide support to students—such as “psychologist” or “social
worker”—.
        The dictionary was designed to make the main categories mutually exclusive (Riffe et al.,
2014). Terms that overlap in the two main categories, such as “student engagement,” were classified
                                                    124


as Climate & Culture. Since the whole inspection process has an instructional lens, this decision was
made to increase the chance of capturing climate and cultural issues.
         The narrative section of the SIPs provides flexibility to principals to address a wide range of
issues. To provide a sense of the scope of topics, clouds of the most frequent terms covered in SIPs
for the school years 2016-17 and 2018-19 were created (See Appendix C). Some of the most frequent
topics include leadership, English language learners (i.e. “English language”), professional
development, demographics (i.e. “reduced lunch”), and performance (i.e. “student achievement”.)
         Table 7 present the content analysis coverage and the frequency of the main categories.
Overall, the dictionary words cover close 2% of the total number of words of the total words for both
years (See Table 7). Nonetheless, these topics cover 12% of the total number of sentences in 2016-
17 and 19% for 2018-19. The word frequency on the Instructional Practices category remained
stable over the two years. In contrast, the frequency of the Climate and Culture category grew more
than doubled. The frequency list of key terms for each SIP is the input for the statistical analysis.
Table 7. Content Analysis Coverage and Term Frequency
                                    School Year
                              2016-17       2018-19
 Dictionary Coverage
 % words                           2.0%          2.1%
 % of sentences                   12.0%         18.7%
 Category Frequency
 Instructional Practices             612           649
 Climate and Culture                 600          1278
 N                                   196           203
Note: % of SIP words excludes “stop words.”
Stage III - Statistical Analysis
         I use a difference-in-differences analysis to evaluate the causal impact of inspection on the
presence of key influential areas on the SIPs. The outcome measure stems from the content analysis
                                                    125


in the areas of 1) Instruction and 2) Culture & Climate, as a proxy of attention. The “word count” is
the sum of both categories.
           The analysis focuses on the impact of inspections conducted in school years 2016-17 and
2017-18. In a given year, the SIPs that guide school planning were prepared at the end of the
previous school year. For example, the SIPs the guide schools in year 2016-17 were prepared in late
spring or summer of 2016. SIPs that reflect the pre-intervention period in my study are those that
guide schools in 2016-17; these are prepared before the inspections occur. Post-intervention SIPs are
those from 2018-19; these were written after inspections.
           I include schools that have SIPs available for both the year reflecting pre-intervention plans
(2016-17) and post-intervention plans (2018-19). I exclude schools with inspections within two years
prior to the study period; this excludes 26 schools with inspections in 2014-15 and/or 2015-16. My
final sample comprises 160 public schools (79% of public schools in the district). My treatment
group has 31 schools with at least one inspection in 2016-17 or 2017-18. My comparison group has
129 schools that did not have inspections and serve as controls.
           Table 8 presents summary statistics at the school level for inspected and not inspected
schools in the periods before and after inspection.3 On average, in the post-treatment period inspected
schools have a significant higher word count on influential topics, compared to not inspected schools
(p<.001). Regarding school characteristics, on average, inspected schools have a higher proportion of
low-income students, as indicated by the portion of student receiving free and reduced-price lunch.
No other observed characteristic differs significantly between treatment groups. Student
achievement, measured by the state standardized tests is also considered. Tests scores are unavailable
for 31 schools in the sample. Average test scores in English Language Arts and Math are presented in
3
  School demographic and school characteristic data were obtained from the state Department of Education
website. Since the School District required not to be named, the name of the State is not disclosed either to avoid
facilitating the district identification.
                                                        126


Appendix D. Schools not inspected have higher test scores, on average, in both subject areas and in
both years (p<.001).4
Table 8. Summary Statistics
                                          School Year 2016-17                        School Year 2018-19
                                      Not Inspected         Inspected           Not Inspected         Inspected
                                           Mean               Mean                    Mean               Mean
  Outcome Variable
  Word count                                6.02               6.29                     7.6            13.45***
                                           (7.84)             (6.68)                  (7.02)             (5.37)
  School
  Characteristics
  Enrollment                              458.93              524.19                 471.81             495.90
                                         (310.31)            (274.32)               (328.16)           (248.36)
  FRL (%)                                   0.66              0.77**                   0.66             0.75**
                                           (0.28)             (0.20)                  (0.28)             (0.19)
  Black (%)                                 0.13               0.14                    0.13               0.14
                                           (0.12)             (0.16)                  (0.12)             (0.16)
  White (%)                                 0.24               0.19                    0.25               0.18
                                           (0.25)             (0.17)                  (0.25)             (0.17)
  Hispanic (%)                              0.56               0.59                    0.55               0.59
                                           (0.29)             (0.25)                  (0.29)             (0.25)
  n                                         129                 31                     129                 31
Notes: Standard deviations are in parentheses. FRL = free and reduced-price lunch.
**: significant difference at the 5% level; difference between inspected and not inspected schools in the indicated
school year, based on two-sample t-tests .
***: significant difference at the 1% level ; difference between inspected and not inspected schools in the indicated
school year, based on two-sample t-tests
         The parametric DD analysis aims to capture the causal impact of inspection on the presence
key topics in the SIPs, controlling for covariates and school fixed effects. The outcome variable
(word count) distribution shows a positive skew, more pronounced in 2016 (skewedness =3.2) than
2018 (skewedness =0.9). To address skewness, rather than use a canonical DD model, I considered
4
  Only 128 of the panel schools have test score data available.
                                                          127


three alternatives: a log-linear model, a Multilevel Mixed-effects Generalized Linear Model with a
negative binomial distribution, and a Multilevel Mixed-effects Generalized Linear Model with a
Poisson distribution. The log-linear model was chosen given its better fit, minimizing the error
dispersion.
         My DD model is specified as:
                    ln (Yit) =β0 + β1 Postt + β2 Inspi + β3 Inspi*Postt + ! Xit + " i+ #!"
where Y it represents the key word count within a SIP for school i in year t, Postt indicates the post-
inspection time period (school year 2018-19), Inspi indicate that the school had an inspection, X it is a
vector of school-level characteristics, " i are the school fixed effects, and # it is the random error term.
Inspi*Postt is an interaction term between inspected schools and an indicator for the post-inspection
period; ß3 is the DD treatment effect. Since this is a log-linear model, regression estimates are
calculated in log points. A transformation is needed to interpret results as percentual points. For
example, β3, the DD effect, has the following interpretation: on average, inspected schools have (eß3 -
1) *100 percentage points increase/decrease in key words, in comparison to non-inspected schools in
the post treatment period.
         The school fixed effects capture the effect of unobserved, time invariant factors that might
influence the outcome, such as principals’ ability to plan. School level characteristics include number
of students enrolled, percent of student receiving free and reduced-price lunch, race, and ethnicity (%
of white, black, and Hispanic students).
                                                   Results
 Interview Analysis - Perceived Usefulness of Inspections
         Most principals find that inspection is useful for planning purposes. Responses are grouped in
three non-mutually exclusive categories that indicate that inspection allowed principals to: 1)
prioritize the school focus of planned improvement (80% of principals), 2) confirm their prior
                                                     128


diagnosis or existing goals (75%), and 3) increase legitimacy among school staff and with the school
district to implement changes (50%). Those principals who highlight how inspection brought
legitimacy within the school to implement changes, explain that inspection facilitated a collective
process of planning. Only three principals out of the sixteen interviewed did not find inspections
useful. Overall, interview responses demonstrate that the inspection process supports planning, and
this planning goes beyond the areas identified in the district’s Performance Framework, which is
heavily focused on standardized test results.
         Improved Prioritization. The inspection feedback helped 80% of principals plan strategically,
through improved prioritization of the school focus. This included identifying new improvement
areas and narrowing the current focus. Many principals highlight the wide variety of issues covered
by inspection, allows for more comprehensive reforms.
         Several principals note that the inspection provided an opportunity to make changes that their
schools “needed.” Principal Tyler explains that the inspection not only uncovered areas that they
needed to work on, but also gave them the “the time and space to actually work” and “restructure the
strategic planning.” More concretely Principal Linda recounts that the inspectors consistently saw
disciplinary issues in the classroom and lack of systems to deal with it, which led “throughout the
school” to ask questions, such as what are they “doing wrong,” and what should they change. She
sees these questions as an opportunity to define their focus as well as action steps.
         Facilitating understanding of the problems and providing evidence were other aspects that
many principals highlight as useful. Principal Sebastian reflects that the inspection informed his
“understanding as a principal, of some deeper issues inside of the school.” Principal Thomas explains
how he selected a focus area for improvement after the inspection illuminated a specific challenge
and provided evidence:
         [After receiving the inspection feedback] we were able to work as a team and have things
         broken down in such a way that people felt like it was something that we could and needed to
                                                   129


         focus on. So I think that the whole process just made it really tangible for us to have clear
         things to focus on for us to then say, "Okay. Rigor is the one that we keep hearing and seeing,
         and so that's the one that we're going to take as our next step as a school to really look to
         move forward with."
Most principals used inspection feedback to prioritize and define their focus in improvement plans.
Principal Mark explains that there are many worthy areas he “could choose to tackle and [the
inspections] really kind of helped … hone in on two areas for my major improvement strategies for
my school improvement plan. They really were crafted around that.” In addition, many principals
provide examples of how the inspection was useful for advancing to implementation. For example,
Principal Mary explains that the feedback helped them “to start writing out some action steps based
on the highest leveraged area that the school could focus on.”
         Diagnosis Confirmation & Goal Reaffirmation. About 75% of principals state that the
inspection feedback was useful in confirming their diagnosis or reaffirming existing goals. Feedback
often confirmed what they already suspected and served to validate that they were “headed in the
right direction” (Principal Amy), “doing lots of really good things” (Principal Thomas), or “working
on the right things” (Principal Tyler). This confirmation was useful for Principal Nicholas in
justifying their current focus:
         Before the [(inspection)] year, we had included Social-Emotional Learning as a major
         improvement strategy for our improvement plan. [The inspection] … helped confirm that,
         that was a valid area of focus, to invest in. So, where we might've felt tempted to just do
         away with it, now there's just a lot more excitement to continue with it and to keep in the
         actual improvement plan that we submit to the state. The fact that it was called out so
         explicitly in the [inspection] was pretty surprising.
Inspection also confirmed what areas were problematic, as explained by, Principal David:
                                                     130


         … what the inspection did help do is provide more clarity and specifics around things that
         maybe I thought were gaps... And I think that provided opportunities for me to kind of get
         back the dots.
         Gained Legitimacy. Half of the principals found that the inspection brought legitimacy with
the staff and the school district the school to implement changes in the strategic planning. Inspections
accomplished this through three main pathways: 1) providing evidence to justify selected focus areas
for improvement planning; 2) incorporating the views of school staff.; 3) establishing a common
ground for planning among the school staff.
         Many principals agree in their view that the inspections legitimized their improvement areas
and strategies. In most cases, gained legitimacy seems to play a more relevant role within the school,
as illustrated by principals Sarah and David:
         Principal Sarah: we talked about instructional rigor, and that was one that we really latched
         onto… it was useful to show the data to our teachers, because our teachers felt that they were
         very rigorous in their instruction, and for us to go back, and say, "Here's this piece of
         [inspection], that's actually not the case." Because sometimes when you say something,
         people don't always believe it, but when you have the data behind it, it really hammers home
         in a different way. Then, for us to say, "Our school-wide focus, we're going to focus on
         rigorous instruction," that helped us out.
                  Principal David: …the biggest place where the inspection was useful was for me to
         be able to say no. We as a leadership team have seen this gap, and it's confirmed by this
         outside source. And it's confirmed by our data. This is something we need to address because
         clearly what we're doing is not working.
The inclusion of inputs from a variety of school stakeholders during the inspection visit, were used as
an additional source of legitimacy to promote changes in the schools. This is the case made by
principals Matthew:
                                                     131


         I knew that we were falling short on some of our work on observation feedback in terms of
         what I wanted. What I didn't understand was the teachers were also wanting it …And then…,
         you're honoring teacher voice, … so you're able to say … "Remember when you guys all
         shared with the folks that you wanted more observation feedback, well I've created a teacher
         lead position that's going to help with that."
Half of the principals found that the inspection facilitated setting a common ground with the staff for
planning and implementing changes. The process was useful to check if the staff follow and
understand instructional processes (Principal Monica), to “start conversations about institutional
practices” (Principal Sarah), and to establish a “common understanding” (Principal Mary).
         Finally, for some principals, the inspection report was also perceived as a source of
legitimacy for the district. Principal Nicholas makes the case:
         We've been investing in areas that have traditionally not been supported outside of the school
         or by the district, because the push has always been academic. And so, by focusing on these
         other areas on the whole child, on Social-Emotional Learning, on the school culture, is a big
         risk that we would not be accepted as good leaders. So, it's very helpful … to bring more
         support to us, to encourage District leadership to be more supportive of our effort.
One size does not fit all. Although a majority of principals (81%) found the inspection process useful
for planning purposes, three principals did not. Reasons for these stances include that inspections
cannot uncover the “true culture” (Prin. Brian) of a school in a short visit, “unnecessary” (Prin.
Ashely) because school administrators could do the evaluation themselves, or not informative enough
(Prin. Paul). In addition, other principals mentioned specific aspects of the feedback they found
unhelpful. The most common reason was the timing to implement changes or the fact that they
already had reforms in place (Principals Monica, Tyler, Sarah, and David). Two principals also find
that the quantity of information was “overwhelming” (Prin. Nicholas) and not helpful enough
providing “actionable next steps” (Prin. Sebastian).
                                                    132


Difference-in-Differences Analysis
         The DD analysis demonstrates an increased use of key words in SIPs that is significant
among inspected schools. First, I calculate the non-parametric difference-in-difference value (Table
9). On average, inspected schools have 5.6 more words in their SIPs in the post-inspection period
related to Instructional Practices and Culture & Climate in the post-period. To put the quantity of
words in context, in school year 2018-19, on average 21% of the sentences were classified as
Instructional Practices or Climate and Culture (Table 10), which contains the 13.5 key words (Table
9).
Table 9. Non-parametric Difference-in-Difference
                                                                          First
                             School Year           Word Count                         Diff-in-Diff
                                                                       Difference
                                                      Mean
                            2016-17                     6.3
 Inspected schools
                                                                           7.2
 (N=31)
                            2018-19                    13.5
                                                                                             5.6
                            2016-17                     6.0
 Not Inspected
                                                                           1.6
 Schools (N=129)
                            2018-19                     7.6
Notes: The first difference is the average word count of 2018-17, less the word count for 2016-17, for each group.
The difference in differences, is the first difference of Inspected schools minus Not Inspected Schools.
Table 10. Content Analysis Coverage for Panel
                                         % SIPs            % SIPs
 Treatment           School year
                                          words           sentences
 Inspected             2016-17            1.9%              11.4%
 n= 31                 2018-19            2.6%              21.1%
 Not
 inspected             2016-17            2.0%              12.2%
 n= 129                2018-19            2.0%              17.5%
Note: % of SIP words is average keyword count over the total words, excluding “stop words,” in the SIPs; % SIP
sentences is the proportion of sentences including keyword over all sentences in the SIPs.
                                                            133


           Table 11 presents the results of the log-linear DD model with five different specifications: (1)
the base model without school fixed effects nor covariates (Model 1), (2) the base model with
selected school demographics (Model 2), (3) model 2 with test scores (Model 3), (4) model 3 with
school-level fixed effects (Model 4), and (5) model 2 with school-level fixed effects (Model 5). The
DD estimates are statistically significant in all five models.
           Model 3 and Model 4, which include test results, do not include the 31 schools that lack test
scores; this affects the treatment group (5 schools) and comparison group (26 schools). In model 5,
the preferred model, the DD estimate is 0.666 log points (p<.01), which indicates that inspection, on
average, results in a 94.6 percentage point increase in key words.5 These results show that inspection
have a significant impact on the focus on schools planning.
           Comparing model (1) with models (2) and (5), shows that the basic model is sensitive to
adding covariates and school fixed effects. By including these controls, the magnitude of the DD
estimate is reduced. In all the models, the DD is statistically significant (p<.01). Model (2) indicate
that larger school have a lower effect than smaller schools, which indicate that school inspection
might be more effective in informing and redirecting school planning in smaller schools. The
significance disappears when school fixed effects are added in model (5). The DD estimate
sensitivity to school fixed effects, indicate that there are unobservable factors within the school that
influence the results (e.g. principals’ ability to plan). Overall, results appear to be robust. Coefficient
estimates are consistent across model specifications – with varying covariates and sample
composition.
5
  This is calculated as: (e.666-1) *100 = 94.6 percentage points
                                                           134


Table 11. DD Regression Results
                                                           Word Count
                                                             (Logged)
                                      (1)           (2)           (3)               (4)            (5)
 Post                             0.319***      0.327***      0.414***          0.303***       0.326***
                                   (0.090)       (0.090)       (0.098)           (0.105)        (0.088)
 Inspected * Post                 0.757***      0.731***      0.765***           0.533**       0.666***
                                   (0.188)       (0.186)       (0.201)           (0.213)        (0.186)
 FRL (%)                                          0.171         -0.790          -7.238**         -2.581
                                                 (0.638)       (1.015)           (3.242)        (1.999)
 Logged Enrollment                             -0.395***        -0.091           -1.067*         -0.444
                                                 (0.114)       (0.122)           (0.576)        (0.378)
 % Black                                          -0.493         0.801            -4.796         -3.148
                                                 (2.029)       (2.238)           (5.320)        (4.512)
 % White                                          -0.290         0.348            -7.233         -2.769
                                                 (1.781)       (2.045)           (4.560)        (3.999)
 % Hispanic                                       -0.113         1.026             0.848         0.382
                                                 (1.748)       (1.920)           (5.151)        (4.289)
 English Language Art                                          -0.016*             0.006
                                                               (0.010)           (0.017)
 Math                                                            0.007            -0.005
                                                               (0.010)           (0.017)
 Constant                         1.480***       3.917**         8.342            13.812         6.738
 N                                   320           320            258               258           320
 R-squared (within)                 0.237         0.244          0.288             0.364         0.269
 School Fixed Effects                 No            No            No                Yes           Yes
Note: Robust standard errors in parentheses.
*** p<0.01, ** p<0.05, * p<0.1
        A series of robustness checks were conducted to provide evidence in favor of the casual
interpretation of the DD estimates were conducted: 1) graphical examination for parallel trends, 2)
balance tests, and 3) placebo tests.
        I visually examine the parallel trend assumption, which is critical to the validity of DD
models. I compare the outcome variable Word Count in SIPs reflecting pre-inspection years (2014-
15 and 2016-17) between inspected schools and those not inspected. Since high performing schools
                                                    135


can opt to submit a plan every two years, this robustness check relies on the school year 2014-15,
rather than 2015-16. This alleviates concern that schools submit the same report in the pre- and post-
inspection years. Reports prior to 2014-15 are excluded since a narrative section was not required in
these early years. Figure 3 indicates that the average word count was similar for inspected and non-
inspected schools, prior to inspections. This evidence provides support in favor of the parallel trend
assumption underlying the DD model. In the post-inspection year (2018-19), inspected schools have
a higher count of inspection-related words.
Figure 3. Parallel trends: Word Count for Inspected vs. Not Inspected Schools
              15
                                                                            Inspected
              10
 Word count                                                               Not inspected
              5
              0
                         2014-15                 2016-17                 2018-19
                                               School Year
Note: Word count represents the average number of keywords related to Instructional Practices and Climate &
Culture in the SIPs (see Research Design section).
               A balance test examines whether differences in attributes of inspected and not inspected
schools are stable over time and that there is not association between the treatment exposure and the
covariate distribution. The test uses a DD model take the covariates from the original model as
outcome variables. Table 12 presents the results of this model; the DD coefficient estimate is not
                                                        136


significant for any of the outcomes considered: enrollment, percent FRL, percent black, white, and
Hispanic, and test results. There is no evidence of attribute imbalances in my DD models.
Table 12. Balance Tests
                                                                                    English
                              Logged           %      %         %          %
 VARIABLES                                                                         Language     Math
                            Enrollment        FRL  Black     White     Hispanic
                                                                                      Arts
 Inspected * 2018-19            -0.06        -0.01  -0.00     -0.00      0.01         1.20       -2.60
                               (0.05)       (0.01) (0.01)    (0.01)     (0.01)       (1.72)     (1.92)
 N                               320          320    320       320        320         258         258
 Number of Schools               160          160    160       160        160         129         129
 Note: Robust standard errors in parentheses.
*** p<0.01, ** p<0.05, * p<0.1
        Lastly, I conduct two placebo tests to examine alternative explanations (Cook et al., 2002).
The tests include, (i)) a DD model using total word count of the SIPs as a dependent variable; and (ii)
a DD model using the word count of words related to inspection, but not included in the set of words
associated with Instructional Practices and Climate & Culture (see Appendix E for list of the
alternative, inspection-related words). Apart from the change in the dependent variables, the placebo
tests have the same specification as Model (5) of Table 11, this includes school fixed effects. Table
13 present the DD estimators of the two placebo tests. Results show a statistically insignificant DD
estimator for the two placebo test models. This provides additional support for a causal interpretation
of the DD estimate.
                                                    137


Table 13. Placebo Tests
                                  (1)              (2)
                             All Words        Key Words
                               (logged)        (logged)
 Inspected * Post                0.172           0.024
                                (0.143)         (0.137)
 N                                320             320
 R-squared                       0.284           0.350
 Number of Schools                160             160
 Note: Robust standard errors in parentheses.
*** p<0.01, ** p<0.05, * p<0.1
         Overall, the DD estimates indicate that inspected schools nearly double their use of keywords
related to instructional practices and climate & culture conducive to learning. All robustness checks-
—visual examination of parallel trends, balance tests, and placebo tests—support the causal
interpretation of the difference-in-differences estimates. This provides evidence of the significant
impact that inspection can have on school planning and selection of priority areas for improvements.
                                               Conclusions
         This study finds evidence that school inspection can influence school planning. Results
indicate that inspection shifted the focus of planning documents. In addition, principals perceived
inspection as useful for planning purposes. The value of this study is twofold. First, it is the first to
assess the causal impact of inspection on school planning. Second, it provides empirical evidence in
the United States of the effectiveness of inspection in influencing school reform, based on
comprehensive onsite evaluations that go beyond standardized test scores.
         Principals indicated in interviews that inspections were useful in informing reforms, both in
terms of planning and implementation. Inspector feedback helped school leaders to prioritize
improvement areas, reaffirm prior diagnoses and goals, and gain legitimacy with school staff and the
district. These factors are relevant for planning. First, setting priorities is the primary function of
                                                    138


improvement plans and is associated with better organizational performance (Chun & Rainey, 2005;
Hines, 2017). Second, obtaining evidence confirming principals’ diagnosis and goals provides
support to sustain long-term reforms (Armenakis & Harris, 2009). Finally, staff participation in the
inspection process served to establish common ground. Through planning as a team, proposed
reforms had greater legitimacy among staff.
         The broad scope of inspection offered an opportunity to consider areas of reform not
addressed by the district accountability framework, which is heavily based on standardized test
results. Most principals decided to implement changes as a result of the inspection feedback.
Interviews indicated that despite the wide array of schooling issues evaluated by inspections, 80% of
principals decided to implement changes in two areas of inspection—instructional practices, climate
and culture—.
         The difference-in-differences analysis found that inspection led to measurable changes in
school planning. Inspection shifted the focus of school planning documents. Inspectors’ evaluations
led to significant increase in text devoted to topics within the most influential areas—instructional
practices and climate & culture conducive to learning. The amount of text devoted to a topic is used
as a proxy of importance to school principals. Inspected schools devoted 11% of sentences in
planning documents to these topics prior to inspection, which rose to 21% after the intervention.
Word frequency related to these two topics increased an average of 95 percentage points due to
inspection. These results are robust to visual examination of parallel trends, balance tests, and
falsification tests using alternative outcomes.
         These findings are relevant for U.S. education policy. Currently, school districts rely
primarily on standardized tests for accountability purposes. Test-based accountability aims to
improve standardized test results through incentives (Figlio & Loeb, 2011). In this context, schools
are accountable for test results, but not their specific improvement actions. How schools choose to
address low test scores is not emphasized. The sole focus on test results is associated with unintended
                                                    139


consequences, such as narrowed curriculum (e.g. Fitchett & Heafner, 2010; Jacob, 2005), gaming
strategies to improve measured outcomes (Figlio & Loeb, 2011), and neglecting other critical aspects
of learning that are not tested (Jacob, 2005; Rothstein et al., 2008). In contrast, school inspection not
only creates incentives for improvement, but also provides specific feedback on school processes and
outcomes (Ehren et al., 2013). This offers more nuanced information on school strengths and
weakness to guide reforms. An increase in text related to influential inspection topics provides
evidence that inspection was effective informing school reforms.
         While this study does not address reform implementation, it is assumed that SIPs represent
schools’ intended reforms. A potential concern is that schools may present inauthentic goals in
school plans, in order to please the district (Meyers & VanGronigen, 2019). This is unlike to be the
case due to several factors. First, test-based accountability remains the primary mechanism that
principals are responsive to, as emphasized in interviews. Further, the SIPs emphasize standardized
results since they are a state requirement relevant for Federal funding, which is linked to test
performance. Second, the district does not track the reforms implemented after inspection; thus, there
are not direct incentives to implement changes based on inspection feedback. Finally, the scope of
inspection much broader than the two influential areas identified in these studies. These two areas
emphasized by inspectors, nor by the district; thus, it does not seem to be the case that principals
would be incentivized to mention these two areas in order to please the district.
         These findings also contribute to the limited literature that has assessed the effect of
inspection on school reform efforts (Cuckle et al., 1998; Dedering & Müller, 2011; Ehren &
Visscher, 2008; J. Gray & Wilcox, 1995; Ouston et al., 1997; Verhaeghe et al., 2010). School
planning is a crucial step in deciding the direction of reforms (Matthews & Sammons, 2004). Yet,
prior literature has not evaluated the causal effect of inspection on planning. My study fills this gap,
by taking advantage of the availability of SIPs before and after inspections take place and using
                                                   140


quasi-experimental methods. This led to finding the causal impact of school inspection on the focus
of intended reforms on specific focus areas.
         Overall, this study provides evidence regarding the potential of school inspection to guide
school reforms. Heavy reliance on standardized test results offers limited insight into specific,
beneficial reforms and create incentives to narrow the scope of reforms (e.g. Gagnon & Schneider,
2019). This study demonstrates the potential of on-site evaluation in the U.S. to inform the school
planning process, providing a broad diagnosis of schools' strengths and weakness, identifying areas
that hinder improvement, and involving school staff in the planning process.
                                                  141


APPENDICES
    142


Appendix A – Codebook for Interviews to School Principals
1. Usefulness
       1.1. Useful / New insights
               1.1.1. Better prioritize
               1.1.2. Reaffirm existing goals / Confirm diagnosis
               1.1.3. Gain legitimacy
               1.1.4. Somewhat useful
       1.2. Not useful / Not relevant
5. Changes – What motivates them?
       5.1. Principal initiatives / Staff initiatives
       5.2. Inspections
               5.2.1. Community Involvement
               5.2.2. Climate & Culture conducive to Learning
               5.2.3. Instructional Practices
               5.2.4. Leadership
               5.2.5. Professional Development
               5.2.6. Support to Students
               5.2.7. Teachers-Administrators Collaboration
               5.2.8. Other Organizational Issues.
       5.3. Test results, performance framework, evaluations
       5.4. School supervisors
       5.5. District (excluding 5.3. & 5.4)
       5.6. Other sources
       5.7. Did not implement changes based on inspection (explicit)
                                                     143


Appendix B – Content Analysis Dictionary
1) Instructional Practices Terms:
        Active Learning, CCSS (Common Core State Standards) ,Check* for understanding, Class*
        inst*, Class size*, Class time, Common core, Conducive to learning, Consistent expectations,
        Coop*, Co-Teach*, Culturally responsive, Differentiat* instruction, Differentiat* learning,
        Differentiation, Direct instruction, Embedded assessment, Exit ticket*, Experiential learning,
        Feedback to students, Formative, Grouping, Growth mindset, Growth mind set,
        Individualize*, Individualizing, Inquiry based, Instructional method*, Instructional practices,
        Instructional strategies, Intentional, Lesson design, Lesson* plan*, Misconceptions,
        Mistakes, Misunderstanding, PBL (Project Based Learning), Pedagog*, Peer to peer, Plan*
        lesson*, Prior knowledge, Problem solving, Project based, Questioning, Quiz*, Real-Life,
        Reasoning, Regular*_Assess*, Re-Teaching, Rigor, Shelter*_Instruction, Small group*,
        Standard, Structured learning, Student centered, Targeted instruction, Teacher created,
        Teacher led, Thinking, Unite assessement*, Whole group, Time in class*
2) Culture & Climate conducive to Learning Terms:
        *Safe*, Abuse, Academic culture, Addiction*, Alcohol, Attitude*, Behav*, Build
        relationships, Bully*, Class* climate, Class* culture, Classroom environment, Classroom
        environment*, Collaborative systems, Collective, Conflict*, Dean of culture, Drug*,
        Emotion*, Empath*, High expectations, Improvement culture, Instruction culture,
        Interaction*, Interpersonal, Learning culture, Learning environment*, Marijuana, Norms,
        PBIS (Positive Behavior Intervention Supports), Positive climate, Positive culture, Positive
        relationships, Positive school culture, Relationships between, Relationships with students,
        Respect, Respectful, Restorative, RJ (Restorative Justice), Routines, Rules, School's culture,
        School climate, School culture, School wide culture, SEAL (Social, Emotional, And
                                                   144


Academic Learning), SEL (Social And Emotional Learning), Student engagement, Student
culture, Student voice, Suicide, Trauma*, Truan*, Trust, Wellness, Whole child.
                                        145


Appendix C –Most Frequent Phrases on School Improvement Plans – School Years 2016-17 and
2018-19
                                           146


Appendix D – Test Score Results
                                 School Year 2016-17               School Year 2018-19
                             Not Inspected      Inspected       Not Inspected    Inspected
                                  Mean            Mean              Mean           Mean
                                  (SD)            (SD)               (SD)           (SD)
 Test results
 English Language Arts          737.9***        725.6***          741.4***       730.3***
                                 (20.85)         (15.39)           (20.11)        (13.69)
 Math                           732.9***        722.9***          734.8***       722.3***
                                 (18.18)         (12.14)           (18.47)        (11.85)
 n                                 102             26                 102            26
 ***: significant difference at the 1% level from two-sample t tests between inspected and not
inspected schools.
                                                 147


Appendix E – Dictionary for Placebo Test
Other Inspection Related Terms:
African American, After school, American Indian, At risk, Authorizer, Autism, Bilingual, Bilingual
parent advisory committee, Biliteracy, Black student*, Candidate*, Chinese, Club*, Coach*,
Collaborative planning, Community event*, Community partnership, Compliance, Conference*,
Decision-Mak*, Department meeting*, Disab*, Distributed leadership, Dual language, Educator*
need*, Effective teacher*, ELD, ELL, English as a second language, English language development,
English language learner*, ESL, Extra-curricular, Faculty input, Families, Family, Father*,
Financial, Food Service*, Frequent communication*, Grade level meeting*, Granparent*, Guardian*,
High quality teach*, Hire*, Hispanic, Home visit*, Immigrant*, In need, Intervention*, Job
embedded, Language acquisition, Language immersion, Latin*, Lead teacher*, Leadership meeting*,
Leadership model, Leadership support, Lesson observation*, Lesson planning, Meet frequently,
Mentor*, Minorit*, Mother*, Multilingual, Native*, Neighbor*, Observ* other teacher*, Observ*
teacher*, Open communication*, Operational, Organizational goal*, Parent*, Professional
development, Professional growth, Professional learning, Professional standard*, Race*, Recruit*,
Reflective process*, Refugee*, Response to Intervention , Retain*, Reten*, School event*, School
leadership team*, School staff, School* operation*, SLT*, SOC, Spanish speaking, Special
education, Special needs, Sports, Staff evaluation*, Staff input, Staff meeting*, Staff review*, Staff
superv*, Staff support, Strategic conversation*, Strategic plan*, Student* need*, Student* of color,
Student* support, Summer program*, Supplemental services, Support to student*, Support* staff,
System*, Teacher meeting*, Teaching staff, Team meeting*, TNLI, Training*, Transitional native,
anguage instruction, Turnover, Tutor*, Underrepresented, Volunteer*, White student*, Workshop*,
Youth center*
                                                  148


REFERENCES
    149


                                          REFERENCES
Ahuvia, A. (2001). Traditional, Interpretive, and Reception Based Content Analyses: Improving
     the Ability of Content Analysis to Address Issues of Pragmatic and Theoretical Concern.
     Social Indicators Research, 54, 139–172. https://doi.org/101108781350505
Allen, R., & Burgess, S. (2012). How should we treat under-performing schools? A regression
     discontinuity analysis of school inspections in England (No. 12; 87).
Altrichter, H., & Kemethofer, D. (2015). Does accountability pressure through school
     inspections promote school improvement? School Effectiveness and School Improvement,
     26(1), 32–56. https://doi.org/10.1080/09243453.2014.927369
Apple, M. (2005). Education, markets, and an audit culture. Critical Quarterly, 47(1–2), 11–29.
     https://doi.org/doi: 10.1111/j.0011-1562.2005.00611
Armenakis, A., Bernerth, J., Pitts, J., & Walker, H. (2007). Organizational Change Recipients’
     Beliefs Scale. The Journal of Applied Behavioral Science, 43(4), 481–505.
     https://doi.org/10.1177/0021886307303654
Armenakis, A., & Harris, S. (2009). Reflections: our Journey in Organizational Change Research
     and Practice. Journal of Change Management, 9(2), 127–142.
     https://doi.org/10.1080/14697010902879079
Armenakis, A., Harris, S., Cole, M., Fillmer, L., & Self, D. (2007). A Top Management Team’s
     Reactions to Organizational Transformation: The Diagnostic Benefits of Five Key Change
     Sentiments. Journal of Change Management, 7(3–4), 273–290.
     https://doi.org/10.1080/14697010701771014
Armstrong, J. (1982). The value of formal planning for strategic decisions: review of empirical
     research. Strategic Management Journal, 3, 197–211.
Ball, S., & Bowe, R. (1992). Subject departments and the ‘implementation’ of National
     Curriculum policy: an overview of the issues. Journal of Curriculum Studies, 24(2), 97–
     115. https://doi.org/10.1080/0022027920240201
Barber, M. (2005). The virtue of accountability: System redesign, inspection, and incentives in
     the era of informed professionalism. Journal of Education, 185(1), 7–38.
     https://doi.org/10.1177/002205740518500102
Baxter, J. A. (2013). Professional inspector or inspecting professional? Teachers as inspectors in
     a new regulatory regime for education in England. Cambridge Journal of Education, 43(4),
     467–485. https://doi.org/10.1080/0305764X.2013.819069
Behnke, K., & Steins, G. (2017). Principals’ reactions to feedback received by school inspection:
     A longitudinal study. Journal of Educational Change, 18(1), 77–106.
                                                 150


     https://doi.org/10.1007/s10833-016-9275-7
Bengston, D., & Xu, Z. (1995). Changing National Forest Values: a content analysis - Research
     Paper NC-323. http://www.nrs.fs.fed.us/pubs/rp/rp_nc323.pdf
Berry, F. S., & Wechsler, B. (1995). State agencies’ experience with strategic planning: findings
     from a national survey. Public Administration Review, 55(2), 159.
     https://doi.org/10.2307/977181
Bitan, K., Haep, A., & Steins, G. (2014). School inspections still in dispute – an exploratory
     study of school principals’ perceptions of school inspections. International Journal of
     Leadership in Education, 18(4), 1–22. https://doi.org/10.1080/13603124.2014.958199
Bloem, S. (2015). The OECD Directorate for Education as an independent knowledge producer
     through Pisa. In H. G. Kotthoff & E. Klerides (Eds.), Governing Educational Spaces (pp.
     169–185). SensePublishers. https://doi.org/10.1007/978-94-6300-265-3_10
Brier, A., & Hopp, B. (2011). Computer assisted text analysis in the social sciences. Quality &
     Quantity, 45(1), 103–128. https://doi.org/10.1007/s11135-010-9350-8
Chabbott, C., & Elliott, E. J. (2003). Understanding others, educating ourselves: Getting more
     from international comparative studies in education. In Social Sciences.
     https://doi.org/10.17226/10622
Chun, Y. H., & Rainey, H. G. (2005). Goal ambiguity and organizational performance in U.S.
     federal agencies. Journal of Public Administration Research and Theory, 15(4), 529–557.
     https://doi.org/10.1093/jopart/mui030
Clarke, J., & Ozga, J. (2011). Governing by inspection? Comparing school inspection in
     Scotland and England. Social Policy Association Conference, 25.
Coburn, C. (2001). Beyond decoupling: Rethinking the relationship between the institutional
     environment and the classroom. Sociology of Education, 77, 211–244.
     https://doi.org/10.1177/003804070407700302
Coburn, C. (2005). Shaping teacher sensemaking: School leaders and the enactment of reading
     policy. Educational Policy, 19(3), 476–509. https://doi.org/10.1177/0895904805276143
Cole, M. S., Harris, S., & Bernerth, J. B. (2006). Exploring the implications of vision,
     appropriateness, and execution of organizational change. Leadership & Organization
     Development Journal, 27(5), 352–367. https://doi.org/10.1108/01437730610677963
Concurso de Supervisores Rio Negro, (2013).
Resolución del Consejo Provincial de Educación de Río Negro N 1053, Pub. L. No. 1053 (1994).
Conway, M. (2006). The Subjective Precision of Computers: A Methodological Comparison
     with Human Coding in Content Analysis. Journalism & Mass Communication Quarterly,
                                                151


     83(1), 186–200. https://doi.org/10.1177/107769900608300112
Cook, T. D., Campbell, D. T., & Shadish, W. (2002). Experimental and quasi-experimental
     designs for generalized causal inference. Houghton Mifflin.
Cuckle, P., Hodgson, J., & Broadhead, P. (1998). Investigating the relationship between
     OFSTED Inspections and school development planning. School Leadership &
     Management, 18(2), 271–283. https://doi.org/10.1080/13632439869691
Darling-Hammond, L., Bae, S., Cook-Harvey, C. M., Lam, L., Mercer, C., Podolsky, A., &
     Stosich, E. L. (2016). Pathways to new accountability through the Every Student Succeeds
     Act. http://learningpolicyinstitute.org/our-work/publications-resources/%0Apathways-new-
     accountability-every-student-succeeds-act.
De Vries, H., Elliott, M. N., Kanouse, D. E., & Teleki, S. S. (2008). Using Pooled Kappa to
     Summarize Interrater Agreement across Many Items. Field Methods, 20(3), 272–282.
     https://doi.org/10.1177/1525822X08317166
de Wolf, I., & Janssens, F. (2007). Effects and side effects of inspections and accountability in
     education: An overview of empirical studies. Oxford Review of Education, 33(3), 379–396.
     https://doi.org/10.1080/03054980701366207
Dedering, K., & Müller, S. (2011). School improvement through inspections? First empirical
     insights from Germany. Journal of Educational Change, 12(3), 301–322.
     https://doi.org/10.1007/s10833-010-9151-9
Dedering, K., & Sowada, M. G. (2017). Reaching a conclusion—procedures and processes of
     judgement formation in school inspection teams. Educational Assessment, Evaluation and
     Accountability, 29(1), 5–22. https://doi.org/10.1007/s11092-016-9246-9
Deng, Q., Hine, M., Ji, S., & Sur, S. (2019). Inside the black box of dictionary building for text
     analytics: a design science approach. Journal of International Technology and Information
     Management, 27(3), 119–159.
Doud, J. (1995). Planning for school improvement: A curriculum model for school based
     evaluation. Peabody Journal of Education, 70, 175–187.
Edgerton, A. K. (2019). The essence of ESSA: More control at the district level? Phi Delta
     Kappan, 101(2), 14–17. https://doi.org/10.1177/0031721719879148
Education Inspectorate - Ministry of Education, C. and S. (2010). Risk-based Inspection as of
     2009 - Primary and Secondary Education.
Education Inspectorate - Ministry of Education, C. and S. (2017a). Inspection framework
     primary education.
Education Inspectorate - Ministry of Education, C. and S. (2017b). Inspection framework
     secondary education.
                                                 152


Ehren, M. (2016a). Methods and modalities of effective school inspections (M. Ehren (ed.)).
     Springer International Publishing. https://doi.org/10.1007/978-3-319-31003-9
Ehren, M. (2016b). Methods and Modalities of Effective School Inspections. In M. C.M. Ehren
     (Ed.), Methods and Modalities of Effective School Inspections. Springer International
     Publishing. https://doi.org/10.1007/978-3-319-31003-9
Ehren, M., Altrichter, H., McNamara, G., & O’Hara, J. (2013). Impact of school inspections on
     improvement of schools—describing assumptions on causal mechanisms in six European
     countries. Educ Asse Eval Acc, 25, 3–43. https://doi.org/10.1007/s11092-012-9156-4
Ehren, M., Gustafsson, J.-E., Altrichter, H., Skedsmo, G., Kemethofer, D., & Huber, S. (2015).
     Comparing effects and side effects of different school inspection systems across Europe.
     Comparative Education, 51(3), 375–400. https://doi.org/10.1080/03050068.2015.1045769
Ehren, M., Perryman, J., & Shackleton, N. (2015a). School Effectiveness and School
     Improvement. School Effectiveness and School Improvement - An International Journal of
     Research, Policy and Practice, 26(2), 296–327.
Ehren, M., Perryman, J., & Shackleton, N. (2015b). Setting expectations for good education:
     how Dutch school inspections drive improvement. School Effectiveness and School
     Improvement, 26(2), 296–327. https://doi.org/10.1080/09243453.2014.936472
Ehren, M., & Shackleton, N. (2016). Risk-based school inspections: impact of targeted
     inspection approaches on Dutch secondary schools. Educational Assessment, Evaluation
     and Accountability, 28(4), 299–321. https://doi.org/10.1007/s11092-016-9242-0
Ehren, M., & Visscher, A. (2006). TOWARDS A THEORY ON THE IMPACT OF SCHOOL
     INSPECTIONS. British Journal of Educational Studies, 54(1), 51–72.
     https://doi.org/10.1111/j.1467-8527.2006.00333.x
Ehren, M., & Visscher, A. (2008). THE RELATIONSHIPS BETWEEN SCHOOL
     INSPECTIONS, SCHOOL CHARACTERISTICS AND SCHOOL IMPROVEMENT.
     British Journal of Educational Studies, 56(2), 205–227. https://doi.org/10.1111/j.1467-
     8527.2008.00400.x
Fernandez, K. E. (2011). Evaluating school improvement plans and their affect on academic
     performance. Educational Policy, 25(2), 338–367.
     https://doi.org/10.1177/0895904809351693
Figlio, D., & Loeb, S. (2011). School accountability. In Handbook of the Economics of
     Education (pp. 383–421).
Fitchett, P., & Heafner, T. (2010). A national perspective on the effects of high-stakes testing
     and standardization on elementary social studies marginalization. Theory & Research in
     Social Education, 38(1), 114–130. https://doi.org/10.1080/00933104.2010.10473418
Gagnon, D. J., & Schneider, J. (2019). Holistic school quality measurement and the future of
                                                153


     accountability: Pilot-test results. Educational Policy, 33(5), 734–760.
     https://doi.org/10.1177/0895904817736631
Gilroy, P., & Wilcox, B. (1997). OFSTED, criteria and the nature of social understanding: A
     Wittgensteinian critique of the practice of educational judgement. British Journal of
     Educational Studies, 45(1), 22–38. https://doi.org/10.1111/1467-8527.00034
Gioia, D., Thomas, J., Clark, S., & Chittipeddi, K. (1994). Symbolism and strategic change in
     academia: The dynamics of sensemaking and influence. Organization Science, 5(3), 363–
     383. https://doi.org/10.1287/orsc.5.3.363
Glazerman, S. (2016). The ralse dichotomy of school inspection. Mathematica Policy Research -
     Blog Post. https://www.mathematica-mpr.com/commentary/the-false-dichotomy-of-school-
     inspections
Gray, C., & Gardner, J. (1999). The impact of school inspections. Oxford Review of Education,
     25(4), 455–468. https://doi.org/10.1080/030549899103928
Gray, J., & Wilcox, B. (1995). In the aftermath of inspection: the nature and fate of inspection
     report recommendations. Research Papers in Education, 10(1), 1–18.
     https://doi.org/10.1080/0267152950100102
Greene, J. C., Caracelli, V. J., & Graham, W. F. (1989). Toward a conceptual framework for
     mixed-method evaluation designs. Educational Evaluation and Policy Analysis, 11(13),
     255–274.
Grimmer, J., & Stewart, B. M. (2013). Text as Data: The Promise and Pitfalls of Automatic
     Content Analysis Methods for Political Texts. Political Analysis, 21(3), 267–297.
     https://doi.org/10.1093/pan/mps028
Grimolizzi-Jensen, C. J. (2018). Organizational change: Effect of motivational interviewing on
     readiness to change. Journal of Change Management, 18(1), 54–69.
     https://doi.org/10.1080/14697017.2017.1349162
Gustafsson, J.-E., Ehren, M., Conyngham, G., McNamara, G., Altrichter, H., & O’Hara, J.
     (2015). From inspection to quality: Ways in which school inspection influences change in
     schools. Studies in Educational Evaluation, 47, 47–57.
     https://doi.org/10.1016/j.stueduc.2015.07.002
Halverson, R., Kelley, C., & Kimball, S. (2004). Implementing teacher evaluation systems: How
     principals make sense of complex artifacts to shape local instructional practice. Educational
     Administration, Policy, and Reform: Research and Measurement, 3, 153–188.
Hanushek, E. A., & Raymond, M. E. (2005). Does school accountability lead to improved
     student performance? Journal of Policy Analysis and Management, 24(2), 297–327.
     https://doi.org/10.1002/pam.20091
Herscovitch, L., & Meyer, J. P. (2002). Commitment to organizational change: Extension of a
                                                 154


      three-component model. Journal of Applied Psychology, 87(3), 474–487.
      https://doi.org/10.1037/0021-9010.87.3.474
Hill, H. (2001). Policy is not enough: language and the interpretation of State standards.
      American Educational Research Joumal, 38(2), 289–318.
      https://doi.org/10.3102/00028312038002289
Hines, R. T. (2017). An Exploration of the Effects of School Improvement Planning and
      Feedback Systems: School Performance in North Carolina.
Holt, D., Armenakis, A., Feild, H., & Harris, S. (2007). Readiness for Organizational Change.
      The Journal of Applied Behavioral Science, 43(2), 232–255.
      https://doi.org/10.1177/0021886306295295
Husfeldt, V. (2011). Wirkungen und wirksamkeit der externen schulevaluation; uberblick zum
      stand der forschung [The impact of school inspection - Does it really work? State of
      research]. Zeitschrift Für Erziehungswissenschaft, 14(2), 259–282.
      https://doi.org/10.1007/s11618-011-0204-5
Hussain, I. (2015). Subjective Performance Evaluation in the Public Sector Evidence from
      School Inspections. The Journal of Human Resources, 50(1), 189–221.
Jacob, B. (2005). Accountability, incentives and behavior: the impact of high-stakes testing in
      the Chicago Public Schools. Journal of Public Economics, 89(5–6), 761–796.
      https://doi.org/10.1016/j.jpubeco.2004.08.004
Jones, K., & Tymms, P. (2014). Ofsted’s role in promoting school improvement: the
      mechanisms of the school inspection system in England. Oxford Review of Education,
      40(3), 315–330.
Jones, K., Tymms, P., Kemethofer, D., O’Hara, J., McNamara, G., Huber, S., Myrberg, E.,
      Skedsmo, G., & Greger, D. (2017). The unintended consequences of school inspection: the
      prevalence of inspection side-effects in Austria, the Czech Republic, England, Ireland, the
      Netherlands, Sweden, and Switzerland. Oxford Review of Education, 43(6), 805–822.
      https://doi.org/10.1080/03054985.2017.1352499
Kaplan, S., & Orlikowski, W. J. (2013). Temporal Work in Strategy Making. Organization
      Science, 24(4), 965–995. https://doi.org/10.1287/orsc.1120.0792
Klein, A. (2016). School inspections offer a diagnostic look at quality. Education Week.
      https://www.edweek.org/ew/articles/2016/09/28/school-inspections-offer-a-diagnostic-look-
      at.html
Klerks, M. (2012). The effect of school inspections: a systematic review. http://janbri.nl/wp-
      content/uploads/2014/12/ORD-paper-2012-Review-Effect-School-Inspections-
      MKLERKS.pdf
Kluger, A. N., & DeNisi, A. (1996). The effects of feedback interventions on performance: A
                                                155


     historical review, a meta-analysis, and a preliminary feedback intervention theory.
     Psychological Bulletin, 119(2), 254–284. https://doi.org/10.1037/0033-2909.119.2.254
Koretz, D. (2008). Measuring up. Harvard University Press.
Krippendorff, K. (2013). Content analysis: An introduction to Its methodology (3rd ed.). SAGE
     Publications.
Ladd, H. F. (2016). Now is the time to experiment with inspections for school accountability.
     Brookings. https://www.brookings.edu/blog/brown-center-chalkboard/2016/05/26/now-is-
     the-time-to-experiment-with-inspections-for-school-accountability/
Ladd, H. F. (2017). NCLB: RESPONSE TO JACOB. Journal of Policy Analysis and
     Management, 36(2), 477–480. https://doi.org/10.1002/pam.21979
Ladd, H. F., & Figlio, D. (2008). School accountability and student achievement. In Handbook of
     research in education finance and policy (pp. 166–182).
Lee, J., & Fitz, J. (1997). HMI and OFSTED: evolution or revolution in school inspection.
     British Journal of Educational Studies, 45(1), 39–52. https://doi.org/10.1111/1467-
     8527.00035
Lewin, A. Y., & Minton, J. W. (1986). Determining Organizational Effectiveness: Another
     Look, and an Agenda for Research. Management Science, 32(5), 514–538.
     https://doi.org/10.1287/mnsc.32.5.514
Lindgren, J. (2015). The front and back stages of swedish school inspection: opening the black
     box of judgment. Scandinavian Journal of Educational Research`, 59(1), 58–76.
     https://doi.org/10.1080/00313831.2013.838803
Luginbuhl, R., Webbink, D., & de Wolf, I. (2009). Do Inspections Improve Primary School
     Performance? Analysis, 31(3), 221–237. https://doi.org/10.3102/0162373709338315
Maitlis, S. (2005). The Social Processes of Organizational Sensemaking. The Academy of
     Management Journal, 48(1), 21–49. https://doi.org/10.2307/20159639
Maitlis, S., & Christianson, M. (2014). Sensemaking in organizations: Taking stocks and moving
     forward. The Academy of Management Annals, 8(1), 57–125.
     https://doi.org/10.1080/19416520.2014.873177
March, J. G., & Olsen, J. P. (2011). The Logic of Appropriateness. In R. E. Goodin (Ed.), The
     Oxford Handbook of Political Science (pp. 1–22). Oxford University Press.
     https://doi.org/10.1093/oxfordhb/9780199604456.013.0024
Mathis, W., & Trujillo, T. (2016). Lessons from NCLB for the Every Student Succeeds Act.
     http://nepc.colorado.edu/%0Apublication/lessons-from-NCLB
Matthews, P., & Sammons, P. (2004). Improvement through inspection. An evaluation of the
                                                156


     impact of Ofsted’s work. Ofsted.
Matthews, Peter, Holmes, J. R., Vickers, P., & Corporaal, B. (1998). Aspects of the reliability
     and validity of school inspection judgements of teaching quality. Educational Research and
     Evaluation, 4(2), 167–188. https://doi.org/10.1076/edre.4.2.167.6959
McDonnell, L. (2008). The politics of educational accountability: Can the clock be turned back?
     In K. E. Ryan & L. A. Shepard (Eds.), The future of test-based educational accountability.
     Routledge.
McDonnell, L. (2013). Educational accountability and policy feedback. Educational Policy,
     27(2), 170–189. https://doi.org/10.1177/0895904812465119
Meyers, C. V., & VanGronigen, B. A. (2019). A lack of authentic school improvement plan
     development. Journal of Educational Administration, 57(3), 261–278.
     https://doi.org/10.1108/JEA-09-2018-0154
Miles, M. B., Huberman, A. M., & Saldaña, J. (2014). Qualitative data analysis: A methods
     sourcebook (3th ed.). SAGE Publications.
Millett, A., & Johnson, D. C. (1998). Expertise or “baggage”? What helps inspectors to inspect
     primary mathematics? British Educational Research Journal, 24(5), 503–518.
     https://doi.org/10.1080/0141192980240502
Mintrop, H., MacLellan, A. M., & Quintero, M. F. (2001). School improvement plans in schools
     on probation: A comparative content analysis across three accountability systems.
     Educational Administration Quarterly, 37(2), 197–218.
     https://doi.org/10.1177/00131610121969299
Morse, J. (2010). Procedures and practice of mixed method design - Maintaining control, rigor,
     and complexity. In A. M. Tashakkori & C. B. Teddlie (Eds.), Handbook of mixed methods
     in social & behavioral research (pp. 339–352). SAGE Publications.
Neuendorf, K. A. (2017). The content analysis guidebook. SAGE Publications, Inc.
     https://doi.org/10.4135/9781071802878
Nusche, D., Braun, H., Halász, G., & Santiago, P. (2014). OECD Reviews of Evaluation and
     Assessment in Education: Netherlands 2014. OECD.
     https://doi.org/10.1787/9789264211940-en
OECD, O. for E. C. and D. (2015). Education at a glance 2015 - OECD Indicators.
     https://doi.org/10.1787/19991487
Ouston, J., Fidler, B., & Earley, P. (1997). What do schools so after OFSTED school
     inspections-or before? School Leadership & Management, 17(1), 95–104.
     https://doi.org/10.1080/13632439770195
Penninckx, M., & Vanhoof, J. (2015). Insights gained by schools and emotional consequences of
                                               157


     school inspections. A review of evidence. School Leadership & Management, 35(5), 477–
     501. https://doi.org/10.1080/13632434.2015.1107036
Penninckx, M., Vanhoof, J., De Maeyer, S., & Van Petegem, P. (2014). Exploring and
     explaining the effects of being inspected. Educational Studies, 40(4), 456–472.
     https://doi.org/10.1080/03055698.2014.930343
Penninckx, M., Vanhoof, J., De Maeyer, S., & Van Petegem, P. (2015). Effects and side effects
     of Flemish school inspection. Educational Management Administration & Leadership.
     https://doi.org/10.1177/1741143215570305
Perryman, J. (2007). Inspection and emotion. Cambridge Journal of Education, 37(2), 173–190.
     https://doi.org/10.1080/03057640701372418
Perryman, J. (2009). Inspection and the fabrication of professional and performative processes.
     Journal of Education Policy, 24(5), 611–631.
Phillips, D., & Schweisfurth, M. (2014). Comparative and international education - An
     introduction to theory, methods , and practice (2nd ed.). Group, Continuum International
     Publishing.
Piderit, S. K. (2000). Rethinking resistance and recognizing ambivalence: A multidimensional
     view of attitudes toward an organizational change. The Academy of Management Review,
     25(4), 783. https://doi.org/10.2307/259206
Pond, S., Armenakis, A., & Green, S. (1984). The Importance of Employee Expectations in
     Organizational Diagnosis. The Journal of Applied Behavioral Science, 20(2), 167–180.
     https://doi.org/10.1177/002188638402000207
Porac, J. F., Thomas, H., & Baden-Fuller, C. (1989). COMPETITIVE GROUPS AS
     COGNITIVE COMMUNITIES: THE CASE OF SCOTTISH KNITWEAR
     MANUFACTURERS. Journal of Management Studies, 26(4), 397–416.
     https://doi.org/10.1111/j.1467-6486.1989.tb00736.x
Portz, J., & Beauchamp, N. (2020). Educational Accountability and State ESSA Plans.
     Educational Policy, 089590482091736. https://doi.org/10.1177/0895904820917364
Ravitch, D. (2016). The death and life of the great American school system: How testing and
     choice are undermining education. Basic Books.
Redding, C., & Searby, L. (2020). The Map Is Not the Territory: Considering the Role of School
     Improvement Plans in Turnaround Schools. Journal of Cases in Educational Leadership,
     23(3), 63–75. https://doi.org/10.1177/1555458920938854
Riffe, D., Lacy, S., & Fico, F. (2014). Analyzing media messages: Using quantitative content
     analysis in research. Routledge.
Rigby, J. G. (2015). Principals’ sensemaking and enactment of teacher evaluation. Journal of
                                                158


     Educational Administration, 53(3), 374–392. https://doi.org/10.1108/JEA-04-2014-0051
Rosenthal, L. (2004). Do school inspections improve school quality? Ofsted inspections and
     school examination results in the UK. Economics of Education Review, 23, 143–151.
Rothstein, R., Jacobsen, R., & Wilder, T. (2008). Grading education: Getting accountability
     right. Economic Policy Institute and Teacher College Press.
Rouleau, L. (2005). Micro‐practices of strategic sensemaking and sensegiving: How middle
     managers interpret and sell change every day. Journal of Management Studies, 42(7), 1413–
     1441.
Rutz, S., Mathew, D., Robben, P., & Bont, A. (2017). Enhancing responsiveness and
     consistency: Comparing the collective use of discretion and discretionary room at
     inspectorates in England and the Netherlands. Regulation & Governance, 11(1), 81–94.
     https://doi.org/10.1111/rego.12101
Ryan, K., Gandha, T., & Ahn, J. (2013). School self-evaluation and inspection for improving
     U.S. schools? In National Education Policy Center.
     http://nepc.colorado.edu/publication/school-self-evaluation
Sandberg, J., & Tsoukas, H. (2015). Making sense of the sensemaking perspective: Its
     constituents, limitations, and opportunities for further development. Journal of
     Organizational Behavior, 36(S1), S6–S32. https://doi.org/10.1002/job.1937
Scheerens, J., Ehren, M., Sleegers, P., & Leeuw, R. de. (2012). OECD Review on Evaluation and
     Assessment Frameworks for Improving School Outcomes.
Shaw, I., Newton, D. P., Aitkin, M., & Darnell, R. (2003). Do OFSTED Inspections of
     Secondary Schools Make a Difference to GCSE Results? British Educational Research
     Journal, 29(1), 63–75.
Spillane, J. P. (1999). External reform initiatives and teachers’ efforts to reconstruct their
     practice: The mediating role of teachers’ zones of enactment. Journal of Curriculum
     Studies, 31(2), 1–33. https://doi.org/10.1080/002202799183205
Spillane, J. P., Parise, L. M., & Sherer, J. Z. (2011). Organizational Routines as Coupling
     Mechanisms. American Educational Research Journal, 48(3), 586–619.
     https://doi.org/10.3102/0002831210385102
Spillane, J. P., Reiser, B. J., & Gomez, L. M. (2006). Policy Implementation and Cognition The
     Role of Human, Social, and Distributed Cognition in Framing Policy Implementation. In M.
     I. Honig (Ed.), New directions in education policy implementation (pp. 47–64). State
     University of New York Press, Albany.
Spillane, J. P., Reiser, B. J., & Reimer, T. (2002). Policy implementation and cognition:
     Reframing and refocusing implementation research. Review of Educational Research, 72(3),
     387–341. https://doi.org/10.3102/00346543072003387
                                                  159


Stiglitz, J. (2000). Economics of the public sector (3rd ed.). Norton.
Strunk, K. O., Marsh, J. A., Bush-Mecenas, S., & Duque, M. R. (2016). The Best Laid Plans.
     Educational Administration Quarterly, 52(2), 259–309.
     https://doi.org/10.1177/0013161X15616864
Teddlie, C., & Tashakkori, A. (2009). Foundations of mixed methods research: Integrating
     qualitative and quantitative approaches in the social and behavioral sciences. SAGE.
Teddlie, C., & Yu, F. (2007). Mixed methods sampling : A typology with examples. Journal of
     Mixed Methods Research, 1(1), 77–100. https://doi.org/10.1177/1558689806292430
UNESCO. (2017). Global Education Monitoring Report - Accountability in education: Meeting
     our commitments.
van Bruggen, J. C. (2010). Inspectorates of Education in Europe; some comparative remarks
     about their tasks and work.
van der Sluis, M. E., Reezigt, G. J., & Borghans, L. (2017). Implementing New Public
     Management in Educational Policy. Educational Policy, 31(3), 303–329.
Vavrus, F. K., & Bartlett, L. (2016). Rethinking case study research: A comparative approach
     (1st ed.). Routledge.
Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Using school performance
     feedback: perceptions of primary school principals. School Effectiveness and School
     Improvement, 21(2), 167–188. https://doi.org/10.1080/09243450903396005
Visscher, A. J., & Coe, R. (2003). School performance feedback Systems: conceptualisation,
     analysis, and reflection. School Effectiveness and School Improvement, 14(3), 321–349.
     https://doi.org/10.1076/sesi.14.3.321.15842
Weick, K. E. (1995). Sensemaking in organizations. SAGE Publications.
Weick, K. E., Sutcliffe, K. M., & Obstfeld, D. (2005). Organizing and the process of
     sensemaking. Organization Science, 16(4), 409–421.
     https://doi.org/10.1287/orsc.1050.0133
Weiner, B. J. (2009). A theory of organizational readiness for change. Implementation Science,
     4(1), 67. https://doi.org/10.1186/1748-5908-4-67
Woods, P., & Jeffrey, B. (1998). Choosing Positions: Living the Contradictions of OFSTED.
     British Journal of Sociology of Education, 19(4), 547–570.
     https://doi.org/10.1080/0142569980190406
                                                 160