MATHEMATICS TEACHERS’ AND PRINCIPALS’ RESPONSES TO THE USE OF STUDENT 

GROWTH DATA ON TEACHER EVALUATION INSTRUMENTS IN THE STATE OF MICHIGAN 

 
By 
 

Michael Henry Morissette 

 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

 

 

2020

A DISSERTATION 

Submitted to 

Michigan State University 

in partial fulfillment of the requirements 

for the degree of 

Mathematics Education—Doctor of Philosophy 

MATHEMATICS TEACHERS’ AND PRINCIPALS’ RESPONSES TO THE USE OF STUDENT 

GROWTH DATA ON TEACHER EVALUATION INSTRUMENTS IN THE STATE OF MICHIGAN 

ABSTRACT 

 
By 
 

 

 

Michael Henry Morissette 

In 2011, the State of Michigan, along with other states, passed legislation that 

mandated standardized test data be used on teacher evaluations starting in the 2013-

2014 academic year.  Subsequent legislation was passed that altered specific weights for 

the data on the evaluations and when schools would have to implement changes to the 

evaluations.  In some states, value-added models are used to determine growth on the 

evaluations, and it was recommended those models be used in Michigan as well.  The 

state elected to not use them and left it up to individual schools to figure out how to 

measure growth. 

In this study, I interviewed five mathematics teachers and four principals to 

ascertain what barriers existed as they attempted to implement the changes to teacher 

evaluations in Michigan.  Specifically, I looked for evidence of the following barriers:  

disagreement, money, knowledge, other people, materials time, apathy, and stress.  

After analyzing my interview data, I found evidence of all of the barriers, with the bulk 

of the interview data coded as disagreement.    Principals and teachers disagreed with 

how teachers were being evaluated.    Both groups did not feel standardized test data 

should be used in a high-stakes matter.  Principals felt the current system required too 

much of their time and that teachers focused too much on their scores.  The teachers 

indicated they did not find the data from their evaluations to be useful to them.  As a 

result, they did not use their evaluations to inform their practice.  Instead of the current 

system, principals advocated for a simpler system where they could just have 

conversations with teachers about their practice instead of talking about a rubric.  

Teachers just asked that, regardless of what system is in place, that they be evaluated 

fairly and the data from their evaluations are meaningful to them.    

 

 

 
Copyright by 
MICHAEL HENRY MORISSETTE 
2020

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

 

 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

to my wife Nicole Pfeifer.  Thank you for all the support during this process.  I couldn’t 

This dissertation is dedicated 

have done it without you 

I love you. 

v 

 

ACKNOWLEDGMENTS 

 
 

Nobody can get through this PhD journey alone.  If it was not for my peers in PRIME and 

the professors I had at Michigan State University, I would not have made it.  The first person I 

would like to single out and thank is Kevin Lawrence.  Kevin put up with me as a roommate for 

four years at The Pines and helped me grow as a person more than he would ever know.  He 

was also there for me when I hit rock bottom mentally during my first year in the program and I 

owe him my life.  Also, thanks for being my best man in my wedding.  It meant a lot that you 

were there that day in Hell. 

The second person on the list is my wife Nicole Pfeifer.  She has been by my side since 

September 2014 and has been extremely supportive emotionally.  She is also a good reminder 

that there are more important things in life than what I choose to do for a living.  Before 

anything else, my number one priority is to be a good husband to her.   

Person number three is Dan Clark.  He helped me get involved in the GEU, was on my 

practicum committee, and helped code data for this dissertation.  More importantly, he has 

been a good friend since I moved to the Lansing area.  I even had the pleasure of standing in his 

wedding. 

As for professors that I had at MSU, there are three that come to mind that deserve to 

be thanked.  Beth Herbel-Eisenmann, Corey Drake, and Mike Steele have all played important 

parts in getting me through this program.  Some of that support was academic as they are all 

great at giving feedback on my writing, but their emotional support was infinitely more 

important.  I definitely would have walked away during my first year if it was not for them. 

 

 

vi 

 

TABLE OF CONTENTS 

 
 
 
LIST OF TABLES                                                                                                                                                   ix 
 
LIST OF FIGURES                                                                                                                                              x 
 
CHAPTER 1:  INTRODUCTION                                                                                                                       1 
What is mathematics education to me?                                                                                       1 
 
 
Why should mathematics educators care?                                                                                  1 
Background Information                                                                                                                            3 
 
School-level accountability                                                                                                4 
 
 
Unintended consequences of NCLB accountability legislation                                   6 
 
 
 
 
Teacher accountability                                                                                                       7 
Value-added models                                                                                                                         8 
 
 
Summary                                                                                                                                                       9 
 
 
 
 
Overview of dissertation                                                                                                                         10 
 
CHAPTER 2:  TEACHER EVALAUTION REFORM                                                                                           11 
 
Standardized observation protocols                                                                                                                         11 
Types of test-based models in education                                                                                           12 
 
Concerns regarding the use of test-based models                                                                                           13 
 
Possible effects of the use of test-based models on teacher evaluations                               17 
 
 
Summary                                                                                                                                                       24
 
 
CHAPTER 3:  CONCEPTUAL FRAMEWORK                                                                                                                         25 
 
Defining barriers                                                                                                                                                       25 
Considering strategic responses                                                                                                                         27 
 
Relating various barriers to implementation                                                                                           28 
 
Examples of responses to barriers                                                                                                                         30 
 
 
Application of framework to other reforms                                                                                           33 
Indifference/Apathy                                                                                                                         33 
 
 
Human resource:  Evidence of attention to knowledge                                             33 
 
 
 
 
Material resource:  Evidence of attention to money                                                             36 
Material resource:  Evidence of insufficient time for implementation                   36 
 
 
Human resource:  Evidence of other people                                                                38 
 
 
 
 
Barriers:  Evidence of disagreement                                                                                           38 
 
 
Barriers:  Evidence of stress                                                                                             39 
 
Summary                                                                                                                                                       39 
 
CHAPTER 4:  METHOD                                                                                                                                                       41 
Research Design                                                                                                                                                       42 
 
 
 
Method                                                                                                                                                       42 

 

vii 

 

Description of the case                                                                                                                         42 
 
Data collection instruments                                                                                            45 
 
Data collection (surveys)                                                                                                                             46 
 
Data analysis (surveys)                                                                                                                             47 
 
Data collection (interviews)                                                                                             49 
 
 
Data analysis (interviews)                                                                                                 50 
Summary                                                                                                                                                       53 

 
 
 
 
 
 
 
 
CHAPTER 5:  FINDINGS                                                                                                                                                       55 
 
Disagreement                                                                                                                                                       57 
 
Money                                                                                                                                                       60 
Materials                                                                                                                                                       63 
 
Time                                                                                                                                                                                     65 
 
Other people                                                                                                                                                       68 
 
 
Knowledge                                                                                                                                                       69 
Apathy                                                                                                                                                       72 
 
Stress                                                                                                                                                       74 
 
 
Revisiting research questions                                                                                                                         77 
 
Summary                                                                                                                                                       80 
 
CHAPTER 6:  DISCUSSION                                                                                                                           82 
 
Summary of dissertation                                                                                                               82 
Revisiting literature                                                                                                                        83 
 
Addressing barriers                                                                                                                        90 
 
 
Potential options for teacher evaluations in Michigan                                                             92 
What can university-level mathematics educators do?                                                               94 
 
Positionality statement                                                                                                                  95 
 
 
Lessons learned                                                                                                                              97 
Limitations                                                                                                                                                       99 
 
Suggestions for future research                                                                                                                         99 
 
 
 
APPENDICES  

APPENDIX A:  Principal Survey                                                                                                                        102 
APPENDIX B:  Mathematics Teacher Survey                                                                                           104 
APPENDIX C:  Principal Interview Protocol                                                                                           107 
APPENDIX D:  Mathematics Teacher Interview Protocol                                                             108 
APPENDIX E:  Survey Data                                                                                                                         109 

 
REFERENCES                                                                                                                                              112 
 

 

viii 

 

LIST OF TABLES 

 
 
 
 

Table 3.1:  Definitions of barriers                                                                                                             26 
 
Table 4.1:  Mapping of research sub-questions to survey questions                                                48 
 
Table 4.2:  Example of coding interview text regarding barriers                                                        52 
 
Table 4.3:  Example of coding interview text regarding responses to barriers                                52 
 
Table 5.1:  Barrier codes (frequency by individual)                                                                              56 
 
Table 5.2:  Responses to barrier codes (frequency by individual)                                                      56 
 
Table E.1:  Amount of assistance the principal provides with using standardized test scores  
to improve mathematics instruction                                                                                                     109 
 
Table E.2:  Does your school district use standardized tests (e.g. ACT, SAT, MEAP,  
MME, MSTEP) to help measure student growth on teacher evaluations?                                    109 
 
Table E.3:  Does your school use a status model or growth model when using student 
 growth data on teacher evaluations?                                                                                                  109 
 
Table E.4:  Emphasis principals put on making sure standardized test scores are high.             110 
 
Table E.5:  To what extent do you encourage your teachers to use standardized test 
 results to inform their practice?                                                                                                           110 
 
Table E.6:  To what extent do you believe standardized tests capture your students' 
 knowledge of mathematics?                                                                                                                 110 
 
Table E.7:  How good of an indicator do you believe standardized test results are of the 
 quality of your teaching?                                                                                                                        110 
 
Table E.8:  Since the beginning of your career, how often have you attended various 
professional development activities/sessions/conferences etc NOT provided by your 
school district?                                                                                                                                          111 
 
Table E.9:  Purpose of evaluation (Teacher responses)                                                                     111 
 
Table E.10:  Purpose of evaluation (Principal responses)                                                                  111 

 

ix 

 

LIST OF FIGURES 

 
 
 
 

Figure 3.1:  Implementation Barriers                                                                                                      30

 

x 

 

 

 

CHAPTER 1:  INTRODUCTION 

 

What is mathematics education to me? 

Prior to beginning my doctoral studies, I really did not give the question much thought 

as I believed (and still believe) that anything that affects the teaching and/or learning of 

mathematics is fair game.  It was not until I read about the debate of what should and should 

not be counted as mathematics education research in my proseminar classes at Michigan State 

University (see Lester & Lambdin’s (2003) piece for a good overview of the development of the 

mathematics education research community) and when I had my own experience with 

someone pushing back against my idea of what mathematics education is that I realized there 

are various opinions regarding what mathematics education is.  I was shocked when I found out 

there are some that believe that if the mathematics is not the primary focus of the research, 

then it does not count as mathematics education research.  If you are in that camp, then this 

dissertation probably will not be to your liking because I focus on recent teacher evaluation 

reforms.  If you have a broader interpretation of what mathematics education research is, like 

me, then feel free to proceed. 

Why should mathematics educators care? 

In recent years, the use of student growth data has become more prevalent on teacher 

evaluations, among other reforms.  One reason to care about these reforms to teacher 

evaluations is that mathematics teachers, and other core teachers, have the potential for being 

evaluated differently and more harshly than teachers in non-core content areas as student 

growth data are used on most evaluations now (this will be discussed in more detail later) 

despite the fact that not all content areas are not tested with standardized tests.  One could 

 

1 

 

argue that it is easier to show growth on teacher-generated pre- and post-tests than it is on a 

standardized test such as the SAT.  As a result, it would be easier for non-core teachers to show 

growth and keep their jobs than core teachers.  This could potentially have negative 

repercussions regarding the supply of mathematics teachers as some who really want to teach 

may choose to teach art, physical education, industrial arts, etc. to avoid the pressure of having 

to make gains on standardized tests.  For those who insist on having a mathematics career, they 

may steer clear of teaching mathematics and choose a different route.  Mathematics teacher 

educators need to think of ways to support and recruit prospective mathematics teachers to 

ensure that school staffing needs are met and that they are able to effectively respond to the 

added pressure that comes with teaching mathematics.  This study helps in this area as it 

identifies specific issues or barriers that currently exist for mathematics teachers and knowing 

these issues is necessary in order to address them. 

In addition to supporting our prospective mathematics teachers, we also need to be 

concerned with supporting our practicing mathematics teachers as they likely have different 

needs now than they had in the past regarding continuing education and professional 

development opportunities.  With the added pressure on mathematics teachers to show 

student growth as measured by standardized tests, there is a greater need now for 

mathematics educators at the university level to foster relationships with the practicing K-12 

teachers and figure out what aspects of instruction they find to be challenging.  With this 

information, we may need to offer different graduate classes and make a larger effort to offer a 

variety of professional development activities in schools and at conferences to meet those 

needs than we currently might be doing.  Although, we should not stop there.  I argue that I, 

 

2 

 

and my future colleagues, must do more to influence educational policy at the state and 

national level than we currently are.  Practicing teachers need allies in their corner fighting for 

them to improve the working conditions and overall climate in education so we can reverse the 

current trend of less people choosing education as a career.  It is way too easy to throw your 

hands up into the air and think this is not worth the effort because policymakers will not listen 

anyway.  In order to be better informed so that we can be good allies, in this dissertation, I will 

discuss some changes mathematics teachers and principals would like to see in the policy world 

to help them do their jobs better. 

“We need, first of all, for there to be accountability, for there to be somebody who is 

Background information 

 

responsible for enforcing standards and holding people’s feet to the fire” (Granholm, 

2003). 

-Jennifer Granholm, Former Governor of the State of Michigan 

Most people have likely heard something along these lines from a variety of politicians 

on television, in print media, in social media, or on the radio any time public monies are spent.  

In these messages, the concern is that we get the “most bang for our buck” and that tax-payer 

money is spent wisely and efficiently.  In public education, standardized tests have been used as 

a way to hold schools and teachers accountable as some claim that student performance on 

these tests is an indicator of school and teacher quality.  Based on students’ performance on 

national (e.g. NAEP, SAT, ACT) and international standardized tests (e.g. PISA), policy makers 

have argued that schools and teachers have not been doing their jobs to the best of their 

abilities as student performance lags behind students in other countries around the world.  In 

 

3 

 

response to this perception, they have made efforts to hold schools’ and teachers’ feet to the 

fire, as Governor Granholm stated, by having consequences for not showing improvement in 

student standardized test results (Anagnostopoulos, Rutledge, & Jacobsen, 2013; Dee & Jacob, 

2011; Hanushek & Raymond, 2005; State of Michigan, 2011a; State of Michigan 2011b).   What 

follows is a brief discussion of some work that has been done regarding school and teacher 

accountability.  In the next chapter, I provide more detail regarding how standardized tests 

have been used to hold teachers accountable. 

School-level accountability 

The idea of holding schools and teachers accountable for what students learn is not a 

new one.  Individual states (e.g. Connecticut and North Carolina) started implementing 

accountability systems more than twenty years ago (Hanushek & Raymond, 2005).  In these 

early accountability systems, the states would typically create a set of standards for what 

students should learn and require standardized tests linked to these standards to assess how 

much learning took place.  From the test results, the state would create a rating system for the 

schools within the state; typically the schools would get a letter grade (Hanushek & Raymond, 

2005).  Some states, for example, Connecticut, North Carolina, and Texas, attached 

consequences, both positive (e.g. teacher bonuses and vouchers for parents) and negative (e.g. 

closing schools), to these ratings.   Other states like Mississippi, Indiana, and Kansas, however, 

did not.    

To assess the impact of the accountability, with and without consequences, on 

achievement, Hanushek and Raymond (2005) conducted a study using National Assessment of 

Educational Progress (NAEP) mathematics data.  To isolate the effect of accountability on 

 

4 

 

achievement, these authors looked at the growth in performance within states between fourth 

and eighth grade and also used growth models with state fixed effects to help control for the 

effects of other policies within each state.  After performing their analysis of the NAEP data, 

they found a positive and statistically significant impact on student performance in the states 

that attached consequences to the ratings, but found no effect on student performance in 

states in which schools received a letter grade with no accompanying consequences.  Other 

studies have reported similar findings to Hanushek and Raymond’s (2005) findings.  Jacob 

(2004), for example, found that mathematics and reading achievement significantly increased 

following the introduction of a consequential accountability policy in Chicago Public Schools.  

Additionally, Winters and Cowen (2012) studied New York City Public Schools and found that 

receiving a letter grade other than an F had no significant effect on student achievement.  The 

results of these studies suggest that if policy makers want to effect change in schools via 

accountability, then any system put in place should have some consequences (e.g. teacher 

bonuses, vouchers for parents, closing schools) associated with it.  This is all assuming, 

however, that the change we are looking for is better results on standardized tests.  Do better 

results on standardized tests equate to better teaching and students learning the material? 

One of the better-known, large-scale, consequential school accountability systems was 

put in place on the federal level with the reauthorization of the Elementary and Secondary 

Education Act in 2001, better known as the No Child Left Behind (NCLB) Act (U.S. Department of 

Education, 2014).  Analyzing and discussing all that NCLB did is beyond the scope of this study, 

but one relevant aspect of the law is the emphasis on ensuring that all students were proficient 

in reading and mathematics by 2013-2014.  To achieve this target of 100% proficiency, schools 

 

5 

 

had to reach a specific target proficiency levels each year known as adequate yearly progress 

(AYP).  To illustrate, consider a school in the 2002-2003 school year in which 40% of its students 

were identified as proficient based on standardized tests the previous year.  In order to reach 

100% proficiency by 2013-2014, 45% of the students would need to be proficient in 

mathematics by the end of the 2002-2003 school year, 50% by the end of the 2003-2004 school 

year, and so on until they reached 100% in 2013-2014.    If this school did not reach the 

indicated target score by the end of the year, then they failed AYP for that year.  If the school 

failed AYP for two or more years, then there were consequences for that school.  These 

consequences included allowing parents to transfer their children to non-failing schools, being 

forced to supply tutoring to students, replacing staff, implementing new curriculum, having 

outside experts come in to advise administrators at the school, extending the school day and/or 

year, and restructuring the internal organization of the schools (U.S. Department of Education, 

2014).  With this added pressure from NCLB, it may not be surprising to hear that schools tried 

a variety of strategies to avoid consequences. 

Unintended consequences of NCLB accountability legislation 

Even though NCLB had an admirable goal of having all students proficient in 

mathematics and reading by the 2013-2014 school year, the law had some unintended 

consequences.  One of the unintended consequences was that, in order to make AYP each year, 

some schools would focus instructional time on those students who were just below the 

proficiency threshold, which some literature has refered to as the “bubble kids” (Dee & Jacob, 

2011).  The idea behind focusing on these students was that, with just some added attention, it 

would be relatively easy to get them above the proficiency threshold and, thus, make AYP for 

 

6 

 

that year.  The additional time that was spent on these students had to come from somewhere.  

Usually it came from instructional time that was originally devoted to the high-performing 

students and the lowest-performing students (Dee & Jacob, 2011).   Even though it could be 

looked at as a bad practice, allocating the focus away from high-performing students made 

sense as high-performing students would likely remain above the proficiency threshold anyway.  

In this system, it may also make sense that a school would not allocate time and resources to 

put a lot of effort into the lowest-performing students as they would not likely breach the 

proficiency threshold even with the added attention.   In addition to focusing on this group of 

students who are on the border of testing at proficient, some schools shifted their stronger 

teachers to grades that were tested and weaker ones to grades that were not tested (Grissom, 

Kalogrides, & Loeb, 2013).  This is a practice that I will discuss in more detail in my discussion of 

strategic response in Chapter 3.  Other schools would literally “feed towards the test” where 

they would increase glucose in the school lunches during testing windows with the hope that it 

would improve student performance (Figlio & Winicki, 2005).  Even with this gaming of the 

system, schools did not reach 100% proficiency.  Given this, policy makers have turned their 

attention to teachers as it has been known for decades that teachers have the largest effect of 

any in-school factor on student standardized test scores (Coleman, 1966). 

Teacher accountability 

In the almost two decades since NCLB was implemented, the quantity of data that can 

be collected and analyzed has increased as a result of advances in technology.  As data 

collection and analysis capabilities have improved, the focus of accountability in the United 

States has been transitioning from schools to individual teachers.  Until the mid-1990s there 

 

7 

 

really was not an effective means of isolating a given teacher’s effect on student achievement, 

but with the advent of value-added models, some argue that it now can be done (Sanders & 

Rivers, 1996). Several states have recently passed legislation that tie student achievement and 

growth data to teacher evaluations while others are already reversing course and stripping the 

mandate that standardized tests be used in the evaluation (Holloway-Libell & Collins, 2014; 

Maine Education Association, 2019). Some of the reasons for focusing on student growth are to 

limit the practice of focusing on students who are close to scoring proficient and not to penalize 

teachers for the varying levels of students that come into their classrooms.  By attempting to 

address students’ varying levels and backgrounds, one could argue that focusing on students’ 

growth is a fairer practice than getting a certain percentage of students above a certain 

proficiency threshold as this focus rewards progress instead of reaching a particular target, but 

growth models are not without fault.  

Value-added models 

To address the issues with simple growth models, the studies that have been done 

recently (since about 1996) regarding tying student achievement data to individual teachers 

have been done using value-added models (NRC & NAE, 2010; Sanders & Rivers, 1996).  Value-

added models (VAMs) are statistical models that try to isolate a given teacher’s contributions to 

student achievement and growth by controlling for variables over which the teacher has no 

control such as a student’s socioeconomic status or prior achievement (NRC & NAE, 2010).  The 

way these variables are typically controlled for is by using multiple years of test data for each 

student (though some VAMs only make use of one year of data).  Using this past data, an 

expected score for a given standardized test is created and compared to the actual data from 

 

8 

 

that test for each student that a given teacher is responsible for.  These comparisons are then 

aggregated in some way and then a value-added rating is assigned to a teacher.  

 

Even though it can be argued that VAMS are fairer than simple growth models, it should 

be noted that several concerns about their use have been identified regarding the standardized 

tests, measurement error and validity, data analysis, equity, and generalizability among other 

things, all of which will be discussed in more detail in the next chapter (Cohen & Moffitt, 2009; 

Jackson, 2012; MET Project, 2013; NRC & NAE, 2010). 

Summary 

In sum, the federal government, along with state and local governments, have 

implemented various reforms in recent decades in response to a perceived crisis in American 

education at the K-12 level.   One reform was to hold schools and teachers accountable for 

student learning.  A method that was implemented to hold schools accountable for student 

learning was to assign letter grades to the schools, but researchers found that assigning letter 

grades did not have an effect on student performance on standardized tests if the grades were 

not accompanied by consequences.  Regarding consequences, with the passage of NCLB at the 

federal level, schools and states were under more pressure to ensure that all students learn 

than at any other point in American history.  Failure to ensure that all students learn, as 

measured by AYP, was accompanied by a variety of sanctions.  In response to the added 

pressure, schools shifted resources to concentrate on what some literature has called “bubble 

kids” among other strategic responses in order to attain AYP and avoid sanctions.  Recently the 

focus of accountability has transitioned to individual teachers.  Several states have now passed 

laws that tie student achievement and growth data to teacher evaluations. Currently, most of 

 

9 

 

the research done regarding teacher accountability focuses on value-added models.  Little is 

known regarding how people at the ground level have implemented these laws, especially in 

contexts that do not use value-added models.  

Overview of dissertation 

 

In this chapter I briefly discussed my beliefs regarding mathematics education research, 

established why we should care about teacher evaluation reform, and I provided a brief 

overview of recent efforts to use policy to improve schools in the United States.  In the next 

chapter, I go more in-depth regarding teacher evaluation reform legislation, with an emphasis 

on test-based models for measuring student growth.  In Chapter 3, I detail a conceptual 

framework that I created over the past few years.  The purpose of the framework is to be able 

to apply it to a given policy, not just teacher evaluation policy, and identify barriers to 

implementation during the implementation stage.  Based on the barriers that are present and 

the responses to those barriers, I can make recommendations to policymakers for the next 

iteration of a policy.  In Chapter 4, I describe my methods for conducting my study.  In Chapter 

5 I share my results which are organized based on my conceptual framework.   In my final 

chapter, I will discuss the results and provide recommendations for the next iteration of teacher 

evaluation policy. 

 

 

 

 

 

 

10 

 

 

CHAPTER 2:  TEACHER EVALUATION REFORM 

Prior to teacher evaluation reform legislation that was passed earlier this decade, it was 

mostly up to individual school districts to generate the instruments they would use for the 

purpose of evaluating their teachers.  This was usually done in consultation with local teacher 

unions as part of the collective bargaining process.  The new legislation that was passed in the 

State of Michigan, and elsewhere, attempted to standardize the instruments used.  To do this 

more easily, legislation that made the evaluation instruments a prohibitive subject of 

bargaining was passed (State of Michigan, 2011(c)).  This legislation would, in theory, allow a 

state to impose an evaluation format on the school districts.  These imposed formats required 

schools to use a standardized observation protocol and several states also instituted a mandate 

that student growth data be tied to the evaluations as well.  The reasoning for using student 

growth data is largely to introduce variation in the outcomes from evaluation as previously 

most teachers received positive evaluations even when principals knew some of their teachers 

had job performance issues (Weisberg et all, 2009). 

Standardized observation protocols 

 

Regarding standardized observation protocols most states rely on protocols such as 

Charlotte Danielson’s (2014) Framework for Teaching, Marzano’s (2014) Teacher Evaluation 

Model, The Thoughtful Classroom (Silver, Strong, & Associates, 2014), or 5 Dimensions of 

Teaching and Learning (Center for Educational Leadership, 2014).  The State of Michigan, for 

example, leaves it up to individual districts to choose one of these four protocols to use.  These 

protocols are intended to measure aspects of quality teaching that are not captured by 

standardized tests.  For example, the Marzano (2014) model evaluates teachers’ performance 

 

11 

 

in four domains: “Classroom Strategies and Behaviors”, “Planning and Preparing”, “Reflecting 

on Teaching”, and “Collegiality and Professionalism”.  Similarly, the Danielson Framework 

(2014) evaluates teachers in the following four domains:  “Planning and Preparation”, “The 

Classroom Environment”, “Instruction”, and “Professional Responsibilities”.  I do not plan to 

critique how well these protocols measure what they say they measure as the focus of this 

dissertation is on the student growth aspect of teacher evaluation reform and not the 

observation part.  In the remainder of this chapter, I describe a range of test-based methods for 

tying student growth data to teacher evaluations. 

Types of test-based models in education 

 

In order to measure a particular teacher’s effect on a student, test-based evaluation 

models are typically used.  These models fall within one of two primary categories, status 

models or growth models (Ladd & Lauen, 2010).  Status models look at student performance at 

a given point in time and compare that performance to a target score.  For example, they are 

used when a school or state assesses what percentage of their students score at a proficient 

level in a given year, such as what was done when determining AYP under NCLB.  Growth 

models measure student achievement by tracking the improvement or decline in test scores of 

students within a given school year or from one year to the next. This can be done with the 

same group of students (e.g. comparing scores of a group of students at the beginning of a 

school year and at the end) or by looking at different group of students (e.g. comparing scores 

of this year’s 11th graders to last year’s 11th graders).  Probably the most common example of 

growth models is when groups of students take pre- and post-tests to measure how much 

students learned over the course of a specified unit of time.   A problem identified with basic 

 

12 

 

growth models is that they do not control factors that affect student achievement which 

schools and teachers have no ability to affect such as a student’s family background or 

socioeconomic status (NRC & NAE, 2010).   For example, a student could show a large gain from 

the pre-test to a post-test over a unit on quadratic functions.  One might assume this was due 

to the instruction the student received in class, but it could very well be due to outside tutoring 

that this student had access to because the student’s parents had the financial means to 

provide their child a tutor.  To address this issue with basic growth models, value-added 

models, a more complex growth model, can be used.  Value-added models are statistical 

models that attempt to isolate a school’s or teacher’s effect on student achievement by 

controlling for all the variables that affect student achievement, but that are out of the control 

of the school and teacher (NRC & NAE, 2010).  This is normally done by taking at least two years 

of students’ test scores and controlling for various student and school-level variables.  A value-

added estimate for a teacher is then created by comparing observed student data to expected 

student data.  Thus, if the observed value is greater than the expected value, then one could 

make an argument that the teacher was an effective teacher.  If they were close, then the 

teacher could be classified as an average teacher.  If the observed was less than the expected 

value, then the teacher could be classified as an ineffective teacher.  

Concerns regarding the use of test-based models 

 

Several concerns about using test-based models to evaluate teacher performance can 

be found in current literature.  Some concerns are regarding using test scores to evaluate 

teachers (NRC & NAE, 2010; Cohen & Moffitt, 2009).  If teachers are to be evaluated based on 

student test scores and if the test covers only a small percentage of what a given teacher 

 

13 

 

teaches, then some question how the score can be an accurate measure of how well the 

teacher performed.  Maybe the students would have performed significantly better if a 

different subset of the content were tested instead?  Also, according to Cohen & Moffitt (2009), 

in most studies dealing with teacher quality there are significant differences between expert 

observers’ opinions of what good teaching is and what test results state.  For example, a 

teacher could have several cram sessions with their students emphasizing doing procedures 

right before the standardized test is given and these sessions could have the short-term effect 

of boosting test scores, but this teaching dos not emphasize deep, conceptual understanding 

and likely will not translate into long-term learning.  Another concern is related to the fact that 

standardized tests generally do not measure important non-content related aspects of teaching 

such as fostering intellectual curiosity, self-esteem, student motivation, persistence in tackling 

difficult tasks, or the ability to collaborate well with others (NRC & NAE, 2010; Jackson, 2012). 

 

Other concerns relate to measurement error and validity.  Given that test items are a 

subset of the entire set of possible relevant questions that could be asked and the test is given 

at one particular time, a student may perform slightly better or worse if they took the test on a 

different day due to a variety of reasons (NRC & NAE, 2010; Cohen & Moffitt, 2009).  Also, if 

growth models are used that rely on longitudinal data, as is normally done with value-added 

models,  the sample used to generate a score for a given teacher may be rather small because 

of missing data due to students moving in or out of a given district, student absences or 

imperfect record matching.  Having a small sample size increases the error and decreases the 

precision of these estimates.  Another problem that occurs when longitudinal data are used 

happens often at the high school level where students are not normally tested annually.  No 

 

14 

 

Child Left Behind only requires students to be tested once during their high school career 

(Goldhaber, Goldschmidt, & Tseng, 2013).  Given this, it would be hard to assign a score to an 

Algebra 2 teacher for example as there would most likely be a two- to three-year gap in time 

between standardized tests. 

Regarding validity, Cohen & Moffitt (2009) argue that results obtained by using growth 

models are somewhat invalid because students are not randomly assigned to teachers.  Some 

teachers typically get higher-achieving students (as defined by standardized test scores) and 

some typically get lower-achieving students.  For example, more-senior mathematics teachers 

normally get to teach higher-level mathematics classes and less-senior teachers normally teach 

remedial-level mathematics.  Thus, the less-senior teacher may appear to be worse than the 

more-senior teacher, not because of their ability to teach, but because of the students each 

teacher has.  On the other hand, teachers that teach only advanced classes may appear to be 

worse than they are if their students do not show significant growth.  One could argue that it is 

hard to show improvement if one consistently teaches students who are in the 95th percentile 

or higher.  In addition to the lack of random assignment, Goldhaber et al (2013) found that 

issues can arise even between very similar growth models. For example, different types of 

value-added models exist.  When comparing results from a “Student Fixed-Effects Model” and a 

“Student Fixed-Effects with Lagged Score Model” the authors found significant variations in 

teacher quality as reported by the models.  Some teachers that were reported as being in the 

lowest quintile in one model were reported as being in the highest quintile using the other 

model and vice versa.  These results are troubling if data from these models are used in high-

 

15 

 

stakes decisions as good teachers could possibly lose their jobs after being mislabeled as poor 

ones and poor teachers could keep their jobs after being mislabeled as good teachers. 

 

The concerns to this point deal primarily with the instrument used to acquire data, but 

other concerns deal with analyzing the data collected.  Some researchers question how causal 

inferences can be made given teachers and students are not randomly assigned even after 

controlling for prior student achievement (NRC & NAE, 2010).  Others question how other 

factors that affect student achievement in a particular class can be teased out (Jackson, 2012; 

NRC & NAE, 2010).  For example, if a student is taking a mathematics course and a physics 

course at the same time, what the student learns in physics is going to have an effect on the 

student’s knowledge of mathematics.  When that student takes a standardized test that 

assesses their knowledge of mathematics, the student’s mathematics teacher will get all of the 

credit for what that student knows.  The student’s mathematics teacher would also get all of 

the credit if the student has a tutor, receives help from another teacher, or attends private test 

preparation sessions.  If these things cannot be teased out, then scores assigned to a particular 

teacher will not be accurate. 

 

Equity issues are another concern regarding using test-based models.  More specifically, 

some researchers have expressed concerns about using test-based models given the gap in 

resources that exists between low-poverty and high-poverty schools.  Students and teachers in 

high-poverty schools typically have fewer educational resources, larger class sizes, weaker 

leadership, and more student and teacher mobility (Cohen & Moffitt, 2009).  As Cohen & 

Moffitt (2009) argue, this inequality “could reduce the value that teachers in poorly resourced 

schools add to students’ scores” and “teachers would be penalized for conditions beyond their 

 

16 

 

control, thus erroneously reducing the quality scores and eroding the scheme’s legitimacy and 

political and legal standing” (Cohen & Moffitt, 2009, p. 203). 

 

Another concern regarding the use of test-based models deals with the assumption that 

results of studies done at the elementary school level are generalizable to the middle and high 

school levels.  C. Kirabo Jackson (2012) addressed this issue in his study entitled “Teacher 

Quality at the High School Level:  The Importance of Accounting for Tracks.”  Jackson (2012) 

stated, “because elementary-school students are typically exposed to one teacher and all are 

on the same academic track, while secondary-school students are exposed to several teachers 

and placed into different tracks, methodologies designed for elementary-school teachers may 

be inappropriate for measuring teacher quality in other grades” (p. 2).  If tracking effects are 

not controlled for, this study found that the importance of high school teachers may be 

overstated by approximately 50%.  When controlling for tracking effects, he found that “a one 

standard deviation increase in algebra teacher quality is associated 0.08σ higher test scores” 

(Jackson, 2012, p. 24).  This finding suggests that teacher quality at the high school level has 

very little effect on student standardized test scores as a .08 standard deviation increase in test 

scores is virtually no change at all. 

Possible effects of the use of test-based models on teacher evaluations 

 

The practice of tying student achievement data to teacher evaluations can affect, both 

positively and negatively, teachers and teacher education in a variety of ways.  With this 

increased emphasis on accountability, one could hypothesize that teachers’ stress levels would 

increase significantly.  Some may say that this is a good thing and that the increased stress 

would cause teachers to try harder; these people are assuming teachers are not giving 100% 

 

17 

 

effort now.  This increase in effort, this line of thinking purports, would then lead to an increase 

in student achievement.  I would argue, on the other hand, that this increased stress could be 

detrimental to teachers’ performance in the classroom.  I base this belief on my own 

experience as a classroom teacher and also on a study done by Drake and Sherin (2006). 

 

Drake and Sherin (2006) investigated two teachers’ implementation of a standards-

based mathematics curriculum.  As part of that study, the authors interviewed the two 

participants, Beth and Linda.  When interviewing Beth, she said, “Weaknesses?  That when I’m 

overstressed I go back to the way I always did, that’s it.  I lose sight of where I’m going and I just 

deal with the here and now” (p. 165).  In other words, when Beth felt overstressed, she 

resorted to teaching in a more traditional way rather than teaching in a way that was more 

aligned with the ideals of the mathematics education reform movement.  Like Beth, I also 

resorted to teaching in a more traditional way when I was stressed.  Extra stress caused me to 

go into a survival mode where I would do just enough preparation to get through the next day.  

If other teachers react similarly to stress, then the added stress that comes from the increased 

emphasis on accountability could negatively affect what happens in the classroom.   

 

In addition to affecting what happens in one’s classroom, tying student achievement 

data to teacher evaluations can also affect a teacher’s interactions with their colleagues and 

their colleagues’ students.  If teachers are to be evaluated using value-added models, as the 

Michigan Council of Educator Effectiveness (2013) recommends for the State of Michigan, then 

why would a teacher want to help their colleagues become better teachers or assist their 

colleagues’ students in any way?  This very thing came up as an issue in Guenther’s dissertation 

(2019):  “findings reveal, at the very least, this current evaluation system does not encourage 

 

18 

 

teachers to work together to improve their practice. At its most consequential, it appears to be 

encouraging isolationism and creating adversarial relationships among some teachers” (p. 6). 

Personally, this finding resonated with me a lot as when I taught, I was rather close with a lot of 

my students and I always told my students that I would always help them if they needed it, 

even if I was no longer their teacher.  Several students took me up on the offer during my 

career.  They would come in after school to get help on their mathematics homework because 

their teacher would not or could not stay after school on that given day (my schedule was more 

flexible given that I had no family obligations).  If I was still teaching and was evaluated using 

test-based models, I am not sure if I would continue this practice in an environment where 

competition is fostered.  It would be against my rational self-interest to do as my actions could 

actually cause me to lose my job.   

In order to address this possible issue and to foster collaboration, the MCEE (2013), 

when making recommendations regarding how to implement the changes in teacher evaluation 

policy, stated that school-level value-added models (e.g., a mathematics department would get 

a value-added score for the group) may be used as part of a teacher’s evaluation. They also 

suggested, however, that this score can only comprise 10% of the teacher’s evaluation.  Thus, 

taking the 2015-2016 school-year as an example, 40% of the teacher’s evaluation would still be 

determined by their individual score.  One may question whether the 10% generated from 

school-level data is large enough to foster collaboration given 40% of the evaluation would still 

be generated from one’s own practice. 

 

A possible effect on teacher education programs that needs to be addressed if test-

based models are used on teacher evaluations is regarding the placement of student 

 

19 

 

teachers/interns in classrooms.  If a teacher is going to be evaluated using value-added models, 

then would it not be against their rational self-interest to allow a novice to teach their 

students?  That student teacher is likely going to affect the supervising teacher’s score on their 

evaluation.  Given this and the likelihood that it will be impossible to tease out the student 

teacher’s effect on the students, more teachers may be unwilling to accept student 

teachers/interns in their classroom.  That is, unless the state agrees to suspend the 

requirement that test-based models be used for any teacher who agrees to have a student 

teacher.  This, however, would create its own set of problems as then every teacher would 

likely want to have a student teacher and there could be some hostility between colleagues in a 

school.  Those who do not get a student teacher would think that they were being treated 

unfairly, especially if the same teachers always had student teachers placed in their classrooms. 

 

To this point, I have only thus far addressed possible negative effects of using test-based 

models. There are, however, possible positive effects as well.  The use of test-based models 

could require teachers to reflect on their practice and consider ways to improve it.  One 

possible change that teachers might make is related to the textbooks they use since 

“mathematics has a long history of being driven by the textbook” (Remillard, 2005, p. 214).  If 

teachers wish to keep their jobs, they may feel that it is necessary to find and use written 

curriculum materials that have been correlated with increases in student achievement.  

Standards-based curricula have been shown to do just that (Thompson & Senk, 2001). 

 

In addition to changing the textbooks they use, teachers may decide to carefully analyze 

standardized test data to look for areas where students struggle and then make changes to 

their practice based on what they find.  The use of this data is something Cavanna (2016) 

 

20 

 

investigated in her dissertation.  She found that teachers did take part in data digs as a result of 

the changes in teacher evaluation policy, but the teachers did not find standardized test data to 

be useful in day-to-day lesson planning and teaching.  Instead, the teachers reported that the 

data they had a hand in collecting themselves was more useful (e.g. formative assessments, 

videorecordings of their teaching, and student work) for these purposes than standardized 

tests.  Cavanna also reported that the teachers did not get a lot of support from administration 

regarding what to do with the standardized test data during the data digs so one might 

conjecture that teachers may find that data more useful if they had more support in how that 

information might be useful to their practice.  Alternatively, different kinds of data that are 

closer to teachers’ everyday practice, like that collected when doing action research, may be 

more relevant and useful. 

Another viable alternative could be to have teachers look to other successful schools and 

teachers for ideas on how to improve their practice. Kitchen, DePree, Celedon-Pattichis, and 

Brinkerhoff (2007) share the characteristics of nine successful schools that serve students who 

are poor and specific teachers’ practices that led to high student achievement as measured by 

standardized tests.   As for the specific characteristics and practices, the authors were able to 

identify three major themes including “high expectations and sustained support for academic 

excellence” (p. 33), “challenging mathematical content and high-level instruction” (p. 77), and 

“the importance of building relationships” (p. 115).  Regarding the first theme, the authors 

found that all of the schools made teaching and learning the priorities over everything else.  

Administrators and teachers believed that all of their students were capable of learning. Failure 

 

21 

 

was not an option for them.  Also, administrators were extremely supportive of their teachers.  

Some specific examples of what was done in the schools included: 

•  Administrators handled most of the paperwork and problems with students and parents 

so their teachers would only have to worry about teaching. 

•  Students were provided with additional instructional time beyond their normal 

mathematics classes for remediation and also to challenge them.   

•  Teachers were given cell phones by the school so their students could call them at night 

for extra help.   

•  Schools provided a late bus so students could get extra help after school from their 

teachers (some of these teachers were paid extra to stay after school to tutor students).   

•  Teachers had access to a variety of different teaching resources.  If the teachers wanted 

something for their classroom, all they needed to do was ask. 

•  Teachers were encouraged to and took part in sustained professional development 

activities.  In many cases, the schools would pay for their teachers to attend conferences 

and workshops. 

As for the second theme, Kitchen et al (2007) found that the schools in the study had a 

clear focus on developing students’ problem-solving and critical thinking skills in addition to 

ensuring the students improved their proficiency with basic skills.  There was also a focus on 

ensuring the students did well on standardized tests.   Some practices that were common in the 

teachers’ classrooms with these foci in mind included: 

•  Teachers had students work in groups in order to facilitate communication about 

mathematical ideas. 

 

22 

 

•  Teachers did not let their textbook define what was going to be taught in their 

classrooms.  The teachers viewed curriculum design and implementation as “an 

ongoing, dynamic process that should sustain high student expectations” (p. 83).  

Given this belief, several of the teachers would supplement their textbook and 

continuously plan, reflect, and alter what they did in their classrooms. 

•  Teachers incorporated tasks that were meaningful to their students. 

•  Teachers would analyze students’ standardized test scores and engage in 

“backward curriculum planning” (p 89).  That is, the teachers would look at the 

scores, see where their students needed to improve, and then vertically align 

their middle school and high school mathematics curriculum to address the areas 

that were shown to be weak. 

The final theme the authors identified addressed the importance of building relationships with 

fellow teachers as well as with students.  When interviewed, the teachers stressed the value of 

being able to collaboratively plan lessons with their colleagues, creating a support system 

amongst the faculty, and holding each other accountable.  Regarding the practice of 

collaboratively planning lessons, Kitchen et al (2007) stated, “for teachers searching for one 

magic bullet in this study that they could implement to impact their students’ learning and 

achievement, the collaborations that existed among participating faculties may be it” (p. 128).  

As for the relationships with the students, the teachers stated that it was important to show 

the students that they care about them and to show the students that they are human.  If the 

students know that the teachers care about them, one could argue that the students might try 

harder in class so they do not disappoint the teacher. 

 

23 

 

Summary 

 

In this chapter, I discussed changes in teacher evaluation policy.  Specifically, several 

states now use standardized observation protocols and also use student growth data as a 

means of introducing variation in evaluation outcomes as there was not a lot of variation 

before.  I discussed how test-based models can be used on teacher evaluations and also some 

issues identified with their use.  This was followed by a discussion of some possible effects of 

using the test-based models on teacher evaluations, both positive and negative. 

 

In the next chapter, I discuss a conceptual framework for analyzing the implementation 

of a given policy that I have been developing for the past few years.  Its roots can be traced 

back to a concept called strategic response that came from literature regarding the 

implementation of No Child Left Behind.  My reason for developing this framework is that I 

wanted to have something that I could use to analyze various educational policies with the 

hope of identifying specific implementation issues.  Using this framework, implementation 

issues can be classified into a handful of categories that can be used to advise the next iteration 

of a given policy so that it is more successful, depending on what implementation issues are 

found. 

 

 

 

 

 

 

 

24 

 

 

CHAPTER 3:  CONCEPTUAL FRAMEWORK 

In this chapter I unpack a conceptual framework that helps us understand the barriers 

and related responses to policy implementation.  Specifically, my framework describes various 

barriers that educational policy and mathematics education researchers have identified as 

commonly causing teachers not to implement a policy in ways intended by policymakers.  

When used as a lens to analyze the data I collected, this framework allowed me to identify 

specific areas where teacher evaluation reform in the State of Michigan broke down during the 

implementation phase.  Having identified these barriers and teachers’ responses to those 

barriers, I can now tailor recommendations to policymakers to make the next iteration of 

teacher evaluation reform policy more successful in improving the quality of instruction in our 

schools.  What follows is a synthesis of the literature that informs my framework with a focus 

on the literature regarding policy implementation and a concept called strategic response.  In 

this synthesis of literature, I discuss what others have found to be barriers to implementation 

and also how educators responded to various barriers that were present in other reforms.  

Following this discussion, I then apply the framework to literature of various reforms where the 

author(s)’ focus was not implementation issues, but where evidence of implementation issues 

existed. 

Defining barriers 

 

After a policy is passed by legislators, some group or groups of people are charged with 

implementing it. Quite often, though, the policy is not implemented as the policymakers had 

intended due to a variety of factors.  In this dissertation I call those factors barriers to intended 

implementation.  Previous studies have identified several barriers to implementation that cause 

 

25 

 

a policy not to be implemented as policymakers had intended.  These barriers include:  

implementers’ indifference or apathy toward the policy, disagreement about how to achieve 

results, stress, and lack of resources (Drake & Sherin, 2006; Hope, 2002).  For clarity, I provide 

my definitions for each of those barriers in Table 3.1 below.  Although the table provides very 

short definitions for each of these terms, I next provide a longer description of “resources” as 

my definition includes terms that are not commonly understood. 

Table 3.1: 

Definitions of barriers 

Term 

Indifference/Apathy 

Disagreement 

Stress 

Resources 

 

Definition 

The state of being completely uninterested in and 
unconcerned with a given policy. 
Those charged with implementing a policy agree 
with the primary goal of the policy, but have 
different opinions than the policymakers 
regarding how to achieve the goal. 
Long-term distress that causes internal 
psychological and/or physical tension which can 
lead to depression, anxiety, and other mental 
health issues.  For the purposes of this study, I 
am ignoring eustress. 
Any material, human, social and/or cultural asset 
that one uses to function. 

Most people, when asked what educational resources are, would probably state they are items 

such as calculators, books, computers, manipulatives, money, etc.  Some researchers have 

classified these as material resources (Adler, 2000) or economic capital (Spillane, Hallet, & 

Diamond, 2003).  A teacher’s knowledge should also be considered a resource because it can 

affect how the teacher responds to a given policy; that is, one cannot do what one does not 

know how to do.  Adler (2000) would categorize one’s knowledge as a human resource.  In 

 

26 

 

addition to material and human resources, time can be considered a resource as Adler (2000) 

classified it as a social and cultural resource.  Other social resources could include trust and 

collaboration as well.   

Considering strategic responses 

The presence of barriers to implementation is only half of what we need to pay attention to 

when studying how a given policy is implemented.  The other half is related to how those who 

are responsible for implementing a given policy respond to those barriers.  One area of 

literature that informs my framework is regarding a concept called strategic response.  Strategic 

response, as it has been used recently in the educational policy literature, has an almost 

nefarious connotation associated with it.  For example, the term has been used to describe 

school and teacher responses to the mandates that came with No Child Left Behind (USDOE, 

2014).  Specifically, the phrase “strategic response”  has been used in literature related to the 

following practices 1) focusing on students just below the proficiency threshold because of 

limited human and material resources (Dee & Jacob, 2011), 2) moving stronger teachers to 

tested grades at the elementary level to boost standardized test scores due to human resource 

barriers (Grissom, Kalogrides & Loeb, 2013), and 3) increasing calorie and glucose levels in 

school lunches around testing periods, possibly due to disagreement or resource issues, to 

boost students’ short-term cognitive ability (Figlio & Winicki, 2005).  My definition of strategic 

response would include these responses under the umbrella of strategic response, but is 

broader as I would argue all responses by implementers should be considered as strategic 

responses as there is some thinking/strategizing involved in deciding if and how to deploy one’s 

resources to implement a policy. 

 

27 

 

Relating various barriers to implementation 

Figure 3.1 on page 30 is a representation of how I see the relationships among the 

various barriers to implementation.  In the previous section, I explained the barriers so here I 

focus on explaining the relationships in the figure.  Looking at the entire figure, all of the 

bubbles are connected to each other because there is not a neat, ordered linear relationship 

that exists among them.  In previous iterations of this figure I tried to impose some order on 

this by having arrows pointing in various directions, but I elected to omit specific directional 

relationships on the figure as I came to realize, after a lot of thought, that some of these 

relationships between barriers can vary from person to person.  That is, the relationship 

between two barriers could vary directly for Person A, but inversely for Person B.  For example, 

let us consider the relationship between one’s knowledge and time.  If we were able to quantify 

knowledge regarding a specific topic, intervention, etc., then one might expect that the amount 

of time a person would need to learn something would increase if the amount of knowledge 

one has decreases.  That may be true only to a certain point though.  If one has no knowledge 

of how to do something, then one may devote no time to learn about it.  This could also be due 

to some apathy/indifference which highlights the complexity that exists among all of these 

barriers.   

In addition to the relationships between two barriers varying directly or inversely, a 

third barrier might act as a mediating or moderating variable.  In the example in the previous 

paragraph, apathy/indifference could be considered as a mediating variable as the person’s 

complete lack of knowledge caused them to feel apathetic which, in turn, caused the person to 

not devote any time at all to learning anything.  An example of a barrier serving as a moderating 

 

28 

 

variable between two others would be if we were to consider non-work-related stress.  By 

definition, it is not caused by factors at work, but it can have an effect on the relationship 

between two other variables.  If someone has a lot of stress because of a chaotic situation at 

home, it may have an effect on the relationship between money and materials, for example.  

They may be so stressed that they cannot even think of how to spend available funds on 

available materials to implement a given policy. 

 

Overall, what I would like readers to keep in mind from looking at this figure and 

considering barriers and relationships between them is that implementation at the ground level 

can get messy.  It is important to think about how the various barriers (and relationships among 

those barriers) could be affecting the successful implementation of a policy as it is highly 

unlikely that only one barrier is the cause of all of the implementation issues.  When offering 

recommendations for further iterations of a policy, it is important to devote the time and effort 

to consider the various ways multiple barriers could be affecting implementation and not just 

zero in on the one that comes to your attention first. 

 

29 

 

 

Examples of responses to barriers 

Identifying the barriers that exist is important, but responses to barriers are also important 

as knowing them can help in offering recommendations for future iterations of a policy.  For 

example, lack of money could be identified as a barrier to implementation, but it may not be 

the primary issue causing implementation breakdown.  There could be a toxic climate created 

by the people in an organization, the media, policymakers, etc. that is a bigger issue than the 

money.  Throwing more money at the problem might help some, but it will not help as much as 

fixing the climate.  Regarding potential responses for the barriers in Figure 3.1, it would be 

impossible to list all possible responses that could happen with a given barrier as policies are 

different and the people charged with implementing the policies are all different.  What I will 

 

30 

 

do here is give a couple possible responses to each of the main barriers I discussed earlier.  

Recommendations regarding how to address the barriers and responses would differ 

depending on what barriers exist and how one triages them. 

If indifference or apathy toward the policy exists, educators might respond by ignoring a 

policy and not making any changes in their practice as a result.  This might occur because the 

implementers believe the policy will be short-lived and the time and energy required to 

implement the policy would be a waste.  An alternate response could be choosing to ignore the 

policy, but constantly complaining about how pointless it is to everyone they encounter.  This 

complaining could wear down the others to the point where they decide not to bother trying to 

implement the given policy as well. 

If lack of resources exists, educators might respond by trying to implement the policy to the 

best of their ability using what resources they do have, but fail to implement the policy as 

intended because the resources available are insufficient to get the job done.  For example, a 

mathematics teacher could respond to a mandate that they teach different content in their 

Algebra 2 class the following year by finding a variety of materials on the internet. The teacher 

may not have the knowledge or time necessary to judge the quality of these activities or how to 

properly sequence the activities, however.  Another potential response if lack of resources 

exists could be that an educator decides not to bother trying to implement a policy if they 

deem the lack of resources to be insurmountable.  Thus, it may look like apathy at first glance, 

but the apathy is actually caused by the lack of resources. 

If the disagreement about how to achieve results exists, educators could respond by 

changing their practice to achieve the goals of the given policy, but do so in a manner that is 

 

31 

 

inconsistent with what policymakers or others had in mind.  For example, a middle school 

principal could mandate the use of a particular curriculum to boost standardized test scores, 

but a mathematics teacher might think that the given curriculum does not give their students 

enough practice and decides to have students do more practice problems from random 

worksheets they found online.  Another potential response to disagreement could be that the 

educator bypasses their administrator and decides to speak with the school board president 

because the educator does not believe that what their administrator has planned will achieve 

the goal of a given policy.    

If stress exists, either work or non-work related, educators may put in the minimum effort 

to just get by even though they may know of a better way to do something.  For example, a 

mathematics teacher may know how to facilitate meaningful discussions in their classroom, but 

they choose to lecture instead because lecturing takes less planning and less planning might be 

all they can do given the stressors that exist.  Another potential response to stress could be the 

educator decides to just resign and pursue a different career. 

 In sum, depending on the barrier that exists, educators can respond in a variety of ways.  

Some of those responses could be choosing to ignore a policy, rationing available resources and 

attempting to implement a policy to the best of their ability, responding to a policy by doing 

something that the implementer believes achieves the goal of a policy, but is inconsistent with 

the substance of a given policy itself, or by putting in minimal effort due to stress.  What follows 

in the next section is my first attempt to use my framework with some qualitative data from 

journal articles. 

 

 

32 

 

 

Application of framework to other reforms 

Ideally, I would have liked to code some interview transcripts to test and further clarify 

my framework, but I decided that attempting to code qualitative data from articles would be 

sufficient for my first attempt.  Thus, I searched for studies of various reforms by inputting 

keywords into Google Scholar, accessed the articles using ProQuest through MSU’s library, and 

looked for evidence of the barriers and responses to barriers listed above in the text of the 

articles.  Specifically, I searched for and applied my framework to articles regarding inclusion of 

special education students in the general education classroom, Algebra for All, the Common 

Core State Standards, and teacher evaluation reform.  I did not do an exhaustive search for 

articles regarding these forms, however. I just sampled a few so I could have some data to work 

with for a trial run.   Results from this search and subsequent coding follow. 

Indifference/Apathy 

As I read the articles, I was not able to find evidence of every barrier in Figure 3.1.  This 

did not surprise me as my focus on implementation barriers is not the focus the various authors 

had in mind when they wrote their respective articles.  Additionally, I was not surprised that I 

was unable to find evidence that suggested teachers were apathetic or indifferent towards any 

of the reforms given the results of Spilane and Zeuli’s (1999) study, for example, found most 

teachers try to implement policies even when they may disagree with them.  This, however, 

was not the case for every person as I will report later in this dissertation regarding 

administrators’ choices regarding tying student achievement data to teacher evaluations.   

 

 

 

33 

 

Human resource:  Evidence of attention to knowledge 

Regarding the other barriers, the bulk of the evidence I found in the articles related to 

lack of resources.  Of the various resources I discussed earlier, the primary issue that resulted in 

implementation challenges in three of the reforms I looked at was gaps in knowledge, 

specifically pedagogical knowledge.  Largely due to inadequate professional development 

opportunities and lack of attention in undergraduate and graduate classes, mathematics 

teachers struggled teaching more heterogeneous classes that resulted from inclusion and 

Algebra for All policies (Allensworth, Nomi, Montgomery, & Lee, 2009; Desimone & Parmar, 

2006a; Desimone & Parmar, 2006b; Gamoran & Hannigan, 2000; Loveless, 2008). Regarding the 

implementation of the Common Core State Standards (CCSS, 2010), administrators reported 

concerns regarding teachers’ content knowledge and “the teachers’ ability to support the 

deeper learning that CCSS aims to encourage, especially in mathematics” (McLaughlin, Glaab, & 

Carrasco, 2014, p. 7).  Teachers also had concerns regarding their content and pedagogical 

knowledge as evidenced in the following excerpt from Porter, Fusarelli and Fusarelli’s (2015) 

study: 

Mr. Harner acknowledged the complexities involved in making the shift. When asked 

about the potential challenges, he noted as follows: The biggest challenge is interpreting 

what that means in terms of teacher actions. You hear a lot of people say, “Oh, you’re 

going to have to change the way you teach, you have to change the way you do things.” 

Well, that’s great to stand up and say, “Well, change the way you teach, change the way 

you do things,” but we need to define exactly what that means in the classroom and we 

 

34 

 

actually have to help teachers understand exactly what kinds of things does that mean 

and how that impacts practice (p.123). 

Administrators in the study also went into detail as to why teachers had concerns regarding 

their pedagogical knowledge, with one respondent explaining, “Teachers having to change how 

they always taught. To really teach these new standards well, you have to teach them 

differently and that is hard for teachers” (p. 127). 

Given the existence of pedagogical knowledge gaps due, in part, to insufficient 

professional development and coursework in college, mathematics teachers assumed that 

special education students and low-achieving students without a specific learning disability 

could be taught in similar ways as there was practically no difference between the two groups 

of students (Desimone & Parmar, 2006a).  Based on this assumption, mathematics teachers 

would often target their instruction towards the hypothetical middle student hoping that this 

would give the students at the bottom of the achievement distribution, as defined by 

standardized test results, a chance to succeed (Allensworth, Nomi, Montgomery, & Lee, 2009).  

Other mathematics teachers, those that devoted more attention to the students at the bottom 

of the achievement distribution, would slow the pace of instruction, skip difficult topics, focus 

on following procedures rather than focusing on problem solving, and put fewer questions on 

assessments assuming that these accommodations would help these students succeed (Stein, 

Kaufman, Sherman, & Hillen, 2012).  As for how mathematics teachers responded to their 

pedagogical issues when implementing the Common Core State Standards, I was unable to find 

any descriptions of what teachers did in their classrooms.  I did find the following quote that 

leads me to believe that teachers did not change their teaching habits at all in response to the 

 

35 

 

Common Core State Standards as they did not have the education or professional development 

to do so, “It’s just overwhelming too much at one time and not enough resources or training 

done in advance—not while you’re trying to implement!” (Porter, Fusarelli, & Fusarelli, 2015, p. 

129).  This quote also touches on another barrier, stress.  One could argue that finding 

something overwhelming would cause one to also feel stressed. 

Material resource:  Evidence of attention to money 

 

In addition to the knowledge gaps, I was able to find evidence of money issues, a 

material resource.  Evidence of money issues could be found in discussions regarding large class 

sizes, lack of paraprofessionals to assist teachers in the classrooms, lack of up-to-date 

technology, and lack of quality classroom materials (Desimone & Parmar, 2006a; McLaughlin, 

Glaab, & Carrasco, 2014).  As for how the teachers responded due to the lack of money, there 

was little direct evidence in the articles.  Most of the discussion regarding money issues was at 

the district level rather than focusing on teachers and I do not have access to interview 

transcripts to look for teacher responses myself. 

Material resource:  Evidence of insufficient time for implementation 

 

Time arose as a significant barrier in the research related to the inclusion of special 

education students and in the implementation of the Common Core State Standards, but not in 

the articles on Algebra for All or teacher evaluation reform.  Regarding inclusion, time was an 

issue when it came to co-planning lessons with special education teachers (Desimone & 

Parmar, 2006a).  Quite often, the mathematics teachers and special education teachers did not 

have a common planning period, which made collaboration with other people, another 

resource, on lessons difficult.  In response to this, mathematics teachers generally planned their 

 

36 

 

lessons by themselves with no help from the special education teacher.  This effectively cut off 

a valuable resource for the mathematics teachers as the special education teachers knew more 

about strategies that work with students with varying learning disabilities. 

 

As for the Common Core State Standards, time appeared to be the largest barrier to 

effective implementation.  Mathematics teachers felt unprepared and extremely stressed 

because of how fast they were expected to implement the CCSS.  Two excerpts, for example, 

that touch on teachers’ feelings as they were beginning to implement the CCSS follow: 

Sayer principal Carlene Yeadon’s comments pointed to how her teachers were feeling 

about starting the process: “As they begin the year. . . they are a little apprehensive. I 

think they’re excited and nervous at the same time. They’re building the plane while 

they’re flying and they’re trying to get the wheels on there so they won’t crash” (Porter, 

Fusarelli, & Fusarelli, 2015, p. 122). 

This quote seemed to indicate that teachers did not feel well-prepared to implement the CCSS 

as appropriate time was not devoted to professional development beforehand.  The following 

quote also touched on this hurried implementation as well. 

On the one hand, practitioners say that all aspects of CCSS implementation have been 

hampered by a lack of time. They have too little time to provide professional 

development, too little time to work on developing new curricula and instructional 

materials, and too little time to communicate with teachers, parents, and school board 

members. As one said: “Time, or lack thereof, appears to be the common enemy.” 

(McLaughlin, Glaab, & Carrasco, 2014, p. 5). 

 

37 

 

In addition to feeling unprepared and stressed, teachers mentioned how difficult it was to 

evaluate the number of curriculum materials that claimed to be aligned with the CCSS, which is 

an issue that deals with both time and knowledge (McLaughlin, Glaab, & Carrasco, 2014).  As 

for how the mathematics teachers responded to their issues with time, the articles did not 

discuss particular responses.  It has been shown, however, that stress can negatively impact 

what happens in a classroom as teachers can go into survival mode and just do enough to get 

by, which is what happened to a teacher in a study by Drake and Sherin (2006). 

Human resource:  Evidence of other people 

 

This was touched on briefly in the previous section, but it came up more when I looked 

at the articles regarding teacher evaluation reform.  In evaluation systems that made use of 

value-added models, teachers became more isolated and did not want to help other teachers 

as helping other teachers was not in their rational self-interest (Darling-Hammond, 2015; 

Guenther, 2019; Johnson, 2015).  It is not in their self-interest because helping others could 

potentially raise other teachers’ value-added score in comparison to the person doing the 

helping.  If that help increases the score of the teacher being assisted enough, a potential result 

could be that the teacher does the helping may lose his or her job.  

Barriers:  Evidence of disagreement 

In the articles regarding inclusion, Algebra for All, and the Common Core, I was largely 

unable to find evidence of disagreement.  The only exception was some teachers felt that 

special education students would be better served being taught in resource rooms (Desimone & 

Parmar, 2006a).  In the articles regarding teacher evaluation reform, I was able to find more 

evidence regarding disagreement.  Specifically, I found evidence that teachers believed their 

 

38 

 

evaluation focused too much on growth, that standardized test data did not adequately 

measure what students know, and there were several issues affecting students attending 

challenging schools (e.g. underfunded and high percentage of low-socioeconomic status 

students) that teachers cannot possibly help or control (Darling-Hammond, 2015; NRC & NAE, 

2010).  I was unable to find evidence of what teachers did as a result of these beliefs though.   

Barriers:  Evidence of stress 

Stress was mentioned briefly a few times earlier as a barrier.  Regarding teaching tested 

grades, one teacher said, “I’m scared I might lose my job if I teach in a transition grade level, 

because…my scores are going to drop” (Darling-Hammond, 2015, p. 134).   Fearing for one’s job 

is clear evidence of work-related stress.  As a response to the stress created by using VAMs on 

their evaluations, it was reported that teachers would try to avoid tested grades at the 

elementary level, avoid assignments (or even schools) where a large portion of the students 

have had low standardized test scores, and sometimes leave the profession entirely (Darling-

Hammond, 2015; Johnson, 2015).   

Summary 

 

In this chapter I unpacked a framework to make sense of policy implementation issues.  

Specifically, I discussed various barriers to implementation and possible responses to those 

issues.  On my first attempt to use the framework, I was able to find evidence of most of the 

barriers in journal articles pertaining to various reforms. Finding responses to these barriers in 

the articles was significantly harder, but this is likely because the authors did not focus on 

barriers and responses in their articles.  I imagine I would have had an easier time finding 

responses to the barriers if I had access to interview transcripts.    In the following chapter, I 

 

39 

 

discuss the design of my study in detail along with the specific research questions I wished to 

answer regarding teacher evaluation reform. 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

40 

 

 

As the focus of accountability has transitioned from the school level to the individual 

CHAPTER 4:  METHOD 

 

teacher level and the use of student growth data has begun to be more common to assess 

teacher quality, it is of interest to know how the use of this data is affecting practice.  

Specifically, this study aims to answer the following research questions: 

1.  What evidence of implementation barriers exists as mathematics teachers and principals in 

the State of Michigan attempt to tie student growth data to teacher evaluations? 

a.  What do mathematics teachers and principals believe the purpose(s) of evaluating 

teachers is(are)? 

b.  What is mathematics teachers’ understanding of how standardized test data and 

other student growth data are used on their evaluations? 

c.  What do mathematics teachers and principals identify as pros and cons of how 

student growth is measured on their evaluation? 

2.  How have mathematics teachers and principals responded to the implementation barriers 

that do exist? 

a.  How do mathematics teachers analyze standardized test and other student growth 

data? 

b.  What do mathematics teachers do to address students’ weaknesses as identified by 

standardized test and other student growth data? 

c.  What steps do teachers take to improve their teaching as a result of the 

standardized test and other student growth data? 

 

 

41 

 

Method 

Research Design 

To answer the previously mentioned research questions, I used case study methods.  

Depending on the study, a case can be an individual, group of people, institution, 

neighborhood, program, culture, region, nation-state or even a stage in a person’s life (Glesne, 

2011; Patton, 2002).  In this study, the case studied was one state in the United States of 

America, the State of Michigan, that has mandated student growth data be used to evaluate 

teachers.  Typically, in a case study, data are acquired via a variety of procedures such as 

observations, interviews, and the collection of documents (Creswell, 2014; Glesne, 2011; 

Patton, 2002).  A variety of data from various methods are collected, compared, and 

contrasted, known as triangulation, in order to improve the trustworthiness of a given study 

(Glesne, 2011).  For this study, I began with surveys because with a large enough sample size, 

they could allow me to say meaningful things about the population and make claims with a 

degree of confidence.  I then followed this with interviews because they allowed me to get very 

detailed data to be able to better understand the relationships among variables.  I decided 

against collecting documents because I had done that in an earlier study in which I analyzed 

teacher evaluation instruments in the State of Michigan (Morissette, 2014).   

Description of the case 

On July 19, 2011, Governor Rick Snyder of Michigan signed a package of four bills (Public 

Act 100, Public Act 101, Public Act 102, and Public Act 103) into law that were designed to 

reform education in the State of Michigan. These new laws made significant changes to teacher 

tenure (State of Michigan, 2011a; State of Michigan, 2011b), teacher evaluations (State of 

 

42 

 

Michigan, 2011d), and collective bargaining (State of Michigan, 2011c).  Addressing all of the 

changes these laws made in one study would be a daunting task; although they all should be 

addressed at some point given the potential issues for teachers and students that may arise as 

a result of these changes.  For this study, I focused on the practice of tying student achievement 

and growth data to teacher evaluations which was part of PA 102.  PA 102, Section 1249 3(b) 

states, 

 For the annual year-end evaluation for the 2013-2014 school year, at least 25% of the 

annual year-end evaluation shall be based on student growth and assessment data. For 

the annual year-end evaluation for the 2014-2015 school year, at least 40% of the 

annual year-end evaluation shall be based on student growth and assessment data. 

Beginning with the annual year-end evaluation for the 2015-2016 school year, at least 

50% of the annual year-end evaluation shall be based on student growth and 

assessment data. The student growth and assessment data to be used for the school 

administrator annual year-end evaluation are the aggregate student growth and 

assessment data that are used in teacher annual year-end evaluations in each school in 

which the school administrator works as an administrator or, for a central-office level 

school administrator, for the entire school district or intermediate school district. 

(“Enrolled House Bill No. 4627”, 2011, p. 4)   

 
After this piece of legislation was passed, the Michigan Council for Educator 

 

Effectiveness (MCEE) (2013) published a document entitled Building an Improvement-Focused 

System of Educator Evaluation in Michigan:  Final Recommendations.  This document detailed 

how incorporating student growth data into teacher evaluations should occur.  According to the 

 

43 

 

MCEE, data regarding teachers’ practices should be collected using an observation instrument 

adopted by the state, either Charlotte Danielson’s (2014) Framework for Teaching, Marzano’s 

(2014) Teacher Evaluation Model, The Thoughtful Classroom (Silver, Strong & Associates, 2014), 

or 5 Dimensions of Teaching and Learning (Center for Educational Leadership, 2014).  The MCEE 

suggested that data regarding student growth should be a combination of value-added scores 

for teachers (for the core content areas) and changes in students’ achievement, with value-

added scores consisting of at least half of a given teacher’s student growth component on their 

evaluation. 

 

When implementing PA 102, the State of Michigan adopted some of the MCEE’s 

recommendations and not others.  One recommendation the state did not adopt was the use of 

value-added models to measure student growth on standardized test.  Instead, the state left it 

up to the individual school districts to figure out how to measure student growth.  To get an 

idea of how school districts measured growth, I conducted a study in which I collected and 

analyzed evaluation instruments from around the state (Morissette, 2014).  I found large 

variation in evidence used to measure growth.  Some districts used standardized tests using 

basic growth models.  Others used student attendance rate, homework completion rate, 

semester exams, course grades, pass/fail rate, and graduation rates among other things to 

measure student growth.  No school district took it upon themselves to use value-added 

models, which is not surprising given the difficulty in generating value-added scores for an 

individual teacher. 

 

 

 

44 

 

Data collection instruments 

 

This study made use of both survey and interview data.  The surveys were created in Fall 

2017 with assistance from the Office of Survey Research (OSR) at Michigan State University.  

Specifically, I started by looking at publicly-available surveys from the Mathematics and 

Institutional Setting of Teaching study at Vanderbilt University to get ideas regarding survey 

design (MIST, 2010).  I selected a few of their questions verbatim (Questions 15 and 16 in 

Appendix A and questions 9-14, 16 in Appendix B) that I thought would be helpful for my study 

and augmented those with questions I created.  The questions I selected from the MIST study 

were related to the type and frequency of assistance teachers received from the principal and 

fellow teachers. I had hypothesized that teachers would not use standardized test data to 

inform their practice if they did not receive any assistance from others and I was hoping I could 

say something about that from the survey data I collected.  The questions I created on the 

survey were more directly related to my research questions.   

Once I had rough drafts of my mathematics teacher and principal surveys, I sent them to 

OSR for feedback.  A person at OSR sent me my documents back about a week later with 

comments regarding recommended wording changes in questions and in my Likert scales.  

Once I received feedback from them, I made the changes they recommended and then sent 

those documents to my co-advisors for more feedback.  After making edits based on their 

feedback, I then created the surveys in Qualtrics and these were the surveys that I sent to 

mathematics teachers and principals throughout the State of Michigan (see Appendices A and 

B). 

 

45 

 

 

After the surveys were completed by the participants, I created semi-structured 

interview protocols based primarily on my research questions and partially on some of the data 

generated by my surveys (see Appendices C and D).  For example, many (12 out of 17) of the 

principals surveyed did not have a background in mathematics and I was curious how that 

affected the evaluation process in the eyes of both the teachers and the principals.  Once my 

protocols were completed, I then sent them to my co-advisors for feedback.  Prior to 

conducting my interviews, I piloted the mathematics teacher protocol with a fellow graduate 

student at MSU that had recent secondary mathematics teaching experience.  Based on their 

feedback, I generated some probing questions that I could use in my interviews.  These 

questions were specific to each of the contexts where my subjects taught as I tried to do some 

research about the schools and/or the individuals before the interviews. 

Data collection (surveys) 

 

Beginning in the second week of September 2018, I employed a stratified random 

sampling strategy to select schools to include in the study.  Specifically, I used the Michigan 

High School Athletic Association’s 2018-2019 enrollment list where high schools throughout the 

state are split into four different classes based on their size (MHSAA, 2018).  I did this to ensure 

representation from different sized schools as schools of different sizes could have different 

challenges while implementing the changes to teacher evaluations.  For example, 

administrators from smaller schools could have issues evaluating all of their teachers because 

they typically have other duties to perform that principals at larger schools do not have.  On the 

other hand, administrators from large schools could have issues evaluating all of their teachers 

because of the large number of teachers they likely have on staff.  From the list of schools, I 

 

46 

 

randomly sampled seven Class A, seven Class B, seven Class C, and seven Class D schools.  If a 

school with “Catholic” or “Christian” in the name was sampled, I threw it out and sampled again 

as these schools do not have to follow the same evaluation rules as the traditional public 

schools.  Once I had my sample, I looked up contact data on the schools’ websites and sent the 

principals an email to ask if they would be willing to participate in my study.  In the email I 

specified that participation would entail taking a survey that would take approximately 15 

minutes to complete, included links to both the principal and mathematics teacher survey on 

Qualtrics, and offered an option to participate in a follow-up interview if they wished to do so.  I 

also wrote that all participants that did the survey and the interview had the choice of a $25 

Amazon or Meijer gift card as compensation for their time.  In addition to asking for 

participation on their part, I asked that the principal forward the email to one of the 

mathematics teachers in the building to see if there was interest in participating in my study.  

Once I sent the email, I waited three to four days and sent a follow-up thank you/reminder 

email.  This process was then repeated five more times so that I ended up contacting a total of 

168 high schools (the entire population of MHSAA-affiliated schools consists of 748 schools 

which includes the religious schools I threw out if they were sampled). Despite these multiple 

attempts to contact principals and teachers, these efforts resulted in responses from 15 

teachers and 19 principals. 

Data analysis (surveys) 

 

After the survey data were collected, the original plan was to see if there were any 

significant differences in responses between principals and teachers for similar questions on 

the surveys by comparing means with an independent samples t-test and also to see if there 

 

47 

 

were any significant differences among principals and teachers based on school size.   It was 

hypothesized that schools could have different challenges implementing the changes to teacher 

evaluation policy based on their size.  One issue was raised on the previous page, but other 

hypotheses I had were that smaller schools would have fewer professional development 

opportunities than larger schools as they have less money since funding is tied to student 

enrollment and smaller schools would have fewer human and material resources than larger 

schools.  If true, this would mean smaller schools would have a harder time implementing the 

changes in teacher evaluations and would likely need more support in order to implement the 

changes effectively.  Given the low number of people that completed the surveys (15 teachers 

and 19 principals), however, I was unable to determine if there were any significant differences 

among the schools based on their size.  I did not want to completely omit the work I did with 

the surveys, so I decided to calculate descriptive statistics for what I had regarding my research 

questions.  These results are not reported in the body of this dissertation as I elected to focus 

only on my interview data, but they can be found in Appendix E.  To provide an idea of what 

survey questions I wanted to use to partially answer my research questions, a brief breakdown 

is included in Table 4.1 below.   

Table 4.1: 
Mapping of research sub-questions to survey questions 
Research Sub-question 
1(a) (Math teachers’ and principals’ beliefs 
regarding the purpose of evaluation) 
1(b) (Math teachers’ understanding of how ST 
test data and other growth data are used on their 
evaluations) 
1(c) (Pros and cons of how student growth is 
measured at their school) 
2(a) (How do math teachers analyze ST and other 
growth data) 

Survey Question 
T15, P14 (about purpose of evaluation) 

T7, T8, P7, P8 (regarding use of standardized tests 
on the evaluations and asking if they use a status 
or growth model) 
T26, P9 (about advantages and disadvantages of 
how they measure growth) 
T14, P15 (both regarding assistance principals 
give teachers in using standardized test results to 
inform instruction) 

 

48 

 

 

Table 4.1 (cont’d) 
2(b) (What math teachers do about students’ 
weaknesses as identified by the data). 
2(c) (What math teachers do to improve their 
practice as a result of ST and other growth data) 

T19 (beliefs regarding how well ST measure what 
students know) 
 T20, T22 (beliefs regarding how well ST reflect 
teacher ability and if they use ST data to inform 
PD) 

*Note:  In the second column, P# refers to a specific question on the principal survey and T# refers to 
a specific question on the mathematics teacher survey. 

Several of the questions on the surveys were not used in answering the research questions as I 

found a lot of the survey data, especially a lot of the demographic data, to be not very helpful 

due to the small sample size.   I had wanted to be able to make claims using school size, 

education of administrators, length of time worked in the school, etc. as ways to sort and 

compare the survey data, but the numbers just were not there to do any of it. In the future I 

think I need to offer a larger incentive to get those numbers and also be more purposeful and 

realistic about the questions I ask in the survey.   

Data collection (interviews) 

Even though I was not able to do any t-tests with my survey data, the survey did serve 

well as an interview recruitment tool, although most of my volunteers were from Class A 

schools.  It was a challenge to get people from smaller schools to talk to me, which I found to be 

a bit discouraging as I thought this would limit my ability to compare responses from small and 

large schools.  From my list of volunteers, I set up and conducted semi-structured interviews in 

November and December 2018 with 2-Class A principals, 1-Class B principal, and 1-Class C 

principal. No Class D principals agreed to be interviewed.  Two of the principals I spoke to have 

a PhD (Principal A2 and Principal B) and one was a former mathematics teacher (Principal B). 

Regarding mathematics teachers, I conducted interviews with 2-Class A, 1-Class B, 1-Class C, 

 

49 

 

and 1-Class D teachers.  Both Class A teachers and the Class C teacher were veteran teachers 

with over 10 years of experience each.  Teacher A2 also happened to be the mathematics 

department head at her school.  Teachers B and D had less teaching experience, but both were 

tenured.  Also, a pair of the Class A principals and teachers, the class B pair, and the class C pair 

worked at the same school.  Given I was not able to really use my survey and who ended up 

being my interview participants were, I do not feel I am able to make statistically significant 

claims regarding how school size affects the implementation of this teacher evaluation policy as 

I am unable to triangulate the data.  I do, however, feel I can discuss how some schools 

responded to the policy and the barriers that they encountered while doing so.  This still has 

value as it highlights things policymakers should be paying attention to in order to make this 

reform more successful in improving education in the State of Michigan. 

Data analysis (interviews) 

 

Once the interviews were completed, I immediately uploaded all of the audio files to an 

auto-transcription website (Temi.com) in order to cut down on the time required to transcribe 

them.  Once I received the transcripts from the website, I listened to each interview and 

corrected any transcription errors I noticed as the website had some issues with how I 

pronounced particular words.  Each transcript took approximately two hours to edit for 

precision to try to capture, as closely as possible, what participants said.  Once the editing was 

done, I then coded each transcript based on my conceptual framework that I discussed in the 

previous chapter.  Specifically, I used the following codes for excerpts of text that exhibited 

evidence of the various barriers discussed previously: money, materials, time, other people, 

knowledge, apathy, disagreement and stress.  If I noticed an excerpt exhibited evidence of a 

 

50 

 

response to one of the barriers, I used the same codes, but put a “-R” after the code.  Excerpts 

coded varied in length from a single sentence to an entire speaking turn depending on if one or 

multiple ideas were discussed in a turn. 

 

Based on the recommendation of my committee, I had another colleague code the 

transcripts with the same coding scheme for inter-rater reliability.  After my colleague coded 

each transcript, he sent the transcript back to me for feedback.  We generally agreed on most 

of the codes (only disagreed on 16 of the 403 codes) and the agreement increased as my 

colleague coded more transcripts.  Most of the disagreement at the beginning came from 

having slightly different ideas regarding what the code “other people” meant, but this was 

cleared up through discussion.  We came to a consensus that we would code any excerpt that 

indicated a desire to work with their colleagues (other teachers or administrators) or university 

professors as other people. My colleague thought my other codes were rather self-explanatory.  

Any time money, time, or stress were specifically mentioned as a barrier, that excerpt got the 

corresponding code. If traditional classroom materials were mentioned, the excerpt got a 

materials code.  Although, after discussing it with my colleague, we broadened this to include 

the standardized tests themselves and the standardized test data.  For the knowledge code, we 

looked for instances where someone mentioned they “don’t know” or “don’t understand” 

something.  As for disagreement, we used that code any time someone indicated they “don’t 

think” or “don’t believe” something.  We also had to infer disagreement from what was said as 

sometimes the interviewees exhibited disagreement without using those exact phrases.  For 

example, when referring to measuring good teaching, one of the participants stated, “so 

pretend that’s a thing”.  By saying this, it seems the person does not believe you can measure 

 

51 

 

what good teaching is.  As for apathy, we looked for phrases or words such as “I don’t care”, 

“I’m not going to”, “waste”, “joke”, basically anything that essentially indicated the person was 

not going to expend a lot of effort in implementing the policy.  Some specific examples of what 

we coded for each barrier and responses to barriers are below in Tables 4.2 and 4.3 

respectively.   

Table 4.2: 
Example of coding of interview text regarding barriers 
Code 
Money 

Materials 

Time 

Other People 

Knowledge 

Apathy 

Example Text 
“I mean there’s a lot of money that the state puts into those ISDs and I, you know, it 
doesn’t funnel out to the schools very much”-Teacher A1 
“I found that it becomes cumbersome in a way that at least the MSTEP and the SAT 
data that we get, isn’t as specific as you would like it to be.” -Teacher D 
“it is almost like a hoop to jump through that, you know, waste 20 minutes of a 
pretest day”-Teacher A1 
“One thing that I believe in is that collaboration is the key component to making 
quality teachers and quality classrooms.  And so anything that makes you compete to 
be better with other teachers is counterproductive.”-Teacher A1 
“You can check all the boxes you want, but in order to be effective you can’t just 
check a box without explaining to me why because I’m not going to grow from it.  I 
don’t know what any of those little things mean.” -Teacher C 
“It just has to be just the right number so that it doesn’t get us to highly effective 
status in some of the numbers.  Where they’re change?  You’re like, oh that’s just 
bullcrap.  You know.  So right now, the effective system is nothing but a huge joke.” -
Teacher C 

Disagreement  “If we could measure what good teaching is, right?  Like it.  So pretend that that’s a 
thing.  If we could measure that.  If a good teacher is a good teacher.” -Teacher A1 
“I mean, their workload is insane.  Our workload is insane.”-Teacher A2 

Stress 

 

Materials-R 

 

 

Table 4.3: 
Example of coding of interview text regarding responses to barriers 
Code 
Money-R 

Example Text 
“Uh, I mean I write our Title II (A) proposals, so anything that I know that’s 
upcoming, that is a need for me or one of my teachers, I’m going to write it in 
there so that there’s funding for it.” -Teacher A2 
“We do like a half day every month where we do a data dig collaborating with our 
coworkers to try an see if we can create some things (tasks and materials)” -
Teacher D.  This was also coded as time-R and other people-R 

52 

 

Table 4.3 (cont’d) 
Time-R 

Other People-R 

Knowledge-R 

Apathy-R 

Disagreement-R 

Stress-R 

 

 

“Originally, what I did was I pre- and post-tested the crap out of everything.  I 
found I couldn’t do it on every chapter because I was wasting too much time so I 
would do basic ideas instead on every unit or maybe two to three chapters.”  -
Teacher C 
“That’s another thing is that was one of the marks I’ve found with really god 
teachers.  They are harder on themselves than I would ever be.  When I get 
someone like that, I try and provide as many resources as I can and just stay out of 
their way.”-Principal C 
“I’m also offered free professional development through the College Board for 
pretty much everything I want.  Um, and I take advantage of that regularly.”-
Teacher A2 
“I don’t know.  I take my PTO days off then every time”.  -Teacher C 

“We talk a lot about that at our local level of how do we take over the story about 
our schools, how do we take over the story in the press, on social media, or 
whatever to continue to push out all the amazing things that go on every day.” -
Principal A2 
“We started doing 10-minute walk throughs.” -Principal C in response to losing all 
of his assistant principals 

After the interview data were coded, I then analyzed all of the excerpts and looked for 

common themes.  Specifically, I looked to see if multiple people mentioned the same barriers 

or responses to barriers.  I felt this was an appropriate next step as my overall goal is to provide 

some recommendations to policymakers to make the next iteration of this policy better.  

Finding some common barriers or responses to barriers among my participants would provide 

me with data to back up these recommendations.   

Summary 

In this chapter I discussed my method I used to answer the following research questions: 

1.  What evidence of implementation barriers exists as mathematics teachers and principals in the 

State of Michigan attempt to tie student growth data to teacher evaluations? 

a) What do mathematics teachers and principals believe the purpose(s) of evaluating teachers 

is(are)? 

 

53 

 

b) What is mathematics teachers’ understanding of how standardized test data and other 

student growth data are used on their evaluations? 

c) What do mathematics teachers and principals identify as pros and cons of how student 

growth is measured on their evaluation? 

2.  How have mathematics teachers and principals responded to the implementation barriers that 

do exist? 

a). How do mathematics teachers analyze standardized test and other student growth data? 

b). What do mathematics teachers do to address students’ weaknesses as identified by 

standardized test and other student growth data? 

c). What steps do teachers take to improve their teaching as a result of the standardized test 

and other student growth data? 

I did two rounds of data collection and analysis where I surveyed and interviewed principals and 

teachers throughout the State of Michigan.  Given my low sample size, I elected to not do 

independent samples t-tests, so my surveys were not as informative as I hoped they would be.  

Thus, I elected to not include most of the data I collected from the surveys.  I did, however, get 

a lot of rich interview data and I will report my findings from these interviews in the next 

chapter using my codes in Tables 4.2 and 4.3 as a framework to organize the chapter. 

 

 

 

 

 

 

 

54 

 

CHAPTER 5:  FINDINGS 

As I stated in the previous chapter, since I did not get enough people to respond to my 

surveys to say anything meaningful from that data, I elected to not include the survey results in 

the body of this dissertation.  Some of the survey data, however, are presented in Appendix E.  

In this chapter, though, I focus on my interview data.  First, I discuss the frequency of codes. 

This section is followed by my findings pertaining to each of the barriers to implementation that 

I discussed in Chapters 3 and 4 and then a discussion of my research questions. 

Tables 5.1 and 5.2 present the frequencies of each barrier and responses to barriers 

were coded for each of my interview participants.  For me, it is not surprising that the quantity 

of codes is greater for the teachers than the principals as my interviews with the teachers were 

longer than the interviews with the principals. Therefore, I focus instead on the relative 

frequencies of the codes.  For example, disagreement was the most common barrier coded for 

both principals and teachers.  I share some specific quotes later in this chapter, but barriers 

related to disagreement that the principals and teacher reported was mostly about using 

standardized test data on teacher evaluations and having an evaluation format imposed on the 

schools.  As for the responses to barriers listed in Table 5.2, Knowledge-R was coded more than 

the others, especially for principals.  This finding might seem odd given knowledge was not 

coded that often as a barrier for principals.  One explanation for Knowledge-R being coded 

more than Knowledge is that the principals did not discuss what they personally struggled with 

and I did not push them for this information during the interviews.  Instead, principals focused 

on what they were doing to address their teachers’ knowledge issues via professional 

 

55 

 

development and working with others in the building.  So, these actions on the principals’ part 

were responses to teachers’ knowledge issues, but not their own knowledge issues. 

Table 5.1: 
Barrier codes (frequency by individual) 
Princ 
Code 
B 
0 

Princ 
A1 
0 

Princ 
A2 
0 

Money 

Princ 
C 
2 

Teach 
A1 
9 

Teach 
A2 
2 

Teach 
B 
0 

Teach 
C 
4 

Teach 
D 
9 

0 

3 

2 

Materials 

Time 

Other 

People 

Knowledge  1 

Apathy 

Disagree 

Stress 

0 

7 

5 

2 

1 

1 

1 

2 

6 

1 

0 

2 

1 

1 

0 

5 

2 

0 

4 

1 

2 

1 

6 

2 

6 

4 

7 

4 

8 

27 

1 

0 

9 

5 

1 

2 

16 

12 

0 

2 

2 

1 

4 

8 

0 

4 

8 

10 

3 

6 

21 

1 

5 

7 

8 

3 

4 

12 

2 

 

 

Table 5.2: 
Responses to barriers codes (frequency by individual) 
Code 

Princ 
B 
0 

Princ 
C 
2 

Teach 
A1 
0 

Princ 
A1 
1 

Princ 
A2 
0 

Money-R 

Materials-R  2 

0 

2 

Time-R 

Other 

People-R 

Knowledge

11 

-R 

0 

1 

5 

4 

1 

1 

2 

6 

3 

0 

3 

6 

0 

1 

1 

1 

56 

Teach 
A2 
1 

Teach 
B 
1 

Teach 
C 
0 

Teach 
D 
0 

0 

3 

2 

4 

2 

0 

1 

2 

1 

2 

2 

1 

3 

3 

1 

8 

 

 

Table 5.2 (cont’d) 

Apathy-R 

Disagree-R 

Stress-R 

0 

0 

1 

0 

1 

0 

0 

0 

0 

0 

0 

3 

0 

5 

0 

0 

2 

0 

0 

1 

0 

2 

1 

0 

1 

0 

0 

What follows now is my interview data pertaining to each of the barriers, starting with 

disagreement.  These are barriers that exist that make implementing teacher evaluation reform 

difficult, not necessarily barriers caused by the reform.  In each of these sections, I incorporate 

responses to the given barrier if evidence of a response existed in the data.  Some of the 

barriers listed did not have responses as I did not follow up with my participants as I should 

have in the moment and would be something I would focus on if I were to do another round of 

interviews. 

Disagreement 
 
 

As was mentioned in the previous section of this chapter, the majority of the interview 

data that was coded related to disagreement.  My participants found the idea of using 

standardized tests in a high-stakes manner on teacher evaluations troublesome.  They also had 

issues with the observation rubrics that were used, how student growth was measured, and the 

idea that good teaching can be measured.  Regarding the use of standardized tests in a high-

stakes manner, every principal and teacher I interviewed was against this idea because of a 

variety of factors that affect students’ standardized test scores that are beyond any teacher’s 

control.   For example, Teacher D stated, “It’s hard to have them take one test, one snapshot, 

on one day. Who knows what they had breakfast?  Who knows where they slept last night?”  In 

 

57 

 

addition to mentioning factors that are outside of teachers control, others had issues with using 

the tests because they did not believe standardized tests actually measured what a student did 

and did not know.  Principal C, for example, said, “It's a sound bite and it doesn't really measure 

what we're asking it to measure” and Teacher A1 said, “So I don't believe that it (standardized 

tests) tells us how much understanding or knowledge a student has about math.  I have a really 

hard time believing that students who score high or low on that (standardized tests) match with 

students who really get things, who really understand big ideas and things like that.”  If 

principals and teachers do not believe that standardized tests should be used on teacher 

evaluations, it would be hard to argue that teachers will actually use the data generated from 

those tests to inform their practice and it is also doubtful that principals will push teachers to 

use data from standardized tests. 

 

As for the observation rubrics, I go in more detail about these in the section regarding 

knowledge when I mention a quote from Teacher C said she did not understand what all the 

little check boxes meant and in the section regarding time when principals states that the 

rubrics were time-consuming to fill out.  Specific to disagreement, the principals all thought the 

current evaluation process was too complicated and should be simplified.  For example, 

Principal A1 stated,  

Part of me wishes that the labels didn't exist, that it would be satisfactory or 

unsatisfactory as opposed to what we have now because teachers are not that different 

than students. They have a mindset where they want the gold star, they want to be the 

A, so I find that instead of having a conversation with a teacher, if they're getting 

 

58 

 

effective versus highly effective, the conversation becomes why didn't I get highly 

effective as opposed to how can I improve my teaching? 

 Principal A2, also advocating for a simpler system, said,  

I think if we were given a little more latitude so that if all I had to do was have 

conversations with the teachers about how they establish purpose for learning in their 

classroom and it was just one ranking instead of five subcategories with rankings thing 

around how they do student engagement instead of five subcategories for student 

engagement, I feel like you can be a little more authentic with, with teachers. 

Essentially, these principals seem to be advocating for a system more like what we had in 

Michigan prior to our current evaluation system. 

 

Regarding how growth is currently measured, Principal A2 said his district tries to 

“Insulate our teachers by and large from that measurement as we feel there is more value in 

the observation piece of eval process” and the others reported they incorporated a variety of 

sources of data (e.g. SAT, NWEA, course grades, pre- and post-tests) to measure growth so that 

their teachers were not affected a lot by any one measure of growth.   As for the teachers, they 

all had issues with how their districts measured growth.  Most of them reported their districts 

were comparing different groups of students to measure growth by using last year’s SAT data 

and comparing it to this year’s data.  Teacher A1 was the only one who did not have an issue 

regarding how his district used the SAT to help measure growth, but he said his district did not 

use the data from that test at all and not using that data was consistent with his beliefs 

regarding standardized testing.  In addition to using the SAT, the teachers all stated their 

schools relied on them to create pre- and post-tests to measure growth, but most found this to 

 

59 

 

a meaningless exercise.  For example, Teacher A2 said, “So they (her students) did really bad 

the first time. Then the second time they did really well. And that means I’m a great teacher. 

Um, no” and Teacher C stated, “But I mean, in all honesty, if you're looking at growth, no crap, 

they're going to grow. They didn't know anything about it before and now I taught them. So 

they're going to know it. So how meaningful that data is, you know, it's, it's kind of a farce.” 

 

As for being able to measure what good teaching was, two of my participants, Principal 

A1 and Teacher A1, questioned whether this could even be done.  Principal A1 said,  

They tried to make it data based, which is very difficult to do in something that is almost 

more of an art than a science like teaching. So instead of being able to just go in and rely 

on an administrator’s best practice understanding of education and then having a good 

conversation. It is script, code, notes, gathered data in those areas.   

Teacher A1 also explicitly stated, “I also don't believe that you could actually clinically measure 

what an effective teacher is.”  When there was evidence that both principals and teachers 

questioned whether good teaching can be measured, it is questionable how useable the data 

generated from the evaluations were as both parties might be just going through the process 

thinking of it as performing a compliance task for the state.  If this is the case, it is doubtful that 

teachers use evaluation data to inform their practice As far as my participants go, that seemed 

to be the case as all of the teachers said, in one way or another, that the purpose of evaluation 

was just accountability and not to make teachers better. 

Money 
 
 

After analyzing my interview data, money arose as a barrier at the individual, school, 

local, and state levels.  At the individual level, all of the teachers mentioned how poorly paid 

 

60 

 

they are and some potential ramifications this has for their development as teachers.  Teacher 

A2, for example, said, “I know my tax returns went backwards from 2008 until 2016…my tax 

guy was like, what the hell are you doing?” Teacher D, reflecting on his pay and how that affects 

his development as a teacher said, “I sit back and I look at the state of things and look at the 

benefits that I gain from getting a Master's degree and the only thing I see is that it's going to 

take me 16 years to pay off the debt…if I have a Master's degree, that makes me less desirable 

somehow than someone who's brand new because they don't have pay me as much.” Thus, 

Teacher D did not think it was worth getting a Master’s degree because he could not afford the 

cost of it on his current salary and, if he were to lose his current job, it would be hard for him to 

find employment elsewhere as his experience and education would make him a less-desirable 

candidate given the way teacher salary schedules are structured.  By not getting his Master’s 

degree, one could argue this would be at odds with the goal of teacher evaluation reform, 

which is to improve teacher quality. 

 

At the school level, Teacher A1 specifically mentioned how funding issues resulted in 

increased class sizes.   Larger classes can be more challenging to teach than smaller classes, 

which could have a negative effect on student standardized test scores and teachers’ 

evaluations.  Teachers C and D mentioned how their schools have no money available for their 

teachers in order to attend external professional development opportunities. These 

professional development activities would be important to attend if we expect teachers to 

improve their practice as a result of their evaluations.  Teacher C said, “They cut off all of our 

funding for anything like that. So, if we go we have to pay on our own or write our own grants 

or whatever.” Thus, she elects to not attend any external professional development.  Teacher 

 

61 

 

D, on the other hand, said he finds going to conferences important, so he ends up just paying 

for it out of his pocket.  Teachers A1, A2, and B said their schools did make funding available for 

external professional development so it could be a school size issue; larger schools sometimes 

have larger budgets since school funding is tied to the number of pupils in attendance.  Another 

potential school-level funding issue was raised by Principal A1 as she said her district has been 

funneling more resources to the elementary schools recently to try to address low reading 

standardized test results at the 3rd grade level.  Since the district has a finite budget, that 

money has to come from somewhere, so the middle schools and high schools have to make do 

with less. 

 

At the local level, Teacher D said, “We're trying and trying to get the community to help 

to pay for some improvements and there was no money in the community, so the community 

won't vote for it.”  Since the community would not support a sinking fund millage for repairs 

and maintenance at the school, the money needed for these repairs and maintenance has to 

come from the school’s general fund.  As a result, there was less money to pay for teachers and 

materials for the classroom.   

As for the state level, Teacher A1 and Principal C both has issues with the Intermediate 

School District model in the state.  Principal C stated,  

A lot of our money flows through ISDs in Michigan. Is that the best use of our money to 

have that be the conduit to come to us? Um, I don't know. I think our ISD in particular 

has the second highest budget of any educational institution in our area, yet our ISD 

building has very few students at it. I'm not poking at them. I'm just wondering if that 

 

62 

 

model hasn't passed, isn't passed its time and more of that money should come directly 

to the districts. 

 In addition to questioning the ISD model, Principal C said they had issues with the amount of 

money that flows out of the state for standardized tests.  He said,  

The amount of money that our state is sending down south to the Alabama area for 

standardized testing. I think some of the numbers I heard is over the last eight years it's 

like $38 million that leaves Michigan and goes to another state because we can't figure 

out testing. So, we're gonna buy testing from you guys. I think that is one of the most 

ridiculous things I've ever heard. 

Principal C argued that if we could figure out standardized testing in this state, we could save 

money that could be spent elsewhere in education instead of contracting that out to another 

state.  For example, that money could be used for teacher professional development, hiring 

more teachers in order to decrease class sizes, or purchasing new classroom materials.  All of 

these things could help teachers become more effective at teaching mathematics. 

Materials 
 
 

The evidence I found regarding materials was related to standardized tests and 

curriculum materials.  Regarding standardized tests, several of the principals and teachers 

mentioned that the lack of an annual standardized test at the high school level makes showing 

growth using standardized test data difficult.  Principal A2, for example said,  

Technically there really is no year-to-year standardized tests they would take like there 

are in the lower levels in the elementary school.  We have a MSTEP test at the 11th 

grade, well that's great, but we can't compare it to where you were at the end of tenth 

 

63 

 

grade or at the end of ninth grade, so for us to use standardized test data, which is 

something we're struggling with right now.   

This lack of year-to-year standardized test data makes it difficult for high school principals to 

measure growth on teachers’ evaluations using standardized test data as there is no baseline 

data to compare results to.  The only practical response, if schools are mandated to use 

standardized test data, is to compare scores on the MSTEP with different groups of students 

(e.g. last year’s juniors vs this year’s juniors).  Another response, which was mentioned by all of 

my participants, to this lack of consistent standardized test data was to use some type of pre-

test/post-test structure where scores were compared before and after units on teacher-created 

pre- and post-tests.  In addition to these tests, my participants from my smaller schools 

(Teachers C and D and Principal C) reported they also used a computer-based standardized test 

called the Northwest Evaluation Association (NWEA) test to help measure growth due to the 

lack of year-to-year data from the MSTEP.  These participants had positive things to say about 

the NWEA test.  For example, Teacher C said, “I like it because it breaks it down by, I can see 

students’ strengths and weakness and that's way more helpful to me as a whole. They struggle 

at this. I can work on that more.”  By being able to see where individual students happen to 

struggle, teachers can theoretically target interventions to address individual student needs, 

and thus, be able to increase student standardized test score results. 

 

As for curriculum issues, Teacher A1 and Teacher D both said they did not have any 

mathematics textbooks.  Everything they did, they pieced together from a variety of print and 

online sources.  Doing this could potentially have negative effects on student achievement and 

teachers’ evaluations as key topics may not be taught or taught in time before students are 

 

64 

 

assessed on standardized tests.  Teacher A1 did not see this as a big issue, though, as his school 

had a long history of working with mathematics education faculty at the local university and he 

feels his students are prepared.  Teacher D, on the other hand, was not as confident and would 

welcome some textbooks as he is responsible for several different mathematics and science 

classes at his school.   

Time 
 
 

The evidence I found related to time relate to lack of time to devote to evaluations due 

to multiple responsibilities, the amount of time required to do each evaluation, the amount of 

class time used to administer standardized tests, the timing regarding when standardized test 

data are available, and lack of time to collaborate with other teachers and/or administrators.  

Regarding lack of time to devote to evaluations, all of the principals mentioned how time-

consuming evaluation was.  If the principal did not have adequate help, it got even more 

difficult to do all that was needed to be done regarding evaluations.  Principal A1 stated,  

I had been in a situation where our assistant principal took a different position halfway 

through the year and I had an interim who was not allowed to do evaluations because 

he hadn't been trained.  Suddenly I had a greater increase in the observations and, as 

much as I tried to give the same level of feedback that I had before, I just didn't have the 

time to do so.   

Principal A2, who had adequate help to do all his evaluations, said, “I would say that on average 

I spend four to five hours a week working through something eval related, whether it's in the 

classroom watching teachers or it's meeting with teachers, you know, those types of things.”  

So, with adequate help he still devoted 1/8 of his work week to evaluations.  Principal C said he 

 

65 

 

started doing shorter 10-minute walkthroughs instead of observing a whole class because he 

had no help and this allowed him get to all of the teachers. Teacher C, a teacher a Principal C’s 

school, indicated this may not be the truth though.  She said, “I haven't had a classroom visit 

except for once when he came into a math lab for like 10 minutes, which the math lab, it's not, 

you know, you’re not going to see how I teach. Um, other than that, I haven't seen him in my 

class in several years.”  If principals are still not doing the required observations, as Teacher C 

claims her principal is not, then one would have to question how accurate the data on the 

evaluations are and the usefulness of that data for teachers. 

 

Regarding the time required to do the evaluations, the principals I spoke with all 

indicated how much more time consuming the new evaluation requirements were.  They had to 

do more evaluations of teachers than they did before and the instruments they had to fill out 

were longer than in the past.  For example, Principal A1 stated, “it's very time consuming to be 

in a classroom and have to write down everything that they say and then go through a 

complicated coding system which may have 25 to 30 different areas of coding.”  All of the 

principals advocated for a simpler evaluation, in part, to save time given all of the other aspects 

of their jobs that needed attention. 

 

As for the standardized tests themselves, the principals and teachers indicated that a 

significant amount of class time was lost because of the administration of standardized tests 

and preparing for the tests.  Several days of instructional time were lost because of the MSTEP, 

SAT, ACT Workkeys, and also teacher-generated pre- and post-tests.  For those schools that also 

administered the NWEA, even more time was required.  Teacher D stated,  

 

66 

 

It's (the NWEA math test) 52 questions. A lot of times they'll get to question 40, which 

means that I've got to pick it up on the next day, which means that I've got a class and a 

half each. Each time that is time wasted and they get so burnt out on it that it's very 

hard to find a way to motivate them to take it in a way that truly is representative of 

their math abilities because they're just not feeling it that day.   

One could argue the time that was devoted to standardized testing might be better used 

actually teaching students as teachers reported the data from the MSTEP and SAT did not even 

get back in time to do anything meaningful with it and the data from the pre- and post-tests is 

“largely a joke” (Teacher C). 

 

The final time-related barrier I found evidence of related to collaboration time, which is 

something that would be important to have if we want teachers to work on their weak areas as 

identified in their evaluations.  For example, teachers could observe how another colleague 

teaches a particular lesson that they may have had difficulties teaching in the past and then 

debrief the lesson afterwards. As for the evidence I found, Teachers C and D indicated they did 

not have any time in the school day to collaborate with their peers.  Teacher A2, on the other 

hand, indicated she did have time to collaborate with her peers, but that was mostly because 

she is the department head and was given time out of the school day to work with other 

teachers.  She also reported that her school had something called “late-start Wednesdays” 

where teachers have built-in meeting and professional development time each Wednesday in 

the morning and the students came in later on in the day. 

 

 

 

67 

 

Other people 
 
 

The barriers I found regarding other people largely related to collaboration issues for 

teachers and not enough help for principals.  Of all the barriers, this seemed to be the one 

where school size appeared to have an effect.  Teachers at the large schools (A1, A2, and B) said 

they had opportunities to observe their colleagues teach whereas the teachers at the smaller 

schools did not (C and D).  The reasons given for not being able to collaborate had to do with a 

lack of common planning time, which was mentioned in the previous section, and also the lack 

of other mathematics teachers to collaborate with as smaller schools often only have one or 

two mathematics teachers in the building.  In addition to not being able to collaborate with 

other math teachers, Teachers C and D said they did not have a mathematics coach to 

collaborate with either.  Teachers A1, A2, and B also did not have a mathematics coach, but 

they at least had other teachers to observe and work with.  Teacher A1 also said he his school 

has had a history of collaborating with mathematics education faculty at a nearby university 

whereas the others have never worked with university mathematics education faculty. 

 

Regarding collaboration and the new evaluation system, Teacher A1 said,  

One thing that I believe in is that collaboration is the key component to making quality 

teachers and quality classrooms. And so anything that makes you compete to be better 

with other teachers, I feel is counterproductive. Right? Like if I feel like I have to be 

better than these five other classrooms that I'm not so sure I want to share my sweet 

project with them and things like that. Right. I want my scores to look better.   

 

68 

 

Essentially Teacher A1 was saying the new evaluation system disincentivizes collaboration, 

which is at odds with what he believes makes quality teachers and the goal of changing 

evaluation policy, assuming that goal is to improve the quality of instruction. 

 

As for help for principals, Principal C stated,  

We've made a conscious decision in this district that the money we were going to spend 

is going to be on our teachers. We were going to try and keep our teacher to student 

ratio the best we possibly could. So we had to make some sacrifices. When I started this 

job, I had two counselors and an assistant principal. Now I'm down to one counselor and 

no assistant principal. That can make this job really difficult.  

This lack of help did not allow this principal to devote the time he would like to teacher 

evaluations, given all of the other aspects of his job that needed attention as well.  It is possible 

that this would also be the case at even smaller schools as well, given many of the principals at 

class D schools in Michigan also happen to be the superintendent of the district.  As I reported 

Chapter 4, however, I was unable to interview a principal at a class D school so I cannot confirm 

this. 

Knowledge 
 
 

The evidence I found regarding knowledge as a barrier related to principals’ knowledge 

of mathematics, teachers’ knowledge of the evaluation rubrics, teachers’ pedagogical 

knowledge, teachers’ understanding of standardized test data, and legislators’ knowledge 

regarding education. As for principals’ knowledge of mathematics, the principals all felt they did 

not need a background in mathematics to be able to evaluate whether the mathematics 

 

69 

 

teachers were effective.  For example, Principal B stated, “Good teaching is good teaching 

whether it's calculus or world history or gym”.  Principal A2 stated that,  

In some respects, I feel like I can gauge the quality of the effectiveness of a lesson in an 

area where maybe I am not as comfortable with the content easier than an area where I 

know the content.  So I think the reason for that is if it's not content that I'm entirely 

comfortable with and I can go into a classroom and I can observe what's going on and I 

can see learning targets and look at performance tasks and the success criteria together 

and I can actually walk out feeling like I understand it better than I did when I went in. 

That tells me the teacher probably was doing a pretty darn good job.  

Other principals, Principal C and Principal A1, stated something along the same lines where they 

downplayed the importance of knowing the content in order to be able to evaluate their 

teachers effectively.  The teachers, on the other hand, all felt having a content background was 

important.  For example, Teacher B stated, “My administration, as much as I like them, they 

were not math teachers so their instructional feedback can only really be behavior 

management type stuff, which I don't usually have a problem with.”  Even though principals 

and teachers seem to have some disagreement regarding whether or not a background in 

mathematics was important to being able to evaluate mathematics teachers, the principals at 

the larger schools did respond to their lack of content knowledge and tried to divide the 

administrative staff based on their backgrounds.  For example, Principal A2 stated,  

We go by department by and large.  Part of that is our admin staff has a pretty good 

variety of teaching background and while I don’t think you have to have content 

 

70 

 

knowledge or experience teaching a specific content to be an evaluator of that type of 

teaching, I think it does bring some credibility to the conversation with teachers. 

This response indicates that principals feel, at least to some degree, that having a background 

in the content is important if teachers are going to take the results of their evaluations 

seriously. 

 

Regarding teachers’ knowledge of the evaluation rubrics, Teacher C stated, “You can 

check all the boxes you want, but in order to be effective you can't just check a box without 

explaining to me why because I'm not going to grow from it. I don't know what any of those 

little things mean.”  This quotation indicated that, without some professional development 

time being devoted to the evaluation rubrics, teachers may not use the data from their 

evaluations to inform their practice because they do not understand the rubric. 

 

As for teachers’ pedagogical knowledge, Principal B stated that “Some of the veteran 

teachers think that the newer teachers get better scores than them sometimes because you 

know, they're trained in it.”  To address this perception, Principal B said he encourages his 

teachers to attend external professional development opportunities, such as going to the 

MCTM and NCTM conferences.  The school also helped pay for these conferences.  Although, as 

was discussed in the section regarding money, the teachers at the smaller schools did not get 

funding assistance to attend these conferences. 

 

Regarding teachers’ knowledge of standardized test data, Teachers A1, A2, C, and D all 

said their administrators responded to other teachers’ lack of knowledge, and possibly their 

own, of quantitative data by having the mathematics teachers help make sense of standardized 

test for the other teachers on staff.  The mathematics teachers said they had to work with 

 

71 

 

other teachers when their districts devoted internal professional development time to data 

digs.  Teacher C indicated helping the other teachers make sense of the data was difficult 

though.  She said,  

We tried. We had a meeting with those of us at the high school that received this 

(standardized test) information. I was the only math teacher in there and somebody else 

was in there too that, that at least understood the basics and so. But the two of us 

trying to help all of them (other teachers) was a little bit tricky. Some of them are old 

dogs and not wanting to learn any new tricks. 

The other mathematics teachers did not indicate that it was difficult working with their 

colleagues, but also did not say it was easy.  

 

The last piece of evidence regarding knowledge I encountered was regarding legislators’ 

knowledge of best practices in education.  The only participant that did not talk about this 

much was Teacher B as she said she tended to ignore politics.  The others all questioned 

whether legislators actually knew what was best for the students in the state and if they 

actually consulted with experts in education.  Principal C was the most vocal about legislators as 

he stated they have completely stepped over the line by imposing an evaluation system upon 

all the schools in the state.  He claimed, “They don’t know what’s important in our area” and, as 

a result, should leave it up to the individual districts to figure out how to best evaluate their 

teachers. 

Apathy 
 
 

The evidence I found regarding apathy dealt with various aspects of standardized testing 

and also the teacher evaluation progress.  Regarding the standardized tests, the principals and 

 

72 

 

teachers had concerns about students taking the tests seriously.  For example, the NWEA test 

had no effect on anything for students so there was no incentive for them to take it seriously.  

Likewise, not every student in high school was planning on going to college so using a test like 

the SAT was problematic because it may not matter to this subset of the population if they 

performed well or not on it. Principal A2 addressed this when he stated, “I know darn well that 

200 of those kids could care less about how well they do on the SAT because it's not meaningful 

to them. They know it's not meaningful to them in the sense that that's not their focus in their 

mindset and their drive or their view of success for themselves.”  If students did not take the 

tests seriously, then it becomes problematic to use data from the tests to evaluate teachers. 

 

In addition to the students not taking the standardized tests seriously, with the 

exception of Teacher A2, the rest of my participants thought that the bigger standardized tests 

like the SAT were a “waste of time” (Principal C) and they had “little faith in their results” 

(Teacher A1) and largely “do not use it (SAT data) for anything” (Teacher A1).  These points 

were at odds a bit with what the participants reported in other parts of the interviews in which 

they claimed they spent professional development time on doing data digs, that is, unless, 

nothing was really done with the results of the data digs and they were only being done to 

appease some directive from their superintendents and/or school boards. 

 

As far as the evaluations went, all of the teachers I interviewed found them not to be 

helpful in becoming a better teacher.  They felt more like a “hoop to jump through” (Teacher D) 

and just “documentation for them (principals) to use to find your (teachers) way out the door” 

(Teacher A2).  There really was not any reason to use them to get better, especially when some 

teachers were told they would never be rated as highly effective.  For example, Teacher C 

 

73 

 

stated, “All I know is I know I'm not going to be rated highly effective. I'm going to be rated as 

effective because that's what the superintendent said we can be. So you can be the best 

teacher in the building or the worst teacher and we are only able to be effective. Nobody's 

allowed to be highly effective in our district.”  One can only imagine being told this would be 

completely demoralizing and you would not have any incentive to try harder as the effort 

would not result in any changes to your evaluation.  

Stress 
 
 

All of the evidence I found regarding stress was all related to work stress.  None of my 

participants mentioned anything regarding issues at home that were impacting how they did 

their jobs.  I maybe should have been expected this finding, though, as it was doubtful the 

teachers and principals would open up about their home lives in an interview with a complete 

stranger. 

 

As for what was reported as causes of stress at work, my participants mentioned the 

lack of assistant principals, standardized tests, the current evaluation format, staff shortages, 

and being the mathematics department head.  Regarding the lack of assistant principals, 

Principal C said he often had to, because of the lack of help, take on additional responsibilities 

as the athletic director and counselor if those people were busy or out of the building.  This 

reallocation of his responsibilities took valuable time away from his other duties as principal 

and was a source of stress for him.  Teacher D also discussed the lack of assistants as he said his 

principal was also the superintendent and had no help.  Given all the aspects of this person’s 

job, Teacher D said his principal often neglected his evaluation duties and ended up not having 

 

74 

 

face-to-face meetings with his staff members regarding their final evaluations.  Instead, he said 

he sent the teachers an electronic file they were expected to digitally sign and that was all. 

 

Regarding standardized tests and the current evaluation format, Principal B stated that 

putting high stakes on the tests “puts unnecessary stress and pressure” on the students and 

teachers in his building as students’ futures and teachers’ evaluations were dependent on “one 

score, on one test, on one day”.  To help alleviate this stress, Principal B said his school does a 

lot of standardized test preparation prior to the administration of the MSTEP and SAT and the 

also incorporate other measures of growth on their evaluations.  Likewise, Principal A1 said,  

We’ve broken down growth into, we have a district score which is based on 

standardized testing. We have a building score which is based on our standardized 

testing for us it is the SAT. And then we also have a teacher growth piece, which for the 

high school is um, exam, pre, post data. So, we separate that so that we have some 

classroom data, some district data, and the reason why we chose to do that is we didn't 

want to get into a situation where growth data which could potentially impact layoff 

connects directly having the teachers compete. 

Even though the intent of what Principal A1 said they did is to help teachers in the event of 

layoffs, individual data would still be what creates variation in evaluations as the district data 

would be the same for all teachers.  If the individual data is determined by pre- and post-tests, 

this could incentivize teachers to make easy, and non-informative, post-tests in order to 

maintain their jobs.  Teacher A2 mentioned something along these lines when she said,  

People choose goals that they know they are going to achieve because it is directly tied 

to your evaluation, which in a layoff situation, luckily we haven't been in one here, but I 

 

75 

 

know most schools have. It's the bottom of the eval pile that gets laid off.  It's lowest 

eval. They’re ranking people essentially. And if you're in the bottom of the ranks, I mean 

that's tied to your livelihood. So am I, you know, as a smart person, am I going to pick a 

goal I’m likely to achieve? Absolutely. It's one less stressor. 

  Thus, the pre- and post-test process seemed to be a meaningless exercise for most teachers. 

 

Regarding staff shortages and being the department head, Teacher A2 discussed how 

both caused her stress and affected her work.  She said her school had an issue keeping 

teachers in the district and, as a result, she ended up teaching an overage where she is 

compensated for teaching during her prep hour and also during the hour that is supposed to be 

set aside for her department head duties.  Since she was teaching during these two hours, she 

ended up doing the work she would normally do during her preparation time before school, 

after school, and at lunch and she found she had no time to devote to help out the newer 

teachers and they ended up leaving because they did not getting the support they needed.  

Teacher A2 also mentioned that her duties as the math department head put her in awkward 

positions at times as she felt she is part teacher and part administrator.  Even though she would 

be a good person to evaluate other teachers, when asked if she did it at all she said, “No, 

intentionally. Absolutely not. I am a union member the same as my colleagues. So I as a teacher 

leader that is about the last thing I want to touch with a 20 foot pole.”  Teacher A2 was also the 

one who did the scheduling for the mathematics teachers and it was expected that the more 

senior teachers would get the advanced classes and the new teachers would teach the remedial 

ones.  When she gave the new teachers the remedial classes though, she knew the students in 

those classes would not perform as well as the students in the advanced classes.  Thus, the 

 

76 

 

teachers of those classes could get a poor evaluation as a result since her school did not use 

methods to control for factors that affect standardized test scores that teachers have no 

control over.  This has the potential to increase stress and exacerbate their staffing problems. 

Revisiting research questions 

 

Now that I have shared the key findings regarding each of the barriers to 

implementation, I compile the findings together to answer my research questions.  As a 

reminder, the research questions this study aimed to answer are as follows: 

1.  What evidence of implementation barriers exists as mathematics teachers and principals in 

the State of Michigan attempt to tie student growth data to teacher evaluations? 

a.  What do mathematics teachers and principals believe the purpose(s) of evaluating 

teachers is(are)? 

b.  What is mathematics teachers’ understanding of how standardized test data and 

other student growth data are used on their evaluations? 

c.  What do mathematics teachers and principals identify as pros and cons of how 

student growth is measured on their evaluation? 

2.  How have mathematics teachers and principals responded to the implementation barriers 

that do exist? 

a.  How do mathematics teachers analyze standardized test and other student growth 

data? 

b.  What do mathematics teachers do to address students’ weaknesses as identified by 

standardized test and other student growth data? 

 

77 

 

c.  What steps do teachers take to improve their teaching as a result of the 

standardized test and other student growth data? 

Regarding research question 1(a), there seemed to be a difference of opinion between 

principals and teachers regarding the purpose of evaluation.  The principals in my study 

believed the purpose was to help teachers grow, but the teachers felt it was largely a means to 

identify the weaker teachers in order to fire them.  Since the teachers felt the purpose was not 

to help them grow, they did not use the data generated from the evaluations to identify areas 

where they could grow.  Even if they wanted to, the teachers did not find the current 

evaluation format useful to them and some did not understand what all the checkboxes on the 

observation rubric even meant. 

Regarding research question 1(b), all of the mathematics teachers were very familiar 

with how their districts measured growth.  The teachers said their districts used several 

different pieces of data to help measure growth.  These data sources included national and 

state standardized tests such as the MSTEP and SAT, the NWEA, course grades, and teacher-

generated pre- and post-tests.  All of the teachers had concerns regarding using standardized 

test data to measure growth as several of the schools were comparing SAT data from different 

groups of students (e.g. last year’s juniors and this year’s juniors) and not doing anything to 

control for factors that the teachers have no control over.  The teachers also found using pre- 

and post-tests to be an easy way to manipulate the growth part of their evaluations, but found 

it to be a waste of time and a meaningless exercise.  They just did it because they were required 

to. 

 

78 

 

Regarding question 1(c), principals and teachers both indicated that not relying on any 

one source of data and being able to show growth easily by using pre- and post-tests to be 

pros.  As for cons, the biggest one was mentioned in the previous paragraph as the teachers 

found using pre- and post-tests to be a waste of valuable class time and was essentially a 

meaningless exercise. 

Overall, there were several barriers to implementation identified from the interview 

data as described in this chapter.  A discussion of these barriers and what might be done to 

address these barriers can be found in the next chapter. As for the second research question, I 

did not collect as much data regarding responses as I did regarding the actual barriers.  I wove 

some of the responses into the discussions of the various barriers throughout the chapter, but I 

will answer my sub-questions under the second research question here. 

Regarding question 2(a), if the mathematics teachers did analyze standardized test data, 

it was usually done during an internal professional development day where the mathematics 

teachers were relied upon to help make sense of the data for the whole staff.  The teachers 

usually did not receive any assistance from their administration or outside experts.  As for the 

other tests such as the NWEA, the mathematics teachers received printouts for each of their 

students regarding how they performed on the fall, winter, and spring administrations of the 

test and it was up to the individual teacher to make sense of the data.  

Regarding question 2(b), the data from tests such as the SAT did not get to the schools 

in a timely manner, so teachers generally did not make radical changes based on it.  Teacher D 

did report that he would look at his homemade curriculum and see if he had to reorganize it at 

all the following year to better address student weaknesses.  As for the NWEA, this was the test 

 

79 

 

that seems to have the most promise as Teachers C and D said they do make changes in what 

and how they teach because they received the data from the test in a timely manner and the 

data was broken down by each student so they know who needed help in what area.  

Regarding question 2(c), teachers from the larger schools reported they sought out 

external professional development to help work on their craft, but their decision to do this ws 

not driven by standardized test data.  They did it because they want to be better teachers.  

Teacher D said the same thing regarding conferences, but it is harder for him as he has to pay 

the cost of attending them out of his pocket.  Teacher C elected to not do anything as she said 

there really was not anything worthwhile to go to in her area (she teaches in the Upper 

Peninsula of Michigan) and if she did want to go to something, she would have to pay for it 

herself. 

Summary 

 

In this chapter I shared my analysis of interview data in relation to the eight barriers to 

implementation that I identified in Chapter 3.  I found evidence of all of the barriers and the 

largest barrier, based on the number of times excerpts were coded, was disagreement.  

Principals and teachers largely disagreed with how teachers are currently being evaluated in the 

State of Michigan and would like to see a simpler system in place.  The current system is too 

time consuming to execute for the principals and not informative or meaningful for teachers.  

Those teachers that do attempt to work on their craft because they just feel the need to do so 

on their own.  Their decisions are not informed or affected by standardized test or growth data. 

 

80 

 

 

In the next chapter I discuss my findings in more detail and offer some suggestions on 

what might be done if we want improve teacher quality, assuming that is the goal of teacher 

evaluation reform.  I also discuss some ideas for future research. 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

81 

 

Summary of dissertation 

CHAPTER 6:  DISCUSSION 

 

In this study, I interviewed five mathematics teachers and four principals to ascertain 

what barriers existed as they attempted to implement the changes to teacher evaluations in 

Michigan.  Specifically, I looked for evidence of the following barriers:  disagreement, money, 

knowledge, other people, materials time, apathy, and stress.  After analyzing my interview data, 

I found evidence of all of the barriers, with the bulk of the interview data coded as 

disagreement.  Principals and mathematics teachers disagreed with how teachers were being 

evaluated.  Both groups did not feel standardized test data should be used in a high-stakes 

matter.  Principals felt the current system required too much of their time and that teachers 

focused too much on their scores.  The teachers indicated they did not find the data from their 

evaluations to be useful to them.  As a result, they did not use their evaluations to inform their 

practice.  Instead of the current system, principals advocated for a simpler system where they 

could just have conversations with teachers about their practice instead of talking about a 

rubric.  Teachers just asked that, regardless of what system is in place, that they be evaluated 

fairly and the data from their evaluations are meaningful to them. 

 

In this chapter, I first revisit the literature from my first two chapters and connect it to 

my findings.  Then I discuss what could be done to address the barriers mentioned in Chapter 5. 

This is followed by a discussion of some possible options for teacher evaluation policy in 

Michigan, including my recommendation for what my findings suggest we should do, especially 

if it is later determined that my findings are common throughout the state.  Then I briefly 

 

82 

 

discuss how my identities affected this study.  This is followed by some lessons I learned as I 

was working on my dissertation, limitations of this study, and suggestions for future research. 

Revisiting literature 

 

In the first chapter of this dissertation, I mentioned how various reforms have been 

implemented in recent decades in response to a perceived crisis in American education at the 

K-12 level.   My specific focus was regarding those reforms that involved schools and teachers 

being held accountable for student learning.  One method that was implemented to hold 

schools accountable for student learning was to assign letter grades to the schools, but 

researchers found that assigning letter grades did not have an effect on student performance 

on standardized tests if the grades were not accompanied by consequences (Hanushek & 

Raymond, 2005; Winters & Cowen, 2012). Given this finding, I hypothesized that consequences 

were necessary if a reform was expected to have a positive effect on student performance, but 

I did not investigate this hypothesis in this study as I did not have access to student 

standardized test data.   

In recent years, several states passed laws that tied student achievement and growth 

data to teacher evaluations.  I assumed a goal of these laws was to increase student 

performance by putting more pressure on teachers to improve their practice.  The consequence 

for not improving, which would be determined by teachers’ scores on their evaluations, would 

be loss of employment.  In this study, however, I found that schools were trying to insulate 

their teachers from students’ standardized test scores and allowing teachers to use other 

sources of information to show student growth.  Some schools did not use standardized test 

data at all, even though it was mandated by the state.  Others supplemented this data with 

 

83 

 

scores from other standardized tests, such as the NWEA, and with teacher-generated pre- and 

post-tests for which teachers could set their own growth goals.  Also, it should be noted that no 

teacher or principal in the study reported that any teacher they knew lost their job as a result of 

their evaluation since the changes to teacher evaluations in Michigan took effect.  Thus, this 

new teacher evaluation policy did not appear to have any real consequences in practice for my 

participants.  If consequences are necessary for student performance to increase, then my 

findings suggest that it is unlikely this policy is positively affecting student standardized test 

scores given how it is being implemented in Michigan.  This is assuming my findings are 

common throughout the state. 

In the second chapter of this dissertation, I discussed the two big changes to teacher 

evaluations that have occurred as a result of recent reforms.  Specifically, several states now 

use standardized observation protocols and also use student growth data as a means of 

introducing variation in evaluation outcomes as there was not a lot of variation before 

(Weisberg et al, 2009).  Regarding the standardized observation protocols, the principals in my 

study found them to be extremely time-consuming to fill out and the teachers did not use the 

data from them to inform their practice.  If teachers are not using the data from the protocols, 

then the use of data either needs to be seriously questioned or teachers need some 

professional development time allotted to understanding the protocols and learning how to use 

them to inform their practice.  

 As for the student growth component on the evaluations, in the second chapter I 

discussed how test-based models can be used on teacher evaluations and also some issues 

identified with their use.  The participants in my study mentioned a lot of the same issues that 

 

84 

 

were raised in the literature.  For example, one of my participants mentioned that using 

standardized test data on the evaluations disincentivizes collaboration, which is something 

Guenther (2019) raised in her dissertation.  Also, several of my participants mentioned how the 

way their schools measure growth does not control for factors that affect student achievement, 

such as a student’s family background, whether the students’ basic needs are being met, or 

socioeconomic status which are all factors mentioned in the NRC and NAE’s report (2010). 

Schools and teachers, however, have no ability to affect these particular factors.  Although, the 

State of Michigan could have attempted to control for these factors if they had followed the 

MCEE’s (2013) recommendation of using value-added models to measure growth, but they 

elected to ignore this recommendation and leave it up to the individual schools to figure out 

how to measure growth.  One could argue that the lack of direction from the state regarding 

how to measure growth played a large role in the policy not being implemented as 

policymakers may have intended at the schools in my study. 

 

In the second chapter, I ended by discussing a study regarding the characteristics of 

successful schools that served poor students.  I did this because, if a goal of the teacher 

evaluation policy was for teachers to improve their practice, then these teachers might look to 

successful schools to see what they are doing in order to boost their evaluation scores.   

According to Kitchen, DePree, Celedon-Pattichis, and Brinkerhoff (2007), the nine schools they 

studied all had some commonalities that they argue led to higher student achievement as 

measured by standardized tests.  Those characteristics included: 

•  Administrators handled most of the paperwork and problems with students and parents 

so their teachers would only have to worry about teaching. 

 

85 

 

•  Students were provided with additional instructional time beyond their normal 

mathematics classes for remediation and also to challenge them.   

•  Teachers were given cell phones by the school so their students could call them at night 

for extra help.   

•  Schools provided a late bus so students could get extra help after school from their 

teachers (some of these teachers were paid extra to stay after school to tutor students).   

•  Teachers had access to a variety of different teaching resources.  If the teachers wanted 

something for their classroom, all they needed to do was ask. 

•  Teachers were encouraged to and took part in sustained professional development 

activities.  In many cases, the schools would pay for their teachers to attend conferences 

and workshops. 

In addition to these characteristics, there were some common practices among the 

teachers at these schools: 

•  Teachers had students work in groups in order to facilitate communication about 

mathematical ideas. 

•  Teachers did not let their textbook define what was going to be taught in their 

classrooms.  Several of the teachers would supplement their textbook and 

continuously plan, reflect, and alter what they did in their classrooms. 

•  Teachers incorporated tasks that were meaningful to their students. 

•  Teachers would analyze students’ standardized test scores and engage in 

“backward curriculum planning” (p 89).  That is, the teachers would look at the 

scores, see where their students needed to improve, and then vertically align 

 

86 

 

 

their middle school and high school mathematics curriculum to address the areas 

that were shown to be weak. 

•  Teachers worked collaboratively with their colleagues planning lessons, creating 

a support system amongst the faculty, and holding each other accountable. 

If schools in Michigan try to emulate what was reported in the Kitchen et al.’s (2007) 

study, my findings regarding teacher evaluation reform suggest they may have difficulties with 

some of the characteristics and practices.  One issue is regarding collaboration, as I mentioned 

earlier.  Teachers in my study, and Guenther’s (2019), indicated that using standardized test 

data on their evaluations disincentivizes collaboration as helping another teacher could 

artificially inflate their evaluation scores.  In the event of layoffs, this could result in someone 

unfairly keeping or losing their job.  Also, even though Kitchen et al (2007) did not argue that 

collaboration was the most important practice that led to the schools being successful, it was 

something that Teacher A1 felt was extremely important if teachers are to grow as 

professionals, which is an assumed goal of the current evaluation policy.  This also resonates 

with my own experience as a high school teacher.  I attribute a large portion of my learning as a 

high school mathematics teacher to observing others teach, co-planning with others, and 

listening to my colleagues’ advice.   Without this collaboration, I likely would not have made it 

past my first year of teaching; any policy that would discourage helpful collaborations. I would 

argue, needs to be seriously rethought.   

 

Another potential difficulty schools may have is regarding the use of standardized test 

data.  Teachers in Kitchen et al.’s (2007) study seemed to use the data from the tests to inform 

their practice a lot more than the teachers in my study or Cavanna’s (2016) study.  This could be 

 

87 

 

because the teachers in our studies did not value the standardized test data because they did 

not have a hand in creating it, as Cavanna (2016) suggested.  This could also be because 

standardized tests were not meant to inform classroom instruction, but rather inform district-

level evaluation and policy making (Shepard, Penuel, & Pellegrino, 2018).  Another possible 

hypothesis is that It could be due to differences at the administrative level at the schools as the 

principals I interviewed did not seem to value the data from standardized tests and, as a result, 

they did not encourage their teachers to use the data in any way.  Likewise, Cavanna (2016) 

stated the administrators at one of the schools in her study did not encourage their teachers to 

use the data to inform their instruction at all.  If administrators actively encouraged their 

teachers to use the data, it may have been done instead of just doing a data dig one day and 

then doing little or nothing with the data afterwards.  Speaking from my experience this past 

year teaching at an online school, I know my administrators’ perceived lack of caring about our 

NWEA results, a test that other teachers in my study found to be helpful, played a large role in 

my, and my colleagues’, lack of use of the data to inform our instruction. 

 

Another characteristic of successful schools that some of my participants had issues 

with was they did not have access to and nor were they encouraged to take part in sustained 

professional development activities.  As was reported earlier, the teachers at the smaller 

schools in my study reported that they had to pay for any professional development they 

attended.  The teacher who taught in a very rural part of the state reported that the quality of 

professional development activities was lacking.  Thus, it seems being a teacher at a small 

and/or rural school in Michigan could be detrimental to a teacher’s professional growth.  For 

example, during the first nine years of my teaching career, I taught at a small rural school in the 

 

88 

 

Upper Peninsula of Michigan and most of my professional development activities consisted of 

“teacher time” where I essentially sat in my classroom and graded papers as the administration 

had nothing prepared for us.  The only time I had the opportunity to take part in meaningful 

professional development was when the local university received grant money and they 

recruited teachers to take part in a two-year lesson study project.  After this project ended and 

I finished my Master’s degree, I felt as if my professional growth had plateaued and this feeling 

was one of the reasons I decided to quit my job and pursue my PhD at Michigan State 

University. 

 

The final characteristics of successful schools I would like to discuss both relate to time.  

In the successful schools, administrators handled a lot of paperwork and the problems with 

students and parents, allowing teachers to focus on teaching.  With the increased demand on 

principals’ time due to the new evaluation system in Michigan, it is unlikely principals will be 

able to handle all of this, especially at smaller schools where principals may not have any 

assistants or where they also wear the hat of superintendent.  Teachers will likely have to 

shoulder more of the administrative burden, which will make devoting additional time to 

instruction difficult. 

 

Overall, the new evaluation system seems to present issues that make replicating what 

is done at successful schools, as identified by Kitchen et al (2007), difficult.  Assuming that their 

findings are transferrable to other schools and contexts, then it seems that something needs to 

change with the current teacher evaluation policy as it seems to be fostering practices that are 

counterproductive to the goals of the policy. 

 

 

89 

 

Addressing barriers 

 

In the previous chapter, I highlighted evidence of the various barriers from my interview 

data.  As was mentioned earlier, I found evidence of all of the following barriers: disagreement, 

money, knowledge, other people, materials time, apathy, and stress.  Knowing what barriers 

exist is helpful, but the work does not end there.  It is also necessary to figure out what can be 

done to address the barriers.  Thus, I would like to highlight a few of the barriers that could be 

the biggest roadblocks to implementing the new teacher evaluation policy and also give a 

possible way to address each barrier. 

 

The first barrier that needs to be addressed is the overall disagreement that principals 

and teachers had regarding the use of standardized test data on teacher evaluations.  Neither 

group believed that standardized test data should be used on the evaluations.  As a result, 

standardized test data were either completely ignored or other data were used in addition to 

the standardized test data to measure growth in order to lessen its impact.  If this disagreement 

is common throughout the state, then policymakers need to clearly articulate how and why 

educators need to use standardized test data.  Without clear guidance, it is unlikely the data 

will be used as policymakers had intended and this policy will not have the desired effect on 

student achievement. 

 

Another issue that was raised in the previous chapter that needs to be addressed is 

actually a combination of multiple barriers.  Principals struggled to get all of their evaluations 

done, in addition to all of their other duties, because they lacked the time and personnel to do 

so.  All of the principals mentioned how time-consuming it was to conduct all of the 

observations, fill out the observation rubrics, and have follow-up meetings with the teachers.  

 

90 

 

Some principals mentioned how their budgets have been shrinking and how they have lost 

assistant principals.  Thus, they now have more time they need to devote to evaluations with 

fewer people to do it.  An obvious solution to this would be to adequately fund our schools so 

principals can get more help and to not be as stressed out about all of the things they need to 

accomplish in a day.  In addition to not being stressed, having fewer things to accomplish in a 

day could allow time for thoughtful practice instead of checking things off a list which could 

improve their performance as principals and also improve the quality of feedback they give 

their teachers.  If funding cannot be increased, then maybe reduce the frequency of 

observations for tenured teachers to be more in line with previous teacher evaluation policies 

in Michigan where they were evaluated only once every three years. 

 

A third issue that needs to be addressed relates to knowledge.  There was evidence that 

teachers did not understand the observation rubrics.  As a result, they did not use the data on 

the rubrics to inform their practice.  If teachers are to use this data, then they need to 

understand “what all the little boxes mean” (Teacher C).  This could be addressed by having 

each school devote professional development time to discussing, understanding and using the 

observation rubric.   

 

Other barriers from Chapter 5, such as student apathy, certainly affect standardized test 

scores and teacher evaluation scores, but are harder to address.  That does not mean they 

should be ignored though.  It just means significant time, thought, and probably money need to 

be devoted to addressing them.  If addressing the barriers to implementing this policy are not 

palatable to policymakers, then another option would be to change the policy itself.  Possible 

options for what could be done follow in the next section. 

 

91 

 

Potential options for teacher evaluations in Michigan 

 

One option for teacher evaluations in Michigan is to stay with the status quo.  That is, 

continue using state-approved observation rubrics and leave it up to the schools to figure out 

how to measure growth.  I would advise against this if what the teachers reported in my study 

is common throughout the state: they did not find the data from their evaluations useful, so 

they did not use the data to inform their practice at all.  This would go against one of the 

assumed goals of teacher evaluation reform.  The teachers and principals also questioned the 

use of standardized tests in a high-stakes manner given teachers cannot control for many of the 

factors that affect students’ scores on these tests which I described earlier in this chapter and in 

Chapter 2.  If these factors are not controlled for, teachers could be unfairly retained or 

terminated in the event layoffs are necessary. 

 

Another option is to leave the current policy relatively intact, but follow the MCEE’s 

(2013) recommendation of using value-added models statewide to measure growth instead of 

leaving it up to the individual schools to figure out how to measure growth.  If policymakers 

insist that standardized test data be used, this is likely the fairest way to do it as value-added 

models attempt to control for the factors that teachers have no control over and allegedly are 

able to isolate a given teacher’s effect.  Many articles have been published in recent years 

regarding value-added models, though, that raise issues with their ability to do what they claim.  

For example, Goldhaber et al (2013) found that when comparing results from a “Student Fixed-

Effects Model” and a “Student Fixed-Effects with Lagged Score Model” (two different value-

added models) there were significant variations in teacher quality as reported by the models.  

Some teachers that were reported as being in the lowest quintile in one model were reported 

 

92 

 

as being in the highest quintile using the other model and vice versa.  Given this, it seems there 

would still be some work to do with value-added models if they were to be used in a high-

stakes manner.  It could be something to pursue, however. 

 

A third option could be to just throw out the growth component and keep the state-

approved observation rubrics.  This would help eliminate the temptation to teach towards the 

tests and allow principals and teachers to focus on what they value in education.  An issue with 

this, though, is the teachers in my study did not find the data from the rubrics to be useful to 

them and some (e.g. Teacher C) did not understand “what all the check boxes mean” on the 

rubrics.  This indicated that, if this is the option that would be chosen, time would need to be 

devoted to helping teachers understand the rubrics prior to their use.  In addition to the 

teachers not finding the data useful, principals in my study indicated that filling out the rubrics 

was very time-consuming.  Thus, it is likely schools would need more money to hire assistants 

for principals in order to carry out their evaluation duties. 

 

A fourth option, even though it would likely be unpopular, would be to throw out the 

observation piece and just use standardized test data to evaluate teachers.  The reason for 

evaluating teachers this way would be to address claims that there was little to no variation in 

evaluations in previous iterations of teacher evaluation policy (Weisberg et all, 2009).  This 

would introduce variation into the evaluations and allow schools to rank their teachers from 

most to least effective, but there is evidence that different value-added models can generate 

different rankings with the same data so this may not be the best option.  This option also 

emphasizes an aspect of school that my participants do not value at all and completely ignores 

the parts they do value. 

 

93 

 

 

My recommended option, assuming my findings are common throughout the state, 

would be to do something along the lines of what my participants wanted.  The principals all 

advocated for a simpler system where they could just focus on conversations with teachers and 

not have to fill out unruly forms.  The teachers wanted something meaningful to them.  For 

example, Teacher A1 advocated for a system where teachers have to prove they are working on 

their craft in some way.  Thus, I would suggest allowing individual schools more freedom in 

what observation rubrics they use.  If they really want to use the ones that were approved by 

the state, then they could choose to do so.  If they feel the need to create a simpler form, then 

that they could choose to do that.  In addition to using these forms, the state may want to 

consider having teachers do some type of action research project every few years in order to 

demonstrate they are working on and learning from their craft.  There could be other ways to 

prove they are working on their craft, but action research was one that I, and others, have 

found to be valuable as a high school teacher (Bennett, 2004; Mitchell, Reilly, & Logue; 2009).  

It is also something that some schools were doing already to show growth when I looked at 

evaluation instruments in a previous study (Morissette, 2014). 

What can university-level mathematics educators do? 

 

One thing that mathematics educators can do, assuming something along the lines of 

my recommended option happens, is to help practicing mathematics teachers with their 

professional growth beyond teaching graduate classes.  At a minimum, I believe mathematics 

educators should be reaching out to local teachers to see what they are struggling with, what 

questions they have, and possibly be a guide as teachers work on their own action research 

projects.  We can suggest different articles teachers may want to read or even act as a sounding 

 

94 

 

board as they think of potential solutions to whatever issue they may be trying to work through 

in their classrooms.  Several mathematics educators already have partnerships with teachers 

and do work in schools, but in my ideal world, I would like to see everyone do this as I believe 

more frequent interactions between academics and K-12 teachers would be beneficial to all in 

the education field and it would help eliminate any perceived chasm between academia and 

the “real world”.  

 

In addition to interacting with practicing teachers more, mathematics educators may 

consider getting involved in the political arena and try to influence policymakers, or actually 

become one.  Several of my participants were fed up with the lack of respect the teaching 

profession gets and also how it seemed that policymakers would pass legislation regarding 

education without getting or listening to the opinions of people with a background in 

education.  We need to get involved in politics and be persistent if we are going to be able to 

effect change.  Regarding this, I need to listen to my own advice and get involved more as I 

have found I have wanted to give up on the education field several times in the past few years.  

I need to take what I have learned in this study and be persistently annoying until I am able to 

get the attention of policymakers.  Once I have their attention, I want to help get a fairer and 

more meaningful teacher evaluation policy instituted in the state. 

Positionality statement 
 
 

There are a couple aspects of my identity that are worth mentioning in this dissertation 

as they had an effect on this study.  The first aspect I feel that needs to be mentioned is that I 

have lived in the State of Michigan my entire life and I care very deeply about what happens in 

this state.   My love for this state is the reason I decided to study teacher evaluation reform 

 

95 

 

here as opposed to any other state.  Also, for the first 32 years of my life, I lived in a very rural 

area in the middle of the Upper Peninsula (UP).  Even though I have lived in the greater Lansing 

area for 8 years now, I still identify as a Yooper.  Yoopers generally feel they are often ignored 

by politicians in Lansing, basically because they can be.  Only 3% of the state’s population is in 

the UP and the perception that Yoopers have is their opinions and votes largely do not matter 

on anything that affects the whole state.  In the MCEE’s (2013) report, for example, there was 

no representation from the UP and I found this to be extremely bothersome.  Since I am 

bothered every time the UP is ignored, I ensured I would have some representation from the 

UP when I selected participants to interview for my study.  If I continue to do research 

regarding education-related policies in Michigan, I plan on continuing this practice of ensuring I 

have at least one participant from the UP as I strongly believe their opinions and concerns 

matter. 

 

Another aspect of my identity that needs to be mentioned is that I was a secondary 

mathematics teacher in Michigan for nine years prior to going to graduate school and I am 

currently teaching at the secondary level in Michigan again as I finish this dissertation.  The 

people in this study are more than just subjects to me.  They are my colleagues and I want to 

see them treated fairly.  When introducing myself to my potential interviewees, I discussed my 

teaching background and my overall goal of seeing a fair and meaningful evaluation system put 

in place in Michigan.  I feel that doing this an letting them know I was “one of them” helped me 

get more honest and thorough answers during my interviews.  Honestly, I do not believe that a 

teacher would flat out tell me their school does not use standardized tests on their evaluations 

if I was a policymaker, reporter, or another graduate student that had no experience teaching.   

 

96 

 

 

As I was working on this dissertation, where I am from and who I am were constantly on 

my mind.  I wanted to make sure the concerns and opinions of people in a part of the state that 

gets ignored were told and I feel I was able to do that in addition to making sure people from 

other parts of the state were heard as well.  Regarding my identity as a teacher, I tried to 

consciously withhold my beliefs and wishes regarding what changes I would like to see happen 

during the interviews as I did not want to influence the teachers’ and principals’ thinking.  There 

were a few that asked about my beliefs, but I tabled my answer until after the interviews were 

over and these conversations were not recorded.  I would like to say I felt these off-the-record 

conversations to be rather cathartic for me and my participants.  Even if this dissertation does 

not affect any meaningful change at the state level, I was at least able to make my participants 

feel heard and that someone cares about what they think. 

Lessons learned 

 

During this whole dissertation process, I came to realize two big things.  The first is that I 

feel I was underprepared regarding research methods.  I had assumed that if I took the research 

courses listed under our program requirements, then that, along with what I learned in my 

research assistantship my first two years at MSU, would have been enough to prepare me to 

conduct my research.  The problem with the courses in the program requirement, though, is 

they are mostly a survey of a variety of different methods and you do not really come out of the 

courses really knowing how to do any of those methods well.  Looking back, I really should have 

asked to be on a different research assistantship after my first year and also taken a course or 

two regarding survey methods as I feel I failed in that area in this dissertation.  In fact, I did not 

get enough people to respond to my surveys and ended up throwing out most of the data.  I 

 

97 

 

only ended up reporting some basic descriptive statistics in an appendix.  By doing this, I was 

unable to make any claims or use quantitative findings to further support my interview results.  

Thus, I could not make any firm recommendations for what to do with this policy as I really only 

have data from nine people.  With data from only nine individuals, I doubt any policymaker 

would make any changes based on my results.  My advice for people starting out in a doctoral 

program would be to try to figure out what type of study you would like to do for your 

dissertation as early as possible and then sit down with your guidance committee to figure out 

what course(s) to take in order to give you the best chance to succeed or possibly take 

additional coursework even when it is not required.  Learning more things is never a bad thing. 

 

The other thing I came to realize as I was working on this dissertation is that everyone 

seems to have a different idea regarding what needs to be in a dissertation, even within the 

same program.  I had assumed that using a dissertation of one recent graduate of the program 

as a guide would be a good way to get me through this process.  I was wrong and it resulted in 

extending my time of completion by a year.  If I were to do this all over again, I would have had 

conversations with my committee regarding what they believed should be in a dissertation and 

why and also looked for dissertations of people that had all, or most, of the same committee 

members on it.  Having these conversations early on and looking at multiple dissertations 

would be my advice to others as well as doing extensive revisions is a stressful and time-

consuming process. 

 

Regarding the lessons I learned, there is nothing I can really do to address the second 

one as that would require a time machine.  The first, however, is a lesson I can do something 

about in the future.  I plan on finding and reading books and articles regarding survey research 

 

98 

 

in order to patch any holes in my understanding.  Then, if I am lucky enough to get a job at a 

university, I plan on seeking out individuals that have more experience with surveys to see if 

they would be willing to work together on a study and/or paper.  Hopefully, with time and 

effort, I can get better and be able to effect change in the policy world. 

Limitations 

 

The biggest limitation I see in this study is that I had a rather small sample size of 

principals and teachers that participated.  As a result, I was unable to perform my independent 

samples t-tests as I had planned and did not use my survey data at all.  Thus, triangulation was 

difficult, but I did have another person code my data to try to improve the trustworthiness of 

my study.  The other limitation I can think of is the teachers and principals I interviewed 

seemed like they wanted to talk to me because they did not like the current system and wanted 

to see changes.  It is unclear how prevalent their views regarding the current evaluation system 

are among educators in the state.  More work needs to be done to address the weaknesses in 

this study, but I feel I have a better idea regarding what to focus on in a follow-up study. 

Suggestions for future research 

 

If I were to follow up on this study, I would likely draft new surveys using my conceptual 

framework and findings as guides to see how common the barriers to implementation of this 

teacher evaluation policy are throughout the state.  Specifically, I would be interested to know 

how many principals feel you do not need a background in a subject in order to evaluate it well 

as all of my principals said something along the lines of “good teaching is good teaching” and 

that they did not need to know the subject in order to judge if the teacher was performing well. 

I would also like to know how many mathematics teachers feel that this background is 

 

99 

 

important as all of my teachers in this study thought it was necessary to have this background 

to be able to evaluate them.  This disagreement between the two groups of people seems like it 

could affect teachers’ use of the evaluation data to inform their practice.   

In addition to investigating the importance of content background, I would also like to 

know if other teachers throughout the state find the standardized observation protocols not to 

be helpful.  If they are not using the data on those forms to inform their practice, are the 

protocols really worth using?  Is there anything that can be done to encourage teachers to use 

this observational data to inform their practice?   

Another thing I would like to investigate further is principals’ and mathematics teachers’ 

beliefs regarding the purpose of teacher evaluations.  The principals in my study all believed 

that the purpose of evaluation was to help teachers grow as professionals, but the teachers in 

my study all thought the primary purpose was accountability-related.  That is, they thought 

they were being evaluated in order to identify who the weaker teachers were in order to fire 

them.  It would be reasonable to believe that if teachers think evaluation is just a sorting 

mechanism, then they may not look at the data from their evaluations to inform their practice. 

 
 
 

 

 
 
 
 
 
 
 
 
 
 
 

100 

 

 

APPENDICES 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

101 

 

 

APPENDIX A:  Principal Survey 

 

1. 

Including your current job and all previous jobs where you have worked, how many TOTAL years 
will you have been a principal at the end of this school year? 

2.  By the end of this school year, how many total years will you have been the principal at your 

3.  How many school districts have you been employed at? 
4.  Are you certified to teach mathematics in the State of Michigan?  If so, was mathematics your 

current school? 

major or your minor? 

degree? 

 

5.  Do you have a Master’s degree?  If so, what was your major field of study for the advanced 

6.  Do you have a Doctorate?  If so, what was your major field of study for the advanced degree? 

7.  Does your school district use standardized tests (e.g. ACT, SAT, MEAP, MME, MSTEP) to help 

measure student growth on teacher evaluations (Y/N/Unsure)? 

8.  Which of the following models comes closest to the way your school uses standardized tests for 

the purpose of conducting teacher evaluations? (Note:  include a brief description of a status 
and growth model on Qualtrics)? 

9.  What do you see as the advantages and disadvantages of using the model you use for measuring 
student growth with standardized tests for the purpose of conducting teacher evaluations (only 
fill in the box on the survey)? 

10. How familiar are you with value-added growth models (not at all familiar, slightly familiar, 

moderately familiar, very familiar, extremely familiar) 

11. How comfortable are you with evaluating the quality of instruction for your mathematics 

teachers (not at all, slightly, moderately, very, extremely)? 

12. What parties (e.g. teacher union, superintendent, principal, individual teachers, ISD staff) were 

involved in the decision regarding how to measure student growth on evaluations? 

13. How well do you think standardized tests measure student growth (not at all well, somewhat 

well, moderately well, very well, extremely well) 

14. To what extent do you agree or disagree with the following statements (strongly agree, 

somewhat agree, neither agree nor disagree, somewhat disagree, strongly disagree)? 
a)  The purpose of teacher evaluation is to inform personnel decisions (e.g. hiring, termination, 
raises, and/or promotions). 
b)  The purpose of teacher evaluation is to inform professional development activities as 
evaluation helps identify areas of weakness for a given teacher. 
c)  The purpose of teacher evaluation reform is to identify and reward strong teaching (e.g. 
merit pay). 

15. In general, how much assistance would you say you personally give your school’s mathematics 

teachers with each of the following tasks (none, a little, some, quite a bit, a great deal) 
 Planning for instruction 
a)  Acquiring materials related to mathematics instruction. 
b)  Establishing classroom routines and procedures (e.g. collecting homework). 
c)  Matching the curriculum to standards. 
d)  Using standardized test scores to improve instruction. 

102 

 

 
 
 
 
 
 
 
 

 

e)  Identifying individuals who can share their expertise in mathematics (and/or mathematics 

teaching). 

f)  Understanding the central mathematical ideas in the curriculum. 

16. To what extent do you expect your mathematics teachers to do the following things (not at all, a 

little, some, quite a bit, a great deal)? 
a) 
 Adhere to a prescribed pacing in their instruction. 
b)  Make sure that their students’ test scores are high. 
c)  Address the state/district standards and objectives. 
d)  Have whole classroom discussion in which students explain how they solved tasks. 
e)  Have small-group discussion in which students explain how they solved tasks. 
f)  Use the adopted curriculum as a basis for their instruction. 
g)  Keep their students quiet and disciplined during classroom instruction. 
h)  Use challenging, problem solving tasks with their students. 
i)  Use students’ current mathematical thinking to inform their instruction. 
j)  Collaborate with other mathematics teachers. 
k)  Observe other mathematics teachers’ instruction. 
l)  Use yourself as a resource when instructional problems arise. 
m)  Make their lesson plans available for review. 
n)  Assist other mathematics teachers in improving their instruction. 

 

 

17. To what extent do you encourage your teachers to use standardized test results to inform their 

practice (not at all, a little, to a moderate extent, quite a bit, a great deal) 
 

18. To what extent do you agree or disagree with the following statement:  I have enough time to 
evaluate all of my school’s teachers to the best of my ability (strongly agree, somewhat agree, 
neither agree nor disagree, somewhat disagree strongly disagree). 
 

19.  Have you terminated any teachers as a result of their score on their evaluation?  (Y/N) 

20. If you are interested in being contacted in the future for a follow-up interview, please provide 
your contact information below.  The information will ONLY be used for my research purposes 
and will not be shared with anyone.  (First Name, Last Name, Phone Number, E-mail Address). 
 

 
 

103 

current school? 
 

major or your minor? 
 

degree? 
 

 

 

 

 

 

APPENDIX B:  Mathematics Teacher Survey 

 

1. 

Including your current job and all previous jobs where you have worked, how many TOTAL years 
will you have been a mathematics teacher at the end of this school year? 
 

2.  By the end of this school year, how many total years will you have been a teacher at your 

3.  How many school districts have you been employed at? 

4.  Are you certified to teach mathematics in the State of Michigan?  If so, was mathematics your 

5.  Do you have a Master’s degree?  If so, what was your major field of study for the advanced 

6.  Do you have a Doctorate?  If so, what was your major field of study for the advanced degree? 

7.  Does your school district use standardized tests (e.g. ACT, SAT, MEAP, MME, MSTEP) to help 

measure student growth on teacher evaluations (Y/N/Unsure)? 
 

8.  Which of the following models comes closer to the way your school uses standardized tests for 

the purpose of conducting teacher evaluations? (Note:  include a brief description of each in 
survey)? 
 

9.  How often do teachers in your department do the following?   

 
a)  work together to develop curriculum and instructional materials (never, annually, monthly, 
weekly, daily)? 
b)  observe each other teach (never, annually, monthly, weekly, daily)? 
c)   offer advice or help to each other (never, annually, monthly, weekly, daily)? 
d)  share ideas on teaching (never, annually, monthly, weekly, daily)? 
e)  promote innovative teaching practices (never, annually, monthly, weekly, daily)? 

 

10. This past school year, how many times have you received meaningful feedback on your 

performance from an administrator? (never, 1-2 times, 3-5 times, 6-10 times, more than 10 
times) 
 

11. This past school year, how many times have you received meaningful feedback on your 

performance from a fellow teacher? (never, 1-2 times, 3-5 times, 6-10 times, more than 10 
times) 
 

12. Does your school have a school-based mathematics coach (Y/N)? 

104 

 

 

 

 

14. In general, in the past year, how much assistance would you say your principal gave you with 

each of the following tasks (none, a little, some, quite a bit, a great deal) 
 
g) 
h)  Acquiring materials (e.g. manipulatives, textbooks, technology) related to mathematics 

 Planning for instruction 

instruction. 

i)  Establishing classroom routines and procedures (e.g. collecting homework). 
j)  Matching the curriculum to standards. 
k)  Using standardized test scores to improve instruction. 
l) 

Identifying individuals who can share their expertise in mathematics (and/or mathematics 
teaching). 

m)  Understanding the central mathematical ideas in the curriculum. 

15.   To what extent do you agree or disagree with each of the following statements (strongly agree, 

a) 

somewhat agree, neither agree nor disagree, somewhat disagree, strongly disagree)? 
 The purpose of my school principal (or assistant principal) visiting my classroom is to directly 
assist me in improving my teaching. 

b)  The purpose of my school principal (or assistant principal) visiting my classroom is to evaluate 

my teaching in terms of job performance. 

c)  My principal (or assistant principal) possesses a thorough knowledge of the curriculum and 

13. In the past year, how often have the following events occurred (never, 1-2 times, 3-5 times, 6-10 

times, more than 10 times)? 
 I discussed my teaching with a school principal or assistant principal. 

a) 
b)  A school principal or assistant principal observed my teaching (for at least 10 minutes). 
c)  A school principal or assistant principal provided me with feedback to improve my instruction 

after observing my teaching. 

d)  A school principal or assistant principal reviewed my students’ work with me. 

related instructional materials. 
 

16.  To what extent does your principal (or assistant principal) and fellow mathematics teachers 

expect you to do the following things (not at all, a little, some, quite a bit, a great deal)? 

o)   Adhere to a prescribed pacing in my instruction. 
p)  Make sure that my students’ test scores are high. 
q)  Address the state/district standards and objectives. 
r)  Have whole classroom discussion in which students explain how they solved tasks. 
s)  Have small-group discussion in which students explain how they solved tasks. 
t)  Use the adopted curriculum as a basis for my instruction. 
u)  Keep my students quiet and disciplined during classroom instruction. 
v)  Use challenging, problem solving tasks with my students. 
w)  Use students’ current mathematical thinking to inform my instruction. 
x)  Collaborate with other mathematics teachers. 
y)  Observe other mathematics teachers’ instruction. 
z)  Use him/her/them as a resource when instructional problems arise. 

105 

 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

aa) Make my lesson plans available for review. 
bb) Assist other mathematics teachers in improving their instruction. 

 

17. To what extent have you made efforts to change your teaching based on your experience in 

professional development sessions (not at all, slightly, moderately, considerably, a great deal)?  

18. To what extent have you made efforts to change your teaching based on your experience in 

courses you have taken since the beginning of your teaching career (not at all, slightly, 
moderately, considerably, a great deal)? 

19. To what extent do you believe standardized tests capture your students’ knowledge of 

mathematics (not at all, small, moderate, great)? 

20. How good of an indicator do you believe standardized test results are of the quality of your 

teaching (not at all, small, moderate, great)? 

21. How much of a teacher’s evaluation do you think should be based on standardized test results 

(0%, 1-10%, 11-25%, 26-50%,>50% )? 

22. How often do you attend state or national mathematics education conferences (e.g. MCTM or 

NCTM) (never, 1-2 times, 3-5 times, 6-10 times, more than 10 times)? 

23. How often do you attend various professional development activities/sessions/conferences etc. 

NOT provided by your school district (never, 1-2 times, 3-5 times, 6-10 times, more than 10 
times)? 

24. How familiar are you with how teachers are evaluated in your district? (none, a little, some, 

quite a bit, a great deal) 

25. Do you believe teacher evaluation reform has made the instruction in your classroom (a lot 

worse, a little worse, no change, a little better, a lot better)? 

26. What do you see as the advantages and disadvantages of using the model you use for measuring 
student growth with standardized tests for the purpose of conducting teacher evaluations (only 
fill in the box on the survey)? 

27. If you are interested in being contacted in the future for a follow-up interview, please provide 
your contact information below.  The information will ONLY be used for my research purposes 
and will not be shared with anyone.  (First Name, Last Name, Phone Number, E-mail Address). 

106 

APPENDIX C:  Principal Interview Protocol 

Beginning Script: 
 
Before we get started, I’d like to thank you for not only filling out the survey but also for taking 
the time to talk with me further. I’m really interested in teacher evaluation and in the 
relationship between teacher evaluation and standardized test scores of students. Would it be 
okay if I recorded our conversation? This is mainly for my recollection and so that I can listen 
carefully now rather than have to write things down at this point in time. I will assign you a 
pseudonym so that this data cannot be traced back to you. And, I am the only person who will 
listen to and transcribe this. 
 
Interview Questions (may ask others depending on how conversation goes): 
 

 What have your experiences with the teacher evaluation process been like?  Have you noticed 
any changes in the process throughout your career?  If so, what are those changes? 

2.  When you conduct formal observations of your mathematics teachers, what are some aspects 

1. 

of their teaching you focus on? 

3.  How do principals evaluate content-specific aspects of teaching if the subject was not their 

major or minor in college? 
In your opinion, what is(are) the purpose(s) for evaluating teachers? (fire vs inform teaching) 

4. 
5.  What is your understanding of how student growth data are used on teacher evaluations?  How 

do you show growth?  Is it different for different subjects?  What do you think the advantages 
and disadvantages are of measuring growth the way your district does? 

6.  What changes would you like to see happen regarding how teachers are evaluated? (student 

growth data, number of observations, who observes teachers) 

7.  What are your views regarding standardized testing?  (frequency, how well it measures student 

knowledge, indicator of teacher effectiveness). 

8.  When your school gets their standardized test results from the state, what do you do with 

them? (nothing, instruct teachers to formulate a plan to address them, etc) 
If standardized test data indicate a specific area is a weakness for your students, how do you 
respond? 

9. 

10. How would you like your math teachers to use standardized test data? 
11. How does standardized test data affect PD decisions (e.g. what you provide for staff)?  
12. What questions do you have for me?   

 

 
 
 
 
 
 
 
 
 
 
 

 

107 

APPENDIX D:  Mathematics Teacher Interview Protocol 

 

Beginning Script: 
 
Before we get started, I’d like to thank you for not only filling out the survey but also for taking 
the time to talk with me further. I’m really interested in teacher evaluation and in the 
relationship between teacher evaluation and standardized test scores of students. Would it be 
okay if I recorded our conversation? This is mainly for my recollection and so that I can listen 
carefully now rather than have to write things down at this point in time. I will assign you a 
pseudonym so that this data cannot be traced back to you. And, I am the only person who will 
listen to and transcribe this. 
 
Interview Questions (may ask others depending on how conversation goes): 
 

1. 

 What have your experiences with the teacher evaluation process been like?  Have you noticed 
any changes in the process throughout your career?  If so, what are those changes? 
In your opinion, what is(are) the purpose(s) for evaluating teachers? (fire vs inform teaching) 

2. 
3.  What is your understanding of how student growth data are used on teacher evaluations?  How 

do you show growth?  What do you think the advantages and disadvantages are of measuring 
growth the way your district does? 

4.  What changes would you like to see happen regarding how teachers are evaluated? (student 

growth data, number of observations, who observes you) 

5.  What are your views regarding standardized testing?  (frequency, how well it measures student 

knowledge, indicator of teacher effectiveness). 

7. 

6.  When your school gets their standardized test results from the state, what do you do with 

them? (nothing, study with others, etc) 
If standardized test data indicate a specific area is a weakness for your students, how do you 
respond? 

8.  How do you decide WHAT you are going to teach on a given day?  (Do ST affect?) 
9.  How do you determine HOW you are going to teach on a given day? (Do ST affect?) 
10. How does standardized test data affect PD decisions (what you seek out/what you provide for 

staff)? (e.g. geometry issue-seek out info regarding this) 

11. Is there anything you would like to see your principal do to help you that he/she isn’t already 

 

 

doing? (in evaluations, PD, dealing with parents or students) 

12. What questions do you have for me? 

 
 
 
 
 
 
 
 
 
 

108 

 

 

 

  

 

 

 

APPENDIX E:  Survey Data 

Table E.1: 
 
Amount of assistance the principal provides with using standardized test scores to 
improve mathematics instruction 

Response  None 

A little 

Some 

Quite a 

A great 

Total 

Teachers 

Principals 

8 

1 

3 

6 

0 

6 

bit 

2 

5 

deal 

0 

0 

13 

18 

Table E.2: 
 
Does your school district use standardized tests 
(e.g. ACT, SAT, MEAP, MME, MSTEP) to help 
measure student growth on teacher evaluations? 

Yes 

No 

Unsure  Total 

Teacher 
Response 
 

7 

2 

3 

12 

Table E.3: 
 
Does your school use a status model or growth 
model when using student growth data on 
teacher evaluations? 

Status  Growth  Unsure  Total 

Teacher 
Response 
 

1 

9 

3 

13 

109 

 

 

 

 

 

Table E.4: 
Emphasis principals put on making sure standardized test scores are high. 
Principal 
Response 
 

deal 
1 

A little  

Quite a bit  A great 

Some 

None 

4 

2 

1 

8 

Total 

16 

Table E.5: 
To what extent do you encourage your teachers to use standardized test results to inform their 
practice? 
Principal 
Response 
 

Quite a bit  A great 

deal 
0 

A little  

Some 

None 

16 

3 

0 

8 

5 

Total 

 

Table E.6: 
To what extent do you believe standardized tests capture your students' knowledge of 
mathematics? 
Teacher 
Response 
 

Quite a bit  A great 

deal 
0 

Not at all 

A little  

Some 

13 

3 

4 

1 

5 

Total 

Table E.7: 
How good of an indicator do you believe standardized test results are of the quality of your 
teaching? 
Teacher 
Response 

Somewhat 
good 

Somewhat 
bad  

Extremely 
bad 

Extremely 
good 

Total 

 

4 

1 

3 

0 

13 

Neither 
good nor 
bad 
5 

110 

 

 
 

 

 

 
 
 
 

 

Table E.8: 
Since the beginning of your career, how often have you attended various professional development 
activities/sessions/conferences etc NOT provided by your school district? 
Teacher 
Response 
 

6-10 times  More than 

10 times 
4 

Total 

Never 

1-2 times  

3-5 times 

13 

0 

4 

3 

2 

Somewhat 
disagree 

Strongly 
disagree 

Total 

2 

1 

0 

0 

13 

13 

Somewhat 
disagree 

Strongly 
disagree 

Total 

0 

4 

0 

4 

18 

18 

Table E.9: 
Purpose of evaluation (Teacher responses) 
Response 

Strongly 
agree 

Somewhat 
agree  

2 

Improve 
Teaching 
Accountability  8 

7 

4 

Neither 
agree nor 
disagree 
2 

0 

Table E.10: 
Purpose of evaluation (Principal responses) 
Response 

Strongly 
agree 

Somewhat 
agree  

10 

Improve 
Teaching 
Accountability  0 

8 

7 

Neither 
agree nor 
disagree 
0 

3 

 

111 

 

 

 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

REFERENCES 

112 

 

REFERENCES 

 

 
 
 
 

Adler, J. (2000). Conceptualising resources as a theme for teacher education. Journal of  
      Mathematics Teacher Education, 3(3), 205-224. 
 
Allensworth, E., Nomi, T., Montgomery, N., & Lee, V. E. (2009). College preparatory curriculum  
     for all: Academic consequences of requiring algebra and English I for ninth graders in  
     Chicago. Educational Evaluation and Policy Analysis, 31(4), 367-391. 
 
Anagnostopoulos, D., Rutledge, S. A., & Jacobsen, R. (2013).  The infrastructure of  
     accountability:  Data use and the transformation of American education.  Cambridge:   
     Harvard Education Press. 
 
Ball, D. L., Thames, M. H., & Phelps, G. (2008). Content knowledge for teaching what makes it  
     special?. Journal of Teacher Education, 59(5), 389-407. 
 
Bennett, M. (2004). A review of the literature on the benefits and drawbacks of participatory  
     action research. First Peoples Child & Family Review: A Journal on Innovation and Best   
     Practices in Aboriginal Child Welfare Administration, Research, Policy & Practice, 1(1), 19-32. 
 
Bivona, K. N. (2002). Teacher Morale: The Impact of Teaching Experience, Workplace  
     Conditions, and Workload. 
 
Cavanna, J. (2016). Mathematics teachers’ data use in practice: Considering accountability,   
     action research, and agency. (Doctoral Dissertation).  Michigan State University, East Lansing,  
     MI. 
 
Center for Educational Leadership. (2014). 5 dimensions of teaching and learning.  Retrieved  
     from http://www.k-12leadership.org/services/5-dimensions. 
 
Chazan, D. (1996). Algebra for all students? Journal of Mathematical Behavior, 15, 455-477. 

Clark, P., Kirk, E., & Burriss, K. G. (2000). Review of research: All-day kindergarten. Childhood  
     Education, 76(4), 228-231. 
 
Cohen, D. K. & Moffitt, S. L. (2009). The ordeal of equality:  Did federal regulation fix the  
     schools?  Cambridge, MA:  Harvard University Press 
 
Common Core State Standards Initiative. (2010). Common Core State Standards for  
     Mathematics. Washington, DC: National Governors Association Center for Best Practices and  

 

 

113 

 

     the Council of Chief State School Officers. 
 
Coleman, J. S. (1966). Equality of educational opportunity. 
Danielson, C. (2014). The framework.  Retrieved from http://danielsongroup.org/framework/ 
 
Darling-Hammond, L. (2015). Can Value Added Add Value to Teacher Evaluation?. Educational 
      Researcher, 44(2), 132-137. 
 
Dee, T. S., & Jacob, B. (2011). The impact of No Child Left Behind on student  
     achievement. Journal of Policy Analysis and Management, 30(3), 418-446. 
 
DeSimone, J. R., & Parmar, R. S. (2006a). Issues and challenges for middle school mathematics  
     teachers in inclusion classrooms. School Science and Mathematics, 106(8), 338-348. 
 
DeSimone, J. R., & Parmar, R. S. (2006b). Middle school mathematics teachers' beliefs about  
     inclusion of students with learning disabilities. Learning Disabilities Research &  
     Practice, 21(2), 98-110. 
 
Drake, C., & Sherin, M. G. (2006). Practicing change: Curriculum adaptation and teacher   
     narrative in the context of mathematics education reform. Curriculum Inquiry, 36(2), 153- 
     187. 
 
Figlio, D. N., & Winicki, J. (2005). Food for thought: the effects of school accountability plans on  
     school nutrition. Journal of Public Economics, 89(2), 381-394. 
 
Fitz, J., Halpin, D., & Power, S. (1994). Implementation research and education policy: practice  
     and prospects. British Journal of Educational Studies, 42(1), 53-69. 
 
Fowler, W. J., & Walberg, H. J. (1991). School size, characteristics, and outcomes. Educational  
     evaluation and policy analysis, 13(2), 189-202. 
 
Gamoran, A., & Hannigan, E. C. (2000). Algebra for everyone? Benefits of college-preparatory 
      mathematics for students with diverse abilities in early secondary school. Educational  
     Evaluation and Policy Analysis, 22(3), 241-254. 
 
Grissom, J. A., Kalogrides, D., & Loeb, S. (2013). Strategic Staffing: Examining the Class  
     Assignments of Teachers and Students in Tested and Untested Grades and Subjects.  
     In American Education Finance and Policy Conference, New Orleans, LA. 
 
Granholm, J. (2003).  CNN interview.  Retrieved from    
     http://edition.cnn.com/TRANSCRIPTS/0308/18/ltm.02.html 
 
Guenther, A. (2019). “How is this making my instruction better at all?: Centering teachers’  
     voices and striving for humanization in an investigation of high-stakes evaluations. (Doctoral  

 

114 

 

     Dissertation).  Michigan State University, East Lansing, MI. 
 
Hahs-Vaughn, D. L., & Lomax, R. G. (2013). An introduction to statistical concepts. Routledge. 
 
Hanushek, E. A., & Raymond, M. E. (2005). Does school accountability lead to improved student  
     performance?. Journal of Policy Analysis and Management, 24(2), 297-327. 
 
Herman, J., & Linn, R. (2013). On the Road to Assessing Deeper Learning: The Status of Smarter  
     Balanced and PARCC Assessment Consortia. CRESST Report 823. National Center for  
     Research on Evaluation, Standards, and Student Testing (CRESST). 
 
Holloway-Libell, J., & Collins, C. (2014). VAM-Based teacher evaluation policies: Ideological  
     foundations, policy mechanisms, and implications. InterActions: UCLA Journal of Education  
     and Information Studies, 10(1). 
 
Honig, M. I. (2009). What works in defining “what works” in educational improvement: Lessons  
     from educational policy implementation research. Directions for future research. Handbook  
     of educational policy research. New York: Routledge, 333-347. 
 
Hope, W. C. (2002). Implementing educational policy: Some considerations for principals. The  
     Clearing House, 76(1), 40-43. 
 
Jacob, B. A. (2005). Accountability, incentives and behavior: The impact of high-stakes testing in  
     the Chicago Public Schools. Journal of Public Economics,89(5), 761-796. 
 
Johnson, S. M. (2015). Will VAMS reinforce the walls of the egg-crate school?. Educational 
      Researcher, 44(2), 117-126. 
 
Kirst, M. W. and Wirt, F. M. (2009). The Political Dynamics of American Education. (4th edition).  
     New York: Teachers College Press. 
 
Kitchen, R., Depree, J., Celedon-Pattichis, S., & Brinkerhoff, J. (2007). Mathematics education at 
     highly effective schools that serve the poor:  Strategies for change. New York: Lawrence  
     Erlbaum Associates 
 
Lester, F., & Lambdin, D. V. (2003). From amateur to professional: The emergence and   
     maturation of the U.S. mathematics education research community. In G. M. A. Stanic & J.  
     Kilpatrick (Eds.), A history of school mathematics: Volume 2 (pp. 1629-1700). Reston, VA:  
     NCTM.  
 
Loveless, T. (2008).  The misplaced math student:  Lost in eighth-grade algebra.  Providence, RI:  
      Brown Center for Education Policy. 
 
Maccini, P., & Gagnon, J. C. (2006). Mathematics instructional practices and assessment  

 

115 

 

     accommodations by secondary special and general educators. Exceptional Children, 72(2),  
 
Maine Education Association. (2019). New law removes standardized test score requirement  
     from teacher evaluation process.  Retrieved from https://maineea.org/news/new-law- 
     removes-standardized-test-score-requirement-from-teacher-evaluation-process/ 
     217-234. 
 
Marzano, R. (2014).  Marzano teacher evaluation.  Retrieved from  
     http://www.marzanoevaluation.com/ 
 
Matland, R. E. (1995). Synthesizing the implementation literature: The ambiguity-conflict model  
     of policy implementation. Journal of public administration research and theory, 5(2), 145- 
     174. 
 
McDonnell, L. M., & Elmore, R. F. (1987). Getting the job done: Alternative policy  
     instruments. Educational evaluation and policy analysis, 9(2), 133-152. 
 
McLaughlin, M., Glaab, L., & Carrasco, I. H. (2014). Implementing Common Core State Standards 
      in California: A report from the field. Stanford, CA: Policy Analysis for California Education. 
 
Michigan Council for Educator Effectiveness. (2013). Building an improvement-focused system  
     of educator evaluation in Michigan:  Final recommendations.  Retrieved from 
      http://www.mcede.org/ 
 
Michigan Department of Education. (2006). Michigan merit curriculum. Retrieved from  
     http://www.michigan.gov/documents/mde/New_MMC_one_pager_11.15.06_183755_7.pdf 
 
Mitchell, S. N., Reilly, R. C., & Logue, M. E. (2009). Benefits of collaborative action research for  
     the beginning teacher. Teaching and Teacher Education, 25(2), 344-349. 
 
Morissette, M. (2014). Using student achievement and growth data for teacher evaluations:  An  
     investigation of the implementation of Michigan PA 102 of 2011.  Unpublished manuscript,  
     Program in Mathematics Education, Michigan State University, East Lansing, MI. 
 
Mosteller, F. (1995). The Tennessee study of class size in the early school grades. The future of  
     children, 113-127. 
 
National Research Council & National Academy of Education (2010). Getting value out of  
     value-added. Braun, H., Chudowsky, N, & Koenig, J. (Eds.). Center for Education, Division  
     of Behavioral and Social Sciences and Education. Washington, DC: The National Academies  
     Press. 
 
Newman, J. W. (1998). America’s teachers:  An introduction to education.  (3rd edition).  

 

116 

 

     Addison-Wesley Longman, Inc. 
 
Nomi, T. (2012). The unintended consequences of an algebra-for-all policy on high-skill students  
     effects on instructional organization and students’ academic outcomes. Educational  
     Evaluation and Policy Analysis, 34(4), 489-505. 
 
Porter, R. E., Fusarelli, L. D., & Fusarelli, B. C. (2015). Implementing the Common Core: How  
     educators interpret curriculum reform. Educational Policy, 29(1), 111-139. 
 
Sanders, W. L., & Rivers, J. C. (1996). Cumulative and residual effects of teachers on future   
     student academic achievement. 
 
Shepard, L. A., Penuel, W. R., & Pellegrino, J. W. (2018). Using learning and motivation theories 
     to coherently link formative assessment, grading practices, and large-scale  
     assessment. Educational Measurement: Issues and Practice, 37(1), 21-34. 
 
Shulman, L. S. (1986). Those who understand: Knowledge growth in teaching. Educational  
     Researcher, 4-14. 
 
Silver, Strong, et al (2014).  The thoughtful classroom.  Retrieved from  
     http://www.thoughtfulclassroom.com/ 
 
Silver, E. A. (1997). "Algebra for All"--Increasing students' access to algebraic ideas, not just  
     algebra courses. Mathematics Teaching in the Middle School, 2(4), 204-207. 

 

 

Stein, M. K., Kaufman, J. H., Sherman, M., & Hillen, A. F. (2011). Algebra: A Challenge at the 
      Crossroads of Policy and Practice. Review of Educational Research, 81(4), 453-492. 

Spillane, J. P., Hallett, T., & Diamond, J. B. (2003). Forms of capital and the construction of  
     leadership: Instructional leadership in urban elementary schools. Sociology of Education, 1- 
     17. 
 
Spillane, J. P., Reiser, B. J., & Reimer, T. (2002). Policy implementation and cognition: Reframing  
     and refocusing implementation research. Review of Educational Research, 72(3), 387-431. 
 
Spillane, J. P., & Zeuli, J. S. (1999). Reform and teaching: Exploring patterns of practice in the  
     context of national and state mathematics reforms. Educational Evaluation and Policy  
     Analysis, 21(1), 1-27. 
 
State of Michigan. (2011a). Enrolled House bill no. 4625. Retrieved from  
     http://www.legislature.mi.gov/documents/2011-2012/publicact/pdf/2011-PA-0101.pdf 
 
State of Michigan. (2011b). Enrolled House bill no. 4626. Retrieved from  

 

117 

     http://www.legislature.mi.gov/documents/2011-2012/publicact/pdf/2011-PA-0100.pdf 
 
State of Michigan. (2011c). Enrolled House bill no. 4628.  Retrieved from 
     http://www.legislature.mi.gov/documents/2011-2012/publicact/pdf/2011-PA-0103.pdf 
 
State of Michigan. (2011d). Enrolled Senate bill no. 7.  Retrieved from  
     http://www.legislature.mi.gov/documents/2011-2012/publicact/htm/2011-PA-0152.htm 
 
State of Michigan. (2012). Enrolled Senate bill no. 1040. Retrieved from  
     http://www.legislature.mi.gov/documents/2011-2012/publicact/pdf/2012-PA-0300.pdf 
 
United States Department of Education. (2014). No child left behind.  Retrieved from  
     http://www2.ed.gov/nclb/landing.jhtml 
 
Vaughn, S., Bos, C. S., & Schumm, J. S. (2000). Teaching exceptional, diverse, and at-risk  
     students in the general education classroom (2nd ed.). Boston: Allyn & Bacon. 
 
Weisberg, D., Sexton, S., Mulhern, J., Keeling, D., Schunck, J., Palcisco, A., & Morgan, K. (2009).  
     The widget effect: Our national failure to acknowledge and act on differences in teacher    
     effectiveness. New Teacher Project. 
 
Winters, M. A., & Cowen, J. M. (2012). Grading New York Accountability and Student  
     Proficiency in America’s Largest School District. Educational Evaluation and Policy  
     Analysis, 34(3), 313-327. 
 
 
 

 

 

 

 
 

118