AN INVESTIGATION OF TEST-TAKING EFFORT IN 

A COMPUTER-ADAPTIVE TEST OF READING 

 

By 

 

James Eugene Los 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

A DISSERTATION 

 

Submitted to 

Michigan State University 

in partial fulfillment of the requirements 

for the degree of 

 

School Psychology—Doctor of Philosophy 

 

2019 

 

 

ABSTRACT 

 

AN INVESTIGATION OF TEST-TAKING EFFORT IN 

A COMPUTER-ADAPTIVE TEST OF READING 

 

By 

 

James Eugene Los 

 

Educators use academic testing to measure the knowledge and skills of their students, but 

research has shown that some students exhibit minimal test-taking effort (TTE) on low-stakes 

tests (Wise & Kong, 2005; Wise 2015). Given the importance of test validity, there is a need for 

further research on the contexts in which low TTE occurs and possible correlates of low TTE. 

The goals of the current study were to measure the prevalence of low TTE in elementary and 

middle school contexts, identify groups for whom low TTE is particularly apparent, and examine 

whether motivational variables are associated with low TTE. Study I involved the analysis of 

item-level test data for students in grades four and eight (N = 572,847) to identify the proportion 

of students who submitted responses so rapidly that they are not considered valid. In Study II, 

students in grades seven and eight (N = 675) completed an online survey that measured their 

expectancy and value beliefs (Wigfield & Eccles, 2000) related to taking the STAR Reading test 

(Renaissance Learning, 2014). Results of logistic regression analyses indicated that grade level, 

gender, race/ethnicity, attainment value beliefs, and cost beliefs were significantly associated 

with the odds of low TTE. These findings suggest that potential ways to improve student TTE 

may include informing students about how a test will be used to enhance their learning. Related 

suggestions for future research that might meaningfully extend the present findings are provided. 

 

 

 
 
 
 
 
 

Copyright by 
JAMES EUGENE LOS 
2019 
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Leah Jean, 

I like you and I love you. 

 

iv

 

 

ACKNOWLEDGMENTS 

 
 
 
 

Thank you to all of my friends, family, and colleagues who have supported me. There are 

many other people who deserve my deepest gratitude. If every one of them were written down, I 

suppose that even the whole world would not have room for the pages that would be written. 

 

Thank you to my family: Scotty, Wendi, Kevin, Lisa, Charissa, Tricia, Jake, Jon, Ady, 

Tyler, Mitch, Cara, Paul, Jeff, Cole, Jackson, Alex, grandpas, grandmas, and all the rest of you. 

 

 

Thank you to my crew: Aaron, Jack, Lucas, Parker, Kailie, Kali. I’m glad you’re mine. 

Thank you to my friends: Lee Gordo, Ben, Derek, Kelli, Jenna, Rheadon, Dana, Ali, 

Jeshua, Allie, Zach, Matt, Tyler, AJ, Kyle, Bryant, Peter, Donny, Bradley, Chris, Drew, Derek, 

Jacob, and Corey. In loving memory of my friend Dr. Adam Winstrom. 

 

Thank you to my cohort of friends and esteemed colleagues: Addam, Ali, Allie, Becky, 

Courtney, Dani, Danielle, Jamie, Katie, Kiley, and Rick. 

 

 

 

Thank you to Rob, Science Mike, Vishnu, Hillary, William, Father Richard, and Rachel. 

Thank you to my supervisors and mentors: Kurt, Sherri, Jason, Luke, Lillian, and Trisha. 

Thank you to my professors at MSU and Calvin: Dr. Aupperlee, Dr. Carlson, Dr. Fine, 

Dr. Oka, Dr. Rispoli, Dr. Windram, Dr. Yonker, Dr. Tellinghuisen, Dr. Riek, Dr. DeHaan, Dr. 

Stehouwer, et al. Thank you to Holly Boehle, Brandi-Lyn Mendham, and Calvin DeKuiper. 

 

Finally, thank you to my adviser and dissertation committee members: Dr. Sara Witmer, 

Dr. Cary Roseth, Dr. Adrea Truckenmiller, and Dr. Martin Volker. 

 

 

 

 

 

v

TABLE OF CONTENTS 

 
 
 
LIST OF TABLES................................................................................................................. viii 
 
LIST OF FIGURES............................................................................................................... ix 
 
CHAPTER I........................................................................................................................... 1 
INTRODUCTION................................................................................................................. 1 
Purpose....................................................................................................................... 1 
 
 
Background................................................................................................................ 2 
Importance................................................................................................................. 5 
 
Rationale for Current Study....................................................................................... 9 
 
 
Research Questions.................................................................................................... 10 
 
CHAPTER II.......................................................................................................................... 11 
LITERATURE REVIEW...................................................................................................... 11 
 
Theoretical Background and Conceptual Framework............................................... 11 
Test-Taking Effort..................................................................................................... 19 
 
Empirical Research on Student Test-Taking Effort................................................... 28 
 
Student-Level Correlates of Test-Taking Effort........................................................ 36 
Current Study and Research Questions......................................................................45 

 
 
CHAPTER III........................................................................................................................ 47 
METHODS OF STUDY I..................................................................................................... 47 
Rationale for Two Studies......................................................................................... 47 
 
Purpose and Design of Study I...................................................................................47 
 
Sampling Procedure................................................................................................... 47 
 
 
Measures.................................................................................................................... 49 
 
Data Analyses............................................................................................................ 52 
 
CHAPTER IV........................................................................................................................ 54 
RESULTS OF STUDY I....................................................................................................... 54 
Data Screening and Preliminary Analyses................................................................. 54 
 
Descriptive Statistics.................................................................................................. 55 
 
 
Comparative Analyses............................................................................................... 56 
 
Logistic Regression Analyses.................................................................................... 57 
 
CHAPTER V......................................................................................................................... 59 
METHODS OF STUDY II.................................................................................................... 59 
Purpose and Design of Study II................................................................................. 59 
 
Sampling Procedure................................................................................................... 59 
 
 
Measures.................................................................................................................... 60 
Procedures.................................................................................................................. 62 
 
 
Data Analyses............................................................................................................ 64 

 

vi

CHAPTER VI........................................................................................................................ 66 
RESULTS OF STUDY II...................................................................................................... 66 
Data Screening and Preliminary Analyses................................................................. 66 
 
Descriptive Statistics.................................................................................................. 66 
 
 
Comparative Analyses............................................................................................... 68 
 
Logistic Regression Analyses.................................................................................... 69 
 
CHAPTER VII....................................................................................................................... 71 
DISCUSSION........................................................................................................................ 71 
Summary of Major Findings...................................................................................... 71 
 
Interpretation of Results............................................................................................. 72 
Implications for Theory and Research....................................................................... 80 
Implications for Practice............................................................................................ 85 
Limitations................................................................................................................. 88 
Conclusions................................................................................................................ 92 

 
 
 
 
 
APPENDICES....................................................................................................................... 93 
APPENDIX A. TABLES AND FIGURES............................................................... 94 
 
 
APPENDIX B. LETTER TO TEST DEVELOPERS................................................ 110 
APPENDIX C. RENAISSANCE LEARNING PRIVACY POLICY NOTICE....... 113 
 
APPENDIX D. STUDENT PERCEPTIONS OF TESTING SURVEY................... 114 
 
APPENDIX E. EXPECTANCY ORIGINAL AND ADAPTED ITEMS................. 117 
 
 
APPENDIX F. VALUE ORIGINAL AND ADAPTED ITEMS.............................. 118 
APPENDIX G. LETTER TO SCHOOL ADMINISTRATORS............................... 119 
 
APPENDIX H. IRB EXEMPT DETERMINATION LETTER................................ 122 
 
 
APPENDIX I. LETTER TO PARENTS................................................................... 125 
 
REFERENCES...................................................................................................................... 127 
 

 

vii

LIST OF TABLES 

 
 
 
Table 1. Studies Measuring Test-Taking Effort Using Response Time Effort....................... 94 
 
Table 2. Demographic Information for Sample (Study I)...................................................... 95 
 
Table 3. Distribution of RTE Scores for Sample (Study I)..................................................... 96 
 
Table 4. Mean RTE Scores by Subgroup (Study I)................................................................ 97 
 
Table 5. RTE Scores by Grade and Gender (Study I)............................................................ 98 
 
Table 6. RTE scores by Grade and Race/Ethnicity (Study I)................................................. 98 
 
Table 7. RTE scores by Gender and Race/Ethnicity (Study I)............................................... 99 
 
Table 8. Proportion Identified with Low TTE by Subgroup (Study I)................................... 100 
 
Table 9. Demographic Information for Students with Low TTE (Study I).............................101 
 
Table 10. Results of Logistic Regression Model (Study I)..................................................... 102 
 
Table 11. Demographic Information for Sample (Study II)................................................... 102 
 
Table 12. Distribution of RTE Scores for Sample (Study II)................................................. 103 
 
Table 13. Proportion Identified with Low TTE by subgroup (Study II).................................104 
 
Table 14. Demographic Information for Students with Low TTE (Study II)......................... 104 
 
Table 15. Descriptive Statistics for SPOTS Items (Study II)................................................. 105 
 
Table 16. Descriptive Statistics for SPOTS Subscales (Study II)..........................................106 
 
Table 17. Mean SPOTS Subscale Scores by Subgroup (Study II)..........................................106 
 
Table 18. Bivariate Correlation Matrix for Variables (Study (II)..........................................107 
 
Table 19. Results of Multiple Logistic Regression Model (Study II).......................................107 
 

 

 

viii

LIST OF FIGURES 

 
 
 
Figure 1. Conceptualization of TTE in Demands–Capacity Model of Test-Taking Effort...  108 
 
Figure 2. Relationships from EEVT Examined in Current Study.........................................  109  
 
 

 

ix

CHAPTER I 

INTRODUCTION 

Purpose 

 

The purpose of the current study was to investigate student-level correlates of test-taking 

effort (TTE) on a low-stakes, computer-adaptive test (CAT) in reading. Educators regularly use 

testing to gather information about the knowledge and skills of students, but inferences made 

based on test scores are only appropriate if these scores represent reliable and valid indicators of 

the students’ “true” proficiency (Salvia, Ysseldyke, & Bolt, 2013). In contemporary models of 

assessment, test users rely on the assumption that the examinees have given appropriate effort 

when completing a test (Eklöf, 2010; Wise, 2015). However, research on educational testing has 

suggested some test-takers show little effort during testing, as demonstrated by responding with 

rapid-guessing behavior (RGB; Schnipke & Scrams, 1997) with accuracy rates comparable to 

chance (Setzer, Wise, van den Heuvel, & Ling, 2013; Swerdzewski, Harmes, & Finney, 2011). 

 

This problem is especially evident in low-stakes testing contexts, in which test scores 

may be significant to educators but carry no personal consequences for the students (Eklöf, 2010; 

Wise, 2014; Wise & DeMars, 2005). In fact, researchers have documented that as many as 35% 

of students exhibit low TTE if scores do not affect their grades (Rios, Liu, & Bridgeman, 2014), 

although the reasons why some students tend to show non-effortful responding are unknown. 

Widespread disengagement during low-stakes educational testing could be a considerable threat 

to the validity of testing systems because scores from low-effort respondents yield no meaningful 

information about the actual proficiency of the students (Cronbach, 1960; Haladyna & Downing, 

2004; Wise & DeMars, 2005). Moreover, if a large proportion of students exhibit low TTE, this 

can adversely affect the psychometric properties of the aggregate test data (Wise & Kong, 2005). 

 

1

 

Even though ensuring that students exert adequate TTE is essential for appropriate testing 

practices, there have been surprisingly few empirical studies of TTE in K–12 academic contexts 

(Wise, 2014). Given the importance of test validity, there is a clear need for additional research 

examining both the extent to which low TTE is a problem for educational tests and the possible 

correlates of low TTE. With a greater understanding of the factors related to disengagement from 

testing, it may be possible to develop targeted strategies for promoting more effortful responding 

on low-stakes tests. Therefore, to help inform policies and practices that address the problem of 

low TTE, the primary goals of the current investigation were to contribute to the extant research 

on the following: a) the contexts in which low TTE occurs, b) the groups of students for whom 

low TTE is particularly apparent, and c) the motivational variables associated with low TTE. 

Background 

 

The appropriate use of educational testing practices has been one of the central concerns 

in the national discourse on education over the past two decades, and recent federal legislation in 

the Every Student Succeeds Act of 2015 (ESSA, 2015) further emphasized the importance of 

using educational testing for measuring student proficiency and informing classroom instruction. 

According to the Institute of Educational Sciences (IES), educational testing data should be used 

as “part of an ongoing cycle of instructional improvement” (Hamilton et al., 2009, p. 8). Indeed, 

reports suggest that K–12 students have experienced a substantial increase in time spent taking 

state-mandated accountability tests, district-wide standardized tests, and teacher-made tests over 

the past few years (Hart et al., 2015). Further, the proliferation of technology in the classroom 

has contributed to an increase in the development and use of computer-based testing (CBT), 

which can have practical advantages over paper-and-pencil testing (PPT) in terms of efficiency 

and measurement precision (Barnard, 2015; Shapiro, Dennis, & Fu, 2015; Weiss; 2011). 

 

2

In response to growing concerns from educators and policy-makers about the expanding 

use of educational testing programs, the U.S. Department of Education (ED) called for efforts to 

promote more appropriate testing practices in K–12 public schools. The Obama administration's 

Testing Action Plan (ED, 2015) stated that all educational tests must be high quality, supportive 

of fairness, worthwhile for students and teachers, and tied to improved learning. This demand for 

quality testing practices reflects the consensus among educators that test scores are only useful if 

they convey valid information about the knowledge and skills of students (Salvia et al., 2013). 

According to the Standards for Educational and Psychological Testing, validity refers to 

“the degree to which evidence and theory support the interpretations of test scores for proposed 

uses of tests” (American Educational Research Association [AERA], American Psychological 

Association [APA], & National Council on Measurement in Education [NCME], 2014, p. 11). 

That is, test developers and users need to consider the evidence that scores are reasonably free 

from the influence of construct irrelevance (i.e., processes extraneous to the intent of the test). 

In general, test validation includes the evaluation of five sources of data: 1) evidence 

based on test content, 2) evidence based on response processes, 3) evidence based on internal 

structure, 4) evidence based on relations to other variables, and 5) evidence based on the 

consequences of testing (AERA et al., 2014; Salvia et al., 2013). Even though it is standard 

practice for technical manuals of assessments to discuss particular types of validity evidence 

(e.g., convergent or discriminant evidence, predictive or concurrent test-criterion relationships), 

little attention has been given to test validity based on response process. This issue is concerning 

because unless test developers and users have data to suggest the test takers engaged in cognitive 

processes consistent with the intended cognitive model of the test, it remains possible that some 

extraneous factors could have differentially influenced the performance of specific test takers. 

 

3

Researchers can gather data based on examinee response processes by asking them to 

explain how they reached their answers, maintaining and analyzing records of their work, or 

tracking and recording eye movement or response times (Salvia et al., 2013; AERA et al., 2014). 

Indeed, a central assumption in educational testing is that students are using some meaningful 

cognitive processes directed toward determining the answer. For instance, if a test is purported to 

measure reading comprehension, a fundamental assumption is that students actually engaged in 

reading the content of the passage, item prompt, and response options. However, several experts 

have suggested that non-effortful responding (i.e., guessing) may be one construct-irrelevant 

response process that could pose a significant threat to test validity. That is, scholars have 

asserted that the validity of any inferences made based on test scores is directly dependent on the 

amount of TTE students exerted while taking the test (Wise & DeMars, 2006; Wise & Kong, 

2005). Indeed, growing research evidence has supported this notion, as several scholars have 

documented the adverse effects of low TTE on test performance (Cronin et al., 2005; Sundre & 

Kitsantas, 2004; Wise, 2006; Wise, Bhola, & Yang, 2006; Wise & DeMars, 2006). 

For this reason, educational researchers, test developers and test users need to recognize 

that responding correctly to test items requires examinees to possess and demonstrate not only 

the target academic skills of the assessment but also the level of test-taking motivation necessary 

for appropriately engaging in the test and providing a valid response (Eklöf, 2010). Although 

there has been empirical research on the extent to which deficits in specific academic skills 

adversely influence student performance on tests not designed to measure those deficit academic 

skills, few researchers have examined whether student differences in motivational factors could 

similarly affect how students demonstrate their academic skills during educational testing. 

 

4

Moreover, it is possible that having low motivation for test-taking could prevent some 

students from exhibiting their actual knowledge and skills on low-stakes tests in a variety of 

subjects (when the tests demand high levels of mental effort), in the same way that deficits in 

specific academic skills can prevent some students from showing their true ability when taking 

tests in other subject areas (e.g., students with low reading skills might not be able to adequately 

demonstrate their underlying math problem-solving skills if the test has high reading demands). 

Despite growing recognition in several countries that appropriate TTE is fundamental to 

valid test interpretation and use (e.g., Barry, Horst, Finney, Brown, & Kopp, 2010; Eklöf, 2010), 

there have been few studies on the extent to which low TTE contributes construct-irrelevant 

variance to test scores and thereby limits the measurement of the academic skills of some 

subgroups or individual students. Ultimately, this issue relates to the core principle of fairness in 

assessment, and the current lack of research on TTE in K–12 academic contexts points to an 

important area for further empirical investigation. Guided by an application of the contemporary 

expectancy–value theory of motivation and engagement (Wigfield & Eccles, 2000), the current 

investigation aimed to extend the previous research on student TTE through two empirical 

studies focused on the student-level correlates of TTE in K–12 educational testing contexts. 

Importance 

Currently, questions remain about why some students exhibit appropriate TTE whereas 

others disengage and respond with minimal effort (Wise, 2014). More specifically, educators, 

researchers, and policy-makers could benefit from additional information on three major issues 

related to TTE: 1) the prevalence of K–12 students who exhibit low TTE, 2) the characteristics 

of students who exhibit low TTE, and 3) the alterable psychological correlates of exhibiting low 

TTE (which could thereby help inform efforts designed to prevent low TTE from occurring). 

 

5

 

First, it is crucial for test developers and educators to identify the extent to which low 

TTE might be a problem for school-age students. Each year, millions of elementary and middle 

school students across the nation take low-stakes CATs in reading (e.g., STAR Reading) as part 

of school-wide benchmark testing programs (Renaissance Learning, 2016), and educators use 

scores from these tests to inform curricula, instruction, and intervention (Ysseldyke et al., 2006). 

However, as previously stated, the resulting scores of test examinees who engaged in cognitive 

processes extraneous to the target construct (such as RGB) might not be considered valid. That 

is, interpreting and using testing data containing scores from disengaged test-takers may be 

inappropriate due to the deleterious effects of low TTE on test scores (Sundre & Wise, 2003; 

Wise & DeMars, 2009). Thus, if any particular students or groups of students exhibit low TTE, 

their resulting scores might under-represent their actual skills in the tested domain. Given the 

potential consequences of using invalid testing data, it is essential for additional research to 

clarify the proportion of students whose scores could potentially be invalid due to low TTE. 

Despite growing evidence that low TTE is a substantial problem for college students 

when taking university-mandated tests (which often have practical utility for the institutions but 

no personal consequences or incentives for students), few researchers to date have studied TTE 

in elementary or middle school samples. In the previous research on student TTE in elementary 

and middle school samples, the proportions of students identified as exhibiting low TTE have 

varied considerably, ranging from 1.4% (Wise, Kingsbury, Thomason, & Kong, 2004) to 11.6% 

(Wise, Ma, Kingsbury, & Hauser, 2010). These findings suggest that some elementary and 

middle school students indeed demonstrate high rates of RGB when taking educational CBTs in 

low-stakes contexts, although researchers have not reported consistent estimates of the current 

prevalence of low TTE on the low-stakes academic tests commonly used in K–12 schools. 

 

6

For instance, Wise and colleagues (2010) documented the proportions of students in 

grades 3–8 who were identified with low TTE on a CAT in reading and disaggregated the results 

by grade level and the time of day the test was taken. The proportions of low-effort respondents 

ranged from 0.5% (for third-grade students taking a test at 7:00 a.m.) to 5.9% (for eighth-grade 

students taking a test at 2:00 p.m.). These figures suggest a notable proportion of students exhibit 

low TTE on educational tests under certain conditions, although replication studies are needed to 

establish further that these rates exist in other contexts. If high proportions of elementary or 

middle school students exhibit RGB on low-stakes tests, that would suggest low TTE might be 

distorting the results of the aggregate test data in these contexts (Wise, 2015). Still, even if the 

overall rate of RGB is small, educators would need to acknowledge that the validity of individual 

test scores would be compromised severely for any student identified as exhibiting exceptionally 

low TTE. Better understanding this issue in the context of K–12 schools may help inform future 

testing research focused on examinee response processes. For this reason, one of the primary 

goals of the current study was to investigate the current proportion of elementary or middle 

school students who exhibit low TTE when taking a commonly used, low-stakes CAT in reading. 

 

Second, if additional research is conducted to identify whether any subgroups of students 

are particularly likely to show low TTE, this information could be useful for informing efforts to 

improve student TTE. Previous research on academic motivation has indicated that students from 

different demographic groups may vary in their general motivation (Wentzel & Brophy, 2014). 

In the area of reading, there are documented differences in motivation at school by age, gender, 

race, ethnicity, and disability status (Archambault, Eccles, & Vida, 2010; Baird, Scott, Dearing, 

& Hamill, 2009; Baker & Wigfield, 1999; Battle, 1979; Durik, Vida, & Eccles, 2006; Eccles, 

1984; Grolnick & Ryan, 1990; Schunk, Meece, & Pintrich, 2014; Wentzel & Miele, 2016). 

 

7

Whether similar group differences might emerge in the specific domain of test-taking is 

currently unknown. This issue warrants further investigation, as there has been growing interest 

in research that addresses the degree to which tests might differentially support the test-taking 

motivation of specific subgroups of examinees (AERA, APA, & NCME, 2014). For example, if 

some subgroups of students perceive the content in an assessment to be especially uninteresting, 

culturally irrelevant, unfamiliar, or confusing, it could differentially limit the TTE of students 

from that group. In sum, it would be helpful to identify student characteristics associated with 

low TTE to help inform how educators might ameliorate any group disparities in TTE that exist. 

 

Lastly, it is essential for researchers to investigate potential reasons why some students or 

groups of students disengage from testing. Scholars have argued that better understanding the 

dynamics of test-taking is critical for developing effective strategies for eliciting appropriate 

TTE from students on low-stakes educational tests (Wise & DeMars, 2005). The identification of 

malleable correlates of TTE could be particularly useful for informing how educators design and 

implement prevention and intervention strategies intended to improve student TTE. Indeed, there 

is research evidence to suggest educators can use instructional practices or targeted interventions 

to enhance motivation in academic domains (Guthrie et al., 2004; Wigfield & Wentzel, 2007). In 

fact, a recent intervention study by Liu, Rios, and Borden (2015) showed that test practitioners 

could proactively improve the TTE of college students by administering a brief motivational 

prompt to the experimental group before taking the test, and this increase in TTE was associated 

with outperforming a control group by 0.63 standard deviations. Accordingly, if additional 

research is conducted to identify alterable motivational variables associated with low TTE, it is 

possible that future efforts to improve student TTE (and prevent RGB) could be strengthened by 

explicitly targeting the unique motivational needs of students from different groups. 

 

8

Rationale for Current Study 

 

Therefore, it is critical for researchers and practitioners to learn more about the issue of 

low TTE in academic contexts to ensure test validity, promote fairness, and develop strategies 

that support student TTE on non-consequential tests. With students taking increasing numbers of 

tests at school, it is essential for educators to understand how serious the problem of low TTE 

might currently be in K–12 schools. Relatedly, it would be helpful for researchers to investigate 

why some students demonstrate appropriate levels of engagement whereas others have been 

found to disengage and guess randomly. According to Eccles and colleagues' expectancy–value 

theory of motivation (EEVT; see Wigfield & Eccles, 2000), students' achievement-related 

behaviors (such as their effort and persistence on a task) can be explained by the students’ 

subjective perceptions concerning 1) how successfully they expect to perform the task 

(expectancy beliefs) and 2) how much they value engaging in the task (value beliefs). 

Generally, the EEVT would imply that students who believe they are unable to answer a 

test item correctly (low expectancy) or believe they have no meaningful reason to try (low value) 

would be hypothesized to exhibit lower TTE (which could be manifested by high rates of RGB). 

Recently, scholars have cited the expectancy–value theory to explain why some test-takers have 

exhibited high levels of RGB in previous studies, asserting that low-stakes academic tests may 

have exceptionally low perceived value for particular students, such that these students exert 

minimal TTE (Setzer et al., 2013). Wise and DeMars (2005) postulated, “For these students, the 

task of doing well on the test will have little attainment, intrinsic, or utility value. Moreover, 

these students will be aware of the costs associated with the assessment test (i.e., being denied 

the opportunity to engage in more valued activities). Thus, the Eccles–Wigfield model would 

predict low effort on low-stakes assessment tests from students with weak value beliefs” (p. 3). 

 

9

Still, there have been few studies to date in which researchers have directly tested the 

extent to which students' expectancy or value beliefs about testing relate to the likelihood they 

exhibit low TTE (i.e., demonstrate pervasive RGB) on low-stakes educational tests. Guided by 

an application of the EEVT, the current investigation was designed to extend previous research 

on the dynamics of TTE to inform future research and practice. The purpose of this study was to 

identify the prevalence of low TTE in students in grades 4–8 and examine the extent to which 

student demographic characteristics, test-taking expectancy beliefs, and test-taking value beliefs 

are associated with the likelihood that students exhibit low TTE on a low-stakes CAT in reading. 

To that end, the specific research questions for the current investigation were as follows: 

Research Questions 

1.  What proportion of students in grades 4–8 exhibit low TTE on a CAT in reading, as 

determined by Response Time Effort (RTE; Wise & Kong, 2005)? 

2.  To what extent do student demographic variables relate to the likelihood students 

exhibit low TTE on a CAT in reading? 

3.  Do students differ in test-taking expectancy and value beliefs by student demographic 

variables? 

4.  To what extent do student test-taking expectancy and value beliefs relate to the 

likelihood students exhibit low TTE on a CAT in reading? 

 

 

10

CHAPTER II 

LITERATURE REVIEW 

 

This literature review summarizes previous research relevant to the current study. First, 

the fundamental constructs of motivation, engagement, and effort are described. Next, the 

contemporary EEVT is presented as it relates to the current investigation, and a conceptual 

model for empirical research on TTE is described. After that, critical issues in research on TTE 

are discussed, the use of response time methods for measuring TTE on CBTs is explained, and 

previous empirical studies of TTE are reviewed. An overview of the proposed student-level 

correlates of TTE follows, focusing on the previous empirical research on student demographic 

characteristics and motivational variables associated with TTE. This literature review concludes 

with the rationale for the current application of the expectancy–value theory to an empirical 

study of TTE, and the specific research questions and hypotheses for this study are presented. 

Theoretical Background and Conceptual Framework 

Motivation, engagement, and effort. The current investigation was contextualized 

within the more general research literature on academic motivation and student engagement. 

Broadly construed, motivation and engagement refer to multidimensional patterns of thoughts, 

feelings, and behaviors that facilitate, explain, or indicate individuals’ goal-directed actions. As 

such, several aspects of academic motivation and student engagement are integral to all aspects 

of the learning process, and so these two concepts have received considerable attention over 

recent decades. For more comprehensive reviews of the major theoretical perspectives on 

academic motivation and student engagement, readers are directed to volumes by Wentzel and 

Miele (2016) and Christenson, Reschly, and Wiley (2012), respectively. 

 

11

Currently, there are competing views about how researchers should define motivation and 

engagement, differentiate and measure their components, and conceptualize the relationship 

between these two concepts, but the general consensus among scholars is that motivation and 

engagement are considered two distinct, yet related “metaconstructs” or organizing frameworks 

(Christenson et al., 2012). Moreover, there are no universally accepted definitions for motivation 

or engagement, but experts in these areas of educational psychology research have proposed the 

following. Wentzel and Miele (2016) defined motivation broadly as “a set of interrelated desires, 

goals, needs, values, and emotions that explain the initiation, direction, intensity, persistence, and 

quality of behavior” (p. 1). Christenson and colleagues (2012) defined student engagement as 

“the student's active participation in academic and co-curricular or school-related activities, and 

commitment to educational goals and learning,” adding, “It is a multidimensional construct that 

consists of behavioral (including academic), cognitive, and affective subtypes” (p. 816). 

Thus, for the sake of conceptual clarity in this review, motivation refers to psychological 

processes (i.e., thoughts and feelings) which facilitate active participation in an academic task, 

whereas engagement refers to observable indicators of active participation in the task. In other 

words, “Motivation refers to the underlying sources of energy, purpose, and durability, whereas 

engagement refers to their visible manifestation” (Skinner & Pitzer, 2012, p. 22). To summarize, 

academic motivation represents the latent psychological processes (i.e., energy and purpose) that 

initiate and sustain students’ goal-directed actions, and student engagement refers to students’ 

goal-directed actions themselves (characterized by effort, intensity, and persistence). As such, 

students with higher motivation for a task are likely to demonstrate higher engagement in the 

task (as indicated by the amount of effort the students exert toward completing the task). 

 

12

Furthermore, because effort (the focus of the current study) is closely related to other key 

constructs in theories of motivation and engagement—with definitions of effort overlapping with 

numerous constructs in both fields—differentiating motivation and engagement is pertinent to 

research on TTE. Broadly construed, there are competing perspectives on whether effort should 

be considered an indicator (i.e., “markers or descriptive parts inside a construct”) or facilitator 

(i.e., “explanatory causal factors, outside the target construct, that have the potential to influence 

the target”) of engagement (Skinner & Pitzer, 2012, p. 25). Numerous experts have argued effort 

is understood best as one behavioral indicator of engagement, whereas motivational variables are 

the facilitators of effort (Appleton, Christenson, & Furlong, 2008; Newmann, Wehlage, & 

Lamborn, 1992; Skinner & Pitzer, 2012; Skinner, Kindermann, Connell, & Wellborn, 2009). 

In the domain of educational test-taking, one of the widely accepted definitions of TTE is 

“a student’s engagement and expenditure toward the goal of attaining the highest possible score 

on the test” (Wise & DeMars, 2005, p. 2). Accepting this definition, the primary construct of 

interest in the current study, TTE, can be conceptualized as one specific facet of test-taking 

engagement (which refers more broadly to psychological processes and observable solution-

focused behaviors directed toward responding correctly to test items). Other scholars described 

TTE as “the extent to which an examinee gives his or her best effort to the test, with the goal 

being to accurately represent what one knows and can do in the content area covered by the test” 

(Barry et al., 2010). By contrast, test-taking motivation represents the underlying psychological 

processes whereby solution-focused responses to items (i.e., TTE) are instigated and sustained. 

Test-taking motivation has been defined as “the willingness to engage in working on test items 

and to invest effort and persistence in this undertaking” (Baumert & Demmrich, 2001, p. 441). 

 

13

To summarize, a fundamental assumption in the current study is that students’ test-taking 

motivation directly influences the amount of TTE the students exert when engaging in the test. 

Although several theories of motivation and engagement could be useful for informing empirical 

research on student TTE, one theoretical framework guided the development of the current 

study. Specifically, the contemporary Eccles et al. expectancy–value theory (EEVT), derived 

from the work of Eccles, Wigfield, and colleagues (see Wigfield & Eccles, 2000; Wigfield, 

Tonks, & Klauda, 2009; 2016), provided the theoretical framework for the current investigation. 

Expectancy–value theory of achievement behaviors. The EEVT proposes the existence 

of relationships among individual's expectancies for success, personal values, task choices, 

beliefs about achievement, self-concepts of ability, goals, self-schemata, affective memories, 

perceptions of others' attitudes and expectations for them, and perceptions of past achievement 

outcomes (Eccles, 2005; Wigfield & Eccles, 2000; Wigfield et al., 2009). The primary constructs 

in the EEVT, expectancies and values, are considered internal, cognitive beliefs that influence 

the individual's observable, measurable behaviors (Schunk et al., 2014). Furthermore, a primary 

assumption of the EEVT is that individuals’ beliefs about the following questions can explain 

their achievement behaviors: 1) Can I do this task? and 2) Do I want to do this task and why? 

(Wentzel & Brophy, 2014). According to the EEVT, expectancy and value beliefs would be 

hypothesized to directly influence effort on achievement-related tasks (Eccles & Wang, 2012; 

Wigfield & Eccles, 1992, 2000). Thus, the EEVT suggests individuals would exert more effort 

toward initiating and completing achievement-related tasks if they believe they can succeed and 

that succeeding will result in a desirable outcome. Indeed, expectancy and value beliefs predict 

student effort, persistence, and achievement outcomes in academic and recreational activities 

(Bong, Cho, Ahn, & Kim, 2012; Meece, Wigfield, & Eccles, 1990; Wigfield et al., 1997).  

 

14

Eccles, Wigfield, Harold, and Blumenfeld (1993) described expectancies for success as 

subjective evaluations about whether one can perform a task successfully, whereas task value 

beliefs refer to subjective evaluations about whether one has a personally meaningful reason to 

engage in the task. Expectancy beliefs have commonly been differentiated from more general 

academic ability beliefs, with expectancy beliefs referring to task-specific beliefs as opposed to 

representing more global perceptions of self-competence (Schunk & Pajares, 2009). 

Subjective value beliefs can also be general or task-specific (Higgins, 2007), and there 

are at least four types of task value beliefs (Conley, 2012; Wigfield & Eccles, 2000). First, 

attainment value (or importance) is the perceived importance of a task based on how it allows 

one to express an important aspect of one’s self-identity. Second, intrinsic value (or interest) is 

enjoyment or interest in a task. Third, utility value refers to perceptions of a task's usefulness 

based on how it aligns with or advances one's future aspirations or goals. Finally, relative cost is 

a dimension of task value belief representing perceptions of the alternative opportunities that are 

forfeited when engaging in the activity (Eccles-Parsons et al., 1983; Wigfield & Eccles, 2000). 

In the context of academic test-taking, the EEVT can serve as a useful theoretical basis 

for explaining how student perceptions of test-taking might relate to their subsequent TTE on the 

test. Generally, the EEVT model would imply that the effort students exert toward responding to 

a test item would be most proximally determined by their subjective beliefs about the following 

questions: 1) Can I respond to this test item successfully? and 2) Do I have a meaningful reason 

to try to respond to this test item successfully? (Eklöf, 2010). In the current study, the EEVT 

model was applied to the domain of test-taking to help inform an empirical investigation of 

particular relationships of interest between individual student characteristics, their expectancy 

and value beliefs related to test-taking, and the TTE they demonstrate while taking the test. 

 

15

Demands-capacity model of test-taking effort. Over the past two decades, researchers 

have attempted to identify variables that might be associated with test examinees’ test-taking 

motivation and their subsequent TTE. This research has primarily focused on three categories of 

variables that have been hypothesized to influence examinee TTE: 1) characteristics of the test 

(e.g., format, content, or item features), 2) characteristics of the individual completing the test 

(e.g., demographic, psychosocial, or motivational variables), and 3) the context in which testing 

occurs (e.g., purpose of testing, test setting, or consequences associated with test performance). 

Based on this research, assessment researchers have suggested that a “test event” (i.e., one 

completion of a test by one student) can be conceptualized as a series of interactions between 

examinee-level variables and item-level variables, each of which occurs within a particular 

assessment context (Wise & Cotten, 2009; Wise & Smith, 2011; Wise, 2015). Wise and Smith 

(2011) proposed a conceptual model of TTE in which the TTE exhibited by an examinee on a 

test item is “influenced by the dynamic interplay” among these factors (p. 147). 

In Wise and Smith’s (2011) demands–capacity model (see Figure 1), the TTE a student 

exerts toward responding correctly to test items is regarded as a function of two primary model 

constructs: 1) resource demands (RD), and 2) effort capacity (EC). More specifically, RD is 

considered “an item characteristic representing the effort that must be expended by an examinee 

to correctly answer the item,” whereas EC is said to represent “the amount of effort the examinee 

is willing to devote to answering test items” (p. 147). According to Wise and Smith, RD is 

considered a fixed, item-level variable based on characteristics of the item (e.g., item length), 

whereas EC is considered a dynamic, examinee-level characteristic based on motivational 

differences (e.g., confidence to answer items) that can change during a single test session. 

 

16

The demands–capacity model implies that when an examinee's internal EC exceeds the 

RD of an item (at the time of the examinee-item encounter), the examinee is expected to give an 

effortful response (i.e., solution-focused behavior; SB); conversely, when the item’s RD exceeds 

the student’s momentary EC, the examinee is expected to exhibit low TTE by 1) omitting the 

item or 2) guessing quickly (i.e., rapid-guessing behavior; RGB). With that said, it must be noted 

that a third possibility would be for students to respond effortlessly but not rapidly, which would 

thereby not constitute RGB. Therefore, this model can account for only two of three possible 

response processes (i.e., the model explains SB and RGB but not non-rapid guessing). Further, as 

Wise and Smith (2011) acknowledged, “There are no methods currently available to quantify EC 

and RD on a common scale, which precludes a literal comparison of their values” (p. 150). As 

such, it is currently unclear how the concept described as EC in the Wise–Smith model might 

relate to other constructs in the research literature on motivation and student engagement. 

Still, the primary strength of the demands-capacity model is that it provides researchers a 

relatively parsimonious framework for explaining the variables that research and theory suggest 

may be significant correlates of student TTE. In this model, TTE is assumed to be multiply 

determined by numerous test characteristics, individual differences, and test context factors. The 

empirical research support for the components of the model is described later in this chapter. 

Another useful aspect of the demands–capacity model is that it conceptualizes TTE as a 

dynamic, changing construct that can vary between students or within a single student across the 

duration of a testing session. As such, the demands–capacity model accounts for the examinee’s 

initial motivation upon beginning the test, as well as changes in the student’s motivation during 

the test. This notion is helpful because TTE is understood best as fluid, with test examinees often 

showing variation in effort on different items, subtests, or tests within assessment batteries. 

 

17

Finally, the relationships between the test-level variables, test context variables, and 

student-level variables proposed in the Wise–Smith model could be tested through empirical 

research on student TTE. Indeed, several scholars have begun testing some of the hypothesized 

relationships among variables in the demands–capacity model, and the model serves as a useful 

framework for identifying which of the potential correlates of TTE warrant additional research. 

Conceptual framework of current study. Given the potential value of employing the 

demands–capacity model for informing research on student TTE in a variety of testing contexts, 

the current study aims to build on the previous work of Wise and colleagues, using the EEVT as 

the theoretical basis for an empirical study of the correlates of low TTE on a low-stakes CAT. 

The demands–capacity model assumes that TTE is multiply influenced by the dynamic 

relationships among 1) the characteristics of the test, 2) the characteristics of the individual test 

examinee, and 3) the context in which testing occurs, but an investigation of all these factors 

would far exceed the scope and purpose of the current study. For this reason, the current 

investigation focused only on the student-level correlates of TTE. 

The conceptual framework for the current study (see Figure 2) represents an application 

of the EEVT in the context of academic test-taking. The EEVT implies that achievement-related 

choices, effort, and persistence (which in the context of test-taking would be a student’s TTE) 

are influenced most directly by an examinee's expectations for success on the test and perceived 

value of the test. The relationships of interest in the current research study are considered factors 

comprising the student’s EC in the demands–capacity model of TTE. Wise and Smith (2011) had 

proposed that examinees possess several individual “internal factors” (i.e., expectations about 

test demands; desire to please teachers or parents; citizenship; competitiveness; ego satisfaction) 

that would be hypothesized to be determinants of their EC prior to engaging in the test (p. 149). 

 

18

In summary, the contemporary EEVT is a useful theoretical framework for explaining 

student engagement in academic tasks in terms of their expectancies for success on the task and 

the extent to which they value engaging in the task (Wigfield et al., 2009). By contextualizing 

the current investigation of student-level correlates of TTE within the EEVT, this study draws 

from a robust theoretical framework of motivation and engagement. The EEVT would suggest 

students’ perceptions of test-taking (i.e., expectancy beliefs and value beliefs) should directly 

influence their subsequent test-taking engagement, which can be indicated by their TTE. This 

model was used to inform an empirical study of the relationships between student-level variables 

(i.e., grade, gender, race/ethnicity, test-taking expectancy beliefs, and test-taking value beliefs) 

and student TTE. In doing so, the current study extends our understanding of the demographic 

characteristics and internal motivational variables associated with low TTE, which may help to 

eventually inform the development of practices and policies aimed at addressing low TTE. 

Test-Taking Effort 

Issues resulting from low test-taking effort. Despite the substantial variability in how 

scholars have defined TTE in previous empirical studies, the research on test-taking motivation 

and effort has consistently suggested that lower TTE is associated with poorer test performance 

(Haladyna & Downing, 2004; Sundre & Kitsantas, 2004; Wise & DeMars, 2005). Wise and 

DeMars (2005) conducted a meta-analysis of empirical research on the relationships between 

TTE and test scores, comparing the magnitude of the differences in test performance between 

groups of high-effort and low-effort test-takers. The authors found that results from 24 of the 25 

reviewed studies indicated significant, positive effects of TTE on test performance. The effect 

sizes (ES) ranged from –0.04 to 1.49 (mean ES g = 0.59), suggesting that high-effort test-takers 

scored more than one-half of a standard deviation higher than low-effort examinees on average. 

 

19

Such findings suggest TTE may contribute substantial construct-irrelevant variance to 

test scores. Furthermore, high rates of non-effortful responding can have spurious effects on the 

reliability and validity of aggregate test datasets (Wise, 2015). Several studies have shown that 

test scores from examinees with low TTE can significantly influence the reliability coefficients 

for the test (Sundre & Wise, 2003; Wise & DeMars, 2009; Wise & Kong, 2005) as well as the 

correlations between test scores and external criteria (Wise, 2009). 

In a seminal article on TTE, Wise and Kong (2005) hypothesized that removing the test 

scores of any examinees identified as exhibiting low TTE would result in a set of remaining 

scores that more accurately represents the actual knowledge and skills of students in the sample. 

Their argument assumed that these scores had been invalidated based on the evidence from the 

examinee’s response processes, and therefore the non-effortful results should not be considered 

valid indicators of those students’ true ability. Wise and Kong postulated that removing scores 

from low-effort respondents would result in a) higher test scores, b) equal or greater test score 

reliability, and c) increased external validity (i.e., higher correlations with scores from measures 

expected to relate to test performance). The results of Wise and Kong’s (2005) study supported 

their hypotheses, and their findings have been replicated in numerous studies using comparable 

methods (e.g., Kong, Wise, Harmes, & Yang, 2006; Swerdzewski et al., 2011). This process of 

removing scores from students flagged for low TTE is commonly called motivational score 

filtering, and several researchers have evaluated its utility as a post hoc statistical method for 

“data cleaning” (Sundre & Wise, 2003; Wise, 2015). Motivational filtering might be a reasonable 

solution for addressing problems related to low TTE, and some scholars have advocated for the 

use of this approach as a way to improve the validity of educational measurement (Wise, 2009). 

 

20

On the other hand, it is possible excluding the scores of low-effort responders from 

aggregated data reports could diminish the validity of resulting test interpretations if the subset of 

examinees retained is not representative of the students for whom the data are used. Currently, it 

is unclear whether score filtering tends to differentially remove test scores from any subgroups 

who are more likely to exhibit low TTE. Because of these possible unintended consequences of 

TTE-based filtering approaches are still unknown, research should address whether any groups 

are especially likely to exhibit low TTE. If students vary in TTE by demographic group, it might 

suggest score filtering has the potential to exclude scores from some subgroups. Systematically 

removing the scores of students from certain groups could lead to educational decision-making 

less informed by their underlying academic needs. For this reason, it is essential for researchers 

to address whether various subgroups of students may differ in the odds they exhibit low TTE. 

Furthermore, if educators could take preventative steps to improve TTE, it is possible statistical 

adjustments like score filtering would not be necessary. Rather than removing the test scores of 

students with low TTE, it would be more appropriate to find ways to increase student test-taking 

motivation in order to promote more effortful responses to test items. In sum, identifying groups 

of students who are most likely to disengage and malleable predictors of disengagement could 

inform the development of targeted strategies for promoting TTE in low-stakes contexts. 

In addition to better understanding the potential consequences of low TTE for any 

interpretations made about aggregate test data, another critical issue concerns the effects of low 

TTE on the validity of interpretations about individual scores. If a student has low test-taking 

motivation, it might be manifested as low TTE by the student refusing to comply with the testing 

procedures, omitting answers, leaving entire sections of the test blank, cheating, guessing, or 

responding in a non-effortful pattern (Haladyna & Downing, 2004; Wise, 2015). 

 

21

These types of unexpected response processes are problematic for test interpretation and 

use, as they make it difficult to ascertain whether an incorrect response indicated a) the examinee 

did not have the target skill being measured, b) the examinee did have the target skill but made 

an error, c) the examinee had partial knowledge but made an incorrect educated guess, or d) the 

examinee might have had the target skill but did not exert enough effort to give a meaningful 

answer. Test users respond to this issue by evaluating whether evidence suggests the student’s 

score has been invalidated due to construct-irrelevant variance associated with low TTE. One 

recommendation for identifying potentially invalid scores is a process Wise (2015) termed the 

individual score validation (ISV) approach, characterized by the following five-step procedure: 

First, the test user identifies the construct-irrelevant factors that represent the greatest 

validity threat. Second, suitable indicators of each construct-irrelevant factor are chosen 

or developed…The third step is to establish criteria for classifying test scores as invalid, 

by defining procedural rules for invalidating scores. The fourth step is to apply the 

indicators and criteria to the set of test events in question and identify those test events 

whose scores are invalidated. Once invalid test scores have been identified, a final step is 

to decide what course(s) of action to take. (p. 246) 

Wise’s (2015) ISV approach gives researchers and practitioners a systematic method for 

evaluating whether low TTE might have invalidated a particular student’s test score. To use this 

method, test users must select an appropriate indicator of TTE to “flag” students with low TTE. 

Because there is no single criterion for measuring “adequate TTE,” one must define low TTE and 

provide evidence that the validity flag can reliably classify effortful and non-effortful responders. 

Therefore, an important issue pertinent to research on student TTE concerns how TTE should be 

measured. In the following section, commonly used methods for measuring TTE are discussed. 

 

22

Issues in measuring test-taking effort. In a volume on motivation in school, Schunk 

and colleagues (2014) stated that effort is considered a latent variable measured through a) direct 

observations of behavior or b) self-report via questionnaires, interviews, or think-aloud. Most 

researchers have measured TTE using brief self-report measures, which are typically Likert-type 

questionnaires that yield general, global estimates of students’ perceived test-taking motivation 

before testing or their post-test perceptions of their TTE (Wise & DeMars, 2006). One 

commonly used measure of test-taking motivation and TTE is the Student Opinion Scale (SOS; 

Sundre, 1999), comprised of subscales measuring perceptions of Effort (e.g., “I gave my best 

effort on this test.”) and Importance (e.g., “Doing well on this test was important to me.”). 

Most of the research on TTE is based on self-report, but scholars have raised several 

unique concerns about the conclusions that can be drawn based on the TTE students report after 

testing. First, self-reports only yield a general estimate of self-perceived TTE, as opposed to 

providing more objective data about student TTE for a particular section of a test or on specific 

items. Second, although they are useful as global estimates of TTE, self-report measures cannot 

provide data about changes in students’ TTE that might occur during a single testing session. 

Third, researchers have questioned how truthfully or reliably students respond to self-report 

measures of TTE (Wise & Kong, 2005). Indeed, previous research from the attribution theory 

literature suggests that some individuals might tend to attribute their poor performance to a lack 

of effort to help themselves preserve a positive self-concept (Weiner, 1992). Lastly, it is unlikely 

that examinees (especially younger students) could accurately report the proportion of items on 

which they guessed, which suggests self-report is not useful for estimating the total prevalence of 

low-effort responses. Overall, there is inconclusive evidence about the extent to which self-report 

of TTE might be adequately reliable, valid, and useful for informing educational decisions. 

 

23

For these reasons, several researchers have indicated the need for more objective methods 

for quantifying TTE. One alternative to self-report questionnaires offered by some measurement 

experts is the use of person-fit statistics, which compare examinees’ responses to a theoretical 

model for the test. According to Meijer (2003), “Person-fit statistics have been proposed that can 

be used to investigate whether a person answers the items according to the underlying construct 

the test measures or whether other answering mechanisms apply… Most statistics are formulated 

in the context of item response theory (IRT) models…and are sensitive to the fit of an individual 

score pattern to a particular IRT model” (72). As such, the person-fit approach enables test users 

to evaluate whether responses are improbable given the estimated item and examinee parameters. 

That is, when item responses deviate substantially from the estimated IRT model, the observed 

aberrant responses can be interpreted as likely due to guessing rather than effortful responding. 

However, some experts have asserted that this assumption may not always be correct. As 

Lord and Novick (1968) argued, identifying all responses that deviate from expected IRT models 

as “random” guesses would be inconsistent with our understanding of test-taking behavior. For 

instance, examinees who have partial knowledge about an item might be able to eliminate clearly 

incorrect responses and answer with the most reasonable remaining option. Others could have 

misconceptions about items and thereby respond in a way that does not fit the IRT model due to 

their misunderstanding. Thus, it may be problematic to interpret every item flagged as violating 

the person-fit model as a completely random guess; however, most of the indices for flagging 

non-model-fitting responses treat all aberrant responses as guesses (Weiss, 1983). In summary, 

both self-report and IRT-based person-fit methods for measuring TTE have notable limitations, 

and neither method identifies non-effortful item responses with complete precision. In response 

to this issue, Wise and Kong (2005) proposed an alternative method for measuring student TTE. 

 

24

Response time effort. Over the past decade, educational measurement experts have 

suggested item response latency could be used as a more reliable and practical indicator of TTE 

(Wise, 2014), citing an observation by Schnipke and Scrams (1997) that some test-takers tend to 

respond very quickly when they approach the end of timed tests. This finding had suggested 

response times might be used to differentiate between effortful and non-effortful item responses. 

Wise and Kong (2005) applied this idea to develop an innovative approach for measuring TTE 

on CBTs by analyzing item response times. 

Specifically, Wise and Kong’s RTE method is used to classify responses as effortful or 

non-effortful using Schnipke and Scrams’ (1997) dichotomous distinction between SB and RGB 

by comparing response times to a prespecified threshold (for a review of research on response 

time thresholds, see Kong, Wise, & Bhola, 2007). Researchers derive RTE for a test event (i.e., 

one completion of a test by an individual student) by summing the number of items classified as 

SB and dividing that number by the total number of items. Hence, researchers can calculate the 

student’s RTE score—indicating the proportion of responses classified as SB—a value ranging 

from 0.0 (no responses classified as SB) to 1.0 (all responses classified as SB). 

Wise and Kong (2005) listed several reasons why response time data may be preferable 

for research on student TTE. First, collecting item response times on CBTs is unobtrusive, as 

researchers can measure latency without examinees realizing that the data are being recorded. 

Second, this indicator of TTE is not based on subjective judgments of effort; instead, scores 

represent examinees’ observed response behaviors during testing. Third, response time data can 

be collected for each item, which allows for the analysis of changes in TTE across different test 

items. A final practical advantage is that RTE can be used to measure TTE on a CAT without 

requiring the IRT-based item parameters for each of the items in a large CAT item bank. 

 

25

In Wise and Kong’s (2005) seminal RTE study, they proposed five hypotheses about the 

validity of RTE as an indicator of student TTE, and a decade of subsequent research on RTE in 

higher education contexts has provided support for the following hypotheses (Wise, 2015). 

1.  RTE scores should demonstrate adequate levels of reliability. 

The first hypothesis was supported by the results of Wise and Kong’s (2005) study, as the 

observed coefficient alpha for RTE scores was 0.97 (indicating high internal consistency). Other 

RTE research studies in higher education contexts have consistently corroborated these findings, 

with the observed internal consistency reliability coefficients for the RTE index ranging from 

0.81 (Wise & DeMars, 2010) to 0.99 (Kong et al., 2007; Wise & DeMars, 2006) and exceeding 

0.90 in several subsequent research studies (e.g., Kong et al., 2006; Wise et al., 2006). 

2.  RTE scores should be correlated with other measures of examinee test-taking effort. 

The second hypothesis was that RTE should demonstrate concurrent validity. Wise and 

Kong’s (2005) findings supported this hypothesis, as RTE scores were significantly correlated 

with both self-reported TTE (r = 0.25) measured by the Effort subscale of the SOS and person-fit 

estimates (r = –0.42) measured by the Modified Caution Index (Harnisch & Linn, 1981). Other 

researchers have found similar associations between RTE and self-reported TTE, with the 

observed validity correlations ranging from 0.38 (Kong et al., 2007) to 0.61 (Rios et al., 2014). 

3.  RTE scores should not be correlated with measures of academic ability. 

Wise and Kong’s (2005) third hypothesis pertained to evidence for discriminant validity, 

as the authors hypothesized RTE scores should not be related to students’ academic ability. To 

test this hypothesis, the authors analyzed relationships between student RTE scores and previous 

scores on the Scholastic Assessment Test (SAT) and found no significant correlations between 

RTE and SAT–Verbal (r = 0.06) or SAT–Quantitative (r = –0.02) scores. 

 

26

Other scholars have found similar results (Rios et al., 2014; Wise & DeMars, 2010), 

suggesting academic ability does not appear to be a correlate of low TTE on low-stakes tests. 

These results could suggest that the students’ low test-taking motivation (rather than low 

academic skills) might have been a reasonable explanation for their low TTE. 

4.  Instances of rapid-guessing behavior should yield item scores that are correct at a rate 

consistent with chance. 

Research has also supported Wise and Kong’s (2005) fourth hypothesis that RGB should 

have accuracy rates comparable to chance guessing. In previous studies, scholars have found the 

accuracy of responses classified as RGB have not significantly exceeded the accuracy expected 

by random guessing, whereas those classified as SB have significantly exceeded chance. For 

instance, Wise (2006) found 25.5% of items identified as RGB were correct, but 72.0% of item 

responses identified as SB were correct; relatedly, Setzer and colleagues (2013) analyzed over 

one million item responses and found the accuracy rates were 27.9% for RGB and 51.7% for SB. 

5.  RTE scores should show motivation filtering effects similar to those found with other 

measures of examinee effort. (p. 174–175) 

Lastly, several empirical studies have supported Wise and Kong’s (2005) fifth hypothesis 

that RTE scores should be comparable to other measures of TTE for motivational score filtering 

(Rios et al., 2014; Swerdzewski et al., 2011; Wise & Cotten, 2009, Wise & Kong, 2005). Results 

from these studies have indicated filtering the test scores of students identified as exhibiting low 

TTE (as determined by RTE score), average test scores increase, test score standard deviations 

decrease, and the magnitudes of correlations with external variables—indicative of convergent 

validity—increase (Kong et al., 2007; Wise, 2015; Wise & DeMars, 2010; Wise & Kong, 2005). 

In the next section, empirical research on Wise and Kong’s (2005) RTE approach is described. 

 

27

Empirical Research on Student Test-Taking Effort 

Prevalence of low test-taking effort. Empirical studies using Wise and Kong’s (2005) 

RTE index have consistently shown that a small proportion of test examinees exhibit low TTE 

when taking a CBT. However, the results from these studies have varied considerably in the 

observed prevalence rates of low TTE. As shown in Table 1, existing research studies in college 

samples have indicated that the proportions of students identified as exhibiting low TTE 

(commonly defined as RTE scores below 0.90) have ranged from 0.6% (Wise & DeMars, 2010) 

to as high as 35.6% (Swerdzewski et al., 2011). Altogether, the observed rates of low TTE in 

these studies are alarming, given that the results from numerous studies have indicated that more 

than 10% of students exhibited low TTE (i.e., RGB on at least 10% of items) when taking 

educational CBTs (Kong et al., 2007; Wise et al., 2006; Wise & DeMars, 2006; Wise, Pastor, & 

Kong, 2009; Wise & DeMars, 2010; Swerdzewski et al., 2011; Rios et al., 2014). 

 

Even though there have been fewer studies on the RTE scores of school-age students on 

low-stakes educational tests, there is some research evidence to suggest that RGB does indeed 

occur in K–12 contexts. As displayed in Table 1, the proportion of examinees flagged as having 

low TTE were reported in four empirical studies, with these observed proportions ranging from 

1.1% (Wise et al., 2004) to 11.9% (Wise, 2015). Of the research in school-age samples, two 

studies in particular (Wise et al., 2010; Wise & Ma, 2012) warrant more detailed discussions in 

the current literature review. Both of these large-scale studies by Wise and colleagues used data 

collected from students who used the Northwest Evaluation Association (NWEA)’s Measures of 

Academic Progress (MAP) assessment, a multiple-choice CAT system with mathematics and 

reading comprehension tests. The results of these studies provide strong support for the notion 

that widespread RGB may pose a significant threat to the validity of commonly used CATs. 

 

28

Wise and colleagues (2010) investigated the relationship between TTE and the time the 

testing occurred. In a large-scale secondary data analysis, researchers analyzed data from all of 

the students in grades 3–9 who used the MAP math (n = 355,116) or reading (n = 356,715) CAT. 

The authors found that average RTE scores for these test events decreased as the time of day was 

later. For instance, results indicated the mean RTE score for MAP reading tests taken at 7:00 

a.m. was 0.994, whereas the mean RTE score for reading tests taken at 2:00 p.m. was 0.974. 

In another related study, Wise and Ma (2012) analyzed MAP reading and math data to 

calculate RTE for students in grades 3–9. Comparing several methods for RTE time thresholds, 

the authors found that even the most conservative approach (a common three-second threshold) 

flagged a considerable proportion of students as having low TTE. In math, 4.8% of the 286,150 

analyzed test events were flagged as invalid, whereas 10.6% of the 287,843 analyzed reading test 

events were flagged as invalid. These estimates exceeded those by Wise and colleagues (2010). 

Although the researchers did not perform comparative or correlational analyses to test the 

significance of group differences in RTE, results from these two studies still have important 

implications for future research in this area. First, researchers conducted population analyses and 

thus observed the true mean RTE scores. To date, no other scholars had reported the mean RTE 

scores for K–12 students disaggregated by grade level and gender. Next, given their large sample 

sizes, these studies provide clear evidence that RGB does indeed occur when K–12 students use 

CATs at school. Lastly, the tests in these studies are used widely in schools for measuring math 

and reading skills. These types of assessments can be considered low-stakes from the perspective 

of the examinees because they carry no (or minimal) consequences for the students, but they are 

used to inform critical educational decisions. The high rates of RGB in these studies support the 

need for additional research on the nature of TTE in these types of assessment conditions. 

 

29

Support for demands–capacity model. As previously stated, Wise and Smith’s (2011) 

demands–capacity model assumes TTE is influenced by three types of factors: item-level 

variables, examinee-level variables, and testing context variables. Although it is only a 

preliminary model for explaining why RGB occurs, the Wise–Smith model is a useful 

framework for describing what scholars have learned so far about the correlates of TTE based on 

evidence from the extant research literature. In several studies, researchers have investigated 

item-level correlates of TTE by testing the relationships between item characteristics and the 

TTE students exhibited on the items. In these studies, researchers used Response Time Fidelity 

(RTF), an index analogous to RTE representing the rate of RGB exhibited across all examinees 

for a given item (as opposed to the RGB exhibited across all items for a given examinee). 

The results of previous studies have consistently shown three primary characteristics are 

strongly associated with student TTE: item position in the test, item length, and the presence of 

additional reading materials (Wise, 2006; Wise et al., 2009; Setzer et al., 2013). By contrast, item 

difficulty has not been found to predict RTF (Wise, 2006; Setzer et al., 2013). The lack of a 

significant association between item difficulty and the likelihood students exhibit RGB was 

unexpected because the EEVT seems to imply students would have lower expectancies for 

success on difficult items compared to easy items (and therefore exhibit less effort). In fact, the 

research has suggested students who exhibit low TTE tend to respond so rapidly that they do not 

accurately judge the difficulty of the item; instead, they make quick decisions about the amount 

of “mental taxation” the item will likely cause them (Wolf, Smith, & Birnbaum, 1995). These 

findings seem to suggest students do have a limited amount of “mental energy” they can exert 

during a testing session and that the perceived RD of a test item does relate to TTE on the item. 

 

30

Further empirical support for the Wise–Smith model is derived from research on the 

effects of the testing context on student TTE. Most early research studies in this area focused on 

one specific contextual variable: the consequences of test performance. Research has shown that 

offering a reward of one dollar for performance on a low-stakes test increased the performance 

of eighth-grade students (O’Neil, Sugrue, & Baker, 1995). Other studies focused on differences 

in TTE between graded and ungraded tests. Wolf and Smith (1995) found that college students 

reported higher levels of TTE on tests they were told would count toward a grade, though Smith 

and Smith (2002) later found that TTE and test performance did not improve in consequential 

conditions for students with high test anxiety. Collectively, these studies have provided some 

support for testing context factors as correlates of TTE in the demands–capacity model of TTE. 

 

In contrast to the extant research literature on the item-level correlates and context-level 

correlates of TTE, there are relatively few studies on the student-level correlates of TTE. In the 

demands–capacity model, Wise and Smith (2011) proposed that the following “internal factors” 

may be determinants of students’ EC when they begin a test: level of proficiency, amount of test 

preparation, expectations regarding test demands, desire to please teachers, parents, and others, 

citizenship, competitiveness, and ego satisfaction (p. 149). Even though there is evidence that 

student expectations regarding test demands do indeed relate to the amount of TTE students exert 

(Wise, 2006; Wolf et al., 1995), few researchers to date have empirically tested the hypothesized 

relationships between student “internal factors” and TTE in the Wise–Smith model. This notable 

gap in the extant research on TTE points to a particularly important area for further research. As 

Wise (2015) concluded, “research should be directed toward better understanding the dynamics 

of test-taking motivation” (p. 250). The next section of this literature review describes what we 

currently know about the psychological and motivational factors that may relate to student TTE. 

 

31

Student test-taking beliefs and test-taking effort. According to the EEVT, expectancy 

beliefs and value beliefs presumed to be the most proximal predictors of achievement-related 

behaviors, which would suggest student perceptions of test-taking should be expected to relate to 

their TTE directly. Indeed, there is research evidence to suggest student beliefs about the value 

of test performance is a correlate of their TTE. Those who perceive a test as affecting their 

grades (and thus having higher utility value) report higher levels of test-taking motivation and 

outperform students who are told a test will not be graded (Wolf & Smith, 1995). Even in the 

absence of test consequences, students are more likely to omit or respond incorrectly to test items 

they perceive as having greater mental taxation (i.e., higher relative cost), even if the items are 

not more difficult (Wolf et al., 1995). In one study, students varied in their perceptions of how 

relevant a test would be to their future employment (i.e., utility value), and these group 

differences in task value beliefs partially explained the variation in self-reported TTE and test 

performance (Chan, Schmitt, DeShon, Clause, & Delbridge, 1997). Cole, Bergin, and Whittaker 

(2008) analyzed an expectancy–value model of test-taking by measuring the TTE of 1,005 

college students who took a low-stakes exam (used for institutional evaluation purposes). 

Mediational analyses indicated students’ test-taking value beliefs predicted their self-reported 

TTE, which in turn predicted their test performance. Together, these findings provide support for 

the proposition that student beliefs about testing may be individual-level correlates of TTE. 

Furthermore, one of the only studies to directly test the student-level correlates of TTE 

(using the RTE method) was Wise and Cotten’s (2009) investigation of a low-stakes exam taken 

by 802 college students. The purpose of the study was to test whether TTE was associated with 

“attitudinal and affective determinants” (p. 192), as suggested by the demands–capacity model. 

The researchers provided the following rationale for examining student beliefs about testing: 

 

32

The Wise–Smith model of test-taking effort would consider student conceptions of 

assessment as important components of the internal motivational factors that contribute to 

effort capacity. In a low-stakes testing situation, in which the dominant motivational 

factors of test consequences are absent, the model would predict that test-taking effort 

should be related to the conceptions of assessment that the student brings into the testing 

session. (Wise & Cotten, 2009, p. 194) 

To empirically test this proposition, researchers gathered survey data from students about 

their perceptions of test-taking using four subscales from the Student Conceptions of Assessment 

Scale (SCoA; Brown, Irving, Peterson, & Hirschfeld, 2009): Improvement (measuring attitudes 

toward the use of the testing for academic improvement), Affect (measuring how positively one 

feels about testing), Irrelevant (measuring the degree to which one believes testing is irrelevant 

to learning), and Accountability (measuring the degree to which one believes testing is important 

for holding students and schools accountable). Using hierarchical multiple regression analyses, 

Wise and Cotten (2009) analyzed the extent to which scores on the four subscales of test-taking 

perceptions derived from the SCoA predicted student TTE, controlling for gender and SAT 

scores. Descriptive statistics for the model indicated that RTE was significantly associated with 

three of the four SCoA subscales (Improvement, Irrelevant, and Accountability), although the 

block of four SCoA predictors explained just 6% of the variance in RTE scores. Further, only 

the SCoA Improvement and Affect scales had statistically significant regression weights, with 

higher Improvement scores associated with an increase in RTE score and higher Affect scores 

associated with a decrease in RTE score. Based on these findings, the authors concluded that 

“student conceptions of assessment were clearly related to test-taking effort” (p. 200). 

 

33

Still, Wise and Cotten’s (2009) study had several limitations that may weaken these 

conclusions. First, the authors did not provide a clear rationale for the use of the SCoA or clear 

conceptual definitions of the constructs its four subscales were intended to measure. As such, the 

practical significance of the observed relationships between RTE scores and the Improvement 

and Affect predictor variables remains unclear. Although the subscales in the SCoA may share 

some conceptual overlap with the four task value beliefs in the contemporary EEVT—items in 

the Improvement and Irrelevant subscales seem to measure facets of utility value, items in the 

Affect subscale seem to measure intrinsic value, and items in the Accountability subscale seem 

to measure attainment value—the major shortcoming of this study was that it lacked a strong 

theoretical foundation for investigating the motivational variables associated with TTE. Further, 

Wise and Cotten’s (2009) finding that positive affect toward testing was negatively associated 

with student RTE appears to be inconsistent with Cole and colleagues’ (2008) finding that 

personal interest in testing was positively associated with student TTE. Altogether, Wise and 

Cotten’s (2009) study was an influential contribution to research on the relationships between 

student test-taking beliefs and TTE (as proposed in the demands–capacity model), but an unclear 

justification for the student-level motivational variables in their investigation does limit the 

conclusions one should make about the psychological processes that contribute to low TTE.  

In another study focused on the relationships between student test-taking beliefs and 

TTE, Zilberberg, Finney, Marsh, and Anderson (2014) used a measure of test perceptions that 

could be considered more consistent with an expectancy–value model of test-taking than the 

measure used by Wise and Cotten (2009). Zilberberg and colleagues examined the extent to 

which first-year college students’ perceptions about K–12 accountability assessment systems 

predicted their test-taking motivation on a university-mandated academic achievement test. 

 

34

Participants responded to the Students’ Attitudes towards Institutional Accountability 

Testing in K–12 (SAIAT-K–12; Zilberberg, Anderson, Finney, & Marsh, 2013), completed 

quantitative and scientific reasoning tests, and responded to the SOS. The researchers tested a 

hypothesized fully-mediated path model and found that “students’ attitudes toward institutional 

accountability tests in K–12 directly affect perceived importance of university accountability 

tests, which in turn directly affects test-taking effort, which in turn directly affects test 

performance” (p. 367). In other words, the results from this study suggested the relationship 

between students’ perceptions about K–12 testing and their performance on a university test 

could be explained by the effects of test perceptions on TTE.  

The specific perceptions that were associated with TTE and subsequent performance on 

the test were Purpose (“students’ understanding of the purpose of such tests”) and Parents 

(“students’ parents paid attention to test scores”); by contrast, subjective beliefs about Validity 

(“K–12 institutional accountability tests are adequate measures of ability”) and Disillusionment 

(“student dissatisfaction with these tests”) were not significant predictors in this mediation 

model. Thus, Zilberberg and colleagues’ (2014) findings supported Wise and Cotten’s (2009) 

conclusion that certain beliefs about assessment are significantly associated with the TTE 

students exhibit in low-stakes assessment scenarios. Altogether, these studies suggest that beliefs 

about the purpose of an assessment (i.e., utility value or attainment value) relate to student TTE. 

Summary and remaining questions. This review of empirical research on TTE has 

presented what we currently know and pointed toward gaps in the literature that may warrant 

further research. First, most students tend to exhibit (or self-report) adequate levels of TTE when 

completing low-stakes academic tests, but scholars have clearly documented that a small 

proportion of test takers disengage from testing and exhibit inappropriately low TTE. 

 

35

The proportions of college students flagged with low TTE have ranged from less than 1% 

(e.g., Wise & DeMars, 2010) to greater than 20% of test takers (e.g., DeMars, 2007; Rios et al., 

2014; Swerdzewski et al., 2011; Wise et al., 2009). However, despite this growing interest in 

TTE on low-stakes university-mandated assessments, relatively few studies to date have focused 

on the prevalence of low TTE in K–12 settings. As such, questions remain about whether 

elementary and secondary students show similar rates of disengagement on low-stakes tests. 

Second, the Wise–Smith (2011) model of TTE has accumulated some empirical support, 

but the research to date has primarily focused on correlates of TTE that are considered test-level 

variables or testing context variables (as opposed to student-level variables). The effects of 

consequences, incentives for performance, and test item-level characteristics on TTE have been 

documented clearly, but less is known about the demographic characteristics or motivational 

patterns of students most likely to exhibit disengagement from low-stakes testing. 

Finally, only a few empirical studies have addressed the extent to which motivational 

factors may be associated with non-effortful response processes. Moreover, these studies have 

not been designed in a way that explicitly draws support from a prominent theory of motivation 

and engagement, such as the EEVT. Further research on “internal factors” related to TTE in the 

Wise–Smith (2011) model is needed, as these relationships are not understood fully at this time. 

Student-Level Correlates of Test-Taking Effort 

 

Numerous studies based in the EEVT have demonstrated the existence of relationships 

between demographic characteristics, beliefs, and effort and persistence on achievement-related 

tasks. Also, the extant research has revealed several key group differences in the expectancy and 

value beliefs students endorse in various academic domains, and these relationships have indeed 

explained subgroup differences in school engagement and overall academic performance. 

 

36

Although few studies have explicitly focused on relationships between student 

demographic characteristics and test-taking expectancy and value beliefs, there is research 

evidence to suggest that certain groups of students may exhibit differences in their broad 

expectancy or value beliefs in a given subject area (such as reading, math, or science). This 

section describes prior research on subgroup differences in a) TTE, b) expectancy and value 

beliefs in the domain of reading, and c) expectancy and value beliefs specific to test-taking. 

Age/grade level. Several studies have indicated that students differ in their test-taking 

motivation and TTE by age or grade level, although few scholars have directly tested whether 

age or grade are significantly associated with the likelihood students exhibit low TTE. Still, the 

trend in previous empirical research is that TTE appears to decrease slightly over time, with 

students in higher grades reporting and exhibiting lower TTE. 

For example, Wolf and colleagues (1995) studied TTE in a sample of students in grades 

10 and 11, and results indicated that grade 11 students reported lower test-taking motivation than 

grade 10 students and consequently were more likely to omit items that were high in mental 

taxation. This finding by Wolf and colleagues suggests students may experience decreases in 

their TTE as they take tests multiple times over consecutive years; indeed, a handful of other 

studies have also pointed to grade-level differences in average RTE scores or the proportion of 

students with low TTE. Hauser and Kingsbury (2009) identified low-effort test respondents on a 

reading CAT in a large sample of students in grade 3 (n = 16,209) and grade 9 (n = 18, 705). The 

proportion of grade 9 students who were flagged as exhibiting low TTE (7.7%) was higher than 

the proportion flagged in grade 3 (6.9%), although researchers did not test whether there was a 

statistically significant difference by grade in the likelihood of exhibiting low TTE. 

 

37

Relatedly, Wise and DeMars (2010) found sophomore students had a lower mean RTE 

score (M = 0.943) than freshman students (M = 0.996). In their study, 11% of second-year 

students were identified as exhibiting low TTE (defined as RTE below 0.90), whereas 0.6% of 

first-year students were flagged. Results from this study showed 43 of the 45 examinees (95.6%) 

removed from the sample through motivational score filtering were sophomores and indicated 

that the lower TTE of sophomores had significantly distorted the estimates of student growth. 

Even more notably, Wise and colleagues (2010) demonstrated strong evidence of a 

decline in RTE in their analysis of test scores from students in grades 3–9. Analyzing 356,715 

test events from the MAP reading CAT, they found the following mean RTE scores: 0.995 in 

grade 3 (n = 138,016), 0.994 in grade 4 (n = 134,535), 0.995 in grade 5 (n = 131,818), 0.988 in 

grade 6 (n = 122,963), 0.984 in grade 7 (n = 121,273), 0.984 in grade 8 (n = 120,375), and 0.971 

in grade 9 (n = 79,189). Their results showed a consistent decline beginning in sixth grade. 

In sum, previous RTE studies suggest TTE declines as students age, with older students 

generally having lower RTE. This trend is consistent with developmental declines in overall 

academic motivation in previous research (Jacobs, Lanza, Osgood, Eccles, & Wigfield, 2002; 

Wigfield et al., 1997). There is growing evidence that academic motivation decreases as students 

get older, and research has indicated that students’ general expectancy and value beliefs tend to 

decline as they age (Durik et al., 2006; Wigfield & Eccles, 1994). Some researchers have even 

documented distinct trajectories of general motivation as students age (Archambault, Janosz, 

Morizot, & Pagani, 2009; Baker & Wigfield, 1999; Ratelle, Guay, Larose, & Senécal, 2004). 

Archambault and colleagues (2010) found changes in children’s motivation in literacy using 

cross-sectional and longitudinal studies of students in grades 1–12, and results indicated there are 

several distinct motivational trajectories students might experience throughout their school years. 

 

38

Relatedly, Eccles-Parsons and colleagues (1983) examined student beliefs about literacy 

at five ages throughout elementary, middle, and high school years. Results suggested students 

display as many as seven distinct trajectories in task value and reading ability beliefs over time. 

Three of these trajectories indicated decreasing expectancy and value beliefs in reading as 

students aged. Students in the “Early Decline Trajectory” group (7.8%) showed steep declines in 

their reported ability beliefs and literacy task value starting in second grade and continuing until 

ninth grade; students in the “Constant Decline Trajectory” group (28.1%) showed consistently 

decreasing expectancy and value beliefs from elementary to high school; lastly, students in the 

“Late Decline Trajectory” group (13.3%) showed decreases in literacy value beliefs that dropped 

more significantly in high school—although this group showed increases in ability beliefs until 

fifth grade, at which time expectancies began to fall dramatically. Altogether, nearly half of the 

students in this longitudinal study (N = 655) demonstrated a decrease in expectancy or value 

beliefs in reading. The results were consistent with previous findings that many students develop 

more negative expectancy and value beliefs about reading as they get older (Jacobs et al., 2002). 

Given the evidence that older students tend to exhibit lower TTE than younger students 

and the evidence that students tend to show developmental declines in their general expectancy 

and value beliefs for reading tasks, it is reasonable to consider whether students may experience 

similar declines in their expectancy or value beliefs specific to test-taking as they get older. 

Indeed, Paris, Lawton, Turner, and Roth (1991) identified developmental trajectories in student 

perceptions of academic testing based on survey data collected from students in grades 2-11 

about attitudes toward test-taking at school. In one study, researchers found significant age 

differences in how students perceived standardized test systems (Paris, Turner, & Lawton, 1990). 

 

39

Paris and colleagues found that younger students agreed tests were useful for measuring 

their learning, whereas older students disagreed that academic test scores were valid to them. 

Older students reported testing was less important, and they disagreed that standardized testing 

provided useful information to their families or that the purposes of tests were explained clearly. 

Paris and colleagues (1991) described this trend as the development of “disillusionment” with 

testing, as results pointed to “a native presumption of the positive value of the test among young 

students, as well as increasing skepticism among older students about the importance of the test” 

(p. 15). Other trends in their survey results included increases in both hostile attitudes toward 

testing and anxiety about social comparisons based on test performance. Given that previous 

findings have shown decreases in overall motivation in reading, these results are reasonable 

evidence that students may be more likely to display inappropriately low TTE as they age. 

Gender. Another consistent trend in the extant research literature is that female students 

exhibited higher TTE than male students in general. By contrast, male students have been 

identified with low TTE at disproportionately higher rates than females. For instance, Wise and 

colleagues (2004) found that 85.2% of students in grades 6–10 flagged as non-effortful 

respondents were male, even though their sample was balanced by gender. Similarly, Wise and 

DeMars (2010) studied a group that was 67% female; however, 25 of the 43 students (58%) with 

low TTE were male. In another study, Wise and Cotten (2009) directly tested whether college 

students differed by gender in mean RTE and found that male students (m = 0.92) demonstrated 

lower RTE than females (m = 0.96), with an ES of 0.18. Male college students showed poorer 

RTE scores on science and business tests (DeMars, Bashkov, & Socha, 2013), and similar gender 

differences were found in Sweden (Eklöf, 2007) and New Zealand (Brown & Hirschfeld, 2008). 

 

40

Other researchers have identified distinct gender differences in expectancies for success 

and value beliefs at school. According to the EEVT, gender differences in ability beliefs and the 

valuing of certain subjects may arise from gender stereotypes that are communicated to children 

as they progress through school (Eccles, Wigfield, & Schiefele, 1998). Indeed, numerous studies 

suggest boys report higher expectancy beliefs for activities in male-stereotyped domains like 

math and sports, whereas girls report higher expectancies for activities in female-stereotyped 

domains, such as reading and social activities (Eccles et al., 1989; Eccles et al., 1993; Jacobs et 

al., 2002; Robinson & Lubienski, 2011; Watt, 2004). Still, it is unclear when and why these 

differences may develop and whether age might moderate the magnitude of these differences. 

Another critical issue requiring additional research pertains to how social, cultural, familial, or 

school environment variables might contribute to gender differences in expectancy beliefs at 

school (Bailey, 1993; Meece, Glienke, & Askew, 2009). 

Relatedly, researchers have also examined how subjective task value beliefs may differ 

between male and female students. Once again, the research has yielded mixed findings 

concerning the possible presence of gender differences in subjective task value beliefs across 

different ages. For instance, Wigfield and Eccles (1992) reported that boys valued math more 

than girls, whereas girls valued English more than boys. However, this pattern was not consistent 

across age groups. By contrast, Wigfield and Eccles (2002) found no gender differences in task 

value for math or computer activities but found that girls valued reading and music more than 

boys. Together, these findings suggest the emergence of gender differences in value beliefs is not 

fully understood, and additional research is needed to determine how these differences might be 

influenced by other student characteristics or environmental variables in the school context. 

 

41

Little is known about whether gender differences may exist in test-taking perceptions. 

Still, given the evidence that female students tend to exhibit higher TTE than male students and 

the evidence that female students tend to endorse greater expectancy and value beliefs for 

reading tasks, gender differences might arise in test-taking expectancy or value beliefs. In one 

study, Cole and colleagues (2008) found significant relationships between gender and student 

perceptions of test usefulness, perceptions of test importance, and self-reported TTE, with male 

students showing lower test-taking motivation and TTE across four subject areas. However, it is 

unclear whether gender differences in test-taking value beliefs are present in younger students. 

Race/ethnicity. Compared to the research on age or gender, few scholars have examined 

whether student race or ethnicity might be related to TTE. As such, it is unclear whether students 

from different racial or ethnic minority groups may hold different test-taking beliefs or exhibit 

different levels of TTE. However, there is research evidence to suggest that students from racial 

and ethnic minority groups may be at-risk for maladaptive patterns of academic motivation in 

general (Graham & Taylor, 2002), and so it could be helpful to know whether such students are 

at-risk for disengagement from test-taking. It should be noted that scholars have argued that 

previous comparisons by race or ethnicity might have been confounded by socioeconomic status 

(SES) if researchers had not controlled for this variable (Graham, 1994; Pollard, 1993). 

Therefore, research on differences in motivation or engagement by race or ethnicity should be 

interpreted with the recognition that the interacting influences among multiple demographic 

variables on motivation are still not well understood. Although there is little empirical research 

that addresses relationships between race or ethnicity and TTE in low-stakes testing contexts, a 

few studies have examined associations between race or ethnicity and test-taking beliefs or TTE.  

 

42

In one study, Chan and colleagues (1997) investigated the performance of 210 college 

students on an assessment of cognitive problem-solving skills (with no consequences), as well as 

student perceptions about the validity of the tests and their test-taking motivation. Black students 

reported lower motivation for testing, and lower performance on two parallel forms of the test 

compared to White students, and mediational analyses indicated the relationship between race 

and test performance was mediated partially by differences by race in perceptions of test validity. 

Based on these findings, the authors concluded that part of the achievement gap between Black 

and White students could be explained by group differences in perceptions about whether the test 

adequately measured their knowledge and skills, which in turn predicted variation in test-taking 

motivation. In another study, Brown and Hirschfeld (2008) found ethnic minority status related 

to student perceptions of assessment and test performance in a group of students in New Zealand. 

Notably, a recent investigation of RTE by Setzer and colleagues (2013) is one of the only 

studies to date in which researchers a) reported descriptive statistics for the sample disaggregated 

by demographic subgroups and b) tested the statistical significance of subgroup differences in 

RTE. Comparing the RTE scores of non-White (n = 1,646) and White (n = 6,436) students, 

Setzer and colleagues found a significant difference between the average RTE scores of White 

(m = .990) and non-White (m = .980) students (with an ES of 0.10). 

 Taken together, little is known about whether similar effects may be found when 

analyzing the test responses of K–12 students, and so one intent of the current investigation is to 

add to the extant research on the demographic characteristics of students who may be most likely 

to exhibit low levels of TTE in low-stakes testing contexts. With that being said, the present 

analysis of the potential associations between student race/ethnicity and student TTE is primarily 

exploratory in nature, given there has been little empirical research on this topic to date. 

 

43

Disability status. The final student-level demographic variable of interest in the current 

study is student learning disability (LD) status. This variable in the proposed model of TTE is 

also exploratory, given that few researchers to date have reported data about students with 

disabilities (SWD) in empirical studies of test-taking motivation or TTE. Still, previous research 

on motivation and engagement has suggested students with an LD report lower self-efficacy, 

make maladaptive attributions about their abilities, and show less effort and persistence on 

difficult tasks (Battle, 1979; Butkowsky & Willows, 1980; Chapman & Boersma, 1979; Licht & 

Kistner, 1986). For instance, a report from the National Joint Committee on Learning Disabilities 

(NJCLD, 2008) suggested that students with an LD have difficulties maintaining sufficient 

motivation in school. Wiest, Wong, Cervantes, Craik, and Kreil (2001) compared the self-

reported intrinsic motivation of secondary students in general education, special education, and 

alternative education settings, and the results suggested that students in special education settings 

reported lower perceptions of personal competence than students in regular education settings. 

These findings were consistent with previous research on elementary students with an 

LD. For instance, Grolnick and Ryan (1990) had documented that children with an LD tended to 

report feeling lower levels of cognitive competence, were more likely to report that academic 

outcomes were out of their control and had lower levels of motivation (compared to a control 

group) per teacher report. To summarize, the limited knowledge about this issue points to the 

potential importance of further research on the motivation and engagement of SWD. Still, there 

is some evidence that could suggest SWD may have low expectancy and value beliefs compared 

to their typically developing peers. A better understanding of whether group differences may 

exist in these alterable motivational variables would be necessary for informing strategies for 

addressing the unique motivational needs of different students. 

 

44

Current Study and Research Questions 

Remaining gaps in the research literature. This review of research on student TTE in 

educational contexts revealed three significant gaps in the research literature that warrant further 

investigation. First, previous studies of TTE (using multiple methods for measuring TTE) have 

not provided conclusive estimates of the prevalence of low TTE on low-stakes tests. Even less is 

known about the occurrence of low TTE for school-age students in K–12 settings. Second, 

scholars have tested some of the hypothesized relationships proposed in the Wise–Smith (2011) 

model of TTE, but few scholars have studied student-level correlates of TTE. As such, it is 

currently unknown whether students from any particular demographic groups may be more likely 

to exhibit low TTE in low-stakes testing contexts. Third, another critical gap in the extant 

research literature concerns student-level motivational correlates of TTE. Although there is some 

evidence that examinee beliefs about test-taking are related to TTE, it is clear that the “internal 

factors” component of the Wise–Smith model warrants further investigation. Taken together, a 

review of the literature on student TTE points to the need for additional research addressing 1) 

the prevalence of students who exhibit low TTE, 2) subgroup differences in low TTE, and 3) 

relationships between malleable motivational factors and the odds students exhibit low TTE. The 

research questions in this study were guided by an application of the EEVT to the domain of test-

taking to identify variables that could be targeted to help ameliorate the problem of low TTE on 

low-stakes academic tests. The EEVT suggests that internal factors (i.e., students’ expectancies 

and values) may be particularly relevant to consider in order to understand TTE better, and 

previous research has suggested that student demographic characteristics may be associated with 

group differences in test-taking expectancy and value beliefs, which in turn may predict TTE in 

school-age students. The research questions and hypotheses for this study are described below. 

 

45

Research question 1. What proportion of students in grades 4–8 exhibit low TTE on a 

CAT in reading, as determined by RTE (Wise & Kong, 2005)? It was hypothesized that the 

proportion of students in grades 4–8 identified as exhibiting low TTE would be similar to the 

proportions observed in previous studies of school-age students, which have ranged from 0.2% 

to 11.9%, depending on grade level, subject area, and the time of day (Wise et al., 2010). 

Research question 2. To what extent do student demographic variables relate to the 

likelihood students exhibit low TTE on a CAT in reading? It was hypothesized that student 

demographic characteristics (i.e., grade, gender, race/ethnicity) would be correlates of the odds 

that students exhibit low TTE. Specifically, based on previous research in the domains of reading 

and testing, it was hypothesized that a) students in grade 8 would be more likely to exhibit low 

TTE than grade 4, b) male students would be more likely to exhibit low TTE than females, and 

c) non-White students would be more likely to exhibit low TTE than White students. 

Research question 3. Do students differ in test-taking expectancy and value beliefs by 

student demographic variables? It was hypothesized that student demographic characteristics 

(i.e., grade, gender) would be correlates of test-taking expectancy and value beliefs. Specifically, 

based on previous research on reading and test-taking, it was hypothesized that a) students in 

grade 7 would report more positive test-taking expectancy and value beliefs than those in grade 

8, and b) female students would report more positive expectancy and value beliefs than males. 

Research question 4. To what extent do student test-taking expectancy and value beliefs 

relate to the likelihood students exhibit low TTE on a CAT in reading? It was hypothesized that 

test-taking expectancy and value beliefs would be correlates of low TTE. Specifically, based on 

previous research, it was hypothesized that a) students with lower test-taking expectancy beliefs 

and b) students with lower test-taking value beliefs would be more likely to exhibit low TTE.

 

46

CHAPTER III 

METHODS OF STUDY I 

Rationale for Two Studies 

 

Two studies of student TTE were carried out to address the research questions (RQs) that 

guided the current investigation. First, Study I addressed RQ1 and RQ2 through a large-scale, 

nationally representative, secondary analysis of data from the STAR Reading test. Second, Study 

II replicated and expanded Study I through an empirical study in one school district, using both 

survey research and a secondary analysis of data from the STAR Reading test. 

Purpose and Design of Study I 

 

The primary purposes of Study I were to a) describe the proportion of students in grades 

four and eight who exhibited low TTE on a STAR Reading test and b) examine the hypothesized 

relationships between three stable individual student characteristics and the likelihood of being 

identified as a student with low TTE. The predictor (independent) variables of interest were 

grade level, gender, and race/ethnicity. The criterion (dependent) variable of interest, low TTE, 

was a binary categorical variable operationalized as RTE scores falling at or below 0.90. 

Sampling Procedure 

 

STAR assessment database. Using data from administrations of STAR Reading 

assessments completing during the 2014–15 school year, a targeted sample of item-level testing 

data were acquired from the testing company. The dataset used for the current analyses included 

student demographic information, test performance, and item-level testing data (including 

response times) for students in grades 4 and 8 who had completed a STAR Reading test in the 

winter of 2014–15 and for whom the following demographic information was input into the 

Renaissance Learning database by their school: grade, gender, and race/ethnicity. 

 

47

Thus, the testing data represented a subset of all students in grades 4 and 8 in the existing 

STAR Reading database. It was assumed that most students who used a STAR Reading test in 

the winter of that school year had taken the test at least once before, so TTE was not expected to 

be unduly influenced by unfamiliarity with the test. To be included in the sample, the student 

must have completed a STAR Reading interim test such that item responses for each question 

were recorded and a standard score was derived from the student’s performance. Cases with 

missing item response data were excluded (per the data screening procedure described below). 

 

The researcher requested de-identified item-level data from the STAR Assessment 

database per the inclusion criteria (see Appendix B). Personally identifiable information (PII) 

was removed before receiving the data. Educational agencies that use the STAR Assessment 

system provided informed consent to the testing company prior to the storage of PII pursuant to 

the Application and Hosting Privacy Policy (Renaissance Learning, 2017). Parents of students 

who used a STAR Reading test who wished to revoke their consent for the storage of PII were 

given the opportunity to contact the child’s school or district to have their educational records 

disclosed, changed, or removed from the STAR Assessment system. As stated in the privacy 

policy notice, available on the Renaissance Learning website (see Appendix C), “Renaissance 

Learning does not use your child’s PII for any purpose other than to provide services to your 

child’s school. Combined information that has been stripped of PII, and therefore not traceable to 

any student, is used for research and development so we can continuously improve our products 

and accelerate learning for all students” (Renaissance Learning, 2014, “Frequently Asked 

Questions About Student Information in our Software Products”). Therefore, participant consent 

for inclusion in research like the current Study I was obtained at the time students participated in 

a STAR Reading test, per the conditions of the user agreement for this assessment system. 

 

48

Measures 

 

STAR Reading test. The STAR Reading test is a CAT of reading comprehension skills 

completed by students in grades 1–12 using a computer or tablet device. According to the test 

developers (Renaissance Learning, 2016), STAR Reading tests are typically administered three 

times per year to all students for the purpose of universal screening of reading skills, 

administered monthly to monitor student reading progress and match instruction to the ability of 

each student, and/or administered weekly for monitoring the progress of students who receive 

intensive reading intervention. Each test consisted of 34 vocabulary-in-context items, which 

required students to read a passage and select from three choices the word that completes a 

sentence about the passage. The average test takes approximately 15 minutes to complete. 

Research on the psychometric properties of STAR Reading has indicated that the test has 

adequate reliability and validity for the purposes of universal screening and progress monitoring, 

according to the testing standards developed by the National Center on Response to Intervention 

(U.S. Department of Education: National Center on Response to Intervention, 2010). Internal 

consistency reliability was measured using a random sample of 1.2 million STAR Reading test 

administrations, and results indicated high reliability for each grade level (range of 0.93 – 0.95), 

with a reliability coefficient of 0.97 for the full sample (Renaissance Learning, 2016). Evidence 

for the concurrent and predictive validity of the STAR Reading test has been demonstrated 

through empirical research on the correlations between student scores on the STAR Reading test 

and their current or future scores on other established measures of reading skills (Renaissance 

Learning, 2016). The results of more than 400 research studies have suggested there is strong 

evidence for the concurrent and predictive validity of STAR Reading tests, given that the 

average correlations between STAR Reading and other reading tests ranged from 0.65-0.87. 

 

49

Predictor variables. The following variables were proposed predictors in the multiple 

logistic regression model(s): grade level, gender, and race/ethnicity. 

 

Outcome variable. Dichotomously represented (low vs. not low) TTE was proposed as 

the criterion or outcome variable for the logistic regression models. The outcome variable was 

derived from the item-level test data using Wise and Kong’s (2005) RTE procedure. Wise and 

Kong’s (2005) RTE index is considered a proxy of TTE, and it represents the proportion of the 

test items on which the examinee had responded with SB (Wise, 2015). As previously described 

in the literature review, several replications of Wise and Kong’s initial study have provided 

further evidence for the reliability and validity of this index. In previous studies, coefficient 

alpha values for RTE scores have ranged from 0.81–0.99, usually exceeding 0.90 (e.g., Kong et 

al., 2006; Kong et al., 2007; Wise et al., 2006; Wise & DeMars, 2006). Likewise, several 

researchers have provided evidence for the concurrent validity (e.g., Rios et al., 2014; 

Swerdzewski et al., 2011) and the discriminant validity (e.g., Kong et al., 2007; Rios et al., 2014; 

Wise & DeMars, 2010; Wise et al., 2009) of RTE scores. In this study, the derived RTE scores 

were used to classify each examinee as either exhibiting low TTE on the STAR Reading test 

(RTE at or below 0.90) or not exhibiting low TTE. To calculate the RTE index, researchers must 

select a time threshold to represent the boundary between RGB and SB. Responses are classified 

as SB if the response times are higher than the threshold time and classified as RGB otherwise. 

Finally, the number of SB exhibited by an examinee is summed and divided by the total number 

of items, and the resulting proportion is the student’s RTE score. Because each administration of 

a STAR Reading test contained precisely 34 items, there were 35 discrete RTE scores that could 

be derived from an individual’s testing data, ranging from 0.0 (no SB) to 1.0 (all SB). 

 

50

In this study, the common threshold method was used to set the time threshold for RTE. 

Of the four conventional methods for setting time thresholds (see Kong et al., 2007), the 

common threshold is most practical for calculating RTE scores for a CAT like STAR Reading 

(given the immense size of the item bank used to administer test items). Prior research has shown 

that the four methods are all comparable for identifying low TTE (Kong et al., 2007). Response 

times (collected automatically and rounded to the nearest second) were compared to a common 

three-second threshold. That is, all of the responses submitted in less than four seconds (i.e., 0–3 

seconds) were classified as instances of RGB. This procedure was consistent with the common 

threshold method employed by Wise and colleagues (2004). 

After deriving the RTE index, students were identified as exhibiting low TTE if they 

earned RTE scores below 0.90. Several scholars have suggested 0.90 is a reasonable criterion for 

identifying low TTE (e.g., Rios et al., 2014; Swerdzewski et al., 2011; Wise & DeMars, 2010). 

As Swerdzewski and colleagues (2011) stated, “an examinee with an RTE of 0.90 only exerted 

effort on 90% of the test items. …it is reasonable for examinees to not try on a small portion of 

items (i.e., 10%) and still be retained in a dataset” (p. 172). 

There is evidence that the 0.90 criterion meaningfully differentiates between examinees 

who had exhibited appropriate levels of TTE and those who had not given adequate effort. 

Several empirical studies (e.g., Kong et al., 2007) have shown that using a 0.90 criterion for 

score filtering can result in greater convergent validity correlations, and Wise (2015) found that 

RTE scores below 0.90 are low enough to distort scores and threaten individual score validity. 

On this test, RTE scores below 0.90 indicate that four or more of the responses were RGB 

because students exhibiting four RGB responses had RTE scores of 0.88 (30/34), whereas 

students exhibiting three RGB responses had RTE scores of 0.91 (31/34). 

 

51

Data Analyses 

 

Data screening and preliminary analyses. According to Raykov and Marcoulides 

(2008), data screening and preliminary analyses are conducted “(a) to ensure that the data to be 

analyzed represent correctly the data originally obtained, (b) to search for any potentially very 

influential observations, and (c) to assess whether assumptions underlying the method(s) to be 

applied subsequently are plausible” (p. 61). The values for all predictor and outcome variables 

across all participants were examined to ensure that the observed values were plausible. Items 

with negative (i.e., impossible) or unreasonably high response times were likely representative of 

an instrument malfunction, computer error, or test interruption. If instrumentation malfunction or 

a coding error is presumed to be the reason for missing data or outliers, Raykov and Marcoulides 

(2008) suggest this may warrant listwise deletion. Thus, listwise deletion was used to exclude 

cases for which response times included missing values or for which all of the response times 

were 0 seconds. Note that individual instances of response times coded as 0 were not treated as 

missing data, as these represented cases in which some responses were submitted in less than 0.5 

seconds and rounded down to 0. 

Descriptive statistics. To address RQ1, the proportion of students in grades 4 and 8 

identified as exhibiting low TTE on a STAR Reading test (i.e., RTE scores below 0.90) was 

examined, and descriptive statistics for the sample were reported and disaggregated by group. 

Comparative analyses. To address RQ2, potential mean RTE score differences by grade, 

gender, and race/ethnicity were tested using independent samples t-tests. Next, potential 

differences in the proportion of students flagged with low TTE by grade, gender, and 

race/ethnicity were tested using chi-squared tests. 

 

52

Logistic regression analyses. To address RQ2 further, multiple logistic regression was 

performed to investigate the hypothesized relationships between the student characteristics and 

the probability of a student being flagged as exhibiting low TTE (described in terms of log odds). 

The test for significance in a logistic regression model is the Wald test, which tests the null 

hypothesis that a predictor does not affect the likelihood that the criterion variable is equal to one 

(Agresti & Finlay, 2009). In multiple logistic regression models, the F test of model significance 

indicates whether the full model had significantly improved the explanatory power of the 

restricted model. In a statistically significant model, any variables with significant regression 

weights are significantly associated with the outcome, controlling for all other predictors. If the 

parameter β for a given predictor variable is significant, it would indicate the increase in the log 

odds of being identified as exhibiting low TTE for each one-unit increase in the value of the 

predictor variable, holding all other predictors constant.

 

53

CHAPTER IV 

RESULTS OF STUDY I 

Data Screening and Preliminary Analyses 

Testing records were obtained for 572,847 administrations of the STAR Reading test 

completed in the winter of the 2014–15 academic year. Each test event included 34 items, which 

resulted in 19,476,798 observed item responses. Preliminary analysis of the item response times 

for the sample revealed that a small number of responses were coded as being submitted after the 

90-second time limit. This issue was presumed to be indicative of a computer error, and so those 

response times were treated as missing data. For the remaining 19,307,311 item responses, the 

average submission time was 32.68 seconds. As expected, there was a subset of responses that 

were submitted rapidly, (i.e., in three seconds or less). Specifically, 1,421 response times were 

rounded to zero seconds, 23,546 response times were rounded to one second, 67,356 response 

times were rounded to two seconds, and 79,455 response times were rounded to three seconds. 

To support the appropriateness of using the RTE index, the effectiveness of the selected 

common threshold was evaluated. Results indicated that the overall accuracy for all responses 

was 68.36%. However, the accuracy of responses submitted in 0–3 seconds was much lower, 

ranging from 32.44% to 34.73%. Given there were three response options, the low accuracy rates 

(i.e., roughly one correct response for every two incorrect responses) for items submitted in three 

seconds or less were nearly equivalent to chance guessing (33.3%). This finding supported the 

assumption that there was a meaningful difference between responses classified as SB and RGB. 

Therefore, a three-second common threshold was accepted, meaning responses rounded to 0–3 

seconds were considered RGB and responses rounded to 4–89 seconds were considered SB. 

Responses classified as RGB were submitted in less than 10% of the average time (32.68). 

 

54

Results indicated that 19,135,533 responses (99.11%) were classified as SB, whereas 

171,778 responses (0.89%) were classified as RGB. The accuracy of RGB was 34.00%, whereas 

the accuracy of SB was 68.67% (i.e., consistent with the overall average for all responses). 

Descriptive Statistics 

 

Demographic information for sample. Demographic data for the full sample of students 

(N = 572,847) are provided in Table 2. The sample included more fourth-grade students 

(60.40%) than eighth-grade students, and the sample was balanced by student gender. The 

race/ethnicity of students in the sample was listed most often as White (36.93%), Hispanic 

(16.98%), Black (16.2%), or Unknown (23.45%). 

An unexpectedly low proportion of students (0.09%) had a value of 1 (“Yes”) endorsed 

for the dichotomous variable representing LD status, whereas the remaining students (99.91%) 

had a default value of 0. Correspondence from the testing company indicated that it was not 

possible to ascertain whether values of 0 indicated that educators had entered a value of 0 (“No”) 

or that the data were missing (because educators had provided no information about LD status). 

RTE scores. The distribution of RTE scores for the sample (N = 571,386) is provided in 

Table 3. 1,461 students (0.26%) had missing item response times and were therefore excluded 

from the analyses. The mean RTE score for each subgroup of students is provided in Table 4. 

The overall mean RTE score was 0.99 (SD = 0.03). Most students (89.50%) had RTE scores 

equal to 1.0, indicating all 34 responses were classified as SB, whereas the other 10.50% of 

students had RTE scores falling below 1.0. In total, 96.91% of students had RTE scores above 

the 0.90 criterion, meaning they exhibited RGB on fewer than four test items. Mean RTE scores 

for subgroups of students in this sample are disaggregated in Table 5, Table 6, and Table 7. 

 

55

Students with low TTE. Notably, there was a total of 16,250 students (2.84%) identified 

as exhibiting low TTE on the STAR Reading test. RTE scores for students flagged for low TTE 

ranged from 0.38–0.88, meaning some examinees exhibited RGB on as many as 21 of the 34 test 

items. The proportion of students with low TTE in each subgroup is provided in Table 8, and 

demographic information for the 16,250 students identified with low TTE is provided in Table 9. 

Comparative Analyses 

 

Mean RTE scores by subgroup. The mean RTE score for female students (M = 0.9943) 

was higher than the mean for males (M = 0.9880), t(557,612) = 66.138, p < 0.001, d = 0.178. The 

mean RTE score for grade 4 students (M = 0.9922) was higher than grade 8 (M = 0.9897), 

t(444,090.47) = 24.624, p < 0.001, d = 0.069. The mean RTE score for Asian/Pacific Islander 

students was highest (M = 0.9961), followed by White (M = 0.9928), Hispanic (M = 0.9905), and 

Black students (M = 0.9869). A one-way ANOVA indicated that the differences in mean RTE 

scores of students in different race/ethnicity subgroups were statistically significant, F(437,361) 

= 442.031, p < 0.001. The mean RTE score for students with an LD (M = 0.9836) was lower than 

the mean for students whose LD status was unknown (M = 0.9912), t(486.434) = 3.387, p = 0.01, 

d = 0.176. 

Grade and low TTE. The proportion of students in grade 4 (N = 344,945) identified with 

low TTE on this test was 2.54 percent, whereas the proportion in grade 8 (N = 226,441) with low 

TTE was 3.31 percent. The chi-squared test indicated that the difference between the proportions 

of students with low TTE (0.77%) was statistically significant, χ2(1) = 298.090, p < 0.001. 

Students in grade 8 comprised 39.6% of the sample, yet they represented 46.2% of the 16,250 

students with low TTE. 

 

56

 

Gender and low TTE. The proportion of male students (N = 278,872) identified with 

low TTE on this test was 3.95%, whereas the proportion of females (N = 278,742) was 1.79%. 

The chi-squared test indicated this difference in proportions of students with low TTE (2.16%) 

was statistically significant, χ2(1) = 2,388.262, p < 0.001. Although the sample was balanced in 

regard to gender, male students represented 67.8% of the 16,250 students flagged with low TTE. 

 

Race/ethnicity and low TTE. The chi-squared test indicated statistically significant 

differences among the proportions of students flagged with low TTE in the six race/ethnicity 

subgroups, χ2(1) = 32.767, p < 0.001. The proportion of Black students (N = 92,682) identified 

with low TTE on this test was highest at 4.33%, the proportion of Hispanic students (N = 96,957) 

with low TTE was 3.08%, the proportion of students with race/ethnicity endorsed as “Other” (N 

= 10,117) with low TTE was 2.80%, the proportion of American Indian/Alaskan native students 

(N = 5,798) with low TTE was 2.71%, the proportion of White students (N = 211,070) with low 

TTE was 2.27%, and the proportion of Asian/Pacific Islander students (N = 20,738) with low 

TTE was lowest at 1.22%. 

Logistic Regression Analyses 

Multiple regression model for low TTE. Finally, multiple logistic regression was 

performed to test the relationships between three predictors (grade, gender, race/ethnicity) and 

the odds of low TTE. Note that dude to group sizes, only White, Hispanic, Black, and 

Asian/Pacific Islander students were included in the logistic regression model using binary 

dummy variables for the subgroups. Because the LD status of most students (99.91%) could not 

be confirmed, the LD variable was not included in the model. The logistic regression model 

regressed grade, gender, and race/ethnicity on low TTE, and the full model was statistically 

significant, χ2(5) = 3,292.642, p < 0.001, explaining 3.4% of variance in the odds of low TTE. 

 

57

Results of the full logistic regression model are provided in Table 10. Each of the five 

predictors in the model was significantly associated with the odds a student was identified as 

exhibiting low TTE on a STAR Reading test (p < 0.001). As such, the coefficients for the five 

significant predictor variables were interpreted. Results by ethnicity were more complex due to 

this variable having more than two categories and requiring binary coding. Within the model: 

(a)  Male students were 2.303 times as likely to exhibit low TTE as female students, 

controlling for all other factors. 

(b) Eighth-grade students were 1.361 times as likely to exhibit low TTE as fourth-grade 

students, controlling for all other factors. 

(c)  Hispanic students were 1.378 times as likely to exhibit low TTE as White students, 

controlling for all other factors. Black students were 1.976 times as likely to exhibit 

low TTE as White students, controlling for all other factors. Asian/Pacific Islander 

students were 0.533 times as likely to exhibit low TTE as White students controlling 

for all other factors; stated another way, White students were 1.876 times as likely to 

exhibit low TTE as Asian/Pacific Islander students, controlling for all other factors. 

 

 

 

58

CHAPTER V 

METHODS OF STUDY II 

Purpose and Design of Study II 

 

The purpose of Study II was to replicate and extend the findings from Study I in order to 

investigate a) the extent to which students might differ in their test-taking expectancy and value 

beliefs, and b) the extent to which test-taking expectancy and value beliefs might relate to the 

likelihood students exhibit low TTE on a STAR Reading test. In Study II, an online survey 

protocol was used to measure individual student characteristics and test-taking expectancy and 

value beliefs, and low TTE was measured as described in Study I, using item response time data 

derived from administrations of the STAR Reading test. The rationale for Study II was that it 

allowed for the collection of data on test-taking beliefs, which were not available in Study I. In 

doing so, Study II addressed hypothesized relationships in the demands–capacity model of TTE. 

Sampling Procedure 

 

Participants were recruited using a convenience sample of students from a school district 

that used the STAR Reading test. Two middle schools were targeted for inclusion based on the 

following characteristics: geographic region, grade levels present, and administrator support for 

district-wide participation in the study. Recruiting participants from schools already using the 

STAR Reading test was desired because it allowed the researcher to replicate the analyses from 

Study I without requiring the participating students to complete any additional testing. Initial 

recruitment was completed through informal meetings with district administrators. According to 

student data from the Michigan Department of Education (2018; see www.mischooldata.org), 

students in the targeted district were mostly White (75%) and Hispanic (14%), not economically 

disadvantaged (75%), and proficient on the statewide English language arts test (60–70%). 

 

59

 

According to the district’s curriculum and instruction website, the test was used for the 

following purposes: complying with state legislation requiring the regular assessment of reading 

skills, screening students for reading difficulties, grouping students for intervention classes and 

differentiated instruction, and monitoring reading progress over time. At the middle school level, 

test scores were used as an indicator of the proportion of students with grade-level proficiency in 

reading and used to measure progress toward goals outlined in their school improvement plans. 

Test data were shared with parents during conferences and included in student report cards, and 

parents were directed to contact their child’s language arts teacher for additional information. 

Measures 

 

Student Perceptions of Testing Survey. Data on demographic characteristics, test-

taking expectancy beliefs, and test-taking value beliefs were derived from participants’ responses 

to a brief online survey (Appendix D) called the “Student Perceptions of Testing Survey” 

(SPOTS). This 23-item survey included two items measuring student demographic information, 

two practice items, and nineteen items measuring student beliefs related to the STAR Reading 

test. Seven predictor variables were coded using the following procedures. (Note that test-taking 

beliefs were both predictor and outcome variables, depending on the research question being 

addressed by a given analysis.) 

 

 

 

Grade level. One item measured student grade-level (coded grade seven or grade eight). 

Gender. One item measured student gender (coded male or female). 

Test-taking expectancy beliefs. Test-taking expectancy beliefs were measured using five 

items adapted from the Academic Efficacy subscale of the Patterns of Adaptive Learning Scales 

(PALS; Midgley et al., 2000), a set of well-established measures of motivation (Senko, 2016). 

 

60

 The PALS Academic Efficacy subscale consists of five items measuring an individual’s 

perceived competence to do school work, and items are similar to the expectancy belief measures 

used by Wigfield and Eccles (2000), which have been studied extensively (Wigfield et al., 2016). 

PALS items are rated using five-point Likert-type scales, anchored at 1 (“Not at all true for me”), 

3 (“Somewhat true for me”), and 5 (“Very true for me”). Internal consistency for this subscale 

has been shown to be adequate ((cid:1) = 0.78; Midgley et al., 2000). In the SPOTS, five items from 

the Academic Efficacy scale were adapted to more specifically measure expectancies for success 

when taking the STAR Reading test (see Appendix E). Test-taking expectancy beliefs subscale 

scores were derived from the average ratings of the five items and ranged from 1.0–5.0. 

 

Test-taking value beliefs. In Study II, test-taking value beliefs were measured using 

items adapted from Conley’s (2012) four subjective task value subscales, which were based on 

the work of Eccles, Wigfield, and colleagues (e.g., Wigfield & Eccles, 2000). In the SPOTS, 14 

items from Conley’s value subscales were adapted to more specifically measure value beliefs 

related to the STAR Reading test. The value scales were shortened to reduce the overall length of 

the SPOTS (see Appendix F), and items selected for inclusion in the SPOTS were those judged 

by the researcher to be most pertinent to test-taking, most appropriate for use with middle school 

students, and most reliable according to Conley (2012). Each test-taking value subscale scores 

was derived from the average rating of the items in the adapted scale (ranging from 1.0–5.0). 

The first type of task value in the EEVT is attainment value, which refers to perceptions 

of how a task relates to important aspects of an individual’s identity (Wigfield & Eccles, 2000). 

Conley (2012)’s original Attainment Value scale ((cid:1) = 0.85) consisted of six Likert-type items 

(e.g., “Being someone who is good at math is important to me”), and four of the items were 

selected for administration in the current study and adapted to reference the STAR Reading test. 

 

61

The second type is intrinsic value, which refers to one’s enjoyment or interest in a task 

(Wigfield & Eccles, 2000). Conley’s original Interest Value scale ((cid:1) = 0.96) consisted of six 

items (e.g., “I like math.”), and four of the items were selected for administration in the current 

study and adapted to address the intrinsic value of STAR Reading. 

The third is utility value, the usefulness of a task for reaching one’s goals (Wigfield & 

Eccles, 2000). Conley’s Utility Value scale ((cid:1) = 0.80) was four items (e.g., “Math will be useful 

for me later in life.”) Four items were selected and adapted to reference STAR Reading. 

The last is relative cost value, which represents the belief that one will lose other desired 

opportunities (Wigfield & Eccles, 2000). Conley’s (2012) Cost Value scale ((cid:1) = 0.70) consisted 

of two items (e.g., “I have to give up a lot to do well in math.”), and both items were adapted. 

Low test-taking effort. The outcome variable of interest in Study II, low TTE on a 

STAR Reading test, was measured using Wise and Kong’s (2005) RTE method, as described in 

the previous chapter. The method for computing the outcome variable in Study II was the same 

as the method in Study I (i.e., RTE below 0.90 were flagged as indicators of low TTE). 

Procedures 

 

In Study II, RTE scores were derived from secondary analyses of data from the existing 

STAR Assessment database following the procedures for data retrieval described in Study I. As 

previously stated, districts using STAR Assessments provide informed consent to Renaissance 

Learning per the Application and Hosting Privacy Policy (Renaissance Learning, 2017). The 

researcher requested written permission from administrators in the targeted district to receive test 

data from the STAR Reading database for the purpose of learning more about the use of the test 

in the district (see Appendix G). 

 

62

The Star Reading test data for participants in Study II were fully anonymized (i.e., school 

ID numbers were removed) before the researcher received the data. Instead, the only identifier 

associated with the STAR Reading test data was a randomly generated Renaissance Learning ID 

number, which the researcher could not connect to students. For this reason, the Michigan State 

University institutional review board classified Study II as exempt from review for the protection 

of human subjects (see Appendix H). 

In addition to requesting express written consent from district administrators for students 

in the targeted school district to participate in the proposed study, the researcher also shared an 

information letter with all parents of children in the district to inform them about the purpose and 

scope of the research study (see Appendix I). Parents were informed that a) their child’s 

participation in the online survey was entirely voluntary, b) their child could elect to opt out of 

the survey at any time, c) information gathered through the research study would not affect their 

child’s grades, instruction, or eligibility for any educational services or supports, d) PII would 

not be gathered through the study, and e) the researcher would not disclose the survey responses 

or performance of any individual or class. 

After consent for district-wide participation was obtained from the superintendent of the 

participating district, students were invited to take the online survey during the spring of the 

2017–2018 school year (following the spring administration of the STAR Reading test). In their 

language arts classrooms, students were invited to open and complete the SPOTS. Information 

about the study and the survey instructions were read aloud to students by the teachers using a 

standardized protocol (Appendix D). Students were given informed consent documentation and 

instructed to indicate their assent to participate in the survey by entering their Renaissance ID 

number (shared with them by their teacher). 

 

63

After the participating students had provided assent, teachers read each item of the 

SPOTS aloud, including two practice items (to ensure students knew how to use the scale) and 

nineteen items measuring their expectancy and value beliefs reading to STAR Reading. 

Following the completion of the survey, all students in the participating schools received a gift 

certificate to a local ice cream store (regardless of whether they had completed the survey). 

School staff were invited to attend a presentation by the researcher related to the study findings. 

Data Analyses 

 

In general, the analyses for Study II followed the same methods as Study I. That is, data 

screening, analyzing item response times, handling missing data, analyzing the accuracy of the 

response time threshold, calculating RTE scores, and identifying students with low TTE were 

completed in the same manner as previously described in the Methods chapter for Study I. 

 

Descriptive statistics. As in Study I, demographic information for the sample are 

presented, and descriptive statistics for RTE scores and the outcome variable (low TTE) are 

presented and disaggregated by subgroups (grade and gender). To inform RQ1 further, mean 

RTE scores were reported, and the proportions of students in grades 7–8 who were flagged as 

exhibiting low TTE on STAR Reading test (i.e., RTE scores below 0.90) were described. 

Principal component analysis of SPOTS. Principal component analysis (PCA) was 

performed using the data from the SPOTS survey to refine the subscales used in subsequent 

analyses iteratively. As previously stated, the survey measuring test-taking expectancy and test-

taking value beliefs was an adaptation of two established measures of academic efficacy and task 

value beliefs. Given the SPOTS included five subscales designed to measure test-taking 

expectancy beliefs and four types of test-taking value beliefs, it was essential to examine whether 

the SPOTS responses fit the anticipated factor structure. 

 

64

The PCA was conducted following the procedure used by Brown and Hirschfeld (2008) 

for refining the SCoA. The goodness-of-fit indices for the resulting factor structure were 

reported, and the descriptive statistics for each identified factor in the model were described. 

Based on the results of the PCA, the psychometric properties of the resulting subscales were 

analyzed, and mean responses for items and subscales were provided. 

Comparative analyses. To address RQ3, a series of one-way multivariate analysis of 

variance (MANOVA) was used to examine whether students differed by grade or gender in their 

test-taking expectancy or test-taking value beliefs. 

Logistic regression analyses. To address RQ2 further and to address RQ4, hierarchical 

logistic regression analyses were performed using the binary variable low TTE as the criterion 

variable. First, the student demographic variables (grade, gender) were entered into the model. 

Next, the student test-taking belief (expectancy, attainment, intrinsic, utility, and cost) subscales 

were entered into the multiple logistic regression model. If results of the F test indicated model 

significance, then the coefficient of determination for the full model would represent the overall 

proportion of variance in the log odds of being identified as exhibiting low TTE that is explained 

by all of the predictor variables in the model. Furthermore, if the full model was statistically 

significant, then any of the predictor variables with significant regression weights would be 

significantly associated with the outcome variable, controlling for all other predictors. If the 

parameter β was found to be significant for a predictor variable, then the estimated parameter 

would indicate the increase in the log odds of being identified as exhibiting low TTE for each 

one-unit increase in the value of the predictor variable, holding all other predictors constant.

 

65

CHAPTER VI 

RESULTS OF STUDY II 

Data Screening and Preliminary Analyses 

 

Testing records were obtained for 826 administrations of the STAR Reading test which 

were completed in the spring of the 2017–18 school year by students in grades seven and eight 

from one Midwestern school district. Item response times were reviewed and any items with 

response times higher than 90 seconds were removed, which resulted in 28,736 observed item 

response times. The same three-second common threshold was applied to the data, and the 

results indicated that there were indeed observable instances of RGB in this dataset. 

In Study II, there were 471 responses (1.64%) submitted in three seconds or less. Of 

those rapid responses, only 157 (33.33%) responses were correct. This accuracy rate was similar 

to the rate observed in Study I. Additional data screening revealed that 20 test records (2.36%) 

did not contain precisely 34 items, and those cases were excluded from the sample for all further 

student-level analyses. To address RQ3 and RQ4, students were only included in analyses if they 

a) completed the online survey with their Renaissance ID number and b) completed the STAR 

Reading test such that item-level data were available for matching and secondary analysis. 

Descriptive Statistics 

 

Demographic data for the participating students who were included in the final sample 

for Study II (N = 675) are provided in Table 11. The sample included more seventh-grade 

students than eighth-grade students and included more female students than male students. 

 

RTE scores for sample. The distribution of RTE scores is provided in Table 12. Results 

indicated that 557 students (82.52%) had RTE scores equal to 1.0, meaning all 34 of their 

responses were submitted after the three-second threshold and thereby classified as SB. 

 

66

The remaining 17.48% of students exhibited at least one RGB and had RTE scores below 

1.0. In total, 93.8% of students in this sample had RTE scores that fell above the 0.90 criterion. 

Furthermore, a noteworthy total of 42 students (6.22%) in the Study II sample were identified as 

exhibiting low TTE on the STAR Reading test. The RTE scores for students flagged with low 

TTE ranged from 0.41–0.88, meaning some examinees exhibited RGB on as many as 20 of the 

34 test items (and therefore showed SB on only 14 of the 34 items). The proportion of students 

flagged with low TTE in each subgroup is provided in Table 13, and the demographic 

information for the 42 students flagged with low TTE is provided in Table 14. 

Student perceptions of testing survey. A total of 748 students completed the online 

survey, which indicates that approximately 98 students opted out of taking the survey or did not 

complete it. Please note that this total is greater than the number of students for whom their 

survey responses could be linked to their STAR Reading data. Specifically, 35 students entered 

an ID number that did not correspond with a test ID, 29 students entered an ID that was not 

unique, and 20 students entered no ID.  No students or parents contacted the lead researcher for 

additional information about the study. Descriptive statistics for each of the 19 items in the 

SPOTS are provided in Table 15. A PCA was performed on the 19-item SPOTS. The suitability 

of the PCA was evaluated prior to analysis, and inspection of the correlation matrix showed that 

all variables had at least one correlation that was greater than 0.3. The Kaiser-Meyer-Olkin 

measure of sampling adequacy was 0.877, and Bartlett’s Test of Sphericity was statistically 

significant (p < 0.001), indicating the survey was appropriate for factoring into components. 

The PCA revealed five principal components that had eigenvalues greater than 1.0, and 

the components explained 16.54%, 15.95%, 15.48%, 14.77%, and 8.12% of the total variance in 

the SPOTS, respectively. Eigenvalues for the principal components ranged from 1.22 to 6.56. 

 

67

The five-component solution explained 70.87% of the total variance, and inspection of 

the scree plot supported a five-factor solution. Varimax orthogonal rotation was employed to aid 

interpretability, and the factor structure met the interpretability criterion. The respective items for 

each subscale of the SPOTS loaded on the anticipated factor, which indicated that the survey was 

consistent with the previously validated expectancy and value subscales from which it was 

derived. Thus, the five factors were retained, and scale scores for each factor were computed. 

Descriptive statistics for the resulting subscales of the SPOTS, disaggregated by subgroup, are 

provided in Table 16 and Table 17. Reliability coefficient alpha exceeded 0.80 for all subscales 

but cost (α = 0.67). In general, participants reported perceiving the STAR Reading test as low in 

attainment value (M = 2.91), interest value (M = 1.76), and relative cost (M = 1.67). Participants 

reported moderate ratings for utility value (M = 3.02). Students in this sample reported relatively 

high test-taking expectancy beliefs (M = 3.25) when compared to their test-taking value beliefs. 

Comparative Analyses 

 

Grade level and test-taking beliefs. A one-way MANOVA was performed to examine 

the relationships between grade and test-taking expectancy and value beliefs. The results of the 

MANOVA indicated there was no statistically significant difference between grade levels on the 

combined dependent variables, F(5, 669) = 1.878, p = .096; Wilks’ Λ = .986; partial η2 = .014. 

However, post-hoc univariate one-way ANOVA analyses indicated that there were statistically 

significant differences by grade for utility value, F(1, 673) = 5.199, p = .023, and attainment 

value, F(1, 673) = 3.944, p = .047. Specifically, students in grade 7 reported higher ratings for 

utility and attainment value (M = 3.11, SD = 1.04; and M = 2.99, SD = 0.98, respectively) than 

those in grade 8 (M = 2.93, SD = 1.02; and M = 2.84, SD = 1.01, respectively). Results indicated 

differences by grade level for expectancy beliefs, interest, and cost were non-significant. 

 

68

 

Gender and test-taking beliefs. Next, a one-way MANOVA was performed to examine 

the relationships between grade level and test-taking expectancy and value beliefs. The results of 

the MANOVA indicated that there was a statistically significant difference by gender on the 

combined dependent variables F(5, 669) = 3.899, p = 0.002, partial η2 = .028. Further, post-hoc 

univariate one-way ANOVA analyses indicated that there was a statistically significant gender 

difference for cost, F(1, 673) = 11.228, p = .047. Specifically, females reported lower ratings for 

cost value (M = 1.57, SD = 0.75) than males (M = 1.78, SD = 0.92). Results indicated differences 

by gender level for expectancy beliefs, interest, utility, and attainment were non-significant. 

Logistic Regression Analyses 

 

A hierarchical logistic regression analysis was performed to investigate whether the 

demographic variables (grade, gender) and motivational variables (expectancy and value beliefs) 

were associated with the likelihood students exhibited low TTE. The bivariate correlation matrix 

describing the relationships between predictor and outcome variables is provided in Table 18. 

The linearity of the continuous variables with respect to the logit of the criterion variable was 

assessed via the Box-Tidwell (1962) procedure. A Bonferroni correction was applied using all 

ten terms in the model, resulting in statistical significance being accepted when p < .005 

(Tabachnick & Fidell, 2019). Based on these preliminary assessments, all of the continuous 

independent variables were found to be linearly related to the logit of the criterion variable. 

The first restricted model regressed grade and gender on the criterion variable of low 

TTE. The model explained 5.5% of the variance in the criterion variable, and the model was 

statistically significant, χ2(1) = 13.967, p = .001. The full model regressed all seven predictor 

variables (grade level, gender, expectancy, intrinsic, attainment, utility, and cost) on the criterion 

variable of low TTE, and the full model was statistically significant, χ2(7) = 26.224, p < .001. 

 

69

The full logistic regression model (see Table 19) explained 10.2% of the variance in the 

criterion variable and correctly classified 93.8% of cases. In the full model, four of the predictors 

(gender, grade level, attainment value, and cost value) had statistically significant associations 

with the criterion variable, whereas expectancy, interest value, and utility value did not. As such, 

the coefficients for the four significant predictor variables were interpreted. Within the model: 

(a)  Male students were 2.675 times as likely to exhibit low TTE as female students, 

controlling for all other factors. 

(b) Students in grade 7 were 2.070 times as likely to exhibit low TTE as grade 8 students, 

controlling for all other factors. 

(c)  For each one-point increase in cost value score (greater perceived cost), students were 

1.562 times as likely to exhibit low TTE, controlling for all other factors. 

(d) For each one-point increase in attainment value score, students were 0.684 times as 

likely to exhibit low TTE, controlling for all other factors; stated another way, each 

one-point decrease in attainment value score was associated with being 1.462 times 

as likely to exhibit low TTE, controlling for all other factors. 

 

 

70

CHAPTER VII 

DISCUSSION 

 

The purpose of the present investigation was to examine the prevalence of low TTE and 

to identify student-level correlates of low TTE. This discussion summarizes the major findings 

from two quantitative studies, notes general limitations, and describes implications for theory, 

research, and practice in this topic area. The research questions for this study were as follows: 

(1) What proportion of students in grades 4–8 exhibit low TTE on a CAT in reading, as 

determined by RTE (Wise & Kong, 2005)? 

(2) To what extent do demographic variables relate to the likelihood students exhibit low 

TTE on a CAT in reading? 

(3) To what extent do students differ in test-taking expectancy and value beliefs by student 

demographic variables? 

(4) To what extent do student test-taking expectancy and value beliefs relate to the likelihood 

that students exhibit low TTE on a CAT in reading? 

Summary of Major Findings 

The current findings extended the previous research literature on the prevalence of low 

TTE and the student-level correlates of low TTE. Specifically, this study made four primary 

contributions to the existing literature. First, this was one of the first studies to explore TTE 

using item response time data from K–8 test-takers. This large-scale investigation of student TTE 

was relatively unique, as few previous studies had measured TTE using item response time data 

(a feasible and non-obtrusive approach) rather than self-report (Wise, Ma, & Theaker, 2014). 

Results from Study I revealed that the proportion of test-takers with low TTE was 2.84%, which 

supported the research hypothesis that the observed prevalence would be in the range of 1–12%. 

 

71

 

Second, the study replicated earlier research on the student-level correlates of TTE, 

which has shown that certain demographic variables are associated with RTE. Specifically, as 

found in Wise and Cotton (2009), female students had significantly higher RTE scores than male 

students. The results of Study I supported the hypothesis that grade, gender, and race/ethnicity 

would be significantly associated with the odds a student was flagged as exhibiting low TTE. 

Third, this study advanced knowledge about motivational variables that might inform 

why certain students might be more likely to show low TTE. Results from Study II supported the 

hypothesis that certain test-taking beliefs would differ by grade and gender. More specifically, 

younger students reported higher utility and attainment, and female students reported lower cost. 

Fourth, much of the work on student TTE has been contextualized within the EEVT, but 

only a few empirical studies (e.g., Cole et al., 2008) had directly tested whether motivational 

variables from the EEVT are associated with student TTE. Results of Study II provided support 

for the hypothesis that certain test-taking beliefs would be associated with the odds of being 

identified with low TTE. Specifically, higher cost value beliefs were positively associated with 

low TTE, whereas higher attainment value beliefs were negatively associated with low TTE. 

Interpretation of Results 

 

Prevalence of low TTE (RQ1). A primary goal of this study was to measure the 

prevalence of low TTE in K–12 schools using response time data. There is evidence to suggest 

that this study effectively replicated previous research using the RTE approach. Consistent with 

previous work by Wise and colleagues (e.g., 2004; 2005; 2012), the accuracy rates of items 

classified as RGB were comparable to those expected by chance guessing, which suggests that 

item responses classified as RGB were almost certainly invalid responses. In this study, the 

accuracy rate of items classified as RGB was approximately 33% (for items with three choices). 

 

72

Results from Study I indicated that 2.84% of students were identified with low TTE, and 

subgroup analyses showed that prevalence rates for various demographic groups ranged from 

1.22–4.33%. Indeed, a majority of students in Study I (97.16%) had RTE scores above 0.90, a 

generally accepted indicator of adequate effort. Still, results indicated that 8,749 fourth-grade 

and 7,501 eighth-grade students exhibited low TTE on a STAR Reading benchmark assessment. 

These figures might seem negligible compared to the total number of fourth-grade and eighth-

grade test-takers (which exceeded 572,000), but those statistics represent 16,250 individual 

students whose test scores were likely not valid due to low TTE. Notably, results from Study II 

revealed an even higher rate of low TTE, with 6.22% of seventh-grade and eighth-grade students 

from the participating school district having RTE scores that were below 0.90. 

 

In general, the results of Study I were consistent with findings from previous studies by 

Wise and colleagues (e.g., 2004, 2010, 2012, 2016), which had applied Wise and Kong’s (2005) 

RTE method in K–12 contexts. For instance, Wise and colleagues (2010) reported average RTE 

scores by grade for a MAP test in reading; the mean RTE score for fourth-grade was 0.994, and 

the mean RTE for eighth-grade was 0.983. This study yielded similar results, with fourth-grade 

and eighth-grade students having mean RTE scores of 0.992 and 0.990, respectively. The total 

prevalence of low TTE in Study I was also relatively consistent with prior studies using different 

measures of TTE, such as Nering, Bay, and Meijer’s (2002) finding that 1.1% of K–12 students 

could be identified with low TTE using pattern-based (e.g., ABCDABCD) person-fit statistics. 

By contrast, the observed prevalence of low TTE in this study was considerably lower than those 

found in some other RTE studies with K–12 and college-age samples. For instance, Wise (2015) 

found over 11.5% of students in ninth grade were flagged with low TTE on a MAP reading test, 

which was much higher than the 3.3% of students in eighth grade with low TTE in Study I. 

 

73

One noteworthy difference between Wise’s (2015) investigation and the present study 

was the method used for measuring the criterion variable. Whereas the current study employed a 

common three-second threshold, Wise (2015) used the normative threshold method introduced 

by Wise and Ma (2012), which the authors claimed might be more appropriate for classifying 

items as SB or RGB. The normative threshold method could not be applied in this study, and 

therefore it is possible the prevalence of low TTE would have been higher using another method. 

As expected, the prevalence of low TTE was much lower than those in college studies. 

However, the reason for this discrepancy is unclear, and no studies have directly compared the 

TTE of K–12 and college students. One explanation is that college students have more control 

over how they spend their time (and less oversight over their behavior), which could suggest the 

opportunity cost of testing might be greater for college students. Another possibility is that the 

design of the test or testing context could result in higher perceived RD for college students. 

Previous studies (e.g., Swerdzewski et al., 2011) have involved assessment batteries consisting 

of multiple tests taken over several hours. By contrast, tests like STAR Reading typically are 

completed in fewer than thirty minutes. As such, it is possible that the higher rates of RGB in 

college contexts could be explained by mental fatigue that students experienced on exceptionally 

lengthy tests. A final explanation could be that many of the previous research studies of college 

students were based on student responses on an optional test, which many students might have 

viewed as particularly unimportant. On the contrary, school-age students are typically required 

to take tests like STAR Reading, which was the situation for the testing data used in the current 

study. Even though the reasons for such high rates of RGB in college settings is currently 

uncertain, the present study was important because it helped to further clarify the prevalence of 

low TTE in K–12 settings. Findings were strengthened by the use of a large, national dataset. 

 

74

Student-level correlates of low TTE (RQ2). The second contribution of this study was 

that it addressed the relationships between student demographic variables and the odds a student 

was identified with low TTE. Notably, gender was the strongest student-level predictor of low 

TTE in Study I, with male students being 2.28 times as likely to exhibit low TTE as female 

students. Because the majority of students in the sample had RTE scores of 1.0, results indicated 

a small effect size when comparing mean RTE scores by gender (d = 0.178). Still, the results of 

logistic regression analyses did reveal a significant gender difference in the log odds of being 

flagged with low TTE. Specifically, 68% of the students flagged with low TTE in Study I were 

male, even though the large sample was balanced by gender. Results suggested that male 

students were more than twice as likely to exhibit low TTE, controlling for grade and 

race/ethnicity. The results from Study II were similar, indicating that 71% of students with low 

TTE were male, despite the sample also being balanced by gender. These findings were 

convergent with previous research on gender and TTE, which has shown lower TTE in male 

students (Eklöf, 2007; Wise & DeMars, 2010; Wise et al., 2004). This investigation extended 

previous research because it was the first study to statistically test mean differences in the RTE 

scores of male and female students in elementary or middle school, whereas previous studies 

(e.g., Wise et al., 2010) had reported only descriptive statistics by group. 

Regarding student grade level, the predicted relationship between grade and TTE was 

supported, with the log odds of exhibiting low TTE being 1.317 times as high for eighth-grade 

students compared to fourth-grade students. Again, the effect size was small for differences in 

mean RTE score by grade (d = 0.069), but the proportion of students flagged with low TTE was 

significantly higher for eighth-grade students than fourth-grade students. These findings were 

consistent with prior research that has shown a decrease in TTE as students go through school. 

 

75

The observed proportion of students with low TTE in grades four (2.54%) and eight 

(3.31%) in Study I were lower than those observed by Hauser and Kingsbury (2009) for grades 

three (6.9%) and nine (7.7%), although those authors had used a different flag for identifying low 

TTE. Wise and colleagues (2010) had also observed higher rates of low TTE in eighth-grade 

students compared to fourth-grade students. When considered together, the results of the present 

study seem to support previous findings that student TTE tends to decline as students age, which 

would be consistent with previously documented declines in general academic motivation over 

time (Archambault et al., 2010; Durik et al., 2006; Jacobs et al., 2002; Wigfield et al., 1997). It 

could be that elementary teachers more consistently communicate the importance of giving one’s 

best effort on a test, whereas middle school teachers might assume that their students have 

already internalized the rationale for doing one’s best on the test. Another possible explanation 

could be that elementary teachers monitor their students more closely while they take the tests. 

Furthermore, there were also significant differences by race/ethnicity in the proportion of 

students who were found to exhibit low TTE. The demographic subgroups in Study I with the 

highest proportion of students exhibiting low TTE were Black students (4.33%) and Hispanic 

students (3.08%). This trend was consistent with prior research findings that Black and Hispanic 

students may be at higher risk for academic disengagement when compared to White students 

(Graham & Taylor, 2002). Conversely, Asian/Pacific Islander students had the highest average 

RTE score and were the least likely to be identified as exhibiting low TTE. Previous research has 

suggested Asian American/Pacific Islander students tend to report greater overall educational 

expectations from their families compared to European American students (Mello, 2009), and so 

it is possible the students from that subgroup may internalize those high expectancy beliefs. 

 

76

Taken together, the results of this study revealed several subgroup differences in RTE 

score and the likelihood of being identified with low TTE on a STAR Reading test, and all of the 

significant relationships were in the predicted directions. Students with the highest likelihood of 

exhibiting low TTE were male, Black, and eighth-grade students, and students with the lowest 

log odds were female, Asian/Pacific Islander, and fourth-grade students. These results supported 

the research hypothesis that different subgroups of students would show varying levels of TTE 

on a low-stakes reading test. Future research should continue to address why young, female, 

White, and Asian/Pacific Islander students tend to show higher TTE than other student groups. 

Psychological correlates of low TTE (RQ3 and RQ4). As previously noted, several 

findings from Study II represented meaningful contributions to the previous research literature 

on student beliefs about academic test-taking and student TTE. 

First, the current findings provided support for the hypothesis that some test-taking value 

beliefs would differ by demographic group. Results suggested that two subtypes of value beliefs 

(utility and attainment) varied by grade, and one type of value belief (cost) varied by gender. 

Specifically, seventh-grade students in this sample reported perceiving the test as more useful 

(higher utility value) and more relevant to their identities (higher attainment value) on average 

than eighth-grade students; comparisons by gender indicated male students reported perceiving 

the test as a greater loss of other opportunities (higher cost value) on average than females. 

The significant relationships between grade and test-taking value beliefs were consistent 

with survey research by Paris and colleagues (1991, 2000), who found that older students were 

less likely to describe achievement tests as valuable to them. However, neither interest value nor 

cost value was significantly related to low TTE. The relationship between grade and interest 

value may have been nonsignificant because students reported low interest in the test overall. 

 

77

Further, one type of value belief (cost) was significantly associated with gender, and this 

relationship had the largest effect size of all comparative analyses (d = 0.257). This result could 

suggest cost value beliefs might be particularly crucial for explaining differences in TTE. Still, it 

remains unclear why male students reported higher cost value beliefs in regard to taking this test. 

Second, results did not support the hypothesis that test-taking expectancy beliefs would 

differ significantly by grade or by gender. This finding was unexpected given previous research 

showing age and gender differences in academic self-efficacy. Instead, the majority of students 

reported moderate expectancies for success the next time they would take a STAR Reading test. 

One possible explanation for the lack of variability in test-taking expectancy beliefs could be 

related to the adaptive nature of the test. That is, the item-selection algorithm for the STAR 

Reading test is designed such that all students receive items at a level that is determined to be of 

moderate difficulty given their earlier individual performance results, which might explain why 

the students reported neutral test-taking expectancy beliefs overall. On the other hand, it is also 

possible that the nonsignificant differences in expectancy beliefs have pointed to a meaningful 

finding about TTE in school-age students. Specifically, the results could suggest that test-taking 

value beliefs are more likely to vary among students than test-taking expectancy beliefs about 

low-stakes test, which would potentially make the value component of the EEVT particularly 

critical for a better understanding of test-taking perceptions and student TTE in general. Further 

research on expectancy beliefs in low-stakes testing settings could help address this question. 

Third, the results of Study II provided mixed support for the hypothesis that test-taking 

value beliefs would be significantly associated with low TTE. Specifically, the current findings 

suggested that two types of value beliefs (attainment and cost) were significantly associated with 

the likelihood of being identified with low TTE (whereas interest and utility were not). 

 

78

The results of Study II suggested that perceptions that taking a STAR Reading test had a 

higher relative cost (i.e., limited them from participation in some other preferred activities) were 

associated with poorer TTE. Controlling for other factors, a one-point increase in relative cost 

beliefs was associated with being 1.55 times as likely to exhibit low TTE. Likewise, a one-point 

decrease in attainment value was associated with students being 1.435 times as likely to exhibit 

low TTE, which was consistent with previous research on relationships between value beliefs 

and TTE. Cole and colleagues (2008) gave a self-reported value belief scale to college students 

who had taken general education tests, and they found that two distinct types of test-taking value 

beliefs (perceived utility value and attainment value) were significant predictors of self-reported 

TTE. This study provided further evidence that test-taking attainment value beliefs are related to 

TTE, but results did not support the relationship between test-taking utility value and TTE. 

Fourth, results did not support the hypothesis that test-taking expectancy beliefs would be 

significantly associated with low TTE. This unexpected finding is especially noteworthy because 

it suggests that the test-taking value component of the current model might be particularly crucial 

for our understanding of low TTE on this test, whereas the test-taking expectancy component of 

the model might not have the same explanatory power. Previous studies (e.g., Wise et al., 2009) 

have shown that item difficulty is unrelated to the likelihood a student displays RGB or SB, and 

so these findings might suggest student perceptions of their ability to pass a test item have less 

influence on TTE than their perceptions about whether it would be valuable to try their best. 

Altogether, this study advanced our understanding of TTE, but researchers will need to 

continue investigating the phenomenon of TTE to extend our understanding of the reasons why 

low TTE occurs. Given the remaining questions in this topic area, the present findings have shed 

light on some essential lines of inquiry for researchers to address through future empirical work. 

 

79

Implications for Theory and Research 

The current findings provided evidence that demographic characteristics and motivational 

variables were significant correlates of low TTE, which has important implications for Wise and 

Smith’s (2011) demands–capacity model of TTE. Surprisingly, results suggested value beliefs, 

but not expectancy beliefs, predicted the likelihood a student exhibited low TTE. However, the 

students in the current sample reported limited variability in their expectancy beliefs related to 

the STAR Reading test, with students reporting moderate to positive expectancy for success. 

Future research with a nationally representative and randomly selected sample is needed to 

corroborate the nonsignificant relationship between expectancy beliefs and the odds of low TTE. 

Conversely, this finding could highlight the relative importance of the value component 

of an EEVT-informed model of TTE and suggest that theoretical models of TTE should account 

for the specific subtypes of task value. As such, future researchers could further investigate how 

attainment value and cost value relate to TTE. An essential next step might be to use more 

advanced statistical modeling to examine whether changes in attainment value over time might 

be a factor that explains the observed decline in TTE as students age (e.g., Wise et al., 2010). It 

also seems critical to examine further why male students reported perceiving this sort of reading 

test as having a higher cost compared to female students. Relatedly, the current investigation 

demonstrated distinct differences in the TTE of students from different racial/ethnic groups, but 

there remain questions about why these differences may exist. Given the significantly elevated 

risk of disengagement from test-taking for Black and Hispanic students, it seems especially 

important for future researchers to explore the motivational patterns of students from different 

racial/ethnic groups and uncover why these disparities exist. Doing so could be an important first 

step toward developing differentiated strategies to support the TTE of these student subgroups. 

 

80

Indeed, evolving models of TTE should reflect the complex nature of engagement, which 

has behavioral, emotional, and cognitive components (Fredricks, Blumenfeld, & Paris, 2004). 

Wise and Smith’s (2011) model suggests that TTE results from interactions among student-level, 

test-level, and contextual variables, and previous research on school engagement suggests that 

engagement is influenced by “family, community, culture, and educational context” (Fredricks et 

al., 2004, p. 73). To extend upon the current findings, it might be critical for future research to 

explore the contextual variables that might be malleable antecedents of low TTE. As Wigfield 

and colleagues (2016) stated, “experiences with different activities, parent and teacher feedback 

about the importance and usefulness of different activities, and children’s comparison of interest 

in different activities to those of their peers may all influence children’s valuing of activities” (p. 

62). As such, future research might focus on specific behaviors by parents and teachers that 

could affect student perceptions about test-taking. For instance, teacher recognition when 

students give their best effort has been found to support their motivation and engagement 

(Schunk & Pajares, 2009). It is possible that students receive little to no feedback on their effort 

when they are taking CBTs, which could diminish the attainment value of the test. Although 

teachers report communicating the importance of learning (Brophy, 1999), it is unclear whether 

they convey the value of doing one’s best on tests that don’t count toward a grade. Therefore, 

future research could explore how teachers present the purpose and importance of TTE before 

administering the test. Newmann and colleagues (1992) asserted that authentic academic tasks 

are “meaningful, valuable, significant, and worthy of one’s effort, in contrast to those considered 

nonsensical, useless, contrived, trivial, and therefore unworthy of effort (p. 23). Knowing more 

about how teachers connect the testing process to the “real world” could shed light on the 

mechanism by which students develop positive or negative perceptions about a test. 

 

81

Furthermore, future research should address contextual factors that may help explain the 

subgroup differences in TTE observed in the current student. Previous research has revealed 

numerous factors that could explain differences in school engagement by students from minority 

groups, but the reason for group differences in TTE still remains unclear. It is possible that some 

students from certain minority groups may demonstrate behavioral disengagement as a conscious 

response to an education system they view as unjust (Ogbu, 1992). Others may underperform in 

school because they fear that giving their best effort could distance them from their peer group 

(Fordham & Ogbu, 1986). Other scholars have found that stress from stereotypes about the 

underachievement of a minority group results in poorer performance for students from that group 

(Osborne, 1997; Steele, 1997). Parental expectations for children’s achievement are associated 

with student engagement among students from racial and ethnic minority groups (Murray, 2009), 

so it might be helpful for researchers to explore whether parental beliefs about the importance of 

test-taking are correlates of student TTE. In sum, future research can build on the current study 

by expanding what is known about how cultural, familial, and environmental variables affect the 

development of test-taking beliefs, which would be expected to proximally influence TTE. 

Next, the current findings pointed to the importance of further studying student cost 

beliefs associated with test-taking. Recently, scholars have argued for further consideration of 

cost in models of motivation and achievement (e.g., Jiang, Rosenzweig, & Gaspard, 2018), given 

cost beliefs have been a “historically neglected” component of the EEVT in prior research 

(Flake, Barron, Hulleman, McCoach, & Welsh, 2015, p. 232). A defining feature of cost beliefs 

in the EEVT is that perceived cost is considered a negative component of motivation, whereas 

intrinsic value, utility value, and attainment value are each considered positive indicators. 

 

82

As such, some scholars have suggested that cost might need to be conceptualized as a 

separate construct, which has resulted in the recent development of expectancy-value-cost 

models (Jiang et al., 2018). Indeed, empirical research suggests cost is more complex than 

previously thought, and scholars have proposed multi-factor conceptual models of cost value 

(e.g., Battle & Wigfield, 2003; Perez, Cromley, & Kaplan, 2014). For example, Flake and 

colleagues (2015) defined four distinct subtypes of cost: task effort cost (“negative appraisals of 

time, effort, or amount of work put forth to engage in the task”), outside effort cost (“negative 

appraisals of time, effort, or amount of work put forth for task other than the task of interest”), 

loss of valued alternatives cost (“a negative appraisal of what is given up as a result of engaging 

in the task of interest”) and emotional cost (“negative appraisals of a psychological state that 

results from exerting effort for the task”) (p. 237). 

If cost perceptions are truly multi-dimensional as posited by Flake and colleagues (2015), 

it is possible the measure of cost in the current study did not adequately measure this construct. 

The two cost items derived from Conley (2012) most closely align with the loss of valued 

alternatives cost dimensions, but it is possible students vary more in task effort cost or emotional 

cost perceptions. The demands–capacity model suggests the RD of a test item is a primary 

determinant of TTE, and research has shown perceived mental taxation of an item is negatively 

associated with the amount of TTE students exhibit (Wolf et al., 1995). Thus, researchers should 

consider exploring the relationships between individual sub-types of cost beliefs and student 

TTE. It is possible an expectancy–value–cost model of TTE could have even greater explanatory 

power than an expectancy–value model, given that cost has been found to predict more variance 

in school disengagement than expectancies and values alone (e.g., Perez et al., 2014). 

 

83

Because low TTE is an example of a maladaptive academic outcome, a greater focus on 

cost (which has a negative valence) might be an important next step. Further research on the 

emotional costs of testing would be consistent with Wise and Smith’s (2011) proposition that test 

anxiety might be a key determinant of a student’s EC during test-taking. Indeed, we know that 

test anxiety is a multi-faceted constellation of maladaptive affective, behavioral, and cognitive 

responses to test-taking (Pekrun & Stephens, 2015), and it could be critical for future researchers 

to consider whether disengagement from test-taking could be an example of avoidance-oriented 

coping (see Spangler, Pekrun, Kramer, & Hofmann, 2002) resulting from an individual’s 

inability to regulate negative emotions experienced during a test (i.e., emotional cost). 

Finally, another critically important area for future empirical work would be applied 

research on the potential effectiveness of motivational interventions that are designed to promote 

student TTE and prevent low TTE from occurring on low-stakes assessments. Given the current 

finding that value beliefs were the only motivational variables significantly associated with low 

TTE, further research may need to focus on the effects of value-oriented interventions specific to 

academic test-taking. In fact, there have been several empirical studies that have documented the 

effectiveness of targeted motivational interventions that are rooted in the contemporary EEVT 

(see Lazowski & Hulleman, 2016). Recently, motivational interventions have been designed to 

specifically target the various subtypes of perceived task value (e.g., Canning et al., 2018; 

Gaspard et al., 2015). Value-based interventions have been found to reduce achievement gaps 

between different racial/ethnic groups (Harackiewicz, Canning, Tibbetts, Priniski, & Hyde, 

2016) and differentially support student achievement according to gender (Rozek, Hyde, 

Svoboda, Hulleman, & Harackiewicz, 2015). 

 

84

Given the clear need for strategies that support student TTE, researchers should study 

whether motivationally supportive interventions could be effective for reducing the proportion of 

K–12 who exhibit low TTE. Research has shown brief, psychosocial interventions can change 

student beliefs about themselves as learners and can have long-term benefits for their academic 

and social-emotional wellbeing (Walton & Cohen, 2011). These value interventions have 

included reflections about the personal relevance of course content and brief essay-writing 

assignments (e.g., Harackiewicz et al., 2016), prompts to generate a list of a few successful peers 

(e.g., Walton & Cohen, 2007), videos and course discussions related to academic motivation and 

performance (e.g., Struthers & Perry, 1996), direct communication about the utility value of an 

activity (e.g., Durik & Harackiewicz, 2007), and sharing online resources designed to normalize 

feelings of anxiety about beginning a new academic program (e.g., Walton & Cohen, 2007). 

Experimental research has demonstrated the efficacy of such value interventions for improving 

the self-reported intrinsic value, utility value, and attainment value of college participants, and 

there is evidence to suggest these interventions are especially helpful for students from lower 

SES families and students from racial/ethnic minority groups (Harackiewicz et al., 2016). The 

existing interventions have been focused on general academic motivation, but the findings from 

this study point to a critical need for instructional strategies and motivational interventions that 

are specifically designed to target test-taking motivation in order to reduce disparities in TTE. 

Implications for Practice 

 

From a practical perspective, there are two critical implications for test developers and 

educators: 1) prevention of low TTE and 2) identification and response to low TTE. The current 

findings suggested that middle school students generally find reading tests like STAR Reading to 

be uninteresting, unenjoyable, moderately useful, and costly compared to other activities. 

 

85

In light of these findings, practitioners must consider ways to support student’s beliefs 

about the value of test-taking. First, test developers would benefit from considering the issue of 

student TTE more carefully when designing and evaluating new tests. Although best practice in 

educational testing indicates the importance of validity evidence based on response patterns 

(AERA et al., 2014), this type of validity evidence is rarely reported by test developers. The 

current study further demonstrated how Wise and Kong’s (2005) RTE approach could be applied 

as an efficient and non-obtrusive method for evaluating if a CBT is vulnerable to frequent RGB 

by test-takers. Accordingly, future educational assessments could include features intended to 

help prevent students from exhibiting low TTE. Wise and colleagues (2006) showed that a brief 

warning message triggered by multiple instances of RGB could successfully prevent students 

from RGB on subsequent items. This effort-monitoring feature could be incorporated into other 

CBTs as a strategy for supporting TTE while students are still completing the test. Similarly, 

tests could be designed to flag individual test scores that are potentially invalid immediately. 

Like the validity indices automatically calculated for computer-based rating scales for emotional 

and behavioral disorders, it would be reasonable to consider using RTE as a validity indicator for 

CBTs. Doing so could help their teachers determine whether additional testing might be needed 

in order to gather more accurate data. In fact, one test system adopted this approach and created 

guidelines for how teachers can respond in the moment to disengaged students (NWEA, 2018). 

Second, the current findings suggest that students might benefit from having the utility 

value of testing be made more salient to them, given that students reported low utility value 

beliefs in general. Previous research has shown that modifying test instructions to emphasize the 

importance of demonstrating good effort on the test results in higher scores and lower rates of 

low TTE (Brown & Walberg, 1993; Liu et al., 2015). 

 

86

Sessoms and Finney (2015) recommended that students should be made aware of the 

usage and value of a test in the weeks before the actual test administration, and so providing 

motivationally supportive instructions might be a simple preventative strategy. Relatedly, there 

may be a need for teachers to provide direct “motivational instruction” (Liu et al., 2015) to 

support the test-taking motivation and subsequent performance of students, particularly in the 

case of low-stakes testing. Research suggests many college students do not report understanding 

the purpose of standardized tests they took during their K–12 education (Zilberberg et al., 2014), 

and so it is questionable whether teachers are making their students aware of how testing is a 

useful part of their education. As previously noted, it is possible that motivational interventions 

could potentially help to increase the perceived value of test-taking. Teachers might explicitly 

share how the test will be used to help students with their academic growth, invite students to 

reflect on the personal relevance of the tested skills for their future academic and professional 

goals, include students in personal goal-setting related to their test scores, and encourage 

students to articulate the importance of giving their best effort in school. 

Finally, this study highlights the importance of Wise’s (2015) ISV process. Given the 

observed prevalence of RGB on low-stakes academic testing, educators must pay close attention 

to the potential adverse effects of low TTE on test validity. ISV is a systematic approach for 

identifying and responding to low TTE, and practitioners should consider applying Wise and 

Kong’s (2005) RTE approach when using CBT data to make important educational decisions. 

Research has shown that practical applications of RTE can inform data-based decisions related to 

program evaluation (Wise & DeMars, 2010), test fraud (Wise, Ma, & Theaker, 2014), teacher 

evaluation (Wise et al., 2012), and estimation of growth scores (Wise, 2015). 

 

87

Given that educators do not always make accurate inferences about the reasons why 

students achieve low test scores (VanDerHeyden, Witt, & Naquin, 2003), practitioners should 

consider Wise’s (2015) assertion that addressing TTE is an issue of professional ethics. Critical 

evaluation of the distinction between “can’t do” and “won’t do” problems can help educators to 

make better decisions based on student testing data (VanDerHeyden, Witt, & Gilbertson, 2007). 

This study suggests that the ISV process might be a reasonable way to inform such decisions. 

Limitations 

Several limitations to the current investigation must be acknowledged, among which are 

issues concerning the research design, external validity, measurement, and statistical analyses in 

the present empirical study. It will be necessary for future research to address these limitations. 

First, the correlational nature of this research study precludes making inferences about 

causality. Specifically, the significant associations among student demographic characteristics, 

test-taking beliefs, and the likelihood of exhibiting low TTE may have been confounded by other 

variables that were not examined in this study. Indeed, the EEVT would imply that several 

additional factors might contribute to the presence of low TTE, including cultural and family 

characteristics, behaviors of key socializers, individual goals and self-schemata, and affective 

reactions to previous achievement outcomes. In particular, no variable was readily available as a 

control for general academic ability or previous reading achievement. As such, it is possible that 

group differences in achievement could have mediated or moderated relationships between 

demographic variables and low TTE. Specifically, the smaller group size for students in grade 8 

could potentially suggest that many of the eighth-grade students who took this test were lower 

performing students who were receiving supplemental reading support. Thus, the correlational 

findings must be interpreted with caution, recognizing the potential for confounding variables. 

 

88

 

Second, questions remain about the external validity of the current investigation. Due to 

the limited number of empirical studies of TTE in K–12 schools, it is unclear how the current 

findings might generalize to different grade levels or different assessments. Additionally, the 

current operationalization of the outcome variable of interest, low TTE, was selected based on 

the previous literature, but there is currently no empirical justification for using the 0.90 criterion 

as a flag for low TTE. Thus, one must exercise caution when comparing the current results with 

those of other studies of student TTE that operationalized low TTE another way. Also, Study II 

used a convenience sample of students from a specific geographical region, rather than a random, 

nationally representative sample. Therefore, the findings of Study II might not reflect the beliefs 

reported by students from different schools or with different demographic characteristics. 

 

Third, measurement limitations must be acknowledged. One potential issue was the use 

of self-report in Study II. It is possible students demonstrated social desirability bias when taking 

the survey, and it is also reasonable to question whether students who showed low TTE on the 

STAR Reading test might have also answered quickly or carelessly when completing the SPOTS 

(which also had no consequences for them). Another concern is that the sample size for Study II 

was limited by the number of survey responses that could not be linked to a corresponding STAR 

Reading test ID and were excluded from subsequent analyses. Because all PII had been removed 

and teachers were not contacted for their input, the researcher could not further investigate why 

some of the participants entered ID numbers that were not unique or entered no ID number. It is 

possible these students entered their ID number incorrectly or that some students indeed had an 

identical ID number. Regardless, it would be reasonable to question whether some of the 

students whose data could not be matched had shown low TTE, in which case the subset of 

participants retained in the dataset might not have been representative of the whole sample. 

 

89

Another concern related to measurement was that the items in the survey were adapted 

from the original measures, and so the validity of the SPOTS has yet to be proven. Specifically, 

the measure of test-taking expectancy beliefs was adapted from an academic efficacy scale. 

Given that expectancy beliefs and general academic self-efficacy are distinct constructs, it is 

unclear if the SPOTS items measured test-taking perceptions specific to the STAR Reading test 

(as opposed to general perceptions of academic ability or reading skill). A related concern is 

whether the scale measured current expectancy beliefs rather than attributions for past successes 

or failures. It is uncertain whether students were truly thinking about taking the test in the future. 

Further empirical work focused on the measurement of test-taking perceptions is warranted. 

Next, the current study was limited by the statistical procedures that were employed. The 

significance threshold of 0.05 was selected based on the conventional p-value used in the extant 

research literature and recommended by experts in statistical methods for the social sciences 

(Agresti & Finlay, 2009). However, other scholars have recently argued that the default p-value 

should be decreased (e.g., Benjamin, Berger, Johannesson, Nosek, Wagenmakers, et al., 2018). 

As such, a more stringent criterion for statistical significance would have yielded nonsignificant 

results for some of the relationships investigated in Study II. Additionally, the statistical analyses 

used in Study II did not allow for a complete investigation of the hypothesized relationships 

between student-level characteristics, test-taking beliefs, and low TTE. That is, this study did not 

address potential mediation effects or employ structural equation modeling to analyze the extent 

to which the predictor variables might be directly or indirectly related to the outcome variable of 

interest. Thus, further exploration of these relationships will add to our understanding of how the 

components described in the Wise–Smith (2011) model of TTE could relate to one another. 

 

90

A final limitation of this investigation warrants further discussion. Specifically, the data 

suggested that one of the primary independent variables of interest, LD status, did not appear to 

be measured reliably based on the data provided by the test company. According to the National 

Center for Learning Disabilities (Cortiella & Horowitz, 2014), about 2.4 million students in the 

United States (5%) have an identified LD. However, the results from Study I indicated that fewer 

than 0.2% of students in the sample had the LD status variable endorsed as 1 (“Yes”). The LD 

status of the remaining students (N = 570,899) could not be identified with an adequate certainty. 

Accordingly, there are several reasons why the findings related to LD status and TTE 

should be interpreted with caution. First, the LD status variable in the dataset did not specify 

whether students had a specific LD in the area of reading (as opposed to writing or mathematics). 

The research hypothesis that students with LD would be more likely to exhibit low TTE on the 

STAR Reading test was predicated on the notion students with identified difficulties in reading 

would be more likely to hold maladaptive motivational beliefs related to reading tests, resulting 

in lower TTE. However, it is possible that some of the students in the LD group had a specific 

LD in another academic area, in which case it is possible that reading was actually an area of 

strength for those students. This is merely speculation given that additional information about the 

specific educational disabilities of students in this group could not be accessed. For this reason, it 

is unclear to what extent the present findings about LD status might be generalizable to the larger 

population of students with LD in elementary and middle schools across the country. Next, the 

estimated prevalence of low TTE for students with an LD was derived from a relatively small 

sample of students (N = 487) when compared to the estimates for other groups. If data on LD 

status had been provided for a higher number of students, then there would have been stronger 

evidence that the observed prevalence (5.13%) accurately represented this subgroup of students. 

 

91

Finally, the LD status variable was excluded from all of the logistic regression analyses 

because more than 99% of cases were treated as having missing data for the LD status variable. 

As such, the present study did not directly test whether LD status is significantly associated with 

the odds of exhibiting low TTE when controlling for other demographic variables. If students 

with LD indeed have a higher likelihood of being identified with low TTE, then it might be 

especially critical to consider TTE when making educational decisions based on the testing data 

from SWD. As future research continues to inform the Wise–Smith (2011) demands–capacity 

model of TTE, researchers are encouraged to consider disability status as a potentially important 

student-level correlate of TTE. Additional research on this issue will be necessary. 

Conclusion 

Cronbach (1946) cautioned test users to consider what the response sets of test-takers 

might suggest about the validity of their scores, and this issue remains true seven decades later. 

With this in mind, researchers and practitioners will need to be innovative about how to promote 

student TTE and optimally use test data. The current study extended previous research on TTE 

by examining the relationships among student-level variables, test-taking beliefs, and TTE on a 

low-stakes test. As expected, students from various subgroups showed differences in their TTE, 

and certain malleable motivational variables were significantly related to the odds of exhibiting 

low TTE. Overall, results pointed to the potential importance of theoretically-driven motivational 

strategies focused on increasing the perceived value of testing and reducing the perceived cost. 

 

92

 

 

 

 

 

 

 

 

 

 

 

 

 

APPENDICES 

 

93

APPENDIX A. 

TABLES AND FIGURES. 

Table 1. 

Studies Measuring Test-Taking Effort Using Response Time Effort 

Study 

N 

Grade 

M (RTE) 

Low TTEa 

Studies in higher education contexts 

College 

College 

College 

— 

— 

0.94 

College 

0.52–0.71b 

College 

0.85–1.00 

7.4 

11.2 

12.2 

— 

— 

College 

0.93–0.95 

9.4–11.9 

College 

College 

0.94 

0.90 

9.0 

24.1 

College 

0.943–0.996 

0.6–11.0 

College 

— 

35.6 

2.9 

23.5 

472 

435 

524 

714 

981 

488 

802 

386 

706 

303 

Wise & Kong (2005) 

Wise, Bhola, & Yang (2006) 

Wise & DeMars (2006) 

Kong, Wise, Harmes, & Yang 

(2006) 

DeMars (2007) 

Kong, Wise, & Bhola (2007) 

Wise & Cotten (2009) 

Wise, Pastor, & Kong (2009) 

Wise & DeMars (2010) 

Swerdzewski, Harmes, & Finney 

(2011) 

Setzer, Wise, van den Heuvel, & 

Ling (2013) 

Wise, Kingsbury, Thomason, & 

Kong (2004) 

Wise, Ma, Kingsbury, & Hauser 

(2010) 

Wise & Ma (2012) 

Wise, Ma, & Theaker (2014) 

Wise (2015) 

Rios, Liu, & Bridgeman (2014) 

132 

College 

— 

10,004 

College 

0.987 

Studies in K–12 contexts 

2382 

6–10 

0.996 

1.1 

711,831 

573,951 

26,879 

18,039 

3–9 

3–9 

3–8 

9 

0.971–0.998 

0.2–11.9 

0.960–0.993 

— 

0.987–0.997 

1.4–7.3 

— 

0.99 

11.5–11.9 

1.95 

Wise & Kingsbury (2016) 

285,230 

2–12 

Note. Mean RTE represents the total proportion of solution behavior across all item responses. 
aLow TTE represents the percentage of examinees with RTE scores at or below 0.90. bA range 
of values indicates results were disaggregated by subgroup, assessment, or RTE threshold. 

 

94

n 

572,847 

 

345,971 

226,876 

 

279,608 

279,443 

13,796 

 

211,548 

97,261 

92,952 

20,803 

5,806 

10,147 

Percent 

100.00 

 

60.40 

39.60 

 

48.81 

48.78 

2.41 

 

36.93 

16.98 

16.23 

3.63 

1.01 

1.77 

134,330 

23.45 

 

491 

572,356 

 

0.09 

99.91 

Table 2. 

Demographic Information for Sample (Study I) 

Variable 

Full Sample 

Grade 

     4th 

     8th 

Gender 

     Male 

     Female 

     Unknown 

Race/Ethnicity 

     White 

     Hispanic 

     Black 

     Asian/Pacific Islander 

     American Indian/Alaskan Native 

     Other Race/Ethnicity 

     Unknown 

Learning Disability Status 

     Student with Learning Disability 

     Unknown 

 

 

 

 

95

Table 3. 

Distribution of RTE Scores for Sample (Study I) 

RTE Score 

Number of SB 

RTE Score 

n 

 

 

 

34 

33 

32 

31 

 

30 

29 

28 

27 

26 

25 

24 

23 

22 

21 

20 

19 

18 

17 

16 

15 

14 

13 

12 

< 12 
 

Total 

Missing 

RTE > 0.90 

 

 

 

 

RTE < 0.90 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

572,847 

1,461 

555,136 

512,688 

24,047 

11,314 

7,087 

16,250 

4,723 

3,273 

2,430 

1,749 

1,226 

887 

663 

442 

309 

210 

137 

89 

42 

46 

15 

6 

1 

2 

0 

– 

 

 

 

1.0 

0.97 

0.94 

0.91 

 

0.88 

0.85 

0.82 

0.79 

0.76 

0.74 

0.71 

0.68 

0.65 

0.62 

0.59 

0.56 

0.53 

0.50 

0.47 

0.44 

0.41 

0.38 

0.35 

– 

96

Percent 

100.00 

0.26 

96.91 

89.50 

4.20 

1.98 

1.24 

2.84 

0.82 

0.57 

0.42 

0.31 

0.21 

0.15 

0.12 

0.08 

0.05 

0.04 

0.02 

0.02 

0.01 

0.01 

0.00 

0.00 

0.00 

0.00 

0.00 

– 

Mean 

0.9912 

 

0.9922 

0.9897 

 

0.9880 

0.9943 

 

0.9928 

0.9905 

0.9869 

0.9961 

0.9922 

0.9912 

 

0.9836 

0.9912 

SD 

0.0357 

 

0.0340 

0.0382 

 

0.0419 

0.0280 

 

0.0317 

0.3743 

0.0442 

0.0224 

0.0327 

0.0356 

 

0.0494 

0.0357 

Table 4. 

Mean RTE Scores by Subgroup (Study I) 

Variable 

Full Sample 

Grade 

     4th 

     8th 

Gender 

     Male 

     Female 

Race/Ethnicity 

     White 

     Hispanic 

     Black 

     Asian/Pacific Islander 

     American Indian/Alaskan Native 

     Other Race/Ethnicity 

Learning Disability Status 

     Student with Learning Disability 

     Unknown 

 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

n 

572,847 

 

345,971 

226,876 

 

279,608 

279,443 

 

211,548 

97,261 

92,952 

20,803 

5,806 

10,147 

 

487 

570,899 

97

Table 5. 

RTE Scores by Grade and Gender (Study I) 

Variable 

Full Sample 

4th Grade 

     Male 

     Female 

8th Grade 

     Male 

     Female 

 
 
Table 6. 

n 

572,847 

 

126,432 

125,379 

 

82,127 

82,813 

RTE Scores by Grade and Race/Ethnicity (Study I) 

Variable 

Full Sample 

4th Grade 

     White 

     Hispanic 

     Black 

     Asian/Pacific Islander 

8th Grade 

     White 

     Hispanic 

     Black 

     Asian/Pacific Islander 

n 

572,847 

 

125,500 

59,887 

56,202 

13,088 

 

85,500 

36,982 

36,370 

7,647 

 

98

Mean 

0.9912 

 

0.9897 

0.9947 

 

0.9853 

0.9938 

Mean 

0.9912 

 

0.9940 

0.9917 

0.9878 

0.9966 

 

0.9911 

0.9886 

0.9856 

0.9953 

SD 

0.0357 

 

0.0393 

0.0274 

 

0.0458 

0.0290 

SD 

0.0357 

 

0.0293 

0.0349 

0.0432 

0.0208 

 

0.0347 

0.0411 

0.0455 

0.0248 

Table 7. 
 
RTE Scores by Gender and Race/Ethnicity (Study I) 

Variable 

Full Sample 

Male 

     White 

     Hispanic 

     Black 

     Asian/Pacific Islander 

Female 

     White 

     Hispanic 

     Black 

     Asian/Pacific Islander 

 

n 

572,847 

 

104,439 

48,304 

45,447 

10,369 

 

102,908 

48,234 

46,759 

20,660 

Mean 

0.9912 

 

0.9902 

0.9868 

0.9825 

0.9946 

 

0.9955 

0.9942 

0.9911 

0.9977 

SD 

0.0357 

 

0.0374 

0.0443 

0.0510 

0.0265 

 

0.0244 

0.0286 

0.0358 

0.0171 

99

 
 

 

Table 8. 

Proportion Identified with Low TTE by Subgroup (Study I) 

Variable 

n 

RTE < 0.90 

Percent 

Full Sample 

Grade 

     4th 

     8th 

Gender 

     Male 

     Female 

Race/Ethnicity 

     White 

     Hispanic 

     Black 

     Asian/Pacific Islander 

     American Indian/Alaskan Native 

     Other Race/Ethnicity 

Learning Disability Status 

     Student with Learning Disability 

571,386 

16,250 

 

344,945 

226,441 

 

278,872 

278,742 

 

211,070 

96,957 

92,682 

20,738 

5,798 

10,117 

 

487 

 

8,749 

7,501 

 

11,016 

4,930 

 

4,784 

2,982 

4,012 

252 

157 

283 

 

25 

     Unknown 

570,899 

16,225 

 

2.84 

 

2.54 

3.31 

 

3.95 

1.79 

 

2.27 

3.08 

4.33 

1.22 

2.71 

2.80 

 

5.13 

2.84 

100

 

 

Percent 

100.00 

 

53.84 

46.16 

 

67.79 

30.34 

1.87 

 

29.44 

18.35 

24.69 

1.55 

0.97 

1.74 

23.26 

 

0.15 

99.85 

Table 9. 

Demographic Information for Students with Low TTE (Study I) 

Variable 

Full Sample 

Grade 

     4th 

     8th 

Gender 

     Male 

     Female 

     Missing 

Race/Ethnicity 

     White 

     Hispanic 

     Black 

     Asian/Pacific Islander 

     American Indian/Alaskan Native 

     Other Race/Ethnicity 

     Missing 

Disability Status 

     Student with Learning Disability 

     Unknown 

 

 
 

 

n 

16,250 

 

8,749 

7,501 

 

11,016 

4,930 

304 

 

4,784 

2,982 

4,012 

252 

157 

283 

3,780 

 

25 

16,225 

101

Table 10. 

Results of Multiple Logistic Regression Model (Study I) 

 

Β 

SE 

Wald 

df 

Sig. 

Exp(β) 

95% CI 

Constant 

-4.396  0.022  39,248.388 

1 

0.000 

0.012 

– 

Gender (Male) 

0.834 

0.020 

1,724.773 

Race/Ethnicity 
(Hispanic) 

Race/Ethnicity 
(Black) 

Race/Ethnicity 
(Asian/Pacific 
Islander) 

0.321 

0.024 

181.347 

0.681 

0.022 

963.624 

-0.628  0.065 

92.693 

Grade (8th) 

0.309 

0.019 

272.183 

1 

1 

1 

1 

1 

0.000 

2.303 

2.214–2.395 

0.000 

1.378 

1.315–1.444 

0.000 

1.976 

1.893–2.063 

0.000 

0.533 

0.469–0.606 

0.000 

1.361 

1.312–1.412 

 
 
Table 11. 

Demographic Information for Sample (Study II) 

Variable 

 

Full Sample 

Grade 

     7th 

     8th 

Gender 

     Male 

     Female 

 

 

Percent 

100.00 

 

48.89 

51.11 

 

47.26 

52.74 

n 

675 

 

330 

345 

 

319 

356 

102

Table 12. 

Distribution of RTE Scores for Sample (Study II) 

RTE Score 

Number of SB 

RTE Score 

 

 

 

34 

33 

32 

31 

 

30 

29 

28 

27 

26 

25 

24 

23 

22 

21 

20 

19 

18 

17 

16 

15 

14 

< 14 
 

Total 

Missing 

RTE > 0.90 

 

 

 

 

RTE < 0.90 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1.0 

0.97 

0.94 

0.91 

 

0.88 

0.85 

0.82 

0.79 

0.76 

0.74 

0.71 

0.68 

0.65 

0.62 

0.59 

0.56 

0.53 

0.50 

0.47 

0.44 

0.41 

0.38 

103

n 

675 

 

633 

557 

41 

17 

18 

42 

15 

7 

3 

4 

2 

1 

2 

2 

1 

0 

2 

0 

0 

1 

0 

0 

2 

0 

Percent 

100.00 

 

93.78 

82.52 

6.07 

2.52 

2.67 

6.22 

2.22 

1.04 

0.44 

0.59 

0.30 

0.15 

0.30 

0.30 

0.15 

0.00 

0.30 

0.00 

0.00 

0.15 

0.00 

0.00 

0.30 

0.00 

RTE < 0.90 

Percent 

6.22 

9.40 

11.69 

7.27 

3.37 

4.55 

2.22 

42 

30 

18 

12 

12 

8 

4 

Percent 

100 

 

61.90 

38.10 

 

71.43 

28.57 

Table 13. 

Proportion Identified with Low TTE by Subgroup (Study II) 

Variable 

Full Sample 

Male 

     7th 

     8th 

Female 

     7th 

     8th 

 
 
Table 14. 

n 

675 

319 

154 

165 

356 

176 

180 

Demographic Information for Students with Low TTE (Study II) 

Variable 

Full Sample 

Grade 

     7th 

     8th 

Gender 

     Male 

     Female 

 

 

 

n 

42 

 

26 

16 

 

30 

12 

104

Table 15. 

Descriptive Statistics for SPOTS Items (Study II) 

Student Perceptions of Testing Survey Item 

M 

SD 

Expectancy Item 

1. I’m certain I can answer the questions correctly next time when I take the STAR 

Reading test. 

6. I’m certain I can figure out how to answer the most difficult questions next time I 

take a STAR Reading test. 

11. I can answer almost all the questions next time I take the STAR Reading test. 

17. Even if the questions are hard when I take the STAR Reading test, I can answer 

them correctly. 

 

3.24 

2.93 

3.78 

2.97 

19. I can answer even the hardest questions when I take the STAR Reading test if I try.  3.35 

Interest Value Item 

2. I like taking the STAR Reading test. 

7. Taking the STAR Reading test is really exciting to me. 

12. I enjoy taking the STAR Reading test. 

15. I enjoy taking tests like the STAR Reading test. 

Attainment Value Item 

3. Being someone who does well on the STAR Reading test is important to me.  

8. Being good at tests like the STAR reading test is an important part of who I am. 

13. It is important for me to be someone who is good at taking tests like the STAR 

Reading test. 

16. Doing well on tests like the STAR Reading test is an important part of who I am. 

Utility Value Item 

4. Being good at taking I graduate tests like the STAR Reading test will be useful for 

what I want to do after and go to work. 

9. Taking tests like the STAR Reading test will be useful for me later in life. 

14. Taking tests like the STAR Reading test is valuable because it will help me in the 

future. 

18. Being good at taking tests like the STAR Reading test will be important when I get 

a job or go to college. 

Relative Cost Item 

5. I have to give up a lot of things I like to do when I take the STAR Reading test. 

10. Success on the STAR Reading test requires that I give up other activities I enjoy. 

 

 

2.02 

1.57 

1.78 

1.75 

 

3.53 

2.55 

3.05 

2.54 

 

3.14 

2.88 

2.92 

3.07 

 

1.72 

1.63 

 

0.92 

1.05 

1.10 

0.97 

1.14 

 

1.03 

0.90 

1.02 

1.00 

 

1.11 

1.26 

1.20 

1.25 

 

1.23 

1.16 

1.19 

1.21 

 

1.00 

0.95 

105

 

 

Table 16. 

Descriptive Statistics for SPOTS Subscales (Study II) 

SPOTS Subscale 

Coefficient α 

Test-Taking Expectancy Beliefs 

Test-taking Interest Value 

Test-Taking Attainment Value 

Test-Taking Utility Value 

Test-Taking Relative Cost 

 
 
Table 17. 

0.808 

0.909 

0.865 

0.891 

0.668 

Mean SPOTS Subscale Scores by Subgroup (Study II) 

M 

3.25 

1.76 

2.91 

3.02 

1.67 

SD 

0.75 

0.86 

1.00 

1.03 

0.84 

Variable 

Expectancy 

Interest 

Utility 

Attainment 

Cost 

 

M 

SD 

M 

SD 

M 

SD 

M 

SD 

M 

SD 

Grade 

     7th 

     8th 

Gender 

     Male 

 

 

 

 

 

 

 

 

 

 

3.27 

0.74 

1.80 

0.85 

3.11 

1.04 

2.99 

0.98 

1.72 

0.82 

3.23 

0.77 

1.73 

0.87 

2.93 

1.02 

2.84 

1.01 

1.62 

0.85 

 

 

 

 

 

 

 

 

 

 

3.27 

0.78 

1.80 

0.91 

3.06 

1.74 

2.96 

0.98 

1.78 

0.92 

     Female 

3.24 

0.72 

1.73 

0.82 

2.99 

0.99 

2.86 

1.02 

1.57 

0.75 

 

106

 

 

Table 18. 

Bivariate Correlation Matrix for Variables (Study (II) 

 

Low TTE 

Efficacy 

Interest 

Utility 

Attainment 

Cost 

 

– 

0.355 

(p < 0.001) 

 

 

– 

0.295 

0.421 

(p < 0.001) 

(p < 0.001) 

 

 

 

– 

 

 

 

 

0.346 

0.374 

0.569 

(p < 0.001) 

(p < 0.001) 

(p < 0.001) 

– 

 

 

 

 

 

-0.047 

(ns) 

0.009 
(ns) 

0.054 
(ns) 

0.084 

(p = 0.029) 

– 

Low TTE 

– 

0-.014 

(ns) 

0.027 
(ns) 

-0.040 

(ns) 

-0.070 

(ns) 

0.105 

(p = 0.006) 

Efficacy 

Interest 

Utility 

Attainment 

Cost 

 
 
Table 19. 

Results of Multiple Logistic Regression Model (Study II) 

 

Β 

SE 

Wald 

df 

Sig. 

Exp(β) 

95% CI 

Constant 

-3.492  0.617 

32.081 

Gender (Male) 

0.985 

0.357 

7.661 

Cost Value 

0.441 

0.170 

6.726 

Attainment Value 

-0.361  0.169 

4.539 

Grade (7th) 

0.725 

0.338 

4.599 

1 

1 

1 

1 

1 

0.000 

0.030 

 

0.006 

2.677 

1.330–5.387 

0.009 

1.554 

1.114–2.169 

0.033 

0.697 

0.500–0.972 

0.032 

2.064 

1.064–4.004 

 

107

 

 

Figure 1. 

Conceptualization of TTE in Demands–Capacity Model of Test-Taking Effort 

(Resource Demands) 

Effort 

(Effort Capacity) 

Test Item 

Characteristics 

Test-Taking 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 1. Conceptualization of TTE in “Demands-capacity model of test-taking effort.” Adapted 
from “A Model of Examinee Test-Taking Effort” by S. L. Wise and L. F. Smith, 2011, in “High-
Stakes Testing in Education: Science and Practice in K–12 Settings” by J. A. Bovaird, K. F. 
Geisinger, and C. W. Buckendahl (Eds.), p. 149. 
 

Testing Context 
Characteristics 

Individual Student 

Characteristics 

 

 

108

Figure 2. 

Relationships from EEVT Examined in Current Study. 

Test-Taking 
Expectancy Beliefs 

Test-Taking 
Effort 

Cultural and Family 
Characteristics 
(Beliefs, Expectations, 
Attitudes, Behaviors) 

Beliefs and Behaviors 
of Socializers 

Student’s Perception 
of Cultural and Family 
Characteristics 

Individual’s Goals, 
Explicit Motivations, 
and Self-Schemata 

Stable Individual 
Characteristics 
1. Grade level 
2. Gender 
3. Race/ethnicity 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 2. Relationships from EEVT Examined in Current Study. Informed by the Eccles et al. 
expectancy–value model from “Part I Commentary: So What Is Student Engagement Anyway?” 
by J. Eccles and M.-T. Wang, 2012, in “Handbook of Research on Student Engagement” by S. L. 
Christenson, A. L. Reschly, and C. Wylie (Eds.), p. 143. 
 

Test-Taking 
Value Beliefs 
1. Intrinsic value 
2. Attainment value 
3. Utility value 
4. Relative cost 

Individual’s 
Interpretations 

Individual’s Affective 
Reactions and 
Memories 

Previous 
Achievement-Related 
Experiences 

 

109

 

 
 
[Date]   

APPENDIX B. 

 

LETTER TO TEST DEVELOPERS. 

 

 

 

 

 

 

 

James Los, M.A. 
Michigan State University 
620 Farm Ln., Rm. 447 
East Lansing, MI 48824 

 
Dear [Addressees] 
 
 

My name is James Los, and I am a doctoral student in the School Psychology program at 

Michigan State University. I am completing a doctoral dissertation, and I am writing to request 
your permission to conduct a secondary data analysis using STAR Reading assessment data 
collected during the 2016–2017 and 2017–2018 school years. The purpose of my dissertation 
research study is to investigate the prevalence and correlates of low test-taking effort (TTE) on a 
computerized-adaptive reading test. In doing so, I hope to contribute to what we currently know 
about 1) the extent to which low TTE might be apparent in low-stakes academic testing systems, 
and 2) the demographic variables and internal motivational variables associated with low TTE. 
 

With a better understanding of which students are most likely to show low TTE and the 

factors related to why students exhibit low TTE, it may be possible to develop targeted strategies 
for promoting more effortful responding during low-stakes testing in schools. Therefore, the 
primary goals of my doctoral dissertation study are 1) to identify the prevalence of low TTE in 
students in grades 3–8, 2) to examine whether any demographic subgroups of students are 
particularly likely to show low TTE, and c) to investigate the extent to which motivational 
variables (i.e., test-taking expectancy beliefs and value beliefs) relate to low TTE. The specific 
research questions for my study of TTE on STAR Reading tests are as follows: 

1.  What proportion of students in grades 3–8 are identified as exhibiting low TTE on a 

computer adaptive reading test, as determined by Response Time Effort (RTE) score? 

2.  To what extent do student demographic variables (grade, gender, race/ethnicity) relate to 
the likelihood that students are identified as exhibiting low TTE on a computer adaptive 
reading test (as determined by RTE score)? 

3.  Do students differ in their test-taking expectancy and/or value beliefs based on student 

demographic variables (grade, gender)? 

4.  To what extent do student test-taking expectancy and/or value beliefs relate to the 

likelihood that students are identified as exhibiting low TTE on a computer adaptive 
reading test (as determined by RTE score) 

 
 

 

110

The purpose of the first stage of my research study will be to describe the proportion of 

 
Study I Purpose and Research Design 
 
 
students in grades 3–8 who exhibit low TTE on a STAR Reading assessment and to examine the 
relationships between four demographic characteristics and the likelihood of exhibiting low TTE. 
The four predictor variables are grade level, gender, and race/ethnicity, derived from 
demographic information entered by educators using the STAR Assessments Renaissance Data 
Integrator (RDI) service. The dependent variable low TTE will be a binary categorical variable 
defined as Response Time Effort (RTE) derived from a STAR Reading assessment falling at or 
below 0.90. RTE is an index that represents the proportion of test items on which the examinee 
exhibited “rapid-guessing behavior” (RB) by submitting a response extremely quickly (i.e., in 
less three seconds or less). An RTE score of 0.0 would mean all of the student’s responses were 
classified as RGB, whereas an RTE score of 1.0 would mean none of the student’s responses 
were classified as RGB. Therefore, “low TTE” (RTE at or below 0.90) is operationally defined 
as 10% or more of the student’s responses being classified as RGB. 
 
Requested Data for Study I 
 

Using data from administrations of STAR Reading assessments during the 2016–2017 
school year, a targeted sample of item-level testing data is requested for inclusion in this study. 
More specifically, data are requested for a randomly selected sample of 2,500 students in each 
grade 3–8 (N = 15,000) who took a STAR Reading interim test in the winter of the 2016–2017 
school year and for whom the following demographic information and testing data are available: 

•  Grade level; Gender; Race/ethnicity 
•  STAR Reading data from a winter screening assessment that includes both item response 

times and the associated item-level scores (i.e., correct vs. incorrect), and overall score 

•  Student identification number (non-personally identifiable information) 

 
Study II Purpose and Research Design 
 

The purpose of the second stage of my research study will be to extend upon Study I by 
allowing researchers to apply the Eccles et al. expectancy–value theory (EEVT) to an empirical 
study of TTE. The general assumption of the EEVT is student’s achievement-related behaviors 
are guided by their expectancies for success on a task and the extent to which they value the task. 
This theory will guide a quantitative study testing the relationships between student demographic 
characteristics, expectancy beliefs, value beliefs, and TTE. Specifically, the second stage of the 
study is designed to 1) examine the extent to which students differ in their test-taking expectancy 
and value beliefs by grade and gender and 2) examine the extent to which test-taking expectancy 
and value beliefs relate to the likelihood students exhibit low TTE on a STAR Reading test. 

 
In my research study, an online survey protocol will be used to gather information on the 
independent and dependent variables (grade, gender, expectancy beliefs, and value beliefs). The 
survey will be administered to students from a school district that administers the STAR Reading 
assessment to students three times per year as part of a district-wide MTSS initiative. 

 

 

111

In Study II, I will replicate the method for measuring the dependent variable, low TTE, 

that I will use in Study I (as described above). As in the first stage of the my investigation, TTE 
will be measured using RTE scores derived from item response times on a STAR Reading 
assessment. The primary rationale for the second stage of the my research study is that it will 
allow for the collection of data on student perceptions of test-taking. In doing so, this study may 
help educators (as well as the test developers) to better understand motivational factors that 
might relate to the odds a student disengages from low-stakes testing. 
 
Requested Data for Study II 
 

Using data from administrations of STAR Reading assessments during the 2017–2018 
school year, a targeted sample of item-level testing data is requested for inclusion in this study. 
More specifically, data are requested for all students in grades 3–8 in the “City” Public School 
District in “City,” Michigan who complete a STAR Reading interim assessment in the winter of 
the 2017–2018 school year and for whom the following testing data are available: 

•  STAR Reading data from a winter screening assessment that includes both item response 

times and the associated item-level scores (i.e., correct vs. incorrect), and overall score 

•  Student identification number (non-personally identifiable information) 

To allow for my proposed data-matching without disclosing any personally identifiable 

 
 
student information to the researchers, I will request that students’ unique school ID numbers be 
included as variable in the de-identified dataset. This will allow me to match the student survey 
data with their STAR Reading data by having students enter their unique school ID number prior 
to completing the online survey about their test-taking perceptions. 
 
Data Retrieval and Storage 
 

Data will be downloaded and stored using high performance computer storage at MSU’s 

Institute for Cyber-Enabled Research (iCER) High Performance Computing Center (HPCC).  
 

If these arrangements meet with your approval, please sign the letter where indicated 

below and return it to me in the enclosed return envelope. Thank you very much. 
 
Sincerely, 
 
 
James Los, M.A. 
School Psychology Program 
Michigan State University 
 
PERMISSION GRANTED FOR THE USE REQUESTED ABOVE:  
 
 
[Name of addressee]  
 
Date:    

 

 

 

 

 

 

 

 

 

 

 

 

 

 

112

APPENDIX C. 

RENAISSANCE LEARNING (2014) PRIVACY POLICY NOTICE. 

 

113

 

APPENDIX D. 

STUDENT PERCEPTIONS OF TESTING SURVEY. 

Part I. Student Assent 

Purpose of Research 

You are being asked to participate in an online survey of what students think about the STAR 
Reading test you take at your school. Your school was selected as possible participants in this 
study because it is one of the schools in Michigan that uses STAR Reading. From this study, the 
researchers hope to learn about student beliefs about this type of reading test. Your participation 
in this study will take you about ten minutes. 
 
What You Will Do 
 
What you will do to participate in this study is complete a survey on the computer. You do not 
need to complete any other testing or school work to participate in this study. 
 
Your Rights to Participate, Say No, or Withdraw 
 
Participation in this research project is completely voluntary. You have the right to say no. You 
may change your mind at any time and withdraw. You may choose not to answer specific 
questions or to stop participating at any time. Whether you choose to participate or not will have 
no effect on your grade or evaluation. 
 
Costs and Compensation for Being in the Study 
 
To thank you for participating in the survey, your class will receive a $50 gift card for a free 
lunch from the researchers. 
 
Contact Information for Questions and Concerns 
 
If you have concerns or questions about this study, please contact the lead researcher James Los 
(Address: 620 Farm Ln, East Lansing, MI, 48824; Email: losjames@msu.edu). If you have 
questions or concerns about your role and rights as a research participant, would like to obtain 
information or offer input, or would like to register a complaint about this study, you may 
contact, anonymously if you wish, the Michigan State University’s Human Research Protection 
Program at 517-355-2180, Fax 517-432-4503, or email irb@msu.edu or regular mail at 4000 
Collins Rd, Suite 136, Lansing, MI, 48910. 
 
Part II. Documentation of Assent 
 
By entering your student ID number below, you voluntarily agree to participate in this survey. 
 
 

 

114

Part III. Questions About STAR Reading Test 
 
First, please answer a few questions about yourself. 
 
Please select your grade. 

o  7th Grade 
o  8th Grade 

 
Please select your gender. 

o  Female 
o  Male 
o  Prefer not to say 

 

 

 
Now we will ask you some questions. For each one, you will answer how true it is for you, using 
a scale from 1 to 5. 
 
1 means “Not at all true” for you. 
 
3 means “Somewhat true” for you. 
 
5 means “Very true” for you. 
 
Let’s practice a few questions. 
 
1 
 
Not at all true   
 
I like eating pizza. 
Playing basketball is fun. 
I can travel to the moon after school today. 
 
Now we will ask you some questions about the STAR Reading test. For each one, you will 
answer how true it is for you, using the same scale from 1 to 5. 
 
1 means “Not at all true” for you. 
3 means “Somewhat true” for you. 
5 means “Very true” for you. 
 
Remember to think about the STAR Reading test when you answer each of these questions. 
 
1 
 
Not at all true   

 
 
Somewhat true 

 
5 
Very true 

3 

 
 
Somewhat true 

 
5 
Very true 

 
 

 
 

 
 

 
 

 
 

 
 

2 
 

2 
 

4 
 

4 
 

3 

1.  I’m certain I can answer the questions correctly next time I take the STAR Reading test. 
2.  I like taking the STAR Reading test. 
3.  Being someone who does well on the STAR Reading test is important to me. 

 

115

4.  Being good at taking tests like the STAR Reading test will be useful for what I want to do 

after I graduate and go to work. 

5.  I have to give up a lot of things I like to do when I take the STAR Reading test. 
6.  I’m certain I can figure out how to answer the most difficult questions next time I take 

the STAR Reading test. 

7.  Taking the STAR Reading test is exciting to me. 
8.  Being good at tests like the STAR Reading test is an important part of who I am. 
9.  Taking tests like the STAR Reading test will be useful for me later in life. 
10. Success on the STAR Reading test requires that I give up other activities I enjoy. 
11. I can answer almost all the questions next time I take the STAR Reading test if I don’t 

give up. 

12. I enjoy taking the STAR Reading test. 
13. It is important for me to be someone who is good at taking tests like the STAR Reading 

test. 

14. Taking tests like the STAR Reading test is valuable because it will help me in the future. 
15. I enjoy taking tests like the STAR Reading test. 
16. Doing well on tests like the STAR Reading is an important part of who I am. 
17. Even if the questions are hard when I take the STAR Reading test, I can answer them 

correctly. 

18. Being good at taking tests like the STAR Reading test will be important when I get a job 

or go to college. 

19. I can answer even the hardest questions when I take the STAR Reading test if I try. 

 

 

116

APPENDIX E. 

EXPECTANCY BELIEFS ORIGINAL AND ADAPTED ITEMS. 

Patterns of Adaptive Learning Scales (PALS) Academic Efficacy (Midgley et al., 2000) 
 

(Original items) 
(1. I’m certain I can master the skills taught in 
class this year.) 

(11. I’m certain I can figure out how to do the 
most difficult class work.) 

(52. I can do almost all the work in class if I 
don’t give up.) 

(56. Even if the work is hard, I can learn it.) 

(58. I can do even the hardest work in this 
class if I try.) 

Adapted items (5) 
1. I’m certain I can answer the questions 
correctly next time I take the STAR 
Reading test. 
6. I’m certain I can figure out how to 
answer the most difficult questions next 
time I take STAR Reading test. 
11. I can answer almost all the questions 
next time I take the STAR Reading test if I 
don’t give up. 
17. Even if the questions are hard when I 
take the STAR Reading test, I can answer 
them correctly. 
19. I can answer even the hardest questions 
when I take the STAR Reading test if I try. 

 
 

 

117

APPENDIX F. 

VALUE BELIEFS ORIGINAL AND ADAPTED ITEMS. 

Subjective Task Value Scales (Conley, 2012) 
 
Interest Value 

(Original items) 
(How much do you like doing math?) 
(I like math.) 
(Math is exciting to me.) 
(I am fascinated by math.) 
(I enjoy doing math.) 
(I enjoy the subject of math.) 

Attainment Value 

(Original items) 

(Being someone who is good at math is important to 
me.) 
(I feel that, to me, being good at solving problems 
which involve math or reasoning mathematically is 
(not at all important to very important). 
(Being good at math is an important part of who I am.) 

(It is important for me to be someone who is good at 
solving problems that involve math.) 
(It is important for me to be a person who reasons 
mathematically.) 
(Thinking mathematically is an important part of who I 
am.) 

Utility Value 

Adapted items (4) 

— 

2. I like taking the STAR Reading test. 
7. Taking the STAR Reading test is exciting to me. 
— 
12. I enjoy taking the STAR Reading test. 
15. I enjoy taking tests like the STAR Reading test. 

Adapted items (4) 

3. Being someone who does well on the STAR 
Reading test is important to me. 

— 

8. Being good at tests like the STAR Reading test is 
an important part of who I am. 
13. It is important for me to be someone who is good 
at taking tests like the STAR Reading test. 

— 

16. Doing well on tests like the STAR Reading test is 
an important part of who I am. 

(Original items) 

Adapted items (4) 

(How useful is learning math for what you want to do 
after you graduate and go to work?) 

(Math will be useful to me later in life.) 

(Math concepts are valuable because they will help me 
in the future.) 
(Being good at math will be important when I get a job 
or go to college.) 

Cost Value 

(Original items) 

(I have to give up a lot to do well in math.) 

(Success in math requires that I give up other activities 
I enjoy.) 

4. Being good at taking tests like the STAR Reading 
test will be useful for what I want to do after I 
graduate and go to work. 
9. Taking tests like the STAR Reading test will be 
useful for me later in life. 
14. Taking tests like the STAR Reading test is 
valuable because it will help me in the future. 
18. Being good at taking tests like the STAR 
Reading test will be important when I get a job or 
go to college. 

Adapted items (2) 

5. I have to give up a lot of things I like to do when I 
take the STAR Reading test. 
10. Success on the STAR Reading test requires that 
I give up other activities I enjoy. 

 

118

APPENDIX G. 

LETTER TO SCHOOL ADMINISTRATORS. 

 
 
[Date]   

 

 

 

 

 

 

 

James Los, M.A. 
Michigan State University 
620 Farm Ln., Rm. 447 
East Lansing, MI 48824 

 
Dear [Addressees] 
 

My name is James Los, and I am a doctoral student in the School Psychology program at 

Michigan State University. I am completing a doctoral dissertation, and I am writing to request 
your permission to conduct a secondary data analysis using STAR Reading assessment data 
collected during the 2017–2018 school years. The purpose of my dissertation research study is to 
investigate the prevalence and correlates of low test-taking effort (TTE) on a computerized-
adaptive reading test. In doing so, I hope to contribute to what we currently know about 1) the 
extent to which low TTE might be apparent in low-stakes academic testing systems, and 2) the 
demographic variables and internal motivational variables that are associated with low TTE. 
 
 
In my proposed study, I intend to collect data about student test-taking beliefs, focusing 
on student perceptions of their ability to be successful on the test, as well as the extent to which 
they value succeeding on the test. To measure student TTE, I plan to analyze item response time 
data from the STAR Reading assessment. Previous research on tests like this has shown us that a 
small subset of test takers tends to respond with extremely low TTE by submitting their answers 
rapidly. However, we currently don’t know how many students show this type of disengagement 
from tests they take on the computer at school. This is an important question for educators and 
researchers alike, as we know that test scores are only meaningful if they reflect students’ actual 
skills or knowledge. For this reason, my goal is to learn more about student test-taking behaviors 
in assessment contexts that may be perceived as “low stakes” (i.e., no personal consequences) for 
students. In addition to learning more about how frequently this issue might occur on tests like 
the STAR Reading, my study is designed to investigate two more questions: 1) which students 
are most likely to show inappropriate effort during testing, and 2) what may be the reasons why 
these students disengage? Answering these questions may help to inform the development of 
targeted strategies that are designed to prevent low TTE from occurring during academic testing. 
 
Requested Data for Study 

With your participation in this study, I would request that the developers of the STAR 

Assessments (Renaissance Learning) provide me a dataset of test data for all students in grades 
6–8 in “City” Public Schools who take a STAR Reading test in the winter of the 2017–2018 
school year and for whom the following demographic information and testing data are available: 

 

119

•  STAR Reading data from a winter screening assessment that includes both item response 

times and the associated item-level scores (i.e., correct vs. incorrect), and overall score 

•  Randomly generated identification number (non-personally identifiable information) 

In addition to requesting the STAR Reading test data, I would also request that I could 
 
come to classrooms in grades 6–8 to administer the online survey to students. For me to match 
student survey data with their STAR Reading test data (to test my research questions) without 
any personally identifiable information being disclosed, I would request that students enter a 
unique student ID number prior to their completion of the survey. This way, I could match the 
de-identified STAR Reading data to student survey responses by linking the datasets with the ID. 
 
Data Retrieval and Storage 

Data will be downloaded and stored using high performance computer storage at MSU’s 

Institute for Cyber-Enabled Research (iCER) High Performance Computing Center (HPCC).  
 
Student Perceptions of Testing 

In order to examine why some students may show quick, low-effort responses on this 

test, a survey of student perceptions of testing will be administered to all students in the district 
who complete this assessment. Understanding why students may show behavioral disengagement 
during testing is essential for researchers to optimally inform improvements to testing systems. 
An online survey will be administered to students on the computer in October, after completion 
of the STAR Reading™ Enterprise fall assessment and before the winter assessment. The items 
on the survey will measure student perceptions about their perceived interest in testing, perceived 
importance of testing, perceived usefulness of testing, perceived relative cost of testing, and their 
expectations for success. The survey will take students approximately ten minutes to complete. 
 
Risks of Participation in Research 
 
 
There are minimal foreseeable risks associated with participation in this research study. 
Because no personally identifiable student information (i.e., names, school ID numbers) will be 
associated with student data, it is anticipated that this study will be approved for an expedited 
review by the Michigan State University (MSU) Social Science / Behavioral / Educational 
Institutional Review Board (SIRB). Student participation would be completely voluntary, and 
student assent would be gathered prior to the students participating in the online survey. 
 
Compensation and Benefits of Participation in Research 
 
 
participating in the study, each class will be compensated with a $50 gift card for a class lunch. 

 
There are no direct benefits to students associated with participating in this study. For 

 
Additionally, the results of this study will be a valuable contribution to the existing 

research on academic testing. Currently, few studies have directly measured student engagement 
during testing at the elementary or middle school levels, so this study will be important for 
informing future research on assessment validity. 
 

 

120

Key findings from this study would be presented by the lead researcher to any interested 

administrators, support staff, and/or classroom teachers as part of an informal meeting or a 
formal professional development session. This presentation would include the following topics: 
•  Descriptive statistics about student behavioral engagement during testing overall, as well 

as data on student engagement during testing at the school, grade, and classroom levels 

Information about student perceptions associated with disengagement during testing 

•  Descriptive statistics about student perceptions of testing 
• 
•  Strategies for identifying disengaged test examinees 
•  Strategies for encouraging engagement during testing 
•  Strategies for promoting the valid use of academic testing systems 

 
 
 
 
If these arrangements meet with your approval, please sign the letter where indicated below and 
return it to me in the enclosed return envelope. Thank you very much. 
 
Sincerely, 
 
James Los, M.A. 
 
PERMISSION GRANTED FOR THE USE REQUESTED ABOVE:  
 
 
 
[Name of addressee]  
 
Date:    
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

121

APPENDIX H. 

IRB EXEMPT DETERMINATION LETTER. 

EXEMPT DETERMINATION

March 28, 2018

To:

Sara Elizabeth Witmer

Re:

MSU Study ID: STUDY00000086
Principal Investigator: Sara Elizabeth Witmer
Category:  Exempt 1
Exempt Determination Date: 3/28/2018

Title: An Investigation of Test-Taking Effort in a Computer-Adaptive Test of 
Reading

This project has been determined to be exempt under 45 CFR 46.101(b) 1. 

Principal Investigator Responsibilities: The Principal Investigator assumes the 
responsibilities for the protection of human subjects in this project as outlined in 
Human Research Protection Program (HRPP) Manual Section 8-1, Exemptions.    

Continuing Review:  Exempt projects do not need to be renewed.  

Modifications:  In general, investigators are not required to submit changes to the 
Michigan State University (MSU) Institutional Review Board (IRB) once a research 
study is designated as exempt as long as those changes do not affect the exempt 
category or criteria for exempt determination (changing from exempt status to 
expedited or full review, changing exempt category) or that may substantially 
change the focus of the research study such as a change in hypothesis or study 
design. See HRPP Manual Section 8-1, Exemptions, for examples. If the project is 
modified to add additional sites for the research, please note that you may not 
begin the research at those sites until you receive the appropriate 
approvals/permissions from the sites. 

Change in Funding: If new external funding is obtained for an active human 
research project that had been determined exempt, a new initial IRB submission will 
be required, with limited exceptions.

Reportable Events:  If issues should arise during the conduct of the research, such 
as unanticipated problems that may involve risks to subjects or others, or any 
problem that may increase the risk to the human subjects and change the category 
of review, notify the IRB office promptly. Any complaints from participants that may 
change the level of review from exempt to expedited or full review must be reported 
to the IRB. Please report new information through the project’s workspace and 
contact the IRB office with any urgent events. Please visit the Human Research 
Protection Program (HRPP) website to obtain more information, including reporting 
timelines. 

Office of 
Regulatory 
Affairs
Human Research 
Protection Program

4000 Collins Road
 Suite 136
Lansing, MI 48910

517-355-2180
Fax: 517-432-4503
Email: irb@msu.edu 
www.hrpp.msu.edu

 

122

 

Personnel Changes: After determination of the exempt status, the PI is 
responsible for maintaining records of personnel changes and appropriate training. 
The PI is not required to notify the IRB of personnel changes on exempt research. 
However, he or she may wish to submit personnel changes to the IRB for 
recordkeeping purposes (e.g. communication with the Graduate School) and may 
submit such requests by submitting a Modification request. If there is a change in 
PI, the new PI must confirm acceptance of the PI Assurance form and the previous 
PI must submit the Supplemental Form to Change the Principal Investigator with 
the Modification request (http://hrpp.msu.edu/forms).

Closure:  Investigators are not required to notify the IRB when the research study 
is complete. However, the PI can choose to notify the IRB when the project is 
complete and is especially recommended when the PI leaves the university.

For More Information: See HRPP Manual, including Section 8-1, Exemptions 
(available at https://hrpp.msu.edu/msu-hrpp-manual-table-contents-expanded).

Contact Information: If we can be of further assistance or if you have questions, 
please contact us at 517-355-2180 or via email at IRB@ora.msu.edu. Please visit 
hrpp.msu.edu to access the HRPP Manual, templates, etc. 

Exemption Category. This project has qualified for Exempt Category (ies) 1.  
Please see the appropriate research category below from 45 CFR 46.101(b) for the 
full regulatory text. 123

Exempt 1. Research conducted in established or commonly accepted educational 
settings, involving normal educational practices, such as (i) research on regular and 
special education instructional strategies, or (ii) research on the effectiveness of or 
the comparison among instructional techniques, curricula, or classroom 
management methods.

Exempt 2. Research involving the use of educational tests (cognitive, diagnostic, 
aptitude, achievement), survey procedures, interview procedures or observation of 
public behavior, unless: (i) information obtained is recorded in such a manner that 
human subjects can be identified, directly or through identifiers linked to the 
subjects; and (ii) any disclosure of the human subjects' responses outside the 
research could reasonably place the subjects at risk of criminal or civil liability or be 
damaging to the subjects' financial standing, employability, or reputation.

Exempt 3. Research involving the use of educational tests (cognitive, diagnostic, 
aptitude, achievement), survey procedures, interview procedures, or observation of 
public behavior that is not exempt under paragraph (b)(2) of this section, if:
(i) the human subjects are elected or appointed public officials or candidates for 
public office; or (ii) federal statute(s) require(s) without exception that the 
confidentiality of the personally identifiable information will be maintained 
throughout the research and thereafter.

2

 

 

123

 

Exempt 4. Research involving the collection or study of existing data, documents, 
records, pathological specimens, or diagnostic specimens, if these sources are 
publicly available or if the information is recorded by the investigator in such a 
manner that subjects cannot be identified, directly or through identifiers linked to the 
subjects.

Exempt 5. Research and demonstration projects which are conducted by or subject 
to the approval of department or agency heads, and which are designed to study, 
evaluate, or otherwise examine: (i) Public benefit or service programs; (ii) 
procedures for obtaining benefits or services under those programs; (iii) possible 
changes in or alternatives to those programs or procedures; or (iv) possible 
changes in methods or levels of payment for benefits or services under those 
programs.

Exempt 6. Taste and food quality evaluation and consumer acceptance studies, (i) 
if wholesome foods without additives are consumed or (ii) if a food is consumed that 
contains a food ingredient at or below the level and for a use found to be safe, or 
agricultural chemical or environmental contaminant at or below the level found to be 
safe, by the Food and Drug Administration or approved by the Environmental 
Protection Agency or the Food Safety and Inspection Service of the U.S. 
Department of Agriculture.

1Exempt categories (1), (2), (3), (4), and (5) cannot be applied to activities that are FDA-
regulated.
2 Exemptions do not apply to research involving prisoners. 
3 Exempt 2 for research involving survey or interview procedures or observation of public 
behavior does not apply to research with children, except for research involving 
observations of public behavior when the investigator(s) do not participate in the activities 
being observed.

3

 

 

 

 

124

APPENDIX I. 

LETTER TO PARENTS. 

 
 
[Date]   

 

 

 

 

 

 

 

James Los, M.A. 
Michigan State University 
620 Farm Ln., Rm. 447 
East Lansing, MI 48824 

Dear parents, 
 
Students in your child’s classroom have been invited to participate in a research study being 
conducted by a doctoral student from Michigan State University’s College of Education. 
 
Researchers are interested in learning more about how schools use the STAR Reading test and 
how students in elementary and middle schools view these tests. The purpose for studying this 
topic is to help test developers and educators understand how best to use reading tests in schools.  
 
Because your school district already requires students to complete this assessment, no additional 
testing will be necessary. Instead, the researchers have requested permission from administrators 
to collect and analyze student testing data (with no identifying information) from Renaissance 
Learning™, the test developers. Again, no personally identifiable information will be disclosed. 
 
The second part of this study involves inviting students to complete a brief, online survey about 
their perceptions of the STAR Reading test. Participation is voluntary, and the survey is 
completely anonymous. Students will be asked to enter their individual school ID number 
before they complete the survey, and the researchers will never be able to connect this number to 
your child’s identifying information. Students’ responses to these questions will not be shared 
with anyone except the researchers who are gathering this information. If there are any questions 
that students do not want to answer, they may choose not to respond. Again, students are not 
required to complete this survey, and they can choose to withdraw from the survey at any time.  
 
To thank students for participating in the survey, your child’s class will receive a free lunch from 
the researchers. 
 
On the following page, please find attached a copy of the assent form your child will be read 
prior to being asked to take the survey. If you have any questions about the study, please contact 
the lead researcher, James Los, at the email address provided below. Thank you very much! 
 
Sincerely, 
 
James Los, M.A 
 

 

125

losjames@msu.edu 
 
Purpose of Research 

You are being asked to participate in an online survey of what students think about the STAR 
Reading test you take at your school. Your school was selected as possible participants in this 
study because it is one of the schools in Michigan that uses STAR Reading. From this study, the 
researchers hope to learn about student beliefs about this type of reading test. Your participation 
in this study will take you about ten minutes. 
 
What You Will Do 
 
What you will do to participate in this study is complete a survey on the computer. You do not 
need to complete any other testing or school work to participate in this study. 
 
Your Rights to Participate, Say No, or Withdraw 
 
Participation in this research project is completely voluntary. You have the right to say no. You 
may change your mind at any time and withdraw. You may choose not to answer specific 
questions or to stop participating at any time. Whether you choose to participate or not will have 
no effect on your grade or evaluation. 
 
Costs and Compensation for Being in the Study 
 
To thank you for participating in the survey, your class will receive a $50 gift card for a free 
lunch from the researchers. 
 
Contact Information for Questions and Concerns 
 
If you have concerns or questions about this study, please contact the lead researcher James Los 
(Address: 620 Farm Ln, East Lansing, MI, 48824; Email: losjames@msu.edu). If you have 
questions or concerns about your role and rights as a research participant, would like to obtain 
information or offer input, or would like to register a complaint about this study, you may 
contact, anonymously if you wish, the Michigan State University’s Human Research Protection 
Program at 517-355-2180, Fax 517-432-4503, or email irb@msu.edu or regular mail at 4000 
Collins Rd, Suite 136, Lansing, MI, 48910. 
 
 

 

126

 

 

 

 

 

 

 

 

 

 

 

REFERENCES 

127

 

 

REFERENCES 

 
Agresti, A., & Finlay, B. (2009). Statistical methods for the social sciences (4th ed.). Upper  
 
 
American Educational Research Association (AERA), American Psychological Association 

Saddle River, NJ: Prentice Hall. 

(APA), & National Council on Measurement in Education (NCME). (2014). Standards 
for educational and psychological testing. Washington, DC: American Psychological 
Association. 

 
Appleton, J. J., Christenson, S. L., & Furlong, M. J. (2008). Student engagement with school:  
 
 
 
Archambault, I., Eccles, J. S., & Vida, M. N. (2010). Ability self-concepts and subjective value  

Critical conceptual and methodological issues of the construct. Psychology in the 
Schools, 45(5), 369–386. doi:10.1002/pits.20303 

in literacy: Joint trajectories from grades 1 through 12. Journal of Educational 
Psychology, 102(4), 804–816. doi:10.1037/a0021075 

 
Archambault, I., Janosz, M., Morizot, J., & Pagani, L. (2009). Adolescent behavioral, affective,  
and cognitive engagement in school: Relationship to dropout. Journal of School Health, 
79(9), 408 – 415. doi:10.1111/j.1746-1561.2009.00428.x 

Educational Psychologist, 28(4), 321–339. doi:10.1207/s15326985ep2804_3 

 
Bailey, S. M. (1993). The current status of gender equity research in American schools.  
 
 
Baird, G. L., Scott, W. D., Dearing, E., & Hamill, S. K. (2009). Cognitive self-regulation in  
 
 
 
 
Baker, L., & Wigfield, A. (1999). Dimensions of children’s motivation for reading and their  

youth with and without learning disabilities: Academic self-efficacy, theories of 
intelligence, learning vs. performance goal preferences, and effort attributions. Journal of 
Social and Clinical Psychology, 28(7), 881–908. doi:10.1521/jscp.2009.28.7.881 

relations to reading activity and reading achievement. Reading Research Quarterly, 
34(4), 452–477. doi:10.1598/RRQ.34.4.4 

 
Barnard, J. (2015). Implementing a CAT: The AMC experience. Journal of Computerized  

Adaptive Testing, 3(1), 1–12. doi:10.7333/15100301001 

 
Barry, C. L., Horst, S. J., Finney, S. J., Brown, A. R., & Kopp, J. P. (2010). Do examinees have  

similar test-taking effort? A high-stakes question for low-stakes testing. International 
Journal of Testing, 10(4), 342–363. doi:10.1080/15305058.2010.508569 

 
Battle, J. (1979). Self-esteem of students in regular and special classes. Psychological Reports,  

44(1), 212–214. doi:10.2466/pr0.1979.44.1.212 

 
Battle, A., & Wigfield, A. (2003). College women’s value orientations toward family, career,  

 

128

and graduate school. Journal of Vocational Behavior, 62(1), 56–75. 
doi:10.1016/s0001-8791(02)00037-4 

effects of incentives on motivation and performance. European Journal of Psychology of 
Education, 16(3), 441–462. doi:10.1007/bf03173192 

 
Baumert, J., & Demmrich, A. (2001). Test motivation in the assessment of student skills: The  
 
 
 
Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E. J., Berk, R., ...  
Johnson, V. E. (2018). Redefine statistical significance. Nature Human Behaviour, 2(1), 
6-10. doi:10.1038/s41562-017-0189-z 

 
Brophy, J. (1999). Toward a model of the value aspects of motivation in education: Developing  
appreciation for particular learning domains and activities. Educational Psychologist, 34, 
75–85. doi:10.1207/s15326985ep3402_1  

outcomes. Assessment in Education: Principles, Policy & Practice, 15(1), 3–17. 
doi:10.1080/09695940701876003 

 
Brown, G. T. L., & Hirschfeld, G. H. F. (2008). Students’ conceptions of assessment: Links to  
 
 
 
Brown, G. T. L., Irving, S. E., Peterson, E. R., & Hirschfeld, G. H. F. (2009). Students’  
 
 
Brown, S. M., & Walberg, H. J. (1993). Motivational effects on test scores of elementary  

Conceptions of Assessment–Version V. PsycTESTS Dataset. doi:10.1037/t03968-000 

students. The Journal of Educational Research, 86(3), 133–136. 
doi:10.1080/00220671.1993.9941151 

 
Bong, M., Cho, C., Ahn, H. S., & Kim, H. J. (2012). Comparison of self-beliefs for predicting  

student motivation and achievement. The Journal of Educational Research, 105(5), 336–
352. doi:10.1080/00220671.2011.627401 

 
Box, G. E. P., & Tidwell, P. W. (1962). Transformation of the independent variables.  

Technometrics, 4(4), 531. doi:10.2307/1266288 

 
Butkowsky, I. S., & Willows, D. M. (1980). Cognitive-motivational characteristics of children  
varying in reading ability: Evidence for learned helplessness in poor readers. Journal of 
Educational Psychology, 72(3), 408–422. doi:10.1037/0022-0663.72.3.408 

 
Canning, E. A., Harackiewicz, J. M., Priniski, S. J., Hecht, C. A., Tibbetts, Y., & Hyde, J. S.  

(2018). Improving performance and retention in introductory biology with a utility-value 
intervention. Journal of Educational Psychology, 110(6), 834–849. 
doi:10.1037/edu0000244 

 
Chan, D., Schmitt, N., DeShon, R. P., Clause, C. S., & Delbridge, K. (1997). Reactions to  
 
 
 

cognitive ability tests: The relationships between race, test performance, face validity 
perceptions, and test-taking motivation. Journal of Applied Psychology, 82(2), 300–310. 
doi:10.1037/0021-9010.82.2.300 

 

129

 
Chapman, J. W., & Boersma, F. J. (1979). Academic self-concept in elementary learning  

disabled children: A study with the student's perception of ability scale. Psychology in the 
Schools, 16(2), 201–206. doi:10.1037/h0081208  

 
Christenson, S., Reschly, A., & Wylie, C. (Eds.). (2012). Handbook of research on student  

engagement. New York, NY: Springer Science. doi:10.1007/978-1-4614-2018-7 

 
Cole, J. S., Bergin, D. A., & Whittaker, T. A. (2008). Predicting student achievement for low  
stakes tests with effort and task value. Contemporary Educational Psychology, 33(4), 
609–624. doi:10.1016/j.cedpsych.2007.10.002 

 
Conley, A. M. (2012). Patterns of motivation beliefs: Combining achievement goal and  

expectancy–value perspectives. Journal of Educational Psychology, 104(1), 32–47. 
doi:10.1037/a0026042 

 
Cortiella, C., & Horowitz, S. (2014). The state of learning disabilities: Facts, trends, and  

emerging issues. New York: National Center for Learning Disabilities. Retrieved from: 
https://www.ncld.org/wp-content/uploads/2014/11/2014-State-of-LD.pdf 

 
Cronbach, L. J. (1946). Response sets and test validity. Educational and Psychological  

Measurement, 6(4), 475–494. doi:10.1177/001316444600600405 

Using item response time and accuracy on a computer adaptive test to predict deflated 
estimates of performance. Paper presented at the annual meeting of the National Council 
on Measurement in Education, Montreal, Canada.  

 
Cronbach, L. J. (1960). Essentials of psychological testing (2nd ed.). New York: Harper & Row. 
 
Cronin, J., Bontempo, B., Kingsbury, G. G., Hauser, C., McCall, M., & Houser, R. (2005).  
 
 
 
 
DeMars, C. E. (2007). Changes in rapid-guessing behavior over a series of assessments.  
 
 
DeMars, C. E., Bashkov, B. M., & Socha, A. B. (2013). The role of gender in test-taking  
 
 
 
Durik, A. M., & Harackiewicz, J. M. (2007). Different strokes for different folks: How  

Educational Assessment, 12(1), 23–45. doi:10.1080/10627190709336946 

motivation under low-stakes conditions. Research and Practice in Assessment, 8(2), 69–
82. 

individual interest moderates the effects of situational factors on task interest. Journal of 
Educational Psychology, 99(3), 597–610. doi:10.1037/0022-0663.99.3.597 

 
Durik, A. M., Vida, M., & Eccles, J. S. (2006). Task values and ability beliefs as predictors of  

high school literacy choices: A developmental analysis. Journal of Educational 
Psychology, 98(2), 382–393. doi:10.1037/0022-0663.98.2.382 

 
Eccles (Parsons), J. S. (1984). Sex differences in achievement patterns. In T. Sonderegger (Ed.),  

 

130

Nebraska Symposium on Motivation (Vol. 32, pp. 97–132). Lincoln, NE: University of 
Nebraska Press. 

 
Eccles (Parsons), J., S., Adler, T. F., Futterman, R., Goff, S. B., Kaczala, C. M., Meece, J. L., &  

Midgley, C. (1983). Expectancies, values and academic behaviors. In J. Spence (Ed.), 
Achievement and achievement motivation (pp. 75–146). San Francisco, CA: W.H. 
Freeman and Co. 

 
Eccles, J. S., & Wang, M-Te. (2012). Part 1 commentary: So what is student engagement  

anyway? In S. Christenson, A. Reschly, & C. Wylie (Eds.), Handbook of research on 
student engagement (pp. 133–145). New York, NY: Springer.  

 
Eccles, J. S., Wigfield, A., Flanagan, C. A., Miller, C., Reuman, D. A., & Yee, D. (1989). Self- 

concepts, domain values, and self-esteem: Relations and changes at early adolescence. 
Journal of Personality, 57(2), 283–310. doi:10.1111/j.1467-6494.1989.tb00484.x 

 
Eccles (Parsons), J. S., Wigfield, A., Harold, R. D., & Blumenfeld, P. (1993). Age and gender  

differences in children’s self- and task perceptions during elementary school. Child 
Development, 64(3), 830–847. doi:10.1111/j.1467-8624.1993.tb02946.x 

W. Damon (Series Ed.), Handbook of child psychology: Vol. 3. Social, emotional, and 
personality development (5th ed., pp. 1051–1071). New York: Wiley. 

 
Eccles, J. S., Wigfield, A., & Schiefele, U. (1998). Motivation to succeed. In N. Eisenberg (Ed.),  
 
 
 
Eklöf, H. (2007). Test-taking motivation and mathematics performance in TIMSS 2003.  
 
 
Eklöf, H. (2010). Skill and will: Test-taking motivation and assessment quality. Assessment in  

International Journal of Testing, 7(3), 311–326. doi:10.1080/15305050701438074 

Education: Principles, Policy & Practice 17(4), 345–356. 
doi:10.1080/0969594X.2010.516569 

 
Every Student Succeeds Act of 2015, Pub. L. No. 114-95 § 114 Stat. 1177 (2015). 
 
Flake, J. K., Barron, K. E., Hulleman, C., McCoach, B. D., & Welsh, M. E. (2015). Measuring  
cost: The forgotten component of expectancy-value theory. Contemporary Educational 
Psychology, 41, 232–244. doi:10.1016/j.cedpsych.2015.03.002 

 
Fordham, S., & Ogbu, J. U. (1986). Black students’ school success: Coping with the “burden of  

‘acting white.’” The Urban Review, 18(3), 176–206. doi: 10.1007/bf01112192 

 
Fredricks, J. A., Blumenfeld, P. C., & Paris, A. H. (2004). School engagement: Potential of the  

concept, state of the evidence. Review of Educational Research, 74(1), 59–109. 
doi:10.3102/00346543074001059 

 
Graham, S. (1994). Motivation in African Americans. Review of Educational Research, 64(1),  

55–117. doi:10.3102/00346543064001055 

 

131

 
Graham, S., & Taylor, A. Z. (2002). Ethnicity, gender, and the development of achievement  

values. In A. Wigfield & J. S. Eccles (Eds.), A Vol. in the educational psychology series. 
Development of achievement motivation (pp. 121-146). San Diego, CA, US: Academic 
Press. doi:10.1016/B978-012750053-9/50007-3. 

 
Grolnick, W. S., & Ryan, R. M. (1990). Self-perceptions, motivation, and adjustment in children  

with learning disabilities: A multiple group comparison study. Journal of Learning 
Disabilities, 23(3), 177–184. doi:10.1177/002221949002300308 

 
Guthrie, J. T., Wigfield, A., Barbosa, P., Perencevich, K. C., Taboada, A., Davis, M. H., et al.  
 
 

(2004). Increasing reading comprehension and engagement through concept-oriented 
reading instruction. Journal of Educational Psychology, 96, 403–423. 
doi:10.1037/0022-0663.96.3.403 

 
Haladyna, T. M., & Downing, S. M. (2004). Construct-irrelevant variance in high-stakes testing.  

Educational Measurement: Issues and Practice, 23(1), 17–27. 
doi:10.1111/j.1745-3992.2004.tb00149.x 

 
Hamilton, L., Halverson, R., Jackson, S., Mandinach, E., Supovitz, J., & Wayman, J. (2009).  
Using student achievement data to support instructional decision making (NCEE 2009-
4067). Washington, DC: National Center for Education Evaluation and Regional 
Assistance, Institute of Education Sciences, U.S. Department of Education. Retrieved 
from http://ies.ed.gov/ncee/wwc/publications/practiceguides/  

 
Harackiewicz, J. M., Canning, E. A., Tibbetts, Y., Priniski, S. J., & Hyde, J. S. (2016). Closing  
achievement gaps with a utility-value intervention: Disentangling race and social class. 
Journal of Personality and Social Psychology, 111(5), 745–765. 
doi:10.1037/pspp0000075 

 
Harnisch, D. L., & Linn, R. L. (1981). Analysis of item response patterns. Questionable test data  
 
 
 
Hart, R., Casserly, M., Uzzell, R., Palacios, M., Corcoran, A., & Spurgeon, L. (2015). Student  

and dissimilar curriculum practices. Journal of Educational Measurement, 18(3), 133–
146. doi:10.1111/j.1745-3984.1981.tb00848.x 

testing in America's great city schools: An inventory and preliminary analysis. 
Washington, DC: Council of the Great City Schools. Retrieved from 
http://www.cgcs.org/cms/lib/DC00001581/Centricity/Domain/87/Testing%20Report.pdf 

 
Hauser, C., & Kingsbury, G. G. (2009). Individual score validity in a modest-stakes  

adaptive educational testing setting. Paper presented at the meeting of the National 
Council on Measurement in Education, San Diego, CA. 

 
 
Higgins, E. T. (2007). Value. In A. Kruglanski & E. Higgins (Eds.), Social psychology:  

Handbook of basic principles (pp. 454–472). New York: Guilford Press. 

 

 

132

Jacobs, J. E., Lanza, S., Osgood, D., Eccles, J. S., & Wigfield, A. (2002). Changes in children’s  

self-competence and values: Gender and domain differences across grades one through 
twelve. Child Development, 73(2), 509–527. doi:10.1111/1467-8624.00421  

 
Jiang, Y., Rosenzweig, E. Q., & Gaspard, H. (2018). An expectancy-value-cost approach in  
predicting adolescent students’ academic motivation and achievement. Contemporary 
Educational Psychology, 54, 139–152. doi:10.1016/j.cedpsych.2018.06.005. 

 
Kong, X. J., Wise, S. L., & Bhola, D. S. (2007). Setting the response time threshold parameter  

to differentiate solution behavior from rapid-guessing behavior. Educational and 
Psychological Measurement, 67(4), 606–619. doi:10.1177/0013164406294779 

 
Kong, X. J., Wise, S. L., Harmes, J. C., & Yang, S. (2006). Motivational effects of praise in  
response-time based feedback: A follow-up study of the effort-monitoring CBT. Paper 
presented at the annual meeting of the National Council on Measurement in Education, 
San Francisco, CA.  

 
Lazowski, R. A., & Hulleman, C. S. (2016). Motivation interventions in education. Review of  

Educational Research, 86(2), 602–640. doi:10.3102/0034654315617832 

 
Licht, B. G., & Kistner, J. A. (1986). Motivational problems of learning-disabled children:  

Individual differences and their implications for treatment. In J. K. Torgesen & B. W. L. 
Wong (Eds.) Psychological and educational perspective on learning disabilities (pp. 
225–255). Orlando, FL: Academic. 

 
Liu, O., L., Rios, J. A., & Borden, V. (2015). The effects of motivational instruction on college  
students’ performance on low-stakes assessment. Educational Assessment, 20(2), 79–94. 
doi: 10.1080/10627197.2015.1028618 

 
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA:  

Addison-Wesley.  

 
Meece, J. L., Glienke, B. B., & Askew, K. (2009). Gender and motivation. In K. Wentzel & A.  
 
 
 
Meece, J. L., Wigfield, A., & Eccles, J. S. (1990). Predictors of math anxiety and its  

Wigfield (Eds.), Handbook on motivation at school (pp. 411-432). New York: Routledge, 
Taylor, and Francis.  

consequences for young adolescents’ course enrollment intentions and performances in 
mathematics. Journal of Educational Psychology, 82(1), 60–70. doi.org/10.1037//0022-
0663.82.1.60 

 
Meijer, R. R. (2003). Diagnosing item score patterns on a test using item response theory-based  
 
person-fit statistics. Psychological Methods, 8(1), 72–87. doi:10.1037/1082-989x.8.1.72 
 
Mello, Z. R. (2009). Racial/ethnic group and socioeconomic status variation in educational and  

 

133

occupational expectations from adolescence to adulthood. Journal of Applied 
Developmental Psychology, 30(4), 494-504. doi:10.1016/j.appdev.2008.12.029. 

 
Michigan Department of Education (2018). MI School Data. Lansing, MI: Michigan Department  

of Education. Retrieved from: https://www.mischooldata.org. 

 
Midgley, C., Maehr, M. L., Hruda, L. Z., Anderman, E., Anderman, L., Freeman, K. E., et al.  

(2000). Manual for the Patterns of Adaptive Learning Scales (PALS). Ann Arbor: 
University of Michigan.  

 
Murray, C. (2009). Parent and teacher relationships as predictors of school engagement and  

functioning among low-income urban youth. The Journal of Early Adolescence, 29(3), 
376–404. doi:10.1177/0272431608322940 

 
National Joint Committee on Learning Disabilities. (2008). Adolescent Literacy and Older  

Students With Learning Disabilities [Technical Report]. Retrieved from 
www.asha.org/policy 

 
Nering, M. L., Bay, L. G., & Meijer, R. R. (2002). Identifying pattern markers in a large-scale  

assessment program. Measurement and Evaluation in Counseling and Development 
35(3), 182–195. 

 
Newmann, F. M., Wehlage, G. G., & Lamborn, S. D. (1992). The significance and sources of  
student engagement. In F. M. Newmann (Ed.), Student engagement and achievement in 
 
American secondary schools (pp. 11–39). New York: Teachers College Press. 
 
 
Northwest Evaluation Association (2018). About rapid-guessing and test disengagement.  

Portland, OR: NWEA. Retrieved from: https://community.nwea.org/docs/DOC-2964. 

 
Ogbu, J. U. (1992). Understanding cultural diversity and learning. Educational Researcher,  

21(8), 5–14. doi:10.2307/1176697 

 
O’Neil, Jr., H. F., Sugrue, B., & Baker, E. L. (1995). Effects of motivational interventions on the  
 
 
 
Osborne, J. W. (1997). Race and academic disidentification. Journal of Educational Psychology,  

National Assessment of Educational Progress Mathematics performance. Educational 
Assessment, 3(2), 135–157. doi:10.1207/s15326977ea0302_2 

89(4), 728–735. doi:10.1037/0022-0663.89.4.728 

 
Paris, S. G., Lawton, T. A., Turner, J. C., & Roth, J. L. (1991). A developmental perspective on  

standardized achievement testing. Educational Researcher, 20(5), 12–20. 
doi:10.2307/1176397 

 
Paris, S. G., Turner, J. C., & Lawton, T. A. (1990). Students’ views of standardized achievement  

tests. Paper presented at the American Educational Research Association, Boston, MA. 

 

 

134

Pekrun, R., & Stephens, E. J. (2015). Test anxiety and academic achievement. International  

Encyclopedia of the Social & Behavioral Sciences, 244–249. 
doi:10.1016/b978-0-08-097086-8.26064-9 

 
Perez, T., Cromley, J. G., & Kaplan, A. (2014). The role of identity development, values, and  
costs in college STEM retention. Journal of Educational Psychology, 106(1), 315–329. 
doi:10.1037/a0034027 

 
Pollard, D. S. (1993). Gender, achievement, and African-American students' perceptions of their  
 
 
 
Ratelle, C. F., Guay, F., Larose, S., & Senécal, C. (2004). Family correlates of trajectories of  

school experience. Educational Psychologist, 28(4), 341–356. 
doi.org/10.1207/s15326985ep2804_4 

academic motivation during a school transition: A semiparametric group-based approach. 
Journal of Educational Psychology, 96(4), 743–754. doi:10.1037/0022-0663.96.4.743  

New York, NY: Routledge Press. 

 
Raykov, T., & Marcoulides, Ga. A. (2008). An introduction to applied multivariate analysis.  
 
 
Renaissance Learning. (2014). Frequently asked questions about student information in our  
 
 
Renaissance Learning. (2016). STAR Reading™ Technical Manual. Wisconsin Rapids, WI:  

software products. Wisconsin Rapids, WI: Renaissance Learning. 

Renaissance Learning. 

Wisconsin Rapids, WI: Renaissance Learning. 

 
Renaissance Learning. (2017). Application and hosting privacy policy (US applications).  
 
 
Rios, J. A., Liu, O. L., & Bridgeman, B. (2014). Identifying low-effort examinees on student  
learning outcomes assessment: A comparison of two approaches. New Directions for 
Institutional Research, 2014(161), 69–82. doi:10.1002/ir.20068 

 
Robinson, J. P., & Lubienski, S. T. (2010). The development of gender achievement gaps in  
 
 
 
 
Rozek, C. S., Hyde, J. S., Svoboda, R. C., Hulleman, C. S., & Harackiewicz, J. M. (2015).  

mathematics and reading during elementary and middle school: Examining direct 
cognitive assessments and teacher ratings. American Educational Research Journal, 
48(2), 268–302. doi:10.3102/0002831210372249 

Gender differences in the effects of a utility-value intervention to help parents motivate 
adolescents in mathematics and science. Journal of Educational Psychology, 107(1), 
195–206. doi:10.1037/a0036981 

 
Salvia, J. S., Ysseldyke, J. E. & Bolt, S. (2013). Assessment in special and inclusive education  

(12th ed.). Boston, MA: Wadsworth/Cengage Publications. 

 
Schnipke, D. L., & Scrams, D. J. (1997). Modeling item response times with a two-state  

 

135

mixture model: A new method of measuring speededness. Journal of Educational 
Measurement, 34(3), 213–232. doi:10.1111/j.1745-3984.1997.tb00516.x 

 
Schunk, D. H., Meece, J. L., & Pintrich, P. R. (2014). Motivation in education: Theory, research,  

and applications (4th ed.). Upper Saddle River, NJ: Pearson Education, Inc. 

 
Schunk, D. H., Pajares, F. (2009). Self-efficacy theory. In K. R. Wentzel & A. Wigfield (Eds.), 
Handbook of motivation at school (pp. 35–53). New Work: Routledge/Taylor & Francis 
Group. 

 
Senko, C. (2016). Learning environments and motivation. In K. R. Wentzel & D. Miele  
(Eds.), Handbook of motivation at school (2nd ed.). (pp. 55–74). New York, NY: 
Routledge. 

 
 
Sessoms, J., & Finney, S. J. (2015). Measuring and modeling change in examinee effort on low- 

stakes tests across testing occasions. International Journal of Testing, 15(4), 356–388.  
doi:10.1080/15305058.2015.1034866 

 
Setzer, J. C., Wise, S. L., van den Heuvel, J. R., & Ling, G. (2013). An investigation of  
examinee test-taking effort on a large-scale assessment. Applied Measurement in 
Education, 26(1), 34–49. doi:10.1080/08957347.2013.739453 

 
Shapiro, E. S., Dennis, M. S., & Fu, Q. (2015). Comparing computer adaptive and curriculum- 

based measures of math in progress monitoring. School Psychology Quarterly, 30(4), 
470–487. doi:10.1037/spq0000116 

disaffection as organizational constructs in the dynamics of motivational development. In 
K. R. Wentzel & A. Wigfield (Eds.), Handbook of motivation at school (pp. 223–245). 
New Work: Routledge/Taylor & Francis Group. 

 
Skinner, E. A., Kindermann, T. A., Connell, J. P., & Wellborn, J. G. (2009). Engagement and  
 
 
 
 
Skinner, E. A. & Pitzer, J. R. (2012). Developmental dynamics of student engagement, coping,  
and everyday resilience. In S. Christenson, A. Reschly, & C. Wylie (Eds.), Handbook of 
research on student engagement (pp. 133–145). New York, NY: Springer.  

 
Smith, L. F., & Smith, J. K. (2002). Relation of test-specific motivation and anxiety to test  

performance. Psychological Reports, 91(3), 1011–1021. doi:10.2466/pr0.2002.91.3.1011 

 
Spangler, G., Pekrun, R., Kramer, K., & Hofmann, H. (2002). Students’ emotions, physiological  

reactions, and coping in academic exams. Anxiety, Stress, & Coping, 15(4), 413–432. 
doi:10.1080/1061580021000056555 

 
Steele, C. M. (1997). A threat in the air: How stereotypes shape intellectual identity and  

performance. American Psychologist, 52(6), 613–629. doi:10.1037/0003-066x.52.6.613 

 
Struthers, C. W., & Perry, R. P. (1996). Attributional style, attributional retraining, and  

 

136

inoculation against motivational deficits. Social Psychology of Education, 1(2), 171–187. 
doi:10.1007/bf02334731 

 
Sundre, D. L. (1999). Does examinee motivation moderate the relationship between test  

consequences and test performance? (Report No. TM029964). Harrisonburg, Virginia: 
James Madison University. Retrieved from https://eric.ed.gov/?id=ED432588 

 
Sundre, D. L., & Kitsantas, A. (2004). An exploration of the psychology of the examinee: Can  

examinee self-regulation and test-taking motivation predict consequential and non-
consequential test performance? Contemporary Educational Psychology, 29(1), 6–26. 
doi:10.1016/s0361-476x(02)00063-2 

 
Sundre, D. L., & Wise, S. L. (2003). Motivation filtering: An exploration of the impact of low  
examinee motivation on the psychometric quality of tests. Paper presented at the annual 
meeting of the National Council on Measurement in Education, Chicago. 

 
Swerdzewski, P. J., Harmes, J. C., & Finney, S. J. (2011). Two approaches for identifying low- 

motivated students in a low-stakes Assessment context. Applied Measurement in 
Education, 24(2), 162–188. doi:10.1080/08957347.2011.555217 

 
Tabachnick, B. G., & Fidell, L. S. (2019). Using multivariate statistics (7th ed.). Boston:  

Pearson. 

 
U. S. Department of Education (2015). Fact sheet: Testing action plan. Washington, DC: U.S.  
Department of Education. Retrieved from: https://www.ed.gov/news/press-releases/fact-
sheet-testing-action-plan. 

 
U.S. Department of Education: National Center on Response to Intervention (2010). Tools  

Charts. Washington, DC: U.S. Department of Education. Retrieved from: 
https://rti4success.org/resources/tools-charts. 

 
VanDerHeyden, A. M., Witt, J. C., & Naquin, G. (2003). The development and validation of a  
process for screening referrals to special education. School Psychology Review, 32, 204–
227. 

 
VanDerHeyden, A. M., Witt, J. C., & Gilbertson, D. (2007). A multi-year evaluation of the  

effects of a Response to Intervention (RTI) model on identification of children for special 
education. Journal of School Psychology, 45(2), 225–256. doi:10.1016/j.jsp.2006.11.004 

 
Walton, G. M., & Cohen, G. L. (2007). A question of belonging: Race, social fit, and  

achievement. Journal of Personality and Social Psychology, 92(1), 82–96. 
doi:10.1037/0022-3514.92.1.82 

 
Walton, G. M., & Cohen, G. L. (2011). A brief social-belonging intervention improves  

academic and health outcomes of minority students. Science, 331(6023), 1447–1451. 
doi:10.1126/science.1198364 

 

137

 
Watt, H. M. (2004). Development of adolescents' self‐perceptions, values, and task perceptions  
 
 
 
Weiner, B. (1992). Human motivation: Metaphors, theories, and research. Newbury Park, CA:  

according to gender and domain in 7th‐through 11th‐grade Australian students. Child 
development, 75(5), 1556–1574. doi.org/10.2307/1131221 

Sage Publications. 

 
Weiss, D. J. (1983). Introduction. In D. J. Weiss (ed.), New horizons in testing: Latent trait test  

theory and computerized adaptive testing. New York, NY: Academic Press, Inc. 

 
Weiss, D. J. (2011). Better data from better measurements using computerized adaptive testing.  

Journal of Methods and Measurement in the Social Sciences, 2(1), 1–27. 
doi:10.2458/azu_jmmss.v2i1.12351 

 
Wentzel, K. R., & Brophy, J. E. (2014). Motivating students to learn. New York, NY: Routledge. 
 
Wentzel, K. R., & Miele, D. B. (Eds.). (2016). Handbook of motivation in school (2nd ed.). New  
 
 
Wiest, D. J., Wong, E. H., Cervantes, J. M., Craik, L., & Kreil, D. A. (2001). Intrinsic motivation  

York, NY: Routledge. 

among regular, special, and alternative education high school students. Adolescence, 
36(141), 111–126. 

 
Wigfield, A., & Eccles, J. S. (1992). The development of achievement task values: A theoretical  

analysis. Developmental Review, 12(3), 265–310. doi:10.1016/0273-2297(92)90011-p 

 
Wigfield, A., & Eccles, J. S. (2000). Expectancy–value theory of achievement motivation.  
Contemporary Educational Psychology, 25(1), 68–81. doi:10.1006/ceps.1999.1015 

 
Wigfield, A., & Eccles, J. S. (2002). The development of competence beliefs, expectancies for  
success, and achievement values from childhood through adolescence. In A. Wigfield & 
J. S. Eccles (Eds.), Development of achievement motivation (pp. 92–120). San Diego: 
Academic Press. 

 
Wigfield, A., Eccles, J. S., Yoon, K. S., Harold, R. D., Arbreton, A., Freedman-Doan, C., &  

Blumenfeld, P. C. (1997). Changes in children’s competence beliefs and subjective task 
values across the elementary school years: A three-year study. Journal of Educational 
Psychology, 89(3), 451–469. doi:10.1037//0022-0663.89.3.451 

 
Wigfield, A., Tonks, S., & Klauda, S. L. (2009). Expectancy–value theory. In K. R. Wentzel, &  
 
 
 
Wigfield, A., Tonks, S., M., & Klauda, S., L. (2016). Expectancy–value theory. In K. R. Wentzel  

A. Wigfield (Eds.), Handbook of motivation at school (pp. 55–75). New York, NY:  
Routledge/Taylor & Francis Group.  

 

138

& D. Miele (Eds.), Handbook of motivation at school (2nd ed.). (pp. 55–74). New York, 
NY: Routledge. 

 
Wigfield, A., & Wentzel., K. R. (2007). Introduction to motivation at school: Interventions that  
 
 
Wise, S. L. (2006). An investigation of the differential effort received by items on a low-stakes  

work. Educational Psychologist, 42(4), 191–196. doi: 10.1080/00461520701621038 

computer-based test. Applied Measurement in Education, 19(2), 95–114. 
doi:10.1207/s15324818ame1902_2 

 
Wise, S. L. (2009). Strategies for managing the problem of unmotivated examinees in low-stakes  
 
 
 
Wise, S. L. (2014). The utility of adaptive testing in addressing the problem of unmotivated  

testing programs. The Journal of General Education, 58(3), 152–166.  
doi:10.1353/jge.0.0042 

examinees. Journal of Computerized Adaptive Testing, 2(3), 1–17. 
doi:10.7333/1401-02010001 

 
Wise, S. L. (2015). Effort analysis: Individual score validation of achievement test data. Applied  

Measurement in Education, 28(3), 237–252. doi:10.1080/08957347.2015.1042155 

 
Wise, S. L., Bhola, D. S., & Yang, S.-T. (2006). Taking the time to improve the validity of low- 

stakes tests: The effort-monitoring CBT. Educational Measurement: Issues and Practice,  
25(2), 21–30. doi:10.1111/j.1745-3992.2006.00054.x 

 
Wise, S. L., & Cotten, M. R. (2009). Test-taking effort and score validity. In D. M. McInerney,  

G. T. L. Brown, & G. A. D. Liem (Eds.), Student perspectives on assessment: What 
students can tell us about assessment for learning (pp. 187–205). Charlotte, NC: 
Information Age. 

 
Wise, S. L., & DeMars, C. E. (2005). Low examinee effort in low-stakes assessment: Problems  
 
 
 
Wise, S. L., & DeMars, C. E. (2006). An application of item response time: The effort- 

and potential solutions. Educational Assessment, 10(1), 1–17. 
doi:10.1207/s15326977ea1001_1 

moderated IRT model. Journal of Educational Measurement, 43(1), 19–38. 
doi:10.1111/j.1745-3984.2006.00002.x 

 
Wise, S. L., & DeMars, C. E. (2009). A clarification of the effects of rapid guessing on  

coefficient alpha: A note on Attali’s reliability of speeded number-right multiple-choice 
tests. Applied Psychological Measurement, 33, 488–490. 
doi:10.1177/0146621607304655 
 

Wise, S. L., & DeMars, C. E. (2010). Examinee noneffort and the validity of program  

assessment results. Educational Assessment, 15(1), 27–41. 
doi:10.1080/10627191003673216 

 

139

 
Wise, S. L., & Kingsbury, G. G. (2016). Modeling student test-taking motivation in the context  

of an adaptive achievement test. Journal of Educational Measurement 53(1), 86–105. 
doi:10.1111/jedm.12102 

 
Wise, S. L., Kingsbury, G. G., Thomason, J., & Kong, X. (2004). An investigation of motivation  

filtering in a statewide achievement testing program. Paper presented at the annual 
meeting of the National Council on Measurement in Education, San Diego, CA.  

 
Wise, S. L., & Kong, X. (2005). Response time effort: A new measure of examinee motivation in  

computer-based tests. Applied Measurement in Education, 18(2), 163–183. 
doi:10.1207/s15324818ame1802_2 

 
Wise, S. L., & Ma, L. (2012). Setting response time thresholds for a CAT item pool: The  

normative threshold method.  Paper presented at the annual meeting of the National 
Council on Measurement in Education, Vancouver, Canada.  

between time of testing and test-taking effort. Paper presented at the annual meeting of 
the National Council on Measurement in Education, Denver, CO. 

 
Wise, S. L., Ma, L., Kingsbury, G. G., & Hauser, C. (2010). An investigation of the relationship  
 
 
 
Wise, S. L., Ma, L., & Theaker, R. A. (2014). Identifying non-effortful student behavior on  
adaptive tests: Implications for test fraud detection. In N. M. Kingston & A. K. Clark 
(Eds.) Test fraud: Statistical detection and methodology (pp. 175–185). New York, NY: 
Taylor & Francis/Routledge. 

 
Wise, S. L., Pastor, D. A., & Kong, X. J. (2009). Correlates of rapid-guessing behavior in low- 

stakes testing: Implications for test development and measurement practice. Applied  
Measurement in Education, 22(2), 185–205. doi:10.1080/08957340902754650 

 
Wise, S. L., & Smith, L. F. (2011). A model of examinee test-taking effort. In J. A. Bovaird, K. F.  

Geisinger, & C. W. Buckendahl (Eds.), High-stakes testing in education: Science and 
practice in K–12 settings (pp. 139–153). Washington, DC: American Psychological 
Association. 

 
Wolf, L. F., & Smith, J. K. (1995). The consequence of consequence: Motivation, anxiety, and  

test performance. Applied Measurement in Education, 8(3), 227–242. 
doi:10.1207/s15324818ame0803_3 

motivation, and mentally taxing items. Applied Measurement in Education, 8(4), 341–
351. doi:10.1207/s15324818ame0804_4 

 
Wolf, L. F., Smith, J. K., & Birnbaum, M. E. (1995). Consequence of performance, test,  
 
 
 
Ysseldyke, J. E., Burns, M., Dawson, P., Kelley, B., Morrison, D., Ortiz, S., Rosenfield, S., &  
 
 

Telzrow, C. (2006). School psychology: A blueprint for training and practice III. 
Bethesda, MD: National Association of School Psychologists.  

 

140

 
Zilberberg, A., Anderson, R. D., Finney, S. J., & Marsh, K. R. (2013). American college 
 
 
 
Zilberberg, A., Finney, S. J., Marsh, K. R., & Anderson, R. D. (2014). The role of students’  

students' attitudes toward institutional accountability testing: Developing measures: 
Educational Assessment, 18(3), 208–234. doi.org/10.1080/10627197.2013.817153 

attitudes and test-taking motivation on the validity of college institutional accountability 
tests: A path analytic model. International Journal of Testing, 14(4), 360–384. 
doi:10.1080/15305058.2014.928301 

 

 

141