STRIKES VERSUS SCORES: A COMPARISON OF TWO VERSIONS OF THE GOOD 
BEHAVIOR GAME AND THE IMPACT OF INTERVENTION ON GENERAL EDUCATION 
STUDENTS AND STUDENTS WITH SPECIAL EDUCATION NEEDS 

By 

Travis Lunsford 

A THESIS 

Submitted to 
Michigan State University 
in partial fulfillment of the requirements   
for the degree of 

Applied Behavior Analysis – Master of Arts 

2025 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ABSTRACT 

Disruption in classrooms presents challenges to both teaching and learning. A common 

and well-researched Tier 1 support to address disruption is the Good Behavior Game (GBG); 

however, little research exists on the impact of this intervention on students with special learning 

needs in a mainstream general education setting. This study utilized a single case research design 

replicated across four students, two of which have autism spectrum disorder (ASD), to determine 

the differential effects of two different versions of the GBG. Visual analysis of intervention 

results was used to draw conclusions about differences in effectiveness in increasing on-task 

student behavior. Results suggest little difference in effectiveness for the two general education 

target students, and mixed results for the two target students with ASD. This study suggests that 

more research is needed on the effects of universal interventions on students with special 

learning needs to understand what modifications may need to be made to improve effectiveness. 

Implications for practice and other suggestions for future research are also discussed.  

Keywords: Good Behavior Game, Caught Being Good Game, Universal Supports, 

Neurodivergence, Autism, Classroom Management 

 
 
 
 
 
 
TABLE OF CONTENTS 

LIST OF SYMBOLS AND ABBREVIATIONS ...................................................................... iv 

Introduction ..................................................................................................................................1 

Method .........................................................................................................................................6 

Results ..........................................................................................................................................15 

Discussion ....................................................................................................................................18 

REFERENCES ............................................................................................................................25 

APPENDIX ..................................................................................................................................27 

iii 

 
 
 
 
 
AIR® 

ASD 

CBGG 

ELA 

GBG 

PAX 

PBIS 

LIST OF SYMBOLS AND ABBREVIATIONS 

American Institute of Research 

Autism Spectrum Disorder 

Caught Being Good Game 

English Language Arts 

Good Behavior Game 

Peace, Productivity, Health, and Happiness 

Positive Behavior Interventions and Supports 

iv 

 
 
Introduction 

Challenging, disruptive, and off-task behavior can have detrimental impacts on both 

teaching and learning (Cameron et al., 2008). The time spent addressing disruptive behaviors is 

often to the detriment of academic instruction (Oliver et al., 2011). Students who engage in 

disruptive behaviors often have lower academic achievement and score lower on standardized 

testing (Oliver et al., 2011; Shinn et al., 1987). Recently, there has been a move towards 

inclusion in education, meaning students who engage in challenging, disruptive, or off-task 

behaviors may be in general education settings due to legislature requiring students be placed in 

the least restrictive environment (Individuals with Disabilities Education Improvement Act, 

2004). This means that general education classroom teachers may be required to address a range 

of behavioral needs in addition to providing academic instruction. Given the multitude of 

behaviors that might be observed across the school day, a comprehensive framework or strategy 

for preventing and addressing challenging behaviors is necessary (Flower et al. 2014).  

Many schools address disruptive behaviors by adopting a school-wide framework called 

Positive Behavior Interventions and Supports (PBIS; Center on PBIS, 2025). PBIS uses a three-

tiered approach to provide support and services and promote positive behaviors using positive 

behavioral interventions. Tier 1 consists of interventions and strategies that are utilized by all 

students and are often the only behavioral supports used by approximately 80% of students to be 

successful. These are often referred to as universal supports and include strategies such as 

posting visuals that promote school rule following or the implementation of school-or-class-wide 

reinforcement systems (Center on PBIS, 2025; Noltemeyer et al., 2018). Tier 2 is meant to 

address the needs of the next 10-15% of students who need more intensive support than what 

they received in Tier 1. Interventions at Tier 2 include additional explicit teaching on social, 

1 

 
 
behavioral, or academic skills, or academic supports, such as individualized token economy 

systems, self-management strategies, and other individualized visual supports (Center on PBIS, 

2025; Noltemeyer et al., 2018). Tier 3 supports are intended for the remaining 1-5% of students 

and are implemented after Tier 1 and Tier 2 supports have been shown not to be effective (Center 

on PBIS, 2025; Noltemeyer et al., 2018). Supports at Tier 3 include multi-disciplinary teams, 

behavioral expertise, and the collection and analysis of behavioral data (Center on PBIS, 2025). 

Regardless of which level of support a student receives, in a PBIS school, all students in the 

general education setting receive Tier 1 universal supports and class-wide interventions. One 

example of a universal support is The Good Behavior Game (GBG), which is a classroom 

management strategy with over 50 years of support (Jornevald et al., 2024). 

The GBG is a classroom management strategy that utilizes an interdependent group 

contingency in which students are divided into even teams. Rules are established and shared with 

the class, and depending on the version of the game being played, a teacher will award or remove 

points for rule violations or rule following. A threshold of points is determined and any team 

with either more than or less than the threshold may win a daily or weekly prize. In 1969, Barrish 

and colleagues first introduced the GBG in an effort to reduce disruptive behavior, including out-

of-seat behavior and talking-out behavior, in a fourth-grade classroom (Barrish et al., 1969).  The 

class was split into teams and points were awarded when individual students on that team 

violated the classroom rules. If a team had fewer than five points at the conclusion of the game, 

they could earn classroom privileges, such as lining up early or free time at the end of the day. 

The game was successful, in that it significantly reduced both behaviors that were targeted. The 

game used a differential reinforcement for low rates of behavior approach to behavior change, 

whereby low rates of behavior were reinforced and higher rates of behavior were not. 

2 

 
 
 
Since its introduction, the GBG, or some variation of it, has been studied to address a 

multitude of behaviors, including vocal disruptions, such as talking-out-of-turn (Barrish et al., 

1969; Sharpe & Joslyn, 2021; Whal et al., 2016; Wiskow et al., 2021; Wright and McCurdy, 

2011), classroom rule-following and classroom rule-breaking (Tanol et al., 2010), on-task 

behavior (Sharpe & Joslyn, 2021; Wright & McCurdy, 2011), work completion and work 

accuracy (Sharpe & Joslyn, 2021), the delivery of teacher praise and teacher reprimands or 

corrective statements (Tanol et al, 2010; Whal et al., 2016; Wiskow et al., 2021), student peer 

prompting, and positive and negative student interactions (Wiskow et al., 2021). Some versions 

of the GBG have been formalized. The PAX Good Behavior Game® and American Institute of 

Research® (AIR) model are two common manualized versions of the GBG that have very 

different proprietary approaches. In PAX, the GBG is paired with the teaching of evidence-based 

practices taught and used throughout the school day, such as written praise notes, the use of 

timeout, and other antecedent and consequence manipulations (Embry, 2011). The game is 

played for very short periods to start (approximately 3 minutes) and gets longer as students 

demonstrate success with the game. Additionally, the reward in the PAX version is often the 

ability to engage in a classroom activity that would not otherwise be allowed, such as a short 

dance party, as opposed to tangible rewards (Johansson et al., 2020). The AIR® model follows 

closer to the original GBG proposed by Barrish and colleagues, using predetermined rules that 

are established by the teacher, and limits teacher interaction only to the times where corrective 

feedback is given, promoting peer accountability among the students themselves (Jornevald et 

al., 2024). 

Several other variations of the GBG have also been studied in peer-reviewed research. 

Tanol and colleagues (2010) looked at two versions of the GBG, one that utilized response cost 

3 

 
 
 
 
during which students in a kindergarten classroom could lose stars for violating the rules, and 

another that utilized positive reinforcement during which students earned stars for following the 

rules. Rule violations were monitored continuously whereas rule-following was monitored using 

momentary time sampling on a schedule set by the teacher. This study found that both versions 

were effective at reducing rule violations, however a slightly larger reduction was observed in 

the reinforcement version of the game (Tanol et al., 2010). Wright and McCurdy (2011) were the 

first to directly compare the GBG without response cost to a version of the GBG that utilized 

positive reinforcement, which they called the Caught Being Good Game (CBGG). Their study 

was conducted with both a kindergarten and fourth-grade general education classroom. Their 

results indicated that both game versions were effective at reducing classroom disruptions, but 

neither intervention demonstrated a functional relation to on-task behavior. More recently, 

Sharpe and Joslyn (2021) conducted a study similar to Wright and McCurdy (2011), comparing 

the GBG to the CBGG. Their study was conducted with at-risk middle school students in the 

process of transitioning from an alternative education setting back to mainstream education. 

They referred to their game versions as Strikes (GBG) and Scores (CBCC) in order to 

differentiate between the two conditions. Their results indicated a slightly higher reduction in 

disruption in the Strikes condition, a slightly higher rate of on-task behavior in the Scores 

condition, and variable results for work completion and accuracy.  

Despite the large body of literature in support of the GBG, according to a recent literature 

review conducted by Jornevald and colleagues (2024), there is a significant lack of research on 

the effects of the GBG for students with special education needs in general education classrooms. 

Most studies of the GBG have been conducted with general education students only or with 

special education students only in special education settings. They recommend future research 

4 

 
 
include participants with disabilities such as autism spectrum disorder (ASD) or attention deficit 

hyperactivity disorder, since it is becoming more common that students with these disabilities are 

taught in general education classrooms. Jornevald and colleagues (2024) argue that schools 

should prioritize the utilization of universal interventions such as the GBG but also recognize 

that universal interventions may need modification to benefit all students, including the highest-

risk students.  

Therefore, the purpose of the current study was to extend the study conducted by Sharpe 

and Joslyn (2021) by including students with special education needs in a general education 

setting. In this study, the differential effects to on-task behavior were assessed for two versions of 

the GBG; Strikes which utilized a positive punishment procedure to differentially reinforce low 

rates of the target behavior, and Scores, which utilized a positive reinforcement procedure with 

praise provided for rule following behavior. The research question addressed in this study is two-

fold; (1) what impact does each version, Strikes or Scores, have in regard to on-task behavior and 

(2) what impact does each version have for students with ASD in the general education setting? 

5 

 
 
 
 
 
Participants and Setting 

Method 

Participants in the study included a fourth-grade general education classroom teacher and 

four 4th grade students, three males and one female, who were reported by the teacher to have 

been frequently off task, disruptive, and not always paying attention during classroom lessons. 

Preliminary observations were also conducted in the classroom to observe student behavior prior 

to the study to determine if the recommended students did display frequent disruptive and off-

task behaviors. Justin was a 9-year-old White male with an Individualized Education Program 

(IEP) under the eligibility category of autism spectrum disorder (ASD) and received Tier 3 

supports. His IEP included Speech, Social Work, and Occupational Therapy services, a behavior 

intervention plan, one-on-one paraprofessional support, and accommodated assignments, which 

included a reduction in the number of academic demands. His paraprofessional was responsible 

for assisting the teacher in accommodating assignments, serving as a scribe when asked, 

supporting him during breaks, and redirecting him back to tasks as needed. During the study, his 

paraprofessional refrained from redirecting him back to the task; however, assignments were still 

accommodated, and he had the option to request a scribe or a break if needed. Isabel was a 9-

year-old Hispanic female with a 504-plan due to a medical diagnosis of ASD and received Tier 2 

support. Her accommodations included scheduled and as-needed movement breaks and 

additional 1-on-1 instruction with the teacher. Hector and Louis were both 9-year-old White male 

general education students receiving only Tier 1 supports. The teacher was a White female with 

over 30 years of experience in teaching elementary education. She had experience implementing 

classroom-wide reward systems in the past, but had not yet done so with this class, nor had she 

implemented any variation of the GBG. 

6 

 
 
This study was conducted in a 4th grade general education classroom at a suburban 

elementary school within a district near a Midwest capital city. The classroom had 26 students, 

12 male, and 14 females. The classroom was in a rectangular shape, with individual student 

desks arranged side-by-side forming 3 columns of 4 rows, with the first column including a 5th 

row. The desks faced a wall-sized whiteboard on which the teacher projected lesson materials, 

videos, reading passages, or math problems required for the lesson. The teacher mostly taught 

from the whiteboard but would occasionally wander around the room to assist students or check 

student engagement. The researcher and second observer were positioned in the back of the 

room, opposite the whiteboard, behind the middle column of desks where a clear line of sight 

could be established for each participant. For sessions 1 through 8, the study was conducted 

during English Language Arts (ELA), however, due to a change in instruction style necessary for 

their ELA curriculum, the study took place during Math instruction and work time for the 

remaining sessions. The study was not conducted if the students were taking an assessment; had 

a planned interruption such as a fire-drill; there was a substitute present; or more than half of the 

target students were absent.  

Materials 

Materials for this study included two android tablets, two pairs of wireless Bluetooth 

earbuds, an interval timer application with customizable interval timer, and writing utensils. 

Tickets for the school-wide reinforcement system were used as the prize for winning the game, 

which was a quarter-sheet sized colorful piece of paper with the school’s Positive Behavior 

Interventions and Supports (PBIS) rules printed on them and the teacher’s signature. The tickets 

were saved and could be redeemed at the school’s monthly prize store for rewards including 

novelty toys, school supplies, pizza party passes, or gift cards for local restaurants and activities.  

7 

 
 
 
Dependent Variables and Measurement 

The researcher worked with the classroom teacher to define the behaviors targeted in the 

study. On-Task was defined as engaging in behaviors that directly support participation in the 

assigned lesson or task. This included behaviors such as maintaining eye contact with 

instructional materials (the teacher, their textbook, a worksheet, the whiteboard, etc.), writing or 

typing as part of the assignment, asking or answering relevant questions with permission to talk, 

and following teacher instructions. Non-examples involved engaging in activities not related to 

the assignment (such as reading an unrelated book), looking around the room, turning around in 

their seat, talking to peers about topics not related to the assignment, or interacting with objects 

not relevant to the assignment. Off-Task was defined as engaging in behaviors that do not directly 

support participation in the assigned lesson or task. This included behaviors such as gazing off, 

putting their head down, or otherwise not being oriented towards the lesson materials or 

instructor; drawing, writing or reading materials that are not a part of the assigned lesson; talking 

without permission or about a topic not related to the lesson; and/or being out of their seat unless 

required by the lesson. Non-examples included engaging in appropriate choice time activities if 

instructed and approved by the teacher. Target behavior was measured using a 10-s momentary 

time sampling system. The 10-min game was divided into 60, 10-s intervals. Using the interval 

timer application on one of the tablets and an ear bud, when the 10-s timer went off, the 

researcher would scan the room in a pre-determined pattern (Justin, Isabel, Hector, then Louis), 

determine if the target student was on-task or off-task, and mark it on the data sheet.  

Experimental Design 

This study utilized an ABCBCA design. A baseline condition was followed by a condition 

called Strikes 1 (B), then Scores 1 (C). Since the study intended to compare effects of the two 

8 

 
 
 
interventions against each other, Strikes 2 was then reintroduced, followed by Scores 2, and 

ending with a final baseline phase to assess the impact of intervention overall. Each phase 

continued until a stable trend was observed in the data for at least two of the target students. Like 

the Sharpe and Joslyn (2021) study, this study attempted to demonstrate an applied comparison 

of the interventions. While the rules in each game version remained the same across the study, 

the criteria for winning the game was not yoked, meaning some elements of each intervention, 

such as the number of points needed to win or the frequency of scanning for rule following or 

violations were not necessarily consistent from phase to phase.  

Procedure 

Pre-Teaching 

The first author and teacher met before school hours a total of five times before 

implementation of the first intervention for pre-teaching. During the first meeting, the teacher 

was provided with an overview of the study, general information about the format of Strikes and 

Scores, and a video overview of the Good Behavior Game was shown (SciSho Psych, 2019). 

During subsequent meetings, the teacher was given opportunities to practice each step of 

implementation without students present and feedback was provided until the teacher 

demonstrated proficiency with each version of the intervention.  

Baseline 

During baseline, the teacher was instructed to provide instruction and manage the class as 

she typically would. She was told to redirect students as needed and had the option to provide 

reward tickets to reward positive classroom behavior, although this was never observed. Baseline 

1 occurred during ELA instruction and Baseline 2 occurred during Math instruction. The session 

began when the teacher began instruction and lasted for 10 minutes, even if students began to 

9 

 
 
 
 
work independently during that time. Data was collected on the target behaviors in the manner 

described above.  

General Procedures 

Across all versions of the game, the teacher divided the classroom into five teams of four 

and one team of six. In an effort to continue the teacher’s regular classroom practices, after each 

phase, the teacher changed the composition and seating arrangements of the teams; however, to 

avoid one team having an advantage over another, only one target student was on any team at 

any time. At the beginning of the game, the teacher would remind the students of the rules of the 

game, which remained consistent throughout the study. Rule one was to raise their hand before 

talking and rule two was to remain facing forward in their seat. The rules were written on the 

classroom white board, along with numbers that corresponded to the teams, with space below 

each number to keep a tally score. The point threshold needed to win was also written on the 

whiteboard. The students were reminded that if their team won, each student on the team would 

earn a ticket for their school-wide reinforcement system. 

Strikes 1 

During the Strikes phases, the teacher was instructed to provide academic instruction as 

she typically would but this time, play Strikes with the students. During the first Strikes phase, 

the teacher wore an ear bud connected to the second android tablet and interval timer that beeped 

every 30 s, which prompted the teacher to look for instances of rule breaking behavior. If a 

student broke a rule while she scanned the room, the teacher would record a “strike” by adding a 

tally next to the corresponding team and say something like, “Someone on team 1 wasn’t raising 

their hand. That’s a strike,” before returning to the lesson. In order to win, a team needed to have 

fewer than four “strikes,” a threshold that was determined after averaging data collected in 

10 

 
 
 
baseline. If no team had fewer than the limit, then the team with the lowest number of strikes 

would win the prize. It was also possible for all teams to win.  

Strikes 2 

Strikes 2 was implemented similarly to Strikes 1, except that the ear bud was not used. 

Instead, the teacher was instructed to continuously scan and record any instance of a rule 

violation. This was because during Strikes 1, each team won every game, indicating that the 

teacher may have missed rule breaking behavior between the 30-s intervals. However, reducing 

the length of the interval to below 30-s would have been too disruptive to the flow of instruction, 

which is why the teacher was instructed to record any instance of rule breaking behavior instead. 

The rules, prize, and point threshold needed remained the same as in Strikes 1.  

Scores 1 

During the Scores phases, the teacher was instructed to provide academic instruction as 

she typically would but this time, play Scores with the students. In addition to reviewing the 

rules and prize for winning, the students were told the threshold number of points needed to win, 

which was 16. It was determined by calculating the total number of opportunities to score, which 

was 20, and subtracting the average number of rule violations observed during baseline, which 

was four. The teacher wore an ear bud connected to the second android tablet and an interval 

timer that prompted the teacher every 30-s with a short beep. When the interval timer went off, 

the teacher would scan the room to identify students who were following the rules. Teams with 

all members following the rules at the time the teacher checked received a “score” and a positive 

call out from the teacher, such as, “Nice job following the rules, team 2! That’s a score.” Any 

team with at least 16 points won the game, meaning it was possible for all teams to win. If none 

of the teams hit the winning threshold, the team with the highest score won.  

11 

 
 
 
 
Scores 2 

Scores 2 was implemented similarly to Scores 1, except the interval length for the 

teacher’s momentary time sampling was changed to 45-s. This was done to address concerns 

presented by the teacher during Scores 1 that the 30-s interval interfered with the pace and flow 

of her instruction. Because the interval length was changed but the game length was not, the 

number of opportunities to earn points decreased, as did the threshold of points needed to win. 

The threshold of points needed to win in this phase was 10. It was calculated by determining the 

number of opportunities to earn points during the game rounded to the next whole number, 

which was 14, and subtracting the average number of rule violations observed during baseline, 

which was four.  

Interobserver Agreement and Procedural Fidelity 

Interobserver agreement (IOA) was assessed by having a second observer separately and 

independently score for at least 40% of sessions in-person during each phase of the study. IOA 

was measured on an interval-by-interval basis. Agreement was calculated by adding the total 

number of agreements and dividing by the total number of possible agreements, then multiplying 

by 100. IOA was recorded across 40 to 75% of all phases for all participants. For Justin, the 

average IOA across all sessions was 97% (range 93 to 100%). Isabel’s average IOA across all 

sessions was 95% (range 89 to 100%). Average IOA across all sessions was 97% (range 89 to 

100%) for Hector. Average IOA across all sessions was 97% (range 88 to 100%) for Louis. 

Procedural fidelity was measured utilizing a procedural integrity checklist adapted from 

the Sharpe and Joslyn (2021) study during at least 33% of sessions in the Strikes and Scores 

phases. There were a total number of 12 steps on each checklist (see Table 1). Each item was 

scored as “yes” if it was observed or “no” if it was not observed. A percentage was obtained by 

12 

 
 
 
dividing the number of correctly implemented steps by the total number of steps possible. The 

mean procedural fidelity during Strikes 1 was 96% (range 83 to 100%). The mean procedural 

fidelity during Scores 1 was 100%. The mean procedural fidelity during Strikes 2 was 100%. The 

mean procedural fidelity during Scores 2 was 96% (range 92 to 100%).  

Social Validity 

Social validity was assessed through a post-study survey sent to the teacher and through 

an informal raise-of-hands survey of the classroom students. The class (both target students and 

non-target students) was asked if they liked playing Strikes or Scores and which they liked better. 

A total of 46% of the class indicated a preference for Scores, whereas 38% indicated a preference 

for Strikes, with approximately 15% of students either not expressing a preference or were 

absent at the time the poll occurred. Conversely, half of the target students indicated a preference 

for Strikes (Isabel and Louis), with one participant preferring Scores (Hector). Justin chose not to 

participate in the poll. The teacher was also asked which version of the game she liked better and 

why. Preference was indicated for Scores due to the positive reinforcement aspect, though she 

admitted Strikes was easier to implement. The teacher was also asked if she was likely to 

continue implementing either version of the game and she indicated that she would likely 

continue using Scores in some way.  

On a social validity survey (see Table 2), the teacher was asked to rate the ease of 

implementation, level of student engagement, perceived effect on behavior, and manageability 

from 1 to 5 on a Likert scale, with 1 indicating a low score or more difficult rating, and 5 being a 

high score or easier rating using a Microsoft Forms survey. The teacher rated both interventions 

as equal and high with regard to ease of implementation and effectiveness. She rated Strikes as 

slightly more manageable than Scores and found slightly more student engagement with Scores 

13 

 
 
than with Strikes. When asked about having anything to share about the experience, she shared 

that she enjoyed participating and found the data and data collection process interesting.  

14 

 
 
 
 
Results 

Results suggest that the functional relation between baseline and intervention is 

idiosyncratic, with little difference in effectiveness between interventions for three of the four 

target students. Figure 1 displays the mean rates of on-task behavior for all participants across 

phases. Justin’s data (Figure 2) was variable across all phases but trended downward as the study 

progressed. In Baseline 1, Justin’s on-task behavior trended downward with a mean rate of 54% 

(range 45 to 100%). When Strikes 1 was introduced, there was an initial increase in on-task 

behavior with an overall mean rate of 62% (range 48 to 87%); however, as the phase progressed, 

his on-task behavior declined. Justin’s mean rate of on-task behavior decreased by 29% to a 

mean of 33% (range 5 to 65%) when Scores 1 was introduced, with a slight upward trend across 

the phase. When Strikes 2 was introduced, Justin’s on-task behavior decreased another 8% to a 

mean rate of 25% (range 13 to 55%). When Scores 2 was introduced, Justin’s on-task behavior 

was similar to Scores 1, with a mean rate of 28% (range 3 to 47%). During a return to baseline, 

his mean rate of on-task decreased to 15% (range 8 to 23%). Visual analysis of Justin’s data 

indicated that neither intervention had an impact on increasing his on-task behavior.  

Isabel’s data is depicted in Figure 3. During Baseline 1, her mean rate of on-task behavior 

was 85% (range 73 to 93%). When Strikes 1 was introduced, her on-task behavior maintained at 

a similar rate of 86% (range 57 to 98%). Isabel’s mean rate remained constant at 86% (range 72 

to 95%) when Scores 1 was introduced with an upward trend observed across the phase. When 

Strikes 2 was introduced, average on-task behavior dropped 9%, for a mean rate of 77% (range 

65 to 93%) and a slight downward trend was observed across the phase. When Scores 2 was 

introduced, on-task behavior increased to a mean rate of 84% (range 67 to 93%). In Baseline 2, 

Isabel’s mean rate of on-task behavior decreased to 63% (range 28 to 82%). Relatively little 

15 

 
 
change was observed from phase to phase in Isabel’s data, indicating that neither intervention 

was more effective than the other at increasing her on-task behavior.  

Hector’s data is shown in Figure 4. In Baseline 1, Hector’s mean rate of on-task behavior 

was 70% (range 43 to 98%) and trended downward across the phase. When Strikes 1 was 

introduced, Hector’s mean rate of on-task behavior increased to 92% (range 75 to 100%). When 

Scores 1 was introduced, Hector’s mean rate increased slightly to 98% (range 92 to 100%). 

Hector’s mean rate remained constant at 98% (range 95 to 100%) when Strikes 2 was introduced 

with a flat trend across the phase. When Scores 2 was introduced, Hector’s mean rate of on-task 

behavior maintained at a similar rate of 96% (range 92 to 100%) with a flat trend across the 

phase. When the intervention was removed and conditions returned to baseline, Hector’s on-task 

behavior decreased by 42% to a mean rate of 54% (range 37% to 62%) with a downward trend 

observed across the phase. There was a significant change in the rate of response observed when 

the intervention was first introduced and again when completely removed, but relatively little 

change in on-task behavior observed between interventions (Strikes or Scores) indicating no 

difference in effectiveness between them.  

Louis’ data is shown in Figure 5. In Baseline 1, Louis’ mean rate of on-task behavior was 

68% (range 27 to 95%) and trended upward across the phase. When Strikes 1 was introduced, 

Louis’ mean rate of on-task behavior increased slightly to 72% (range 23 to 98%) and overall 

trended upward across the phase with some variability observed. In Scores 1, Louis’ on-task 

behavior increased by 20% to a mean rate of 92% (range 77 to 98%). Louis was only present for 

two sessions in the Strikes 2 phase, with a mean rate of on-task behavior of 96% (range 95 to 

100%). When Scores 2 was introduced, Louis’ mean rate of on-task behavior was 94% (range 85 

to 100%). When the intervention was removed and conditions returned to baseline, Louis’ mean 

16 

 
 
rate for being on-task decreased slightly to 51% (range 30 to 70%). Given Louis’ absence and 

lack of sufficient data points across phases, a determination on a functional relation is 

inconclusive.  

17 

 
 
 
 
Discussion 

Given the paucity of research to support the use of the GBG, the study did not explicitly 

investigate the overall effectiveness of the intervention. Instead, the purpose of the current study 

was to compare two versions of the GBG, Strikes and Scores, to see if a punishment-based 

version of the game (Strikes) or a positive-reinforcement based version of the game (Scores) was 

more effective than the other. The results of the study did not demonstrate that one intervention 

was more effective at increasing on-task behavior than the other. However, there were slightly 

higher rates of on-task behavior observed during Scores than Strikes for three of the four 

participants. Although it is difficult to say with certainty given the research design, it appears that 

for most participants, some version of the GBG was more effective for improving on-task 

behavior than no intervention at all. However, this was not the case for one participant, whose 

behavior progressively worsened as the study progressed. Overall, results of the study are 

consistent with previous research, which suggest that a positive-reinforcement based version of 

the game may improve on-task behavior (Sharpe & Joslyn, 2021; Wiskow et al., 2021) and aligns 

with other research, which suggests multiple versions of the game are equally effective at 

reducing classroom disruptions (Wahl et al., 2016; Wright & McCurdy, 2011). The results of this 

study also support the claims by Jornevald and colleagues (2024) that the GBG may not be 

effective without additional supports for some students. 

For all target students, the mean percentage of intervals on-task increased when the 

intervention was first introduced and decreased once conditions returned to baseline. This 

supports previous research, which has demonstrated that universal supports, such as the GBG, 

can be an effective classroom management strategy (Jornevald et al., 2024). In addition, there 

were slightly higher rates of on-task behavior observed during Scores for Isable, Hector and 

18 

 
 
Louis, which is similar to the findings from Sharpe and Joslyn (2021), which found that the 

GBCC, or Scores, may be slightly more effective at increasing on-task behavior, at least for most 

students. However, as stated in Tonel and colleagues (2010), this conclusion should be taken 

with caution given the significant amount of overlap in results between phases. Two of the target 

students in the study had diagnoses of ASD and received additional educational support through 

a Section 504 plan or IEP. While Isabel’s on-task behavior was generally high throughout the 

study, Justin’s on-task behavior decreased significantly across the study. These findings align 

with Wright and McCurdy (2011) and Jornevald and colleagues (2024), which found that while 

the GBG can benefit most students with special education needs in the general education setting, 

some students with more severe difficulties may need additional support to benefit from the 

intervention, such as one-to-one support or more frequent student-teacher interaction.  

Justin’s IEP included one-to-one paraprofessional support, which as a part of the behavior 

intervention plan, sometimes included redirection back to the task. During the study, however, 

his paraprofessional was asked to refrain from redirection. This was done to eliminate potentially 

confounding variables as we wanted to see the impact of the GBG on his on-task behavior 

without additional support. However, given the downward trend in his behavior over the course 

of the study and the student’s documented need for support, it may have been beneficial to 

include the one-to-one support to see if that would address issues with the game and improve on-

task behavior. One potential reason his behavior did not improve in the GBG is due to the 

generalized reinforcer that was used as a reward for winning the game. Previous research has 

suggested the use of rewards may have an impact on the magnitude of effect (Flower et al., 

2014). This suggests that a student’s preferences and reinforcers, such as allowing students to 

choose the rewards and timing of rewards, should be considered when making decisions about 

19 

 
 
which intervention to implement. The rules of the game themselves may have an impact on 

effectiveness, especially for students with ASD. During every session, every team won, including 

whatever team Justin was on, which indicates that while Justin was following the rules, his rule 

following did not correlate to being on-task. Careful consideration should be given to the rules 

selected for the GBG to ensure they align with the goals of the intervention.  

Given the similarities in on-task behavior across both versions of the GBG, there are 

several other factors for consideration when deciding which version of the game to implement. 

The method to promote behavior change varied depending on which version of the game was 

being played. During Strikes, a positive punishment contingency for rule violations was 

established by using corrective feedback when issuing “strikes.” Differential reinforcement of 

low rates of behavior provided reinforcement for fewer rule violations, similar to the original 

study by Barrish and colleagues (1969). Conversely, Scores relied on a positive reinforcement 

contingency delivered through praise and awarding “scores” for rule following. Despite these 

differences, there was little difference in impact of on-task behavior. This implies that when 

deciding which version to implement, consideration should be given to the procedure used and 

its alignment with PBIS goals, which are to create positive learning environments for all students 

by teaching and reinforcing positive behaviors (Center on PBIS, 2025). 

Additionally, ease of implementation should be considered. In this study, the teacher rated 

both interventions equally in regard to ease of implementation on the Likert scale, but that 

Strikes was slightly more manageable, due to how often a teacher scans for rule violations or rule 

following. This suggests the frequency at which a teacher has to stop providing academic 

instruction in order to check for rule violations or rule following may impact a teacher’s 

preference for one intervention over the other. In fact, Tonal and colleagues (2010) found slightly 

20 

 
 
better rule following occurred during the reinforcement version of their study, during which the 

teacher was responsible for determining the number and pace of checking.  In addition, Jornevald 

and colleagues (2024) found that implementation fidelity had an impact on effectiveness of the 

intervention. Given that teachers are more likely to implement an intervention that is easy and 

manageable, it may make sense for teachers to choose the version of the game that they are more 

likely to implement with high fidelity.   

Another consideration for implementation is teacher and student preference for the game. 

Social validity data indicates both interventions are acceptable to students who participate in the 

game and teachers who implement it. This finding is supported by past comparison studies 

(Sharpe and Joslyn, 2021; Whal et al., 2016; Wiskow et al., 2021; Wright and McCurdy, 2011). 

The mixed preferences among the wider class, the target students, and the teacher indicate that 

preferences between interventions are idiosyncratic and likely depend on the classroom context. 

This notion is supported by past research as well. Wiskow and colleagues (2021) found that three 

of the four participating teachers preferred the GBG version with corrective feedback to the 

CBGG which utilized positive reinforcement, each identifying different reasons for their 

preference, including the immediacy of impact the teachers observed, how natural the 

intervention felt, and how convenient the monitoring system was in one version (CBGG) vs. the 

other. Joslyn and Sharpe (2021) found a teacher preference for the GBG (Strikes) and a slight 

student preference for the CBGG (Scores), which was further supported in the student’s free 

responses. They also found that some students felt like they were “in trouble” when a Strike was 

earned, even though they only received corrective feedback and points. Extra care should be 

taken by implementers to explain that earning a “strike” is not getting in trouble (Sharpe & 

Joslyn, 2021).  

21 

 
 
Decisions on which intervention to implement may also largely depend on the context. 

The teacher indicated that she would likely continue implementing the Scores version, however 

she indicated that one barrier to continued implementation was identifying a better system to 

prompt her to check for rule following. Continued use of the private audio interval timer was 

offered, however the teacher did not wish to continue to use an earpiece. Other suggestions to 

prompt regular checking for rule following include visual reminders at the back of the room, the 

use of haptic feedback through a smart watch or MotivAider timekeeping device, or setting a 

specific number of checks to complete prior to playing while not specifically adhering to a set 

interval (Tonel et al., 2010). Given the little difference in intervention effects on on-task behavior 

for both Strikes and Scores, this study supports the notion that teacher preference, classroom 

context, and alignment with PBIS goals should be the primary considerations when deciding 

which variation of the GBG to implement.  

There were several potential limitations to this study. First, some aspects of the game, 

such as team composition and student seating arrangements, did not remain constant across 

phases, making it more difficult to draw conclusions regarding the effectiveness of the GBG on 

the targeted behavior. Based on observation, it appeared that a participant’s proximity to the 

teacher may have had an impact on students’ on-task behavior, meaning students who were 

closer to the teacher often appeared on-task more frequently. Another potential confound was 

that there were minor changes made to both versions of the GBG regarding the frequency in 

which the teacher scanned the room across phases. However, these differences did not appear to 

impact on the results of this study but could limit direct comparisons between phases. Another 

limitation was the failure to collect proficiency data on the teacher during training; doing so 

would have provided comparison data on how quickly the teacher was able to learn each 

22 

 
 
  
intervention and provide an objective measure of readiness. There was also a significant change 

in the time of day the intervention was implemented and material being taught across the study 

due to the applied nature of the study. There were also differences in the amount of instructional 

time vs. independent work time while the game was played. While several studies have looked at 

use of the GBG during various parts of the school day, there is little research to compare its use 

during instructional time vs independent work time, or compared to specific subjects being 

taught. Finally, there were several unplanned changes in the school schedule, unexpected 

absences, assemblies, and extended breaks that may have had an impact on results. For example, 

the last two sessions of Scores 2 and the four sessions of Baseline 2 were conducted after 

students returned from winter break. Baseline 2 showed significantly lower rates of on-task 

behavior than in Baseline 1, which could be attributed to having to readjust to returning to school 

after an extended break. 

One recommendation for future research is to incorporate participant feedback and 

preference in identifying reinforcement or prizes for winning the game. This study used the 

generalized reinforcement system utilized throughout the school, which may not have served as a 

true reinforcer for some participants. In addition, the tickets could only be redeemed once a 

month, so the schedule in which they could be exchanged for back up reinforcers may not have 

been dense enough for some students. When a universal intervention such as the GBG is found 

not to be effective for a student, both fidelity of implementation and the type of reinforcement in 

use should be examined (Flower et al., 2014; Humphrey et al., 2021). As fidelity of 

implementation during this study was high in each phase, the type of reinforcement used may not 

have been a great match for all learners.  

23 

 
 
 
Another recommendation would be to consider other experimental designs. The 

ABACBC design used by Tonal and colleagues (2010) would have allowed for more concrete 

conclusions to be made about individual intervention effectiveness for each participant, which is 

important when considering the different profiles of the participants included in this study. 

Additionally, an Alternating Treatment Design like what was used in Sharpe and Joslyn (2021) 

may also been more appropriate as it allows for a quicker comparison with fewer phases and 

sessions. Additional considerations include keeping more components consistent between 

phases, such as team composition or student seating assignment, consistency in time of day, 

classroom activity, etc.  

To conclude, this study found that for most students, there was slightly better rates of on-

task behavior in the Scores version of the game than the Strikes version. There was also a 

preference for Scores among both the students and the teacher, given the positive reinforcement 

aspect of that version. However, given that the difference between versions was very slight, 

teacher and student preference, classroom contextual fit, and alignment with PBIS goals should 

be considered when deciding which version of the GBG to implement. Additionally, 

modification considerations may need to be made to implementation for individuals with more 

support needs. While this study continues to support the strong effectiveness of the GBG for 

general education students and settings, as a universal support, it does not necessarily have a 

universal impact.  

24 

 
 
 
 
 
REFERENCES 

Barrish, H. H., Saunders, M., & Wolf, M. M. (1969). Good behavior game: effects of individual 
contingencies for group consequences on disruptive behavior in a classroom. Journal of 
Applied Behavior Analysis, 2(2), 119–124. https://doi-
org.proxy1.cl.msu.edu/10.1901/jaba.1969.2-119 

Cameron, C., Connor, C., Morrison, F., & Jewkes, A. (2008). Effects of classroom organization 
on letter-word reading in first grade. Journal of School Psychology, 6, 173–192. 
doi:10.1016/j.jsp.2007.03.002 

Center on PBIS (2025). Positive Behavioral Supports & Interventions. Retrieved March 17, 

2025, from www.pbis.org 

Embry, D. D. (2011). Behavioral vaccines and evidence-based kernels: Non-pharmaceutical 

approaches for the prevention of mental, emotional and behavioral disorders. The 
Psychiatric Clinics of North America, 34(1), 1–
34. https://doi.org/10.1016/j.psc.2010.11.003. 

Flower, A., McKenna, J. W., Bunuan, R. L., Muething, C. S., & Vega Jr, R. (2014). Effects of the 
Good Behavior Game on challenging behaviors in school settings. Review of educational 
research, 84(4), 546-571. 

Humphrey, N., Panayiotou, M., Hennessey, A., & Ashworth, E. (2021). Treatment effect 

modifiers in a randomized trial of the Good Behavior Game during middle childhood. 
Journal of Consulting and Clinical Psychology, 89(8), 668–681. 
https://doi.org/10.1037/ccp0000673 

Individuals with Disabilities Education Act, 20 U.S.C. § 1400 (2004) 

Johansson, M., Biglan, A. & Embry, D. The PAX Good Behavior Game: One Model for 

Evolving a More Nurturing Society. Clin Child Fam Psychol Rev 23, 462–482 (2020). 
https://doi.org/10.1007/s10567-020-00323-3 

Jornevald, M., Pettersson‐Roll, L., & Hau, H. (2024). The Good Behavior Game for students 

with special educational needs in mainstream education settings: A scoping review. 
Psychology in the Schools, 61, 861–886. https://doi.org/10.1002/pits.23086 

Noltemeyer, A., Palmer, K., James, A. G., & Wiechman, S. (2019). School-wide positive 

behavioral interventions and supports (SWPBIS): A synthesis of existing 
research. International Journal of School & Educational Psychology, 7(4), 253-262. 

Oliver, R. M., Wehby, J. H., & Reschly, D. J. (2011). Teacher classroom management practices: 

Effects on disruptive or aggressive student behavior. Campbell Systematic Reviews, 7(1), 
1-55. 

25 

 
 
Tanol, G., Johnson, L., McComas, J., & Cote, E. (2010). Responding to rule violations or rule 

following: A comparison of two versions of the Good Behavior Game with kindergarten 
students. Journal of School Psychology, 48(5), 337-355. 

SciSho Psych. (2019, September 9). The Good Behavior Game [Video]. YouTube. 

https://www.youtube.com/watch?v=Nc0Tw6ISYKk 

Sharpe, A. N., & Joslyn, P. R. (2021). Correspondence of product and topographical behavior 
measures during a comparison of Good Behavior Game arrangements. Education and 
Treatment of Children, 44(4), 215-231. 

Shinn, M. R., Ramsey, E., Walker, H. M., Stieber, S., & O'Neill, R. E. (1987). Antisocial 

behavior in school settings: Initial differences in an at-risk and normal population. The 
Journal of Special Education, 21, 69–84. 

Wahl, E., Hawkins, R. O., Haydon, T., Marsicano, R., & Morrison, J. Q. (2016). Comparing 

Versions of the Good Behavior Game: Can a Positive Spin Enhance Effectiveness? 
Behavior Modification, 40(4), 493-517. https://doi.org/10.1177/0145445516644220 

Wiskow, K. M., Urban-Wilson, A., Ishaya, U., DaSilva, A., Nieto, P., Silva, E., & Lopez, J. 

(2021). A comparison of variations of the good behavior game on disruptive and social 
behaviors in elementary school classrooms. Behavior Analysis: Research and Practice, 
21(2), 102–117. https://doi.org/10.1037/bar0000208 

Wright, R. A., & McCurdy, B. L. (2012). Class-wide positive behavior support and group 

contingencies: Examining a positive variation of the good behavior game. Journal of 
Positive Behavior Interventions, 14(3), 173-180. 

26 

 
 
APPENDIX 

Table 1 

Procedural Fidelity Checklist for Strikes and Scores 

Strikes  

Scores 

1.  Divide students into equal teams 
2.  Write team names on board 
3.  Remind students of the rules. Write 

1.  Divide students into equal teams 
2.  Write team names on board 
3.  Remind students of the rules. Write 

them on board 

them on board 

4.  State the point threshold needed to 

4.  State the point threshold needed to 

win 

5.  Describe the reward for winning 
6.  Ask students if they have any 
questions before beginning.  

7.  Announce the start of the game and 
begin interval time keeping device. 
Prompt students to begin 
assignment/lesson 

8.  For Strikes 1: Listen for the chime of 
the interval time keeping device and 
deliver a “Strike” when a rule is 
broken on the chime (every 30 
seconds). For Strikes 2: Deliver a 
“Strike” whenever a rule is broken.  
9.  Deliver feedback – call out team and 

behavior, not the individual 
10. Play the game for 10 minutes 
11. Add up points and announce winders 
12. Distribute rewards 

win 

5.  Describe the reward for winning 
6.  Ask students if they have any 
questions before beginning.  

7.  Announce the start of the game and 
begin interval time keeping device. 
Prompt students to begin 
assignment/lesson 

8.  Listen for the chime of the interval 
time keeping device and deliver a 
“Score” for each team with everyone 
following the rules (For Scores 1: 
every 30 seconds, For Scores 2: every 
~45 seconds) 

9.  Deliver feedback – call out positive 

team behavior, not individual behavior 

10. Play the game for 10 minutes 
11. Add up points and announce winners 
12. Distribute rewards 

27 

 
 
 
 
 
 
Table 2  

Teacher Social Validity Survey Questions and Responses 

Teacher Social Validity Survey 

Survey Question 
Which version of The Good Behavior Game did 
you prefer to implement (Scores or Strikes), and 
why? Please be as specific as possible.  
Are you likely to continue implementing either 
version of the game after the end of the study? 
Reflect on the Strikes version of the game. How would you rate each of the following on a scare 
of 1 to 5, 1 being a low score or more difficult rating, and 5 being a high score or easier rating? 

"Scores" because it focused on the positive.  
However, "strikes" was easier to implement. 

Yes, I will likely use the Scores version of the 
game 

Teacher Response 

Ease of Implementation 
Student Engagement 
Effect on Behavior 
Manageability 

5 
4 
5 
5 

Reflect on the Scores version of the game. How would you rate each of the following on a scare 
of 1 to 5, 1 being a low score or more difficult rating, and 5 being a high score or easier rating? 

Ease of Implementation 
Student Engagement 
Effect on Behavior 
Manageability 

What challenges or barriers did you encounter 
while implementing either version of The Good 
Behavior Game in your classroom? 

Lastly, is there anything else you would like to 
share about your experience with The Good 
Behavior Game or your participation in the 
study? 

5 
5 
5 
4 
It was easier to do it when I had regular 
reminders to check.  Implementing it without 
the prompt is trickier because I often forget to 
add points. 

I enjoyed participating in the study!  It was 
interesting to see the data collected.  Thanks! 

28 

 
 
 
 
 
 
Figure 1 

Mean Percent of Intervals On-Task 

Mean Percent of Intervals On-Task

k
s
a
T
-
n
O
s
l
a
v
r
e
t
n
I

f
o
t
n
e
c
r
e
P
n
a
e

M

100%

90%

80%

70%

60%

50%

40%

30%

20%

10%

0%

Justin

Isabel

Hector

Louis

Baseline 1

Strikes 1

Scores 1

Strikes 2

Scores 2

Baseline 2

Phase

29 

 
 
 
 
 
 
 
 
 
 
Figure 2 

Justin’s Percent of Intervals On-Task 

Justin's Percent of Intervals On-Task

Baseline  1

Strikes 
1

Scores 
1

Strikes
2

Scores
2

Baseline  2

k
s
a
T
-
n
O
s
l
a
v
r
e
t
n
I

f
o
t
n
e
c
r
e
P

100%

90%

80%

70%

60%

50%

40%

30%

20%

10%

0%

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Session Number

30 

 
 
 
 
 
 
 
 
Figure 3 

Isabel’s Percent of Intervals On-Task 

Isabel's Percent of Intervals On-Task

k
s
a
T
-
n
O
s
l
a
v
r
e
t
n
I

f
o
t
n
e
c
r
e
P

100%

90%

80%

70%

60%

50%

40%

30%

20%

10%

0%

Baseline  1

Strikes 1

Scores 1

Strikes 2

Scores 2

Baseline  2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Session Number

31 

 
 
 
 
 
 
Figure 4 

Hector’s Percent of Intervals On-Task 

Hector's Percent of Intervals On-Task

k
s
a
T
-
n
O
s
l
a
v
r
e
t
n
I

f
o
t
n
e
c
r
e
P

100%

90%

80%

70%

60%

50%

40%

30%

20%

10%

0%

Baseline  1

Strikes 1

Scores 1

Strikes 2

Scores 2

Baseline  2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Session Number

32 

 
 
 
 
 
 
 
 
Figure 5 

Louis’ Percent of Intervals On-Task 

Louis' Percent of Intervals On-Task

k
s
a
T
-
n
O
s
l
a
v
r
e
t
n
I

f
o
t
n
e
c
r
e
P

100%

90%

80%

70%

60%

50%

40%

30%

20%

10%

0%

Baseline  1

Strikes 1

Scores 1

Strikes 2

Scores 2

Baseline  2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Session Number

33