STRIKES VERSUS SCORES: A COMPARISON OF TWO VERSIONS OF THE GOOD BEHAVIOR GAME AND THE IMPACT OF INTERVENTION ON GENERAL EDUCATION STUDENTS AND STUDENTS WITH SPECIAL EDUCATION NEEDS By Travis Lunsford A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Applied Behavior Analysis – Master of Arts 2025 ABSTRACT Disruption in classrooms presents challenges to both teaching and learning. A common and well-researched Tier 1 support to address disruption is the Good Behavior Game (GBG); however, little research exists on the impact of this intervention on students with special learning needs in a mainstream general education setting. This study utilized a single case research design replicated across four students, two of which have autism spectrum disorder (ASD), to determine the differential effects of two different versions of the GBG. Visual analysis of intervention results was used to draw conclusions about differences in effectiveness in increasing on-task student behavior. Results suggest little difference in effectiveness for the two general education target students, and mixed results for the two target students with ASD. This study suggests that more research is needed on the effects of universal interventions on students with special learning needs to understand what modifications may need to be made to improve effectiveness. Implications for practice and other suggestions for future research are also discussed. Keywords: Good Behavior Game, Caught Being Good Game, Universal Supports, Neurodivergence, Autism, Classroom Management TABLE OF CONTENTS LIST OF SYMBOLS AND ABBREVIATIONS ...................................................................... iv Introduction ..................................................................................................................................1 Method .........................................................................................................................................6 Results ..........................................................................................................................................15 Discussion ....................................................................................................................................18 REFERENCES ............................................................................................................................25 APPENDIX ..................................................................................................................................27 iii AIR® ASD CBGG ELA GBG PAX PBIS LIST OF SYMBOLS AND ABBREVIATIONS American Institute of Research Autism Spectrum Disorder Caught Being Good Game English Language Arts Good Behavior Game Peace, Productivity, Health, and Happiness Positive Behavior Interventions and Supports iv Introduction Challenging, disruptive, and off-task behavior can have detrimental impacts on both teaching and learning (Cameron et al., 2008). The time spent addressing disruptive behaviors is often to the detriment of academic instruction (Oliver et al., 2011). Students who engage in disruptive behaviors often have lower academic achievement and score lower on standardized testing (Oliver et al., 2011; Shinn et al., 1987). Recently, there has been a move towards inclusion in education, meaning students who engage in challenging, disruptive, or off-task behaviors may be in general education settings due to legislature requiring students be placed in the least restrictive environment (Individuals with Disabilities Education Improvement Act, 2004). This means that general education classroom teachers may be required to address a range of behavioral needs in addition to providing academic instruction. Given the multitude of behaviors that might be observed across the school day, a comprehensive framework or strategy for preventing and addressing challenging behaviors is necessary (Flower et al. 2014). Many schools address disruptive behaviors by adopting a school-wide framework called Positive Behavior Interventions and Supports (PBIS; Center on PBIS, 2025). PBIS uses a three- tiered approach to provide support and services and promote positive behaviors using positive behavioral interventions. Tier 1 consists of interventions and strategies that are utilized by all students and are often the only behavioral supports used by approximately 80% of students to be successful. These are often referred to as universal supports and include strategies such as posting visuals that promote school rule following or the implementation of school-or-class-wide reinforcement systems (Center on PBIS, 2025; Noltemeyer et al., 2018). Tier 2 is meant to address the needs of the next 10-15% of students who need more intensive support than what they received in Tier 1. Interventions at Tier 2 include additional explicit teaching on social, 1 behavioral, or academic skills, or academic supports, such as individualized token economy systems, self-management strategies, and other individualized visual supports (Center on PBIS, 2025; Noltemeyer et al., 2018). Tier 3 supports are intended for the remaining 1-5% of students and are implemented after Tier 1 and Tier 2 supports have been shown not to be effective (Center on PBIS, 2025; Noltemeyer et al., 2018). Supports at Tier 3 include multi-disciplinary teams, behavioral expertise, and the collection and analysis of behavioral data (Center on PBIS, 2025). Regardless of which level of support a student receives, in a PBIS school, all students in the general education setting receive Tier 1 universal supports and class-wide interventions. One example of a universal support is The Good Behavior Game (GBG), which is a classroom management strategy with over 50 years of support (Jornevald et al., 2024). The GBG is a classroom management strategy that utilizes an interdependent group contingency in which students are divided into even teams. Rules are established and shared with the class, and depending on the version of the game being played, a teacher will award or remove points for rule violations or rule following. A threshold of points is determined and any team with either more than or less than the threshold may win a daily or weekly prize. In 1969, Barrish and colleagues first introduced the GBG in an effort to reduce disruptive behavior, including out- of-seat behavior and talking-out behavior, in a fourth-grade classroom (Barrish et al., 1969). The class was split into teams and points were awarded when individual students on that team violated the classroom rules. If a team had fewer than five points at the conclusion of the game, they could earn classroom privileges, such as lining up early or free time at the end of the day. The game was successful, in that it significantly reduced both behaviors that were targeted. The game used a differential reinforcement for low rates of behavior approach to behavior change, whereby low rates of behavior were reinforced and higher rates of behavior were not. 2 Since its introduction, the GBG, or some variation of it, has been studied to address a multitude of behaviors, including vocal disruptions, such as talking-out-of-turn (Barrish et al., 1969; Sharpe & Joslyn, 2021; Whal et al., 2016; Wiskow et al., 2021; Wright and McCurdy, 2011), classroom rule-following and classroom rule-breaking (Tanol et al., 2010), on-task behavior (Sharpe & Joslyn, 2021; Wright & McCurdy, 2011), work completion and work accuracy (Sharpe & Joslyn, 2021), the delivery of teacher praise and teacher reprimands or corrective statements (Tanol et al, 2010; Whal et al., 2016; Wiskow et al., 2021), student peer prompting, and positive and negative student interactions (Wiskow et al., 2021). Some versions of the GBG have been formalized. The PAX Good Behavior Game® and American Institute of Research® (AIR) model are two common manualized versions of the GBG that have very different proprietary approaches. In PAX, the GBG is paired with the teaching of evidence-based practices taught and used throughout the school day, such as written praise notes, the use of timeout, and other antecedent and consequence manipulations (Embry, 2011). The game is played for very short periods to start (approximately 3 minutes) and gets longer as students demonstrate success with the game. Additionally, the reward in the PAX version is often the ability to engage in a classroom activity that would not otherwise be allowed, such as a short dance party, as opposed to tangible rewards (Johansson et al., 2020). The AIR® model follows closer to the original GBG proposed by Barrish and colleagues, using predetermined rules that are established by the teacher, and limits teacher interaction only to the times where corrective feedback is given, promoting peer accountability among the students themselves (Jornevald et al., 2024). Several other variations of the GBG have also been studied in peer-reviewed research. Tanol and colleagues (2010) looked at two versions of the GBG, one that utilized response cost 3 during which students in a kindergarten classroom could lose stars for violating the rules, and another that utilized positive reinforcement during which students earned stars for following the rules. Rule violations were monitored continuously whereas rule-following was monitored using momentary time sampling on a schedule set by the teacher. This study found that both versions were effective at reducing rule violations, however a slightly larger reduction was observed in the reinforcement version of the game (Tanol et al., 2010). Wright and McCurdy (2011) were the first to directly compare the GBG without response cost to a version of the GBG that utilized positive reinforcement, which they called the Caught Being Good Game (CBGG). Their study was conducted with both a kindergarten and fourth-grade general education classroom. Their results indicated that both game versions were effective at reducing classroom disruptions, but neither intervention demonstrated a functional relation to on-task behavior. More recently, Sharpe and Joslyn (2021) conducted a study similar to Wright and McCurdy (2011), comparing the GBG to the CBGG. Their study was conducted with at-risk middle school students in the process of transitioning from an alternative education setting back to mainstream education. They referred to their game versions as Strikes (GBG) and Scores (CBCC) in order to differentiate between the two conditions. Their results indicated a slightly higher reduction in disruption in the Strikes condition, a slightly higher rate of on-task behavior in the Scores condition, and variable results for work completion and accuracy. Despite the large body of literature in support of the GBG, according to a recent literature review conducted by Jornevald and colleagues (2024), there is a significant lack of research on the effects of the GBG for students with special education needs in general education classrooms. Most studies of the GBG have been conducted with general education students only or with special education students only in special education settings. They recommend future research 4 include participants with disabilities such as autism spectrum disorder (ASD) or attention deficit hyperactivity disorder, since it is becoming more common that students with these disabilities are taught in general education classrooms. Jornevald and colleagues (2024) argue that schools should prioritize the utilization of universal interventions such as the GBG but also recognize that universal interventions may need modification to benefit all students, including the highest- risk students. Therefore, the purpose of the current study was to extend the study conducted by Sharpe and Joslyn (2021) by including students with special education needs in a general education setting. In this study, the differential effects to on-task behavior were assessed for two versions of the GBG; Strikes which utilized a positive punishment procedure to differentially reinforce low rates of the target behavior, and Scores, which utilized a positive reinforcement procedure with praise provided for rule following behavior. The research question addressed in this study is two- fold; (1) what impact does each version, Strikes or Scores, have in regard to on-task behavior and (2) what impact does each version have for students with ASD in the general education setting? 5 Participants and Setting Method Participants in the study included a fourth-grade general education classroom teacher and four 4th grade students, three males and one female, who were reported by the teacher to have been frequently off task, disruptive, and not always paying attention during classroom lessons. Preliminary observations were also conducted in the classroom to observe student behavior prior to the study to determine if the recommended students did display frequent disruptive and off- task behaviors. Justin was a 9-year-old White male with an Individualized Education Program (IEP) under the eligibility category of autism spectrum disorder (ASD) and received Tier 3 supports. His IEP included Speech, Social Work, and Occupational Therapy services, a behavior intervention plan, one-on-one paraprofessional support, and accommodated assignments, which included a reduction in the number of academic demands. His paraprofessional was responsible for assisting the teacher in accommodating assignments, serving as a scribe when asked, supporting him during breaks, and redirecting him back to tasks as needed. During the study, his paraprofessional refrained from redirecting him back to the task; however, assignments were still accommodated, and he had the option to request a scribe or a break if needed. Isabel was a 9- year-old Hispanic female with a 504-plan due to a medical diagnosis of ASD and received Tier 2 support. Her accommodations included scheduled and as-needed movement breaks and additional 1-on-1 instruction with the teacher. Hector and Louis were both 9-year-old White male general education students receiving only Tier 1 supports. The teacher was a White female with over 30 years of experience in teaching elementary education. She had experience implementing classroom-wide reward systems in the past, but had not yet done so with this class, nor had she implemented any variation of the GBG. 6 This study was conducted in a 4th grade general education classroom at a suburban elementary school within a district near a Midwest capital city. The classroom had 26 students, 12 male, and 14 females. The classroom was in a rectangular shape, with individual student desks arranged side-by-side forming 3 columns of 4 rows, with the first column including a 5th row. The desks faced a wall-sized whiteboard on which the teacher projected lesson materials, videos, reading passages, or math problems required for the lesson. The teacher mostly taught from the whiteboard but would occasionally wander around the room to assist students or check student engagement. The researcher and second observer were positioned in the back of the room, opposite the whiteboard, behind the middle column of desks where a clear line of sight could be established for each participant. For sessions 1 through 8, the study was conducted during English Language Arts (ELA), however, due to a change in instruction style necessary for their ELA curriculum, the study took place during Math instruction and work time for the remaining sessions. The study was not conducted if the students were taking an assessment; had a planned interruption such as a fire-drill; there was a substitute present; or more than half of the target students were absent. Materials Materials for this study included two android tablets, two pairs of wireless Bluetooth earbuds, an interval timer application with customizable interval timer, and writing utensils. Tickets for the school-wide reinforcement system were used as the prize for winning the game, which was a quarter-sheet sized colorful piece of paper with the school’s Positive Behavior Interventions and Supports (PBIS) rules printed on them and the teacher’s signature. The tickets were saved and could be redeemed at the school’s monthly prize store for rewards including novelty toys, school supplies, pizza party passes, or gift cards for local restaurants and activities. 7 Dependent Variables and Measurement The researcher worked with the classroom teacher to define the behaviors targeted in the study. On-Task was defined as engaging in behaviors that directly support participation in the assigned lesson or task. This included behaviors such as maintaining eye contact with instructional materials (the teacher, their textbook, a worksheet, the whiteboard, etc.), writing or typing as part of the assignment, asking or answering relevant questions with permission to talk, and following teacher instructions. Non-examples involved engaging in activities not related to the assignment (such as reading an unrelated book), looking around the room, turning around in their seat, talking to peers about topics not related to the assignment, or interacting with objects not relevant to the assignment. Off-Task was defined as engaging in behaviors that do not directly support participation in the assigned lesson or task. This included behaviors such as gazing off, putting their head down, or otherwise not being oriented towards the lesson materials or instructor; drawing, writing or reading materials that are not a part of the assigned lesson; talking without permission or about a topic not related to the lesson; and/or being out of their seat unless required by the lesson. Non-examples included engaging in appropriate choice time activities if instructed and approved by the teacher. Target behavior was measured using a 10-s momentary time sampling system. The 10-min game was divided into 60, 10-s intervals. Using the interval timer application on one of the tablets and an ear bud, when the 10-s timer went off, the researcher would scan the room in a pre-determined pattern (Justin, Isabel, Hector, then Louis), determine if the target student was on-task or off-task, and mark it on the data sheet. Experimental Design This study utilized an ABCBCA design. A baseline condition was followed by a condition called Strikes 1 (B), then Scores 1 (C). Since the study intended to compare effects of the two 8 interventions against each other, Strikes 2 was then reintroduced, followed by Scores 2, and ending with a final baseline phase to assess the impact of intervention overall. Each phase continued until a stable trend was observed in the data for at least two of the target students. Like the Sharpe and Joslyn (2021) study, this study attempted to demonstrate an applied comparison of the interventions. While the rules in each game version remained the same across the study, the criteria for winning the game was not yoked, meaning some elements of each intervention, such as the number of points needed to win or the frequency of scanning for rule following or violations were not necessarily consistent from phase to phase. Procedure Pre-Teaching The first author and teacher met before school hours a total of five times before implementation of the first intervention for pre-teaching. During the first meeting, the teacher was provided with an overview of the study, general information about the format of Strikes and Scores, and a video overview of the Good Behavior Game was shown (SciSho Psych, 2019). During subsequent meetings, the teacher was given opportunities to practice each step of implementation without students present and feedback was provided until the teacher demonstrated proficiency with each version of the intervention. Baseline During baseline, the teacher was instructed to provide instruction and manage the class as she typically would. She was told to redirect students as needed and had the option to provide reward tickets to reward positive classroom behavior, although this was never observed. Baseline 1 occurred during ELA instruction and Baseline 2 occurred during Math instruction. The session began when the teacher began instruction and lasted for 10 minutes, even if students began to 9 work independently during that time. Data was collected on the target behaviors in the manner described above. General Procedures Across all versions of the game, the teacher divided the classroom into five teams of four and one team of six. In an effort to continue the teacher’s regular classroom practices, after each phase, the teacher changed the composition and seating arrangements of the teams; however, to avoid one team having an advantage over another, only one target student was on any team at any time. At the beginning of the game, the teacher would remind the students of the rules of the game, which remained consistent throughout the study. Rule one was to raise their hand before talking and rule two was to remain facing forward in their seat. The rules were written on the classroom white board, along with numbers that corresponded to the teams, with space below each number to keep a tally score. The point threshold needed to win was also written on the whiteboard. The students were reminded that if their team won, each student on the team would earn a ticket for their school-wide reinforcement system. Strikes 1 During the Strikes phases, the teacher was instructed to provide academic instruction as she typically would but this time, play Strikes with the students. During the first Strikes phase, the teacher wore an ear bud connected to the second android tablet and interval timer that beeped every 30 s, which prompted the teacher to look for instances of rule breaking behavior. If a student broke a rule while she scanned the room, the teacher would record a “strike” by adding a tally next to the corresponding team and say something like, “Someone on team 1 wasn’t raising their hand. That’s a strike,” before returning to the lesson. In order to win, a team needed to have fewer than four “strikes,” a threshold that was determined after averaging data collected in 10 baseline. If no team had fewer than the limit, then the team with the lowest number of strikes would win the prize. It was also possible for all teams to win. Strikes 2 Strikes 2 was implemented similarly to Strikes 1, except that the ear bud was not used. Instead, the teacher was instructed to continuously scan and record any instance of a rule violation. This was because during Strikes 1, each team won every game, indicating that the teacher may have missed rule breaking behavior between the 30-s intervals. However, reducing the length of the interval to below 30-s would have been too disruptive to the flow of instruction, which is why the teacher was instructed to record any instance of rule breaking behavior instead. The rules, prize, and point threshold needed remained the same as in Strikes 1. Scores 1 During the Scores phases, the teacher was instructed to provide academic instruction as she typically would but this time, play Scores with the students. In addition to reviewing the rules and prize for winning, the students were told the threshold number of points needed to win, which was 16. It was determined by calculating the total number of opportunities to score, which was 20, and subtracting the average number of rule violations observed during baseline, which was four. The teacher wore an ear bud connected to the second android tablet and an interval timer that prompted the teacher every 30-s with a short beep. When the interval timer went off, the teacher would scan the room to identify students who were following the rules. Teams with all members following the rules at the time the teacher checked received a “score” and a positive call out from the teacher, such as, “Nice job following the rules, team 2! That’s a score.” Any team with at least 16 points won the game, meaning it was possible for all teams to win. If none of the teams hit the winning threshold, the team with the highest score won. 11 Scores 2 Scores 2 was implemented similarly to Scores 1, except the interval length for the teacher’s momentary time sampling was changed to 45-s. This was done to address concerns presented by the teacher during Scores 1 that the 30-s interval interfered with the pace and flow of her instruction. Because the interval length was changed but the game length was not, the number of opportunities to earn points decreased, as did the threshold of points needed to win. The threshold of points needed to win in this phase was 10. It was calculated by determining the number of opportunities to earn points during the game rounded to the next whole number, which was 14, and subtracting the average number of rule violations observed during baseline, which was four. Interobserver Agreement and Procedural Fidelity Interobserver agreement (IOA) was assessed by having a second observer separately and independently score for at least 40% of sessions in-person during each phase of the study. IOA was measured on an interval-by-interval basis. Agreement was calculated by adding the total number of agreements and dividing by the total number of possible agreements, then multiplying by 100. IOA was recorded across 40 to 75% of all phases for all participants. For Justin, the average IOA across all sessions was 97% (range 93 to 100%). Isabel’s average IOA across all sessions was 95% (range 89 to 100%). Average IOA across all sessions was 97% (range 89 to 100%) for Hector. Average IOA across all sessions was 97% (range 88 to 100%) for Louis. Procedural fidelity was measured utilizing a procedural integrity checklist adapted from the Sharpe and Joslyn (2021) study during at least 33% of sessions in the Strikes and Scores phases. There were a total number of 12 steps on each checklist (see Table 1). Each item was scored as “yes” if it was observed or “no” if it was not observed. A percentage was obtained by 12 dividing the number of correctly implemented steps by the total number of steps possible. The mean procedural fidelity during Strikes 1 was 96% (range 83 to 100%). The mean procedural fidelity during Scores 1 was 100%. The mean procedural fidelity during Strikes 2 was 100%. The mean procedural fidelity during Scores 2 was 96% (range 92 to 100%). Social Validity Social validity was assessed through a post-study survey sent to the teacher and through an informal raise-of-hands survey of the classroom students. The class (both target students and non-target students) was asked if they liked playing Strikes or Scores and which they liked better. A total of 46% of the class indicated a preference for Scores, whereas 38% indicated a preference for Strikes, with approximately 15% of students either not expressing a preference or were absent at the time the poll occurred. Conversely, half of the target students indicated a preference for Strikes (Isabel and Louis), with one participant preferring Scores (Hector). Justin chose not to participate in the poll. The teacher was also asked which version of the game she liked better and why. Preference was indicated for Scores due to the positive reinforcement aspect, though she admitted Strikes was easier to implement. The teacher was also asked if she was likely to continue implementing either version of the game and she indicated that she would likely continue using Scores in some way. On a social validity survey (see Table 2), the teacher was asked to rate the ease of implementation, level of student engagement, perceived effect on behavior, and manageability from 1 to 5 on a Likert scale, with 1 indicating a low score or more difficult rating, and 5 being a high score or easier rating using a Microsoft Forms survey. The teacher rated both interventions as equal and high with regard to ease of implementation and effectiveness. She rated Strikes as slightly more manageable than Scores and found slightly more student engagement with Scores 13 than with Strikes. When asked about having anything to share about the experience, she shared that she enjoyed participating and found the data and data collection process interesting. 14 Results Results suggest that the functional relation between baseline and intervention is idiosyncratic, with little difference in effectiveness between interventions for three of the four target students. Figure 1 displays the mean rates of on-task behavior for all participants across phases. Justin’s data (Figure 2) was variable across all phases but trended downward as the study progressed. In Baseline 1, Justin’s on-task behavior trended downward with a mean rate of 54% (range 45 to 100%). When Strikes 1 was introduced, there was an initial increase in on-task behavior with an overall mean rate of 62% (range 48 to 87%); however, as the phase progressed, his on-task behavior declined. Justin’s mean rate of on-task behavior decreased by 29% to a mean of 33% (range 5 to 65%) when Scores 1 was introduced, with a slight upward trend across the phase. When Strikes 2 was introduced, Justin’s on-task behavior decreased another 8% to a mean rate of 25% (range 13 to 55%). When Scores 2 was introduced, Justin’s on-task behavior was similar to Scores 1, with a mean rate of 28% (range 3 to 47%). During a return to baseline, his mean rate of on-task decreased to 15% (range 8 to 23%). Visual analysis of Justin’s data indicated that neither intervention had an impact on increasing his on-task behavior. Isabel’s data is depicted in Figure 3. During Baseline 1, her mean rate of on-task behavior was 85% (range 73 to 93%). When Strikes 1 was introduced, her on-task behavior maintained at a similar rate of 86% (range 57 to 98%). Isabel’s mean rate remained constant at 86% (range 72 to 95%) when Scores 1 was introduced with an upward trend observed across the phase. When Strikes 2 was introduced, average on-task behavior dropped 9%, for a mean rate of 77% (range 65 to 93%) and a slight downward trend was observed across the phase. When Scores 2 was introduced, on-task behavior increased to a mean rate of 84% (range 67 to 93%). In Baseline 2, Isabel’s mean rate of on-task behavior decreased to 63% (range 28 to 82%). Relatively little 15 change was observed from phase to phase in Isabel’s data, indicating that neither intervention was more effective than the other at increasing her on-task behavior. Hector’s data is shown in Figure 4. In Baseline 1, Hector’s mean rate of on-task behavior was 70% (range 43 to 98%) and trended downward across the phase. When Strikes 1 was introduced, Hector’s mean rate of on-task behavior increased to 92% (range 75 to 100%). When Scores 1 was introduced, Hector’s mean rate increased slightly to 98% (range 92 to 100%). Hector’s mean rate remained constant at 98% (range 95 to 100%) when Strikes 2 was introduced with a flat trend across the phase. When Scores 2 was introduced, Hector’s mean rate of on-task behavior maintained at a similar rate of 96% (range 92 to 100%) with a flat trend across the phase. When the intervention was removed and conditions returned to baseline, Hector’s on-task behavior decreased by 42% to a mean rate of 54% (range 37% to 62%) with a downward trend observed across the phase. There was a significant change in the rate of response observed when the intervention was first introduced and again when completely removed, but relatively little change in on-task behavior observed between interventions (Strikes or Scores) indicating no difference in effectiveness between them. Louis’ data is shown in Figure 5. In Baseline 1, Louis’ mean rate of on-task behavior was 68% (range 27 to 95%) and trended upward across the phase. When Strikes 1 was introduced, Louis’ mean rate of on-task behavior increased slightly to 72% (range 23 to 98%) and overall trended upward across the phase with some variability observed. In Scores 1, Louis’ on-task behavior increased by 20% to a mean rate of 92% (range 77 to 98%). Louis was only present for two sessions in the Strikes 2 phase, with a mean rate of on-task behavior of 96% (range 95 to 100%). When Scores 2 was introduced, Louis’ mean rate of on-task behavior was 94% (range 85 to 100%). When the intervention was removed and conditions returned to baseline, Louis’ mean 16 rate for being on-task decreased slightly to 51% (range 30 to 70%). Given Louis’ absence and lack of sufficient data points across phases, a determination on a functional relation is inconclusive. 17 Discussion Given the paucity of research to support the use of the GBG, the study did not explicitly investigate the overall effectiveness of the intervention. Instead, the purpose of the current study was to compare two versions of the GBG, Strikes and Scores, to see if a punishment-based version of the game (Strikes) or a positive-reinforcement based version of the game (Scores) was more effective than the other. The results of the study did not demonstrate that one intervention was more effective at increasing on-task behavior than the other. However, there were slightly higher rates of on-task behavior observed during Scores than Strikes for three of the four participants. Although it is difficult to say with certainty given the research design, it appears that for most participants, some version of the GBG was more effective for improving on-task behavior than no intervention at all. However, this was not the case for one participant, whose behavior progressively worsened as the study progressed. Overall, results of the study are consistent with previous research, which suggest that a positive-reinforcement based version of the game may improve on-task behavior (Sharpe & Joslyn, 2021; Wiskow et al., 2021) and aligns with other research, which suggests multiple versions of the game are equally effective at reducing classroom disruptions (Wahl et al., 2016; Wright & McCurdy, 2011). The results of this study also support the claims by Jornevald and colleagues (2024) that the GBG may not be effective without additional supports for some students. For all target students, the mean percentage of intervals on-task increased when the intervention was first introduced and decreased once conditions returned to baseline. This supports previous research, which has demonstrated that universal supports, such as the GBG, can be an effective classroom management strategy (Jornevald et al., 2024). In addition, there were slightly higher rates of on-task behavior observed during Scores for Isable, Hector and 18 Louis, which is similar to the findings from Sharpe and Joslyn (2021), which found that the GBCC, or Scores, may be slightly more effective at increasing on-task behavior, at least for most students. However, as stated in Tonel and colleagues (2010), this conclusion should be taken with caution given the significant amount of overlap in results between phases. Two of the target students in the study had diagnoses of ASD and received additional educational support through a Section 504 plan or IEP. While Isabel’s on-task behavior was generally high throughout the study, Justin’s on-task behavior decreased significantly across the study. These findings align with Wright and McCurdy (2011) and Jornevald and colleagues (2024), which found that while the GBG can benefit most students with special education needs in the general education setting, some students with more severe difficulties may need additional support to benefit from the intervention, such as one-to-one support or more frequent student-teacher interaction. Justin’s IEP included one-to-one paraprofessional support, which as a part of the behavior intervention plan, sometimes included redirection back to the task. During the study, however, his paraprofessional was asked to refrain from redirection. This was done to eliminate potentially confounding variables as we wanted to see the impact of the GBG on his on-task behavior without additional support. However, given the downward trend in his behavior over the course of the study and the student’s documented need for support, it may have been beneficial to include the one-to-one support to see if that would address issues with the game and improve on- task behavior. One potential reason his behavior did not improve in the GBG is due to the generalized reinforcer that was used as a reward for winning the game. Previous research has suggested the use of rewards may have an impact on the magnitude of effect (Flower et al., 2014). This suggests that a student’s preferences and reinforcers, such as allowing students to choose the rewards and timing of rewards, should be considered when making decisions about 19 which intervention to implement. The rules of the game themselves may have an impact on effectiveness, especially for students with ASD. During every session, every team won, including whatever team Justin was on, which indicates that while Justin was following the rules, his rule following did not correlate to being on-task. Careful consideration should be given to the rules selected for the GBG to ensure they align with the goals of the intervention. Given the similarities in on-task behavior across both versions of the GBG, there are several other factors for consideration when deciding which version of the game to implement. The method to promote behavior change varied depending on which version of the game was being played. During Strikes, a positive punishment contingency for rule violations was established by using corrective feedback when issuing “strikes.” Differential reinforcement of low rates of behavior provided reinforcement for fewer rule violations, similar to the original study by Barrish and colleagues (1969). Conversely, Scores relied on a positive reinforcement contingency delivered through praise and awarding “scores” for rule following. Despite these differences, there was little difference in impact of on-task behavior. This implies that when deciding which version to implement, consideration should be given to the procedure used and its alignment with PBIS goals, which are to create positive learning environments for all students by teaching and reinforcing positive behaviors (Center on PBIS, 2025). Additionally, ease of implementation should be considered. In this study, the teacher rated both interventions equally in regard to ease of implementation on the Likert scale, but that Strikes was slightly more manageable, due to how often a teacher scans for rule violations or rule following. This suggests the frequency at which a teacher has to stop providing academic instruction in order to check for rule violations or rule following may impact a teacher’s preference for one intervention over the other. In fact, Tonal and colleagues (2010) found slightly 20 better rule following occurred during the reinforcement version of their study, during which the teacher was responsible for determining the number and pace of checking. In addition, Jornevald and colleagues (2024) found that implementation fidelity had an impact on effectiveness of the intervention. Given that teachers are more likely to implement an intervention that is easy and manageable, it may make sense for teachers to choose the version of the game that they are more likely to implement with high fidelity. Another consideration for implementation is teacher and student preference for the game. Social validity data indicates both interventions are acceptable to students who participate in the game and teachers who implement it. This finding is supported by past comparison studies (Sharpe and Joslyn, 2021; Whal et al., 2016; Wiskow et al., 2021; Wright and McCurdy, 2011). The mixed preferences among the wider class, the target students, and the teacher indicate that preferences between interventions are idiosyncratic and likely depend on the classroom context. This notion is supported by past research as well. Wiskow and colleagues (2021) found that three of the four participating teachers preferred the GBG version with corrective feedback to the CBGG which utilized positive reinforcement, each identifying different reasons for their preference, including the immediacy of impact the teachers observed, how natural the intervention felt, and how convenient the monitoring system was in one version (CBGG) vs. the other. Joslyn and Sharpe (2021) found a teacher preference for the GBG (Strikes) and a slight student preference for the CBGG (Scores), which was further supported in the student’s free responses. They also found that some students felt like they were “in trouble” when a Strike was earned, even though they only received corrective feedback and points. Extra care should be taken by implementers to explain that earning a “strike” is not getting in trouble (Sharpe & Joslyn, 2021). 21 Decisions on which intervention to implement may also largely depend on the context. The teacher indicated that she would likely continue implementing the Scores version, however she indicated that one barrier to continued implementation was identifying a better system to prompt her to check for rule following. Continued use of the private audio interval timer was offered, however the teacher did not wish to continue to use an earpiece. Other suggestions to prompt regular checking for rule following include visual reminders at the back of the room, the use of haptic feedback through a smart watch or MotivAider timekeeping device, or setting a specific number of checks to complete prior to playing while not specifically adhering to a set interval (Tonel et al., 2010). Given the little difference in intervention effects on on-task behavior for both Strikes and Scores, this study supports the notion that teacher preference, classroom context, and alignment with PBIS goals should be the primary considerations when deciding which variation of the GBG to implement. There were several potential limitations to this study. First, some aspects of the game, such as team composition and student seating arrangements, did not remain constant across phases, making it more difficult to draw conclusions regarding the effectiveness of the GBG on the targeted behavior. Based on observation, it appeared that a participant’s proximity to the teacher may have had an impact on students’ on-task behavior, meaning students who were closer to the teacher often appeared on-task more frequently. Another potential confound was that there were minor changes made to both versions of the GBG regarding the frequency in which the teacher scanned the room across phases. However, these differences did not appear to impact on the results of this study but could limit direct comparisons between phases. Another limitation was the failure to collect proficiency data on the teacher during training; doing so would have provided comparison data on how quickly the teacher was able to learn each 22 intervention and provide an objective measure of readiness. There was also a significant change in the time of day the intervention was implemented and material being taught across the study due to the applied nature of the study. There were also differences in the amount of instructional time vs. independent work time while the game was played. While several studies have looked at use of the GBG during various parts of the school day, there is little research to compare its use during instructional time vs independent work time, or compared to specific subjects being taught. Finally, there were several unplanned changes in the school schedule, unexpected absences, assemblies, and extended breaks that may have had an impact on results. For example, the last two sessions of Scores 2 and the four sessions of Baseline 2 were conducted after students returned from winter break. Baseline 2 showed significantly lower rates of on-task behavior than in Baseline 1, which could be attributed to having to readjust to returning to school after an extended break. One recommendation for future research is to incorporate participant feedback and preference in identifying reinforcement or prizes for winning the game. This study used the generalized reinforcement system utilized throughout the school, which may not have served as a true reinforcer for some participants. In addition, the tickets could only be redeemed once a month, so the schedule in which they could be exchanged for back up reinforcers may not have been dense enough for some students. When a universal intervention such as the GBG is found not to be effective for a student, both fidelity of implementation and the type of reinforcement in use should be examined (Flower et al., 2014; Humphrey et al., 2021). As fidelity of implementation during this study was high in each phase, the type of reinforcement used may not have been a great match for all learners. 23 Another recommendation would be to consider other experimental designs. The ABACBC design used by Tonal and colleagues (2010) would have allowed for more concrete conclusions to be made about individual intervention effectiveness for each participant, which is important when considering the different profiles of the participants included in this study. Additionally, an Alternating Treatment Design like what was used in Sharpe and Joslyn (2021) may also been more appropriate as it allows for a quicker comparison with fewer phases and sessions. Additional considerations include keeping more components consistent between phases, such as team composition or student seating assignment, consistency in time of day, classroom activity, etc. To conclude, this study found that for most students, there was slightly better rates of on- task behavior in the Scores version of the game than the Strikes version. There was also a preference for Scores among both the students and the teacher, given the positive reinforcement aspect of that version. However, given that the difference between versions was very slight, teacher and student preference, classroom contextual fit, and alignment with PBIS goals should be considered when deciding which version of the GBG to implement. Additionally, modification considerations may need to be made to implementation for individuals with more support needs. While this study continues to support the strong effectiveness of the GBG for general education students and settings, as a universal support, it does not necessarily have a universal impact. 24 REFERENCES Barrish, H. H., Saunders, M., & Wolf, M. M. (1969). Good behavior game: effects of individual contingencies for group consequences on disruptive behavior in a classroom. Journal of Applied Behavior Analysis, 2(2), 119–124. https://doi- org.proxy1.cl.msu.edu/10.1901/jaba.1969.2-119 Cameron, C., Connor, C., Morrison, F., & Jewkes, A. (2008). Effects of classroom organization on letter-word reading in first grade. Journal of School Psychology, 6, 173–192. doi:10.1016/j.jsp.2007.03.002 Center on PBIS (2025). Positive Behavioral Supports & Interventions. Retrieved March 17, 2025, from www.pbis.org Embry, D. D. (2011). Behavioral vaccines and evidence-based kernels: Non-pharmaceutical approaches for the prevention of mental, emotional and behavioral disorders. The Psychiatric Clinics of North America, 34(1), 1– 34. https://doi.org/10.1016/j.psc.2010.11.003. Flower, A., McKenna, J. W., Bunuan, R. L., Muething, C. S., & Vega Jr, R. (2014). Effects of the Good Behavior Game on challenging behaviors in school settings. Review of educational research, 84(4), 546-571. Humphrey, N., Panayiotou, M., Hennessey, A., & Ashworth, E. (2021). Treatment effect modifiers in a randomized trial of the Good Behavior Game during middle childhood. Journal of Consulting and Clinical Psychology, 89(8), 668–681. https://doi.org/10.1037/ccp0000673 Individuals with Disabilities Education Act, 20 U.S.C. § 1400 (2004) Johansson, M., Biglan, A. & Embry, D. The PAX Good Behavior Game: One Model for Evolving a More Nurturing Society. Clin Child Fam Psychol Rev 23, 462–482 (2020). https://doi.org/10.1007/s10567-020-00323-3 Jornevald, M., Pettersson‐Roll, L., & Hau, H. (2024). The Good Behavior Game for students with special educational needs in mainstream education settings: A scoping review. Psychology in the Schools, 61, 861–886. https://doi.org/10.1002/pits.23086 Noltemeyer, A., Palmer, K., James, A. G., & Wiechman, S. (2019). School-wide positive behavioral interventions and supports (SWPBIS): A synthesis of existing research. International Journal of School & Educational Psychology, 7(4), 253-262. Oliver, R. M., Wehby, J. H., & Reschly, D. J. (2011). Teacher classroom management practices: Effects on disruptive or aggressive student behavior. Campbell Systematic Reviews, 7(1), 1-55. 25 Tanol, G., Johnson, L., McComas, J., & Cote, E. (2010). Responding to rule violations or rule following: A comparison of two versions of the Good Behavior Game with kindergarten students. Journal of School Psychology, 48(5), 337-355. SciSho Psych. (2019, September 9). The Good Behavior Game [Video]. YouTube. https://www.youtube.com/watch?v=Nc0Tw6ISYKk Sharpe, A. N., & Joslyn, P. R. (2021). Correspondence of product and topographical behavior measures during a comparison of Good Behavior Game arrangements. Education and Treatment of Children, 44(4), 215-231. Shinn, M. R., Ramsey, E., Walker, H. M., Stieber, S., & O'Neill, R. E. (1987). Antisocial behavior in school settings: Initial differences in an at-risk and normal population. The Journal of Special Education, 21, 69–84. Wahl, E., Hawkins, R. O., Haydon, T., Marsicano, R., & Morrison, J. Q. (2016). Comparing Versions of the Good Behavior Game: Can a Positive Spin Enhance Effectiveness? Behavior Modification, 40(4), 493-517. https://doi.org/10.1177/0145445516644220 Wiskow, K. M., Urban-Wilson, A., Ishaya, U., DaSilva, A., Nieto, P., Silva, E., & Lopez, J. (2021). A comparison of variations of the good behavior game on disruptive and social behaviors in elementary school classrooms. Behavior Analysis: Research and Practice, 21(2), 102–117. https://doi.org/10.1037/bar0000208 Wright, R. A., & McCurdy, B. L. (2012). Class-wide positive behavior support and group contingencies: Examining a positive variation of the good behavior game. Journal of Positive Behavior Interventions, 14(3), 173-180. 26 APPENDIX Table 1 Procedural Fidelity Checklist for Strikes and Scores Strikes Scores 1. Divide students into equal teams 2. Write team names on board 3. Remind students of the rules. Write 1. Divide students into equal teams 2. Write team names on board 3. Remind students of the rules. Write them on board them on board 4. State the point threshold needed to 4. State the point threshold needed to win 5. Describe the reward for winning 6. Ask students if they have any questions before beginning. 7. Announce the start of the game and begin interval time keeping device. Prompt students to begin assignment/lesson 8. For Strikes 1: Listen for the chime of the interval time keeping device and deliver a “Strike” when a rule is broken on the chime (every 30 seconds). For Strikes 2: Deliver a “Strike” whenever a rule is broken. 9. Deliver feedback – call out team and behavior, not the individual 10. Play the game for 10 minutes 11. Add up points and announce winders 12. Distribute rewards win 5. Describe the reward for winning 6. Ask students if they have any questions before beginning. 7. Announce the start of the game and begin interval time keeping device. Prompt students to begin assignment/lesson 8. Listen for the chime of the interval time keeping device and deliver a “Score” for each team with everyone following the rules (For Scores 1: every 30 seconds, For Scores 2: every ~45 seconds) 9. Deliver feedback – call out positive team behavior, not individual behavior 10. Play the game for 10 minutes 11. Add up points and announce winners 12. Distribute rewards 27 Table 2 Teacher Social Validity Survey Questions and Responses Teacher Social Validity Survey Survey Question Which version of The Good Behavior Game did you prefer to implement (Scores or Strikes), and why? Please be as specific as possible. Are you likely to continue implementing either version of the game after the end of the study? Reflect on the Strikes version of the game. How would you rate each of the following on a scare of 1 to 5, 1 being a low score or more difficult rating, and 5 being a high score or easier rating? "Scores" because it focused on the positive. However, "strikes" was easier to implement. Yes, I will likely use the Scores version of the game Teacher Response Ease of Implementation Student Engagement Effect on Behavior Manageability 5 4 5 5 Reflect on the Scores version of the game. How would you rate each of the following on a scare of 1 to 5, 1 being a low score or more difficult rating, and 5 being a high score or easier rating? Ease of Implementation Student Engagement Effect on Behavior Manageability What challenges or barriers did you encounter while implementing either version of The Good Behavior Game in your classroom? Lastly, is there anything else you would like to share about your experience with The Good Behavior Game or your participation in the study? 5 5 5 4 It was easier to do it when I had regular reminders to check. Implementing it without the prompt is trickier because I often forget to add points. I enjoyed participating in the study! It was interesting to see the data collected. Thanks! 28 Figure 1 Mean Percent of Intervals On-Task Mean Percent of Intervals On-Task k s a T - n O s l a v r e t n I f o t n e c r e P n a e M 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Justin Isabel Hector Louis Baseline 1 Strikes 1 Scores 1 Strikes 2 Scores 2 Baseline 2 Phase 29 Figure 2 Justin’s Percent of Intervals On-Task Justin's Percent of Intervals On-Task Baseline 1 Strikes 1 Scores 1 Strikes 2 Scores 2 Baseline 2 k s a T - n O s l a v r e t n I f o t n e c r e P 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Session Number 30 Figure 3 Isabel’s Percent of Intervals On-Task Isabel's Percent of Intervals On-Task k s a T - n O s l a v r e t n I f o t n e c r e P 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Baseline 1 Strikes 1 Scores 1 Strikes 2 Scores 2 Baseline 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Session Number 31 Figure 4 Hector’s Percent of Intervals On-Task Hector's Percent of Intervals On-Task k s a T - n O s l a v r e t n I f o t n e c r e P 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Baseline 1 Strikes 1 Scores 1 Strikes 2 Scores 2 Baseline 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Session Number 32 Figure 5 Louis’ Percent of Intervals On-Task Louis' Percent of Intervals On-Task k s a T - n O s l a v r e t n I f o t n e c r e P 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Baseline 1 Strikes 1 Scores 1 Strikes 2 Scores 2 Baseline 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Session Number 33