SUPPORTING STUDENTS AND TEACHERS WITH TESTING AND DEBUGGING IN THE 
CONTEXT OF COMPUTATIONAL SYSTEMS MODELING 

By 

Jonathan Robert Bowers 

A DISSERTATION 

Submitted to 
Michigan State University 
in partial fulfillment of the requirements   
for the degree of 

Curriculum, Instruction, and Teacher Education – Doctor of Philosophy 

2024 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ABSTRACT 

To make sense of our interconnected and algorithm driven world, students increasingly need 

proficiency with computational thinking (CT), systems thinking (ST), and computational modeling. One 

aspect of computational modeling that can support students with CT, ST, and modeling is testing and 

debugging. Testing and debugging enables students to analyze and interpret model output to identify 

aspects that need improvement. Students can subsequently revise their own models or provide meaningful 

feedback to their peers. Testing and debugging has long been identified as a key learning goal in both 

science education and computer science. However, current evidence suggests that students have limited 

opportunities to engage in testing and debugging in K-12 science classrooms. Additionally, both 

curricular and teacher supports for testing and debugging remain understudied. As such I set out to 

investigate how students test and debug computational models within a supportive learning environment 

and how two teachers supported students with testing and debugging in the context of a high school 

chemistry unit. Through this research, I developed the ST and CT Identification Tool to categorize 

student testing and debugging behaviors during computational modeling. Using this tool, I identified that 

students implemented a variety of different pattens of testing and debugging during computational 

modeling. This suggests that teachers and curricular designers should embrace a diversity of testing and 

debugging pathways when supporting students with testing and debugging. Likewise, my analysis of 

pedagogical strategies provides evidence that using synergistic scaffolding and presenting students with 

clear rationales for engaging with different aspects of testing and debugging encourages students to utilize 

testing and debugging as a means of improving their computational models.

 
 
 
 
 
 
 
 
ACKNOWLEDGEMENTS 

I would like to thank the following people for all the support they have given me with respect to 

this thesis and across my time as a graduate student at Michigan State University. First, I would like to 

thank my parents, Carlton and Robyn Bowers and Carolyn and Donald Kindell, for always believing in 

my potential and encouraging me to work hard to achieve my dreams. I am grateful to Lisa Kenyon for 

introducing me to the field of science education and connecting me with the amazing faculty at Michigan 

State University. I thank Julie Christensen for being my graduate student mentor during my first year as a 

graduate student and for serving on my practicum committee. I also would like to acknowledge my 

colleagues from Concord Consortium (particularly Daniel Damelin, Cynthia McIntyre, Steve Rodrick, 

and Lynn Stephens) for their work with designing SageModeler and co-developing the evaporative 

cooling unit used throughout this thesis, their support with creating “A Framework for Computational 

Systems Modeling,” and their feedback on the many research instruments and manuscripts that comprise 

this thesis. I also want to thank my colleagues from CREATE for STEM (Lindsey Brennan, Emil Eidin, 

Namsoo Shin, Israel Touitou, and Joseph Krajcik) who also worked collaboratively on the Multilevel 

Computational Modeling project. I especially want to thank Namsoo Shin for her leadership and 

mentorship throughout my time at Michigan State and for her work with developing “A Framework for 

Computational Systems Modeling.” I am grateful to Emil Eidin and Lindsey Brennan for their 

contributions towards developing the many research instruments used in this thesis, their hard work with 

collecting data from “Faraday High School,” and for collaboratively developing the evaporative cooling 

unit alongside Daniel Damelin and me. I also thank Peng He for contributing his statistical advice to this 

thesis and I thank Tingting Li for collecting data for me the day that I had my thesis proposal meeting. 

I am also thankful for my thesis committee (Amelia Gotwals, Joseph Krajcik, Christina Schwarz, 

and Gail Richmond) for supporting me throughout this process. Thank you, Amelia Gotwals for being an 

amazing instructor for my first three semesters at Michigan State. Thank you, Christina Schwarz and 

David Stroupe, for sharing your professional wisdom, advice, and encouragement throughout many 

Science Education lunches. I am also grateful for Gail Richmond and her support with mentoring me 

iii 

 
 
 
 
 
 
 
throughout the “Native Animals, Native Knowledge” project. I once more want to thank Joseph Krajcik 

for being an amazing PI and steadfastly supporting me throughout my work on this thesis and as I pursue 

the next steps in my career. I also want to acknowledge and thank the many members of the science 

education writing group. The science education writing group at Michigan State University has been an 

invaluable community for me during my time at Michigan State providing much needed mentorship and 

community. Their feedback has greatly influenced the direction of my research and tremendously 

improved the quality of all three manuscripts found within this thesis. I particularly want to thank Matt 

Adams for his leadership within the science education writing group and for his support and expertise 

when we co-taught TE 802 and TE 804. 

Above all, I want to thank Mr. H and Mr. M of “Faraday High School” for four years of 

collaboration without which this thesis would have been impossible. Thank you for your willingness to 

allow us to visit your classroom and work with your students, even during the challenges of a global 

pandemic. Thank you for all your efforts to take our vision of the evaporative cooling unit and being 

willing to implement its many variations across its long development cycle. Thank you for all the 

feedback that enabled us to continuously improve the evaporative cooling unit and for your feedback on 

this thesis. I especially want to thank you for allowing me to tell the narrative of your teaching strategies 

so that other teachers might gain deeper insights into how to best support students with testing and 

debugging. 

And to whomever is not listed here who supported me throughout this process, Thank you! 

iv 

 
 
 
 
 
 
 
 
TABLE OF CONTENTS 

INTRODUCTION …………………………………………………………………………………………1 

PAPER 1: DEVELOPING THE SYSTEMS THINKING AND COMPUTATIONAL THINKING 
IDENTIFICATION TOOL ……………………………………………………………………………….23 

PAPER 2: EXAMINING STUDENT TESTING AND DEBUGGING WITHIN A COMPUTATIONAL 
SYSTEMS MODELING CONTEXT …………………………………………………………………….39 

PAPER 3: SYNERGISTIC SCAFFOLDING AND CLEAR RATIONALES: HOW TEACHERS CAN 
SUPPORT STUDENTS WITH TESTING AND DEBUGGING IN A COMPUTATIONAL MODELING 
CONTEXT ………………………………………………………………………………………………..76 

CONCLUSIONS …………………………………………………………………………………….......170 

ACKNOWLEDGMENT OF PREVIOUSLY PUBLISHED WORK …………………………………..186 

BIBLIOGRAPHY ……………………………………………………………………………………….189

v 

 
 
 
 
 
 
 
 
 
 
 
 
INTRODUCTION 

In an increasingly interconnected world, it is important that students have a deep appreciation of 

the intricacies of natural systems and a firm grasp on key aspects of systems thinking (ST) and testing and 

debugging. For example, to articulate how small changes (such as the introduction of an invasive species) 

can have a massive impact on broader systems, students need to understand feedback mechanisms and 

other advanced aspects of ST (Hofman-Bergholm, 2018; Keynan et al., 2014; Ledly et al., 2017; 

Meadows, 2008). Modeling can help students visualize the relationships that exist between different 

elements in a system, further supporting ST (Hopper & Stave, 2008; Monroe et al., 2015). Likewise, 

computational modeling software, such as Model-IT and STELLA, help students create interactive 

models that can generate a numeric or semi-quantitative model output (Bielik et al., 2019; Mandinach, 

1988; Metcalf et al., 2000). Learners can compare this model output to external data and use it to facilitate 

testing and debugging of their models (Shin et al., 2022; Campbell & Oh, 2015; Fisher, 2018; Sins et al., 

2005). Additionally, the process of constructing, testing, and revising these computational models often 

allows students to develop computational thinking (CT) skills, such as problem decomposition and 

iterative refinement (Pierson & Brady, 2020; Sengupta et al., 2013; Weintrop et al., 2016).  

Given the synergies between ST, CT, and modeling, researchers proposed “A Framework for 

Computational Systems Modeling” to demonstrate the interconnected nature of these three constructs and 

to help researchers, curriculum developers, and teachers better support student engagement in all aspects 

of computational systems modeling (Bowers et al., 2022a; Hamidi et al., 2023; Shin et al., 2022; 

Weintrop et al., 2016). One prominent computational system modeling practice emerging from this 

framework that engages students with ST, CT, and modeling is testing and debugging. Testing and 

debugging processes allow students to analyze a model’s output and structure. If the model does not align 

with their understanding of the system or with external data students can make subsequent changes to 

improve the model (Hadad et al., 2020; Hogan & Thomas, 2001; Sengupta et al., 2013). Through testing 

and debugging, students can consider how broader structural patterns influence model behavior, thus 

1 

 
 
 
 
 
 
engaging in ST (Pierson & Brady, 2020; Shin et al., 2022). By systematically analyzing model output to 

uncover unexpected errors in a model’s structure, students also utilize major aspects of CT during testing 

and debugging (Lee et al., 2020; Li et al., 2019; Michaeli & Romeike, 2019). Finally, iteratively refining 

a model based on new experimental evidence and a growing understanding of the underlying 

phenomenon is a central tenet of scientific modeling that intersects with and informs understanding of 

testing and debugging (Grover & Pea, 2018; Metcalf et al., 2000; NRC, 2012; Shin et al., 2022).   

Although testing and debugging has been recognized by several scholars as a major aspect of 

computational modeling and an important learning goal for STEM education, students typically have 

limited opportunities to engage with testing and debugging in most K-12 classrooms (Sins et al., 2005; 

Swanson et al., 2021; Wilensky & Reisman, 2006). Even in learning environments specifically designed 

to support students with computational modeling, students often do not spend adequate time analyzing 

model output to make necessary revisions after constructing their initial models (Grapin et al., 2022; 

Stratford et al., 1998; Swanson et al., 2021). Likewise, students often use superficial testing and 

debugging approaches to create models that have functional outcomes that mirror real-world experimental 

data but demonstrate a lack of internal consistency and thus fail to explain the phenomenon in a 

meaningful way (Li et al., 2019; Sins et al., 2005; Wilensky & Reisman, 2006). Given the challenges 

students face with testing and debugging, as well as the many overlapping and competing visions of 

testing and debugging that exist within the broader STEM education literature, it is important to both 

clearly define a vision for testing and debugging in the context of computational modeling and show 

examples of how students use different aspects of testing and debugging as they build and revise a 

computational model. Additionally, given the lack of clear instructional resources for supporting students 

with testing and debugging in computational modeling, research needs to investigate how teachers can 

scaffold students with key aspects of testing and debugging in a computational modeling context. As such 

I address the following research questions through the three respective papers in this thesis. 

RQ1: How can I categorize student testing and debugging behaviors in the context of computational 

modeling based on “A Framework for Computational Systems Modeling”? 

2 

 
 
 
 
 
RQ2: What testing and debugging behaviors do students use as they are revising computational systems 

models in an evaporative cooling unit? 

RQ3: How do teachers scaffold students with testing and debugging in a computational modeling 

context? 

Literature Review 

A Framework for Computational Systems Modeling 

Systems Thinking 

The natural world contains many complex systems with elements that interact with each other in 

interesting ways to give rise to a multitude of interesting phenomena (Bielik et al., 2023; Hofman-

Bergholm, 2018; Meadows, 2008). From the complex web of biochemical reactions within a single cell, 

to the numerous feedback systems within the human body that help us maintain homeostasis, to the 

mechanisms by which energy flows and nutrients circulate within ecosystems, natural systems play a 

critical role in shaping our daily lives. As such systems thinking (ST), or the process of viewing the 

natural world as a series of interconnected elements that interact to form complex systems, is an important 

part of scientific literacy (Arnold & Wade, 2015; Ledley et al., 2017; Meadows, 2008; Stave & Hopper, 

2007). Indeed, many of the most important scientific issues of our contemporary world, including 

infectious diseases, invasive species, and climate change, can greatly benefit from a systems thinking 

approach (Hofman-Bergholm, 2018; Keynan et al., 2014; Ledley et al., 2017; Xia et al., 2017).  

While individual aspects of these systems can sometimes be described in the simple terms of 

“cause and effect” relationships, understanding many of the phenomena associated with these systems 

requires a holistic systems thinking approach (Forrester, 1994; Hofman-Bergholm, 2018; Ledley et al., 

2017; Stave & Hopper, 2007). For example, while a person can describe anthropogenic climate change by 

stating that, “more carbon dioxide in the atmosphere makes the planet warmer”, this simplistic causal 

explanation ignores the broader complexities of climate change. A more complete, ST based explanation 

of climate change would include discussions about how carbon dioxide is transferred from fossil fuels 

into atmospheric CO2 through industrial processes and how rising temperatures themselves create 

3 

 
 
 
 
 
feedback loops that lead to even higher concentrations of atmospheric greenhouse gasses and even higher 

global temperatures (Hofman-Bergholm, 2018; Ledley et al., 2017). This more complex, ST based 

approach to teaching climate change communicates the responsive nature of earth’s natural systems and 

the urgency of climate action. Therefore, incorporating ST into science education has the potential to 

enhance student understanding of key science ideas, including climate change (Ke et al., 2020; Hofman-

Bergholm, 2018; Ledley et al., 2017). 

Modeling and Computational Modeling 

Given the potential that ST has in supporting student understanding of key science phenomena 

and core science ideas, there have been several efforts to find ways to integrate ST into science 

classrooms (Boersma et al., 2011; Hmelo-Silver et al., 2017; Yoon, 2008). One way that shows promise 

in helping students with developing ST skills is by embedding ST into modeling (Arndt, 2006; Forrester, 

2007; Sterman, 2002; Svoboda & Passmore, 2013). Modeling is the process of creating a static (paper 

pencil or 3-dimensional) or dynamic representation of a phenomenon such that the representation can be 

used to explain or predict the behavior of that phenomenon (Harrison & Treagust, 2000; Louca & 

Zacharia, 2012; Mittelstraß, 2005). From this perspective, models are viewed not just as the product of 

scientific inquiry but as essential tools for supporting scientific reasoning (Bailer-Jones, 1999; Schwarz & 

White, 2005; zu Belzen et al., 2019). Additionally, models can help one gain insight into previously 

unknown aspects of a phenomenon and have predictive power (zu Belzen & Krüger, 2010). Models can 

also act as sensemaking tools by helping learners to synthesize knowledge and by serving as a focal point 

for asking future questions about a phenomenon (Gouvea & Passmore, 2017; Nercessian, 2008; Schwarz 

et. al, 2009). Models can support systems thinking by allowing students to represent the relationships 

between different elements in a system and facilitate discussions on how distant elements within a system 

can impact each other. Modeling also facilitates more constructivist approaches to science learning as 

students can continuously revise their models as they gather new information through investigations, 

simulations and analysis of data or make modifications to their models to enhance the explanatory power 

of their models (Krell et al., 2015; Passmore et al., 2009; Windschitl et al., 2008). 

4 

 
 
 
 
 
To help students visualize how structural changes to their models impact system behavior, many 

researchers and educators have turned to computational modeling software (Basu et al., 2016; Nguyen & 

Santagata, 2020; Shin et al., 2022). Computational modeling uses algorithms or algorithmic thinking to 

create a model that visually represents the behavior of a system in a quantitative or semi-quantitative 

manner (Fisher, 2018; Pierson & Clark, 2018; Sengupta et al., 2013; Shin et al., 2021, 2022; Weintrop et 

al., 2016). Computational models have many affordances that make them valuable tools for science 

learning. The visual aspect of computational models allows for students to explore how various elements 

of the model interact to generate complex behaviors and see that changes to model parameters can affect 

system behavior thus facilitating ST (Basu et al., 2016; Cronin et al., 2009; Nguyen & Santagata, 2020). 

The algorithmic nature of computational models also provides opportunities for students to utilize 

different aspects of CT (Anderson, 2016; Brennan & Resnick. 2012; Irgens et al., 2020; Wang, 2021b). 

For example, students need to engage in the CT aspect of problem decomposition as they decide how to 

best represent different aspects of a phenomenon in a format that can be interpreted by the computational 

modeling software. Computational models are also responsive to new data inputs and can be tested and 

debugged using a computer (Fisher, 2018; Pierson & Clark, 2018; Sengupta et al., 2013; Weintrop et al., 

2016). As such computational modeling supports students with ST, CT, and testing and debugging. 

In general, there are two major classes of computational modeling programs: agent-based 

modeling and icon-based modeling. In agent-based modeling, students use computer programming 

languages to create or manipulate individual elements or “agents” in a programming canvas (Basu et al., 

2016; Sengupta et al., 2013; Wilensky & Reisman, 2006). These agents can be given unique behaviors 

and be programmed to interact with other agents. In icon-based modeling, the user represents variables as 

symbols or icons (Costanza & Voinov, 2001; Smith et al., 2005; Xiang, 2011). The user then sets links 

between these different variables to demonstrate the causal relationships between variables in the system 

(Damelin et al., 2017; Nguyen & Santagata, 2020). Early examples of computational modeling software 

used to support students with ST come from the icon-based modeling programs of STELLA and Model-

IT (Metcalf et al., 2000; Richmond, 1994; Stratford et al., 1998). Both of kinds of software allow for 

5 

 
 
 
 
 
students to set distinct relationships between multiple elements in their models and generate a model 

output. The generation of model output facilitates students in exploring how changes to system structure 

impact system behavior and encourages them to engage in testing and debugging so that their models 

better match both their conceptual understanding of the phenomenon and real-world data (Basu et al., 

2016; Bravo et al., 2006; Sengupta et al., 2013; Stratford et al., 1998). 

One icon based computational modeling software that is particularly promising in its potential to 

support students with systems thinking and testing and debugging is SageModeler. SageModeler is an 

open-source semi-quantitative icon based computational modeling software developed by Concord 

Consortium (Damelin et al., 2017; Eidin et al., 2023; Nguyen & Santagata, 2020). Several of the features 

of SageModeler have the potential to support students in various aspects of systems thinking, 

computational modeling, and testing and debugging. On a basic level, SageModeler allows students to set 

relationships between elements and define these relationships in semi-quantitative terms (Figure 1A). 

Students can also set certain elements to be collectors (meaning that these elements can either increase or 

decrease in value over time) and set flows between these collectors to simulate how two interrelated 

elements can change over time (Figure 1B). Additionally, SageModeler allows students to generate model 

output through both simulation features and a specialized graphing tool, facilitating student testing and 

debugging (Figure 1C). Because these features of SageModeler were designed specifically to support 

students with computational modeling, the studies found in this document are built around students using 

this software program. However, many of the principles studied in the context of SageModeler can be 

applied to other system dynamics programs and other forms of computational modeling. 

6 

 
 
 
 
 
 
 
 
Figure 1: SageModeler Introduction 

The simulation features of SageModeler are activated through the simulate button (1). This allows for 

students to change the relative amount of each input variable (2) and see its impact on model behavior. 

Using the record button (3), students can record how the system is changing over time and can 

subsequently generate a graph (4) showing the relationship between any two variables in the system. 

Figure 1A: Example of a simple causal relationship in SageModeler  

Figure 1B: Example of a simple collector and flow system 

Figure 1C: Simulations and graphing features of SageModeler 

7 

 
 
 
 
 
 
 
 
Computational Thinking 

While computational modeling provides a platform for students to visualize a phenomenon as a 

system of interconnected elements, thereby facilitating ST, it also creates opportunities for students to 

engage in computational thinking (Basu et al., 2016; Wilensky & Reisman, 2006; Weintrop et al., 2016). 

Computational thinking (CT) is a form of sensemaking that uses an iterative and quantitative approach to 

decompose a phenomenon or problem to explore, explain, and predict the behavior of that phenomenon or 

to find a solution to a problem through the creation and revision of algorithms (Grover & Pea, 2018; Shin 

et al., 2022; Weintrop et al., 2016; Wing, 2006). Because the CT community has its origins in computer 

science education, CT literature emphasizes the algorithmic nature of computational models, both in how 

students construct and revise their models (Brennan & Resnick, 2012; Weintrop et al., 2016). Two aspects 

of CT deeply intertwined with ST and computational modeling are testing & debugging and iterative 

refinement (Brennan & Resnick, 2012; Li et al., 2019; Swanson et al., 2021; Wilensky & Reisman, 2006). 

As students examine and test their model behavior and model output, they often notice aspects of model 

behavior that do not match their conceptual understanding of the phenomenon or external data (Hadad et 

al., 2020; Weintrop et al., 2016). Using debugging strategies, they can identify a specific aspect of their 

model (be it an individual element or relationship between elements) that needs modification, so their 

model better fits their conceptual understanding or external data (Aho, 2012; Shin et al., 2022; Türker & 

Pala, 2020; Weintrop et al., 2016). Likewise, as student conceptual understanding evolves over the course 

of a unit, they will inevitably need to make iterative refinements to their models to match their changing 

understanding of the phenomenon (Barr & Stephenson, 2011; Basu et al., 2016; Shin et al., 2021). As 

these aspects of CT encourage students to consider how model structure influences model behavior, they 

provide opportunities to reinforce aspects of students’ understanding of ST. 

A Framework for Computational Systems Modeling 

While systems thinking and computational thinking are a part of many science educational policy 

documents (ACARA, 2017; KMK, 2005, 2020; Yadav et al., 2017), including the Next Generation 

Science Standards (NGSS, 2013), and the idea of using computational modeling to support ST and CT in 

8 

 
 
 
 
 
science classrooms has existed for several decades, this approach to science education remains absent 

from most science classrooms (Boersma et al., 2011; Riess & Mischo, 2010; Verhoeff et al., 2018). 

Integrating ST and CT with computational modeling (to create “computational systems modeling”) 

requires that science educators recognize the synergy between modeling, CT, and ST and abandon the 

former “siloing” that has previously defined these three bodies of literature. While many researchers have 

explored computational modeling and strived to integrate CT into science classrooms, these researchers 

focused on agent-based modeling and computer science principles, thereby did not explore or address 

icon-based models (such as SageModeler and Model-IT) for helping students to visualize relationships 

between variables (Basu et al., 2016; Sengupta et al., 2013; Wilensky & Reisman, 2006). Likewise, ST 

modelers (particularly in the System Dynamics community) often focus on having students understand 

specific types of relationships between elements in a model, while largely avoiding discussions on the 

broader CT principles at work in the modeling process (Assaraf & Orion, 2005; Cronin et al., 2009; Stave 

& Hopper, 2007). Finally, much of the traditional modeling community prioritizes diagrammatic models 

and investigating these models as tools to facilitate student sensemaking (Schwarz et al., 2009) and 

support students in explaining and predicting the behavior of a real-world phenomenon (zu Belzen & 

Krüger, 2010). 

To help synthesize key ideas and contributions from ST, CT, and Modeling literature into a 

cohesive vision for computational systems modeling, Shin & colleagues (2022) proposed “A Framework 

for Computational Systems Modeling” (Figure 2). This framework consists of three major components: 

ST aspects (on the left-hand side of the diagram), CT aspects (on the right-hand side of the diagram), and 

computational modeling practices (green boxes in the middle of the diagram). When constructing this 

framework, Shin & colleagues (2022) took inspiration and guidance from the ST, CT, and Modeling 

literature to define the five ST aspects, the five CT aspects, and the five computational modeling practices 

shown in this framework (Grover & Pea, 2018; NRC, 2012; Richmond, 1994; Schwarz et al., 2017). 

While the ST and CT aspects of this framework serve to summarize the authors’ conceptualization of 

systems thinking and computational thinking respectively, the five computational modeling practices are 

9 

 
 
 
 
 
 
concrete actions that students perform as they design, construct, test, and revise their computational 

systems models. Each of the five computational modeling practices are informed by the ST and CT 

aspects defined in this framework and provide students with the opportunity to develop and demonstrate 

various aspects of ST and CT (Shin et al., 2022). For example, as students “test, evaluate, and debug 

model behavior”, they often utilize the ST aspects of “predicting system behavior based on system 

structure” and “engaging in causal reasoning” (Hadad et al., 2020; Lee et al., 2020) alongside the CT 

aspects of “testing and debugging” and “making iterative refinements” (Aho, 2012; Barr & Stephenson, 

2011; Shin et al., 2022). As such this framework acknowledges the deep synergy between ST, CT, and 

modeling that occurs as students construct and revise computational models. This framework also 

recognizes that computational systems modeling is an iterative process, where students will identify 

changes that need to be made to their models through the practice of “test, evaluate, and debug model 

behavior” and subsequently reconsider system boundaries and model structures, thus reengaging in those 

practices (Shin et al., 2022).  

Figure 2: “A Framework for Computational Systems Modeling” 

Testing and Debugging  

Testing and Debugging: an Overview 

One of the main advantages of incorporating computational modeling into K-12 STEM education 

is to enable students to engage in testing and debugging. Testing and debugging is a multi-faceted process 

10 

 
 
 
 
 
 
 
 
by which students actively seek to identify flaws in their algorithmic representations of a phenomenon 

and make subsequent corrections to their representations to more accurately reflect their evolving 

understanding of the underlying phenomenon (Hadad et al., 2020; Hogan & Thomas, 2001; Sengupta et 

al., 2013; Shin et al., 2022). After students have constructed an initial algorithmic product (e.g. a text-

based computer program, an agent-based computational model, a SageModeler model, etc.), they will 

often need to test their algorithmic product to see if its output aligns to their expectations (Griffin, 2016; 

Hadad et al., 2020; Shin et al., 2022; Wilensky & Reisman, 2006). Such testing is often built directly into 

the computational modeling software or is accomplished by executing algorithmic codes. If the output of 

the algorithmic product does not match the expected outcome or is found to differ from experimental 

results, students are then tasked with identifying specific structural flaws within their algorithmic 

representation (Bravo et al., 2006; Michaeli & Romeike, 2019; Sengupta et al., 2013). This might require 

students to engage in debugging by systematically going through lines of computer code to find 

syntactical errors or interrogating their reasoning behind each individual relationship in a computational 

model. Having knowledgeable peers review their algorithmic products can also help students identify 

specific flaws in their representations of the phenomenon. Once students have identified flaws in their 

algorithmic representations and make appropriate revisions, they should once more test their algorithmic 

products to see if their changes have improved their algorithmic outputs and to identify additional errors. 

Through this cycle of testing, debugging, and revising, students will iteratively refine their algorithmic 

product to better reflect the underlying science phenomenon they are trying to represent (Grover & Pea, 

2018; Hutchins et al., 2020; Shin et al., 2022; Windschitl et al., 2008).  

There are several benefits for encouraging students to test and debug their algorithmic products. 

By seeking out structural and syntactical flaws in their algorithmic representations, students build a 

deeper understanding of how to encode ideas in an algorithmic environment, thus enhancing their CT 

skills (Michaeli & Romeike, 2019; Grover et al., 2015; Shin et al., 2021). For example, identifying that a 

missing parenthesis in a text-based computer program renders it unable to compile teaches students the 

importance of proper syntax in computer science. Likewise, testing and debugging often requires that 

11 

 
 
 
 
 
students investigate how various aspects and elements of their algorithmic representations interact with 

each other to create complex behavioral patterns (Abar et al., 2017; Fretz et al., 2002; Sengupta et al., 

2012; Weintrop et al., 2016). Through seeing how interactions between various elements in their 

algorithmic products influence output behavior, students can develop ST competency (Shin et al., 2022; 

Weintrop et al., 2016). Finally, by engaging in the iterative refinement aspect of testing and debugging, 

students have multiple opportunities to revisit their understanding of the underlying scientific 

phenomenon or the goals of their algorithmic product (Grover & Pea, 2018; Hutchins et al., 2020; Shin et 

al., 2022). By this reflective process, particularly when paired with collecting and analyzing data from 

real-world experiments, students can reconsider previously held assumptions about the phenomenon and 

make changes to their models based on new knowledge (Basu et al., 2016; Bravo et al., 2006; Grapin et 

al., 2022; Windschitl et al., 2008). As such, testing and debugging supports students in making sense of 

the phenomenon they seek to represent, thus benefiting their understanding of disciplinary core ideas.  

Testing and debugging (along with the closely related construct of iterative refinement) is found 

across a broad spectrum of STEM Education literature (Grover & Pea, 2018; Michaeli & Romeike, 2019; 

Stratford et al., 1998; Weintrop et al., 2016). In the scientific practice of modeling, students are 

encouraged to frequently return to their models after learning new science content and/or conducting real 

world experiments to make model revisions (Louca & Zacharia, 2012; Metcalf et al., 2000; NRC, 2012; 

Schwarz et al., 2009). Through this process of iterative refinement, students gradually improve their 

models to better reflect the real-world science phenomenon and deepen their understanding of underlying 

disciplinary core ideas (Clement, 2000; Schwarz et al., 2007, 2009; Windschitl et al., 2008). Computer 

science scholars view testing and debugging as the process of searching for anomalies in a software 

program, finding specific flaws (i.e., “bugs”) in the algorithmic code, and subsequently replacing these 

flawed sections of computer code so that the program can run as intended (Griffin, 2016; McCauley et al., 

2008; Michaeli & Romeike, 2019). Given the importance of being able to identify and correct flaws in 

computer programming, computer science educators often view testing and debugging proficiency as an 

essential indicator of programming skill.  

12 

 
 
 
 
 
Computational modeling, another instructional environment, can facilitate students with testing 

and debugging. The computational modeling environment requires that users encode information in an 

algorithmic manner so that the software can generate a functioning model (Fisher, 2018; Pierson & Clark, 

2018; Sengupta et al., 2013; Shin et al., 2022). The complexity of the encoding process ranges from using 

drop-down text menus to set semi-quantitative relationships between individual variables in icon-based 

modeling programs such as SageModeler to using relatively sophisticated text-based programming to set 

the behavior of individual agents in NetLogo (Damelin et al., 2019; Goldstone & Janssen, 2005; 

Wilensky & Reisman, 2006). Despite these varying levels of complexity, the common algorithmic nature 

of computational modeling software programs leads to an environment where syntactical errors can occur, 

thus creating opportunities for students to use debugging to locate and correct these flaws in their models. 

Most computational modeling software programs also have a visual output, allowing for students to more 

easily see how their algorithmic products behave under different initial conditions, thus facilitating model 

testing (Abar et al., 2017; Campbell & Oh, 2015; Fisher, 2018; Sins et al., 2005). For example, some 

computational modeling programs allow for students to see the behavior of the various agents they have 

programed on a set canvas (Basu et al., 2014; Goldstone & Janssen, 2005; Ginovart, 2014; Sengupta & 

Farris, 2012). Other computational modeling programs let students explore how manipulating the relative 

amount of input variables impacts the relative amount of intermediate and output variables in their models 

(Damelin et al., 2017; Metcalf et al., 2000; Nguyen & Santagata, 2020; Richmond, 1994). Such a visual 

output enables students to see if their model’s behavior contradicts their expectations and determine if 

further revisions are needed. In computational modeling programs with a quantitative or a semi-

quantitative visual output, students can often compare their model output with real-world experimental 

data, allowing students to validate their models (Campbell & Oh, 2015; Shin et al., 2021; Weintrop et al., 

2016). Using external data to validate model behavior is an important aspect of testing and debugging as 

it allows for students to demonstrably determine if their model behavior accurately reflects the targeted 

real-world phenomenon. It also helps emphasize the importance of the experimental aspect of science, 

13 

 
 
 
 
 
showing that theoretical models, such as their computational model, need to be supported by external, 

experimental data (Bravo et al., 2006; Sengupta et al., 2013; Stratford et al., 1998). 

Testing and Debugging in “A Framework for Computational Systems Modeling” 

Expanding upon prior studies, my colleagues and I developed a comprehensive definition of 

testing and debugging, as part of the broader computational modeling framework of “A Framework for 

Computational Systems Modeling” (Basu et al., 2016; Hadad et al., 2020; Sengupta et al., 2013; Shin et 

al., 2022; Figure 3). As previously mentioned, “A Framework for Computational Systems Modeling” is 

deeply rooted in our collective understanding of ST, CT, and computational modeling and was influenced 

by our experiences with the SageModeler software program (Damelin et al., 2019; Shin et al., 2021; 

2022). “A Framework for Computational Systems Modeling” incorporates the term “testing and 

debugging” in both the context of CT and within the computational modeling practice of “testing, 

evaluating, and debugging model behavior”. However, for the purposes of this thesis, I am focusing on 

testing and debugging as a multifaceted computational modeling practice. While testing and debugging, 

students will often begin by interrogating the different variables and relationships within their models or 

by analyzing the visual output of their models (Hadad et al., 2020; Lee et al., 2020; Shin et al., 2022). 

During this analytical phase, students will often identify aspects of their models that do not correspond 

with their evolving understanding of the phenomenon or fail to align with experimental data. This, in turn, 

motivates students to seek specific relationships and variables that can be changed to improve their 

model. Through iteratively assessing model output and refining model structures, students generally 

succeed in aligning their models more closely with the behavior of the targeted real-world phenomenon. 

This inclusive perspective on testing and debugging, inspired by the insights of various scholars, 

acknowledges how students embody elements of CT and ST as they participate in this practice (Aho, 

2012; Brennan & Reisnick, 2012; Sengupta et al., 2013; Yadav et al., 2014). Within this framework, 

aspects of the scientific practice of “using mathematics and computational thinking” along with aspects of 

the crosscutting concept of “systems and systems models” and “cause and effect” are seamlessly woven 

into the computational systems modeling practice of “testing, evaluating, and debugging model behavior” 

14 

 
 
 
 
 
 
 
and the broader scientific practice of “developing and using models” (NGSS, 2013; Shin et al., 2022). As 

students examine the visual outputs of their computational models to identify aspects that deviate from 

their expectations, they are actively engaging in the CT aspect of “testing and debugging” (Barr & 

Stephenson, 2011; Sengupta et al., 2013; Sullivan & Heffernan, 2016). Similarly, when students compare 

this model output against external real-world data, they are simultaneously involved in “generating, 

organizing, and interpreting data” (Aho, 2012; Selby & Woollard, 2013). Furthermore, as students discuss 

the validity of various relationships within their models, they are exemplifying the ST aspect of “causal 

reasoning” which intersects with the crosscutting concept of “cause and effect” (NGSS, 2013; Shin et al., 

2022). When these discussions progress towards assessing how structural elements within a model (such 

as feedback loops) influence broader facets of model behavior, students are effectively, “interpreting and 

predicting system behavior based on system structure”. Finally, students engage in “iterative refinements” 

when they modify their models to ensure that their model’s behavior more accurately mirrors that of the 

real-world phenomenon (Hadad et al., 2020; Weintrop et al., 2016). 

Figure 3: Aspects of Systems Thinking and Computational Thinking exhibited through the computational 

modeling practice of “Test, Evaluate and Debug Model Behavior” 

15 

 
 
 
 
 
 
Challenges with Testing and Debugging 

Although testing and debugging has been identified as a key aspect of computational modeling 

and is recognized as an important learning goal in computer science education and science education, 

students often find testing and debugging challenging (Barr & Stephenson, 2011; Eidin et al., 2023; 

Grapin et al., 2022; Li et al., 2019). When building computational models, students are often reluctant to 

make changes to their initial models, even when presented with new evidence that contradicts their initial 

ideas about a phenomenon (Grapin et al., 2022; Stratford et al., 1998; Swanson et al., 2021). Students also 

tend to take an ad-hoc outcome-oriented approach to testing and debugging when tasked with using real-

world external data to validate their model output (Li et al., 2019; Sins et al., 2005; Wilensky & Reisman, 

2006). In these cases, students will make modifications to their models to try to generate a model output 

that matches the real-world experimental results without considering the implications of these changes on 

the explanatory power of their models. This outcome-oriented approach results in models that 

superficially reflect real-world data but that are unable to properly explain the mechanisms of the targeted 

phenomenon. For example, based on experimental observations, a student might generate an agent based 

geocentric computational model of the rotational paths of the Sun, Earth, and Moon that accurately 

predicts the occurrences of lunar and solar eclipses. While this model technically fits the experimental 

data they have been provided, the underlying mechanisms of the model reflect a non-canonical 

understanding of the targeted disciplinary core ideas.  

As the literature demonstrates the challenges students face with testing and debugging, relatively 

few studies discuss how teachers can support students with this practice in the context of science 

education and computational modeling (Barr & Stephenson, 2011; Grover & Pea, 2018; Michaeli & 

Romeike, 2019). Most of the existing studies on pedagogical practices for supporting students with 

testing and debugging center on computer science contexts, particularly with respect to debugging 

traditional text-based computer programs (Katz & Anderson, 1989; McCauley, 2008; Michaeli & 

Romeike, 2019; Vessey, 1985). Meanwhile studies on pedagogical practices for computational modeling 

tend to offer generic advice on supporting students with iterative refinement and largely avoid suggesting 

16 

 
 
 
 
 
 
specific scaffolds for testing and debugging (Fretz et al., 2002; Snyder et al., 2022; Wilkerson et al., 2018; 

Wilkerson-Jerde et al., 2015). Given the established need to better support students with the 

computational modeling practice of testing and debugging and the dearth of literature on effective 

pedagogical supports for this practice, I set out to investigate how to best support students with testing 

and debugging in a computational modeling environment.  

Thesis Overview 

Building off “A Framework for Computational Systems Modeling” and the broader testing and debugging 

literature, I explore ways to support students with testing and debugging in a computational modeling 

context. It is important to note that the research found in this thesis was part of a larger research 

partnership between the Concord Consortium (Concord, Massachusetts) and the CREATE for STEM 

Institute (Michigan State University, East Lansing, Michigan) that sought to investigate how to integrate 

computational systems modeling into high school science classrooms using curricula built around 

SageModeler software. This project, influenced by project-based learning and design research principles 

(Barab & Squire, 2016; Krajcik & Shin, 2022), centers on the implementation of a high school chemistry 

unit, where students are tasked with building and revising a computational model of evaporative cooling 

using SageModeler software. Papers 1 and 2 of the thesis represent data collected from the initial (year 1) 

implementation of this project while Paper 3 consists of data collected in year 4 of this project. All data 

for this project was collected at Faraday High School (FHS), a pseudonym for a STEM magnet school in 

the Midwestern United States, in collaboration with two high school teachers: Mr. H and Mr. M (both 

pseudonyms).   

Before investigating how teachers and the broader learning environment could support students 

with testing and debugging, it was necessary to develop a research instrument for assessing how students 

test and debug computational models. The development of this research instrument is described in the 

first paper of this thesis titles: “Developing the Systems Thinking and Computational Thinking 

Identification Tool”. Based off “A Framework for Computational Systems Modeling”, this research 

instrument, known as the Systems Thinking and Computational Thinking Identification Tool (or the ID 

17 

 
 
 
 
 
Tool), was initially intended to analyze student behaviors across all five computational modeling practices 

found in “A Framework for Computational Systems Modeling”. However, the unwieldly nature of the 

original instrument necessitated a narrowing of the focus towards seven behaviors associated with testing 

and debugging.  In addition to creating a useful qualitative research instrument, this paper helped to 

define testing and debugging in the context of computational modeling by describing discrete testing and 

debugging behaviors students engage with as they test, debug, and revise computational models.  

Using the ID Tool instrument developed and validated in the first manuscript, I subsequently 

investigated how five student groups tested and debugged their models in the context of a high school 

chemistry unit centered on computational modeling in Paper 2: “Examining Student Testing and 

Debugging within a Computational Systems Modeling Context”. Through this paper, I identified several 

different approaches these students took towards testing and debugging, ranging from systematically 

using the simulation features built into SageModeler to find structural flaws in their models to gaining 

regular insights from peers on which aspects of their models needed further support. I also found evidence 

of certain testing and debugging behaviors (as measured by the ID Tool instrument developed in the first 

manuscript) being more common than others, suggesting that additional support from teachers and the 

learning environment would be helpful for engaging students in the less represented testing and 

debugging behaviors. 

While the first two papers of this thesis focus on analyzing how students test and debug 

computational models, Paper 3 (“Synergistic Scaffolding and Clear Rationales: How Teachers can 

Support Students with Testing and Debugging in a Computational Modeling Context”) instead explores 

how teachers can scaffold students with testing and debugging and which scaffolds seem to impact 

student testing and debugging behaviors. Narrowing the scope of this paper, I chose to focus on how 

teachers supported students with three key testing and debugging behaviors from the original ID Tool: 

analyzing model output, using peer feedback, and using external data to validate model output. This 

decision was based on the relative importance of these aspects of testing and debugging in the broader 

literature so that any conclusions from this paper could be more targeted and concise. Given the design-

18 

 
 
 
 
 
based nature of the broader project, the evaporative cooling unit as enacted in paper 3 (year 4 of the 

project), was significantly modified based on the results of papers 1 and 2 to better support students with 

testing and debugging. Immediately prior to the enactment of the evaporative cooling unit, Mr. H and Mr. 

M participated in a professional learning community organized by two colleagues and I to discuss how to 

best support students with testing and debugging. The results of this paper suggest that when Mr. H and 

Mr. M used synergistic scaffolding, by supporting students with making use of existing curricular and 

technological scaffolds embedded in the learning environment, students were more likely to engage in the 

targeted testing and debugging behaviors. Additionally, providing students with clear rationales for using 

certain testing and debugging behaviors to revise their models, seems to have been an effective 

pedagogical strategy for scaffolding students with testing and debugging. 

Author Positionality 

As a scholar in the field of science education, I take a strong stance on promoting what I believe 

to be best science teaching practices and student science learning. Taking an asset-based approach, I 

believe that all students are capable of learning science and that all students should have the opportunity 

to experience the joy and wonder of the natural world through instruction centered on science practices. I 

also have a strong belief that science learning should be contextualized to student lived experiences and 

be relevant to their everyday lives. I also have deep philosophical commitments to constructivist 

pedagogies and student-centered learning. To me, Constructivism is a theory of learning that posits that 

all student learning must build upon student prior knowledge and that students learn best when they can 

actively engage in the process of knowledge construction. In practice, this means that I believe that good 

science teaching must allow for students to engage with a meaningful phenomenon, ask questions and 

conduct investigations to gain insight into said phenomenon, and ultimately construct a knowledge 

product that demonstrates their sensemaking and understanding of the key scientific principles underlying 

the phenomenon. I believe that having students construct and revise models (both paper-pencil and 

computational models) facilitates student sensemaking and that students must be given multiple 

opportunities to revise their models as they gain new insights into the phenomenon through hands-on 

19 

 
 
 
 
 
 
investigations. Finally, I believe that developing a strong appreciation for systems thinking is a critical 

component of contemporary science education. Given that many of the most important scientific issues of 

our time involve systems composed of complex webs of interconnected elements that change over time, it 

is imperative that every student have a firm grasp on ST concepts before leaving high school. Given these 

philosophical commitments, I seek to amplify these core constructivist principles and their subsequent 

corollaries through my work.  

As a researcher, I recognize that classroom research is not value neutral and that my presence as a 

researcher impacts student learning and teacher behavior. Given that planning and enacting the 

curriculum at the core of my research takes a substantial amount of time and effort on the part of the 

cooperating teachers (Mr. H and Mr. M), I have sought to build and maintain a strong professional 

working relationship with both teachers. As I had limited experience in K-12 classrooms prior to 

beginning this work, I have deeply valued the input and expertise of Mr. H and Mr. M in this project. 

Over the many PLC meetings we have had, I have endeavored to avoid taking an overtly authoritative 

role. I have instead sought to maintain a collaborative environment where expertise is mutually shared 

and respected as both the MSU team and Mr. H and Mr. M work together to improve the quality of the 

unit. Additionally, I have had Mr. H and Mr. M review my findings, particularly in Paper 3, to ensure that 

my work accurately portrays their perspectives and resonates with their experiences as research 

participants.  

Within Mr. H and Mr. M’s classrooms, I aimed to take on primarily an observer role and sought 

to minimize my disruption to the learning environment. Earlier in this research, at times I took a more 

active role in the classroom and regularly supported students in the modeling process. As this research has 

progressed, I came to value a less hands-on approach as it allowed for a more authentic look at the 

interactions between teachers and students in constructing and revising computational models. It also 

minimized the potentially negative impacts that active interference can have on student learning outcomes 

and emotional well-being. I instead sought to focus more on observing how Mr. H and Mr. M support 

their respective students in constructing and revising their computational models and emphasized 

20 

 
 
 
 
 
 
supporting Mr. H and Mr. M through professional learning opportunities that took place outside of the 

classroom environment. However, technical difficulties emerging from the learning management system 

led me to take on a more active role in these classrooms than I would have preferred. During the final 

implementation of this unit, I regularly helped students with troubleshooting and at times provided 

support with navigating the technical aspects of SageModeler (such as creating collector and flow 

relationships) so that Mr. H and Mr. M can spend more time focusing on helping students with testing and 

debugging and ST. 

Finally, it is important to address how my identity as a native English speaking White man 

impacts my interaction with the students and teachers who are participating in this research project. Given 

that I am doing this work in a school building that is majority White, I acknowledge that my identity 

largely matches that of many of the students and of the teachers involved in this study. As such, I must 

recognize that some of the relational aspects of building this research partnership are easier than they 

would be if I were a person of color, an immigrant, and/or a non-native English language speaker. I also 

admit that given how science has long been a male dominated field and that both teachers I am working 

with are also White men, my presence can reinforce assumptions about science identity, even though that 

is not my intention. As such, I feel the need to give space so that female students and students of color can 

also feel empowered in their science identity through this work. I also must acknowledge that as a White 

man, I can often be blind to the hidden power dynamics and implicit biases present in the classroom 

environment and curricular design choices. 

In concrete terms, this process of supporting female students and students of color took three 

distinct forms: supporting Mr. H and Mr. M in supporting equitable discourse moves in our professional 

learning sessions, providing positive feedback and affirming the emerging science identities of female 

students and students of color, and prioritizing participation of female students and students of color in 

data collection practices. Given that one of my colleagues was interested in using this project to 

investigate how to support more equitable discourse between students in small groups, we made it a 

priority to address the role that teachers can play in supporting this process during our professional 

21 

 
 
 
 
 
learning meetings with Mr. H and Mr. M. As such, I supported her in having these potentially sensitive 

conversations around equitable discourse with Mr. H and Mr. M. Once I was in the classroom, I took key 

opportunities in conversations with female students and students of color to provide these students with 

supportive feedback that affirmed their science identity. I prioritized (where possible given the challenges 

we faced with student recruitment) having appropriate representation of female and POC students in 

student screencasts and student interviews. When analyzing these data, I sought to take an asset-based 

perspective that displayed their emerging science identities and provided examples of how these students 

are using testing and debugging strategies and systems thinking discourse to make sense of the 

evaporative cooling phenomenon. 

22 

 
 
 
 
 
 
 
PAPER 1: DEVELOPING THE SYSTEMS THINKING AND COMPUTATIONAL THINKING 

Abstract 

IDENTIFICATION TOOL  

We developed the Systems Thinking (ST) and Computational Thinking (CT) Identification Tool (ID 

Tool) to identify student involvement in ST and CT as they construct and revise computational models. 

Our ID Tool builds off the ST and CT Through Modeling Framework, emphasizing the synergistic 

relationship between ST and CT and demonstrating how both can be supported through computational 

modeling. This paper describes the process of designing and validating the ID Tool with special emphasis 

on the observable indicators of testing and debugging computational models. We collected 75 hours of 

students’ interactions with a computational modeling tool and analyzed them using the ID Tool to 

characterize students’ use of ST and CT when involved in modeling. The results suggest that the ID Tool 

has the potential to allow researchers and practitioners to identify student involvement in various aspects 

of ST and CT as they construct and revise computational models. 

Introduction 

Many of our current societal and ecological challenges involve complex systems composed of 

interconnected elements. From global pandemics to climate change, these challenges require systems 

thinking (ST) to identify how various elements contribute to emergent effects in large-scale systems. ST 

enables individuals to investigate how a single part of a system can have broader impacts on the whole 

system (Meadows, 2008). Given the complexity of most systems, computational thinking (CT) is often 

required to approach these problems. CT is a sensemaking process where one decomposes a problem in a 

systematic way, translates it into an algorithm that can be interpreted by an information processing agent, 

and iteratively refines it based on new observations and new data inputs (Grover & Pea, 2018; Wing, 

2006). Because both ST and CT are important for addressing problems involving complex systems, it is 

fruitful to consider their synergies for investigating phenomena (Shin et al., 2022; Weintrop, 2016). ST 

and CT are also increasingly being emphasized as important elements of science education on a global 

23 

 
 
 
 
 
scale, being incorporated into official policy documents in many countries including the U.S., the U.K., 

and Taiwan (Csizmadia et al., 2015; NGSS Lead States, 2013; So et al., 2020).  

These efforts to include ST and CT as key aspects of science education create a need for 

developing new research tools for characterizing and monitoring student use of these types of thinking 

(Grover & Pea, 2018). One framework that recognizes the interconnected relationship between ST and 

CT is the “ST and CT Through Modeling Framework,” which describes how student use of ST and CT 

can be supported through the construction of computational models (Bowers et al., 2022; Shin et al., 

2022). This framework seeks to clarify and expand the conceptualizations of ST and CT as proposed by 

the NGSS as well as demonstrate the synergy between ST, CT, and modeling (NGSS Lead States, 2013; 

Shin et al., 2022). Given its focus on the interconnectedness of ST and CT, this framework provides a 

foundation for developing an instrument for observing student use of ST and CT as they construct and 

revise computational models. Such a tool may facilitate researchers in recognizing instances of and 

patterns in students’ use of specific ST and CT aspects as they construct and revise models. In this paper, 

we first summarize our conceptualization of the three main components of our framework (ST, CT, and 

modeling) and how these components combine to form five computational modeling practices. We then 

describe how we developed a research tool based on this framework to explore student use of ST and CT 

as they constructed and revised computational models using a semi-quantitative computational modeling 

tool. Finally, we provide examples of how this tool can be used to identify and categorize student use of 

ST and CT. 

Theoretical Approach 

Systems thinking is an approach to exploring a phenomenon as a network of elements that work 

together to create a system with emergent behavior that is more than the sum of its constituent parts 

(Arnold & Wade, 2015; Forrester, 1971; Meadows, 2008). We define an “element” as a key part of a 

system that can be independently described yet interacts with other aspects of the system to impact the 

overall behavior of that system. Many complex phenomena can be described as a series of interacting 

elements with feedback relationships and informational delays that often generate counterintuitive 

24 

 
 
 
 
 
 
behaviors (Booth-Sweeney & Sterman, 2007; Cronin et al., 2009). To fully engage in ST, students need to 

move beyond simple linear causal reasoning to a system behavior perspective so that they can identify 

common structural patterns found within and across systems. Our framework identifies five major aspects 

of ST: (1) defining a system’s structure and boundaries, (2) engaging in causal reasoning, (3) recognizing 

interconnections and identifying feedback, (4) framing problems or phenomena in terms of behavior over 

time, and (5) predicting system behavior based on system structure (Shin et al., 2022). 

Computational thinking has many definitions ranging from being grounded in mathematics and data 

analysis (NRC, 2012) to being an aspect of sensemaking centered on formulating questions through 

testing models and simulations (Schwarz et al., 2017; Weintrop et al., 2016) to thinking like a computer 

scientist (Grover & Pea, 2018; Wing, 2006). Synthesizing these approaches, we define CT as a form of 

sensemaking that uses an iterative and quantitative approach to decompose a phenomenon or problem to 

explore, explain, and predict the behavior of that phenomenon or to find a solution to a problem through 

the creation and iterative revision of algorithms (Shin et al., 2022). Our framework identifies five major 

aspects of CT: (1) decomposing problems such that they are computationally solvable, (2) creating 

computational artifacts using algorithmic thinking, (3) generating, organizing, and interpreting data, (4) 

testing and debugging, and (5) making iterative refinements. 

In addition to ST and CT, modeling forms the third component of our framework. Modeling is the 

process of creating a representation of a phenomenon such that the representation can be used to explain 

or predict the behavior of that phenomenon (Harrison & Treagust, 2000; Schwarz et al., 2009). From this 

perspective, models are viewed not just as the product of scientific inquiry but as essential tools for 

supporting scientific reasoning and sensemaking (Schwarz et al., 2009). Additionally, analyzing existing 

models can help one gain insight into different aspects of a phenomenon and predict its future behavior 

(zu Belzen & Krüger, 2010). Scientists and students often use models to represent their conceptualization 

of a phenomenon so that they can synthesize and communicate their ideas to others (Gilbert & Justi, 

2016). Within our framework, students utilize both ST and CT approaches as they engage in the process 

of modeling.  

25 

 
 
 
 
 
While researchers (Berland & Wilensky, 2015; Wing, 2017) claim that CT and ST are intertwined 

and support each other, we view CT and ST as co-equal, yet distinct in the context of modeling because 

of their unique ways of approaching problems. CT focuses on designing solutions through computation 

while ST analyzes the various relationships among elements in a system (Shute et al., 2017). Our 

framework thus defines CT and ST as separate entities and identifies five computational modeling 

practices that combine aspects of ST and CT: (M1) characterize problem or phenomenon to model, (M2) 

define the boundaries of the system, (M3) design and construct model structure, (M4) test, evaluate, and 

debug model behavior, and (M5) use model to explain and predict behavior of phenomenon or design 

solution to a problem (Bowers et al., 2022). Students engage in these modeling practices as they 

construct, test, revise, and use their computational models. Students characterize the phenomenon (M1) as 

they discuss and unpack key elements of the phenomenon under study and as they learn about new 

elements of the phenomenon. Students define the boundaries of the system (M2) and design/construct 

model structure (M3) as they discuss which variables to add to their models and set relationships between 

these variables respectively. Once students have built their initial models, they can analyze the model 

output and should compare this output to real-world data or their emerging understanding of the 

phenomenon to identify and modify flaws in their model, thus testing and debugging of model behavior 

(M4). Finally, students use their models to construct explanations of the phenomenon or predict how the 

system will behave under different circumstances (M5). Each of these practices are supported by a 

combination of aspects of ST and CT (Table 1).  

26 

 
 
 
 
 
 
 
Table 1: The computational modeling practices and associated ST and CT aspects 

Computational 
Modeling Practice 

Associated ST and CT Aspects 

M1. Characterize 
Problem or Phenomenon  

ST: Define a System 
CT: Decompose Problems  

M2. Define System 
Boundaries  

ST: Define a System, Frame Phenomena in Terms of Behavior 
over Time 
CT: Decompose Problems, Create Algorithmic Artifacts  

M3. Design and 
Construct Model 
Structure  

ST: Engage in Causal Reasoning, Recognize Interconnections and 
Feedback, Frame Phenomena in Terms of Behavior over Time 
CT: Create Algorithmic Artifacts  

M4. Test, Evaluate, and 
Debug Model Behavior  

ST: Define a System, Predicting System Behavior Based on 
System Structure 
CT: Generate and Interpret Data, Test and Debug, Make Iterative 
Refinements 

M5. Use Model to 
Explain and Predict 
Behavior of 
Phenomenon  

ST: Predict System Behavior Based on System Structure, Engage 
in Causal Reasoning 
CT: Generate and Interpret Data, Test and Debug, Make Iterative 
Refinements 

Although the science education community has established ST, CT, and modeling as key learning 

goals, we know relatively little about how to support students in these practices. We used the ST and CT 

Through Modeling Framework to develop a research tool that could help researchers investigate student 

use of ST and CT as they build, test, and revise models. We hypothesize that such a tool could help 

researchers identify which aspects of ST and CT students use more frequently or find challenging. 

Therefore, we investigate these research questions: How can one characterize patterns of student use of 

specific aspects of ST and CT as they construct and revise models? Which aspects seem to be more 

challenging for learners? To address these questions, we developed the ST and CT Identification Tool (ID 

Tool) to classify instances of students using aspects of ST and CT as they build, test, and revise models. 

27 

 
 
 
 
 
 
 
 
Methods 

Study Context and Data Sources 

The data used to develop and evaluate our ID Tool came from a high school chemistry unit on 

evaporative cooling designed to meet NGSS learning goals and enacted at a Midwestern U.S. STEM 

school. We designed this unit around Project-Based Learning (PBL) principles (Krajcik & Shin, 2022) in 

which students explore the phenomenon of evaporative cooling, use a driving question and a driving 

question board and conduct investigations to address the driving question. This unit also centered on 

students building and revising models of phenomena using an open-source semi-quantitative 

computational modeling tool called SageModeler (Figure 4). SageModeler is a modeling tool that allows 

students to construct semi-quantitative models without using formal programming language 

(https://sagemodeler.concord.org). Students can test these models using a simulation function to generate 

model output and using graphs constructed from the output or imported from real-world data. To collect 

data on students building and revising their models, we used 15 hours of screencasts from five pairs of 

students for a total of 75 hours. Screencasts record the students’ actions on their laptop screens and record 

student audio while they are building and manipulating their computational models.   

Instrument Development 

Content validity refers to the extent all aspects of our framework align with the literature. For 

content validity, we conducted an extensive literature review of CT, ST, and modeling, and deconstructed 

each practice into smaller aspects and sub-aspects (Shin et al., 2022). We examined specific aspects of ST 

and CT to define how students should be able to use their knowledge through five modeling practices. 

During the development processes, our research team – including experts in science, learning sciences, 

learning technology, and science education – defined, reviewed, and revised the modeling process and 

specified aspects of ST and CT in the context of modeling through discussing disagreements and 

ambiguities, continuing (or updating) our literature review, and teachers’ and students’ data collected 

from implementation. Our research team expanded on this work, developing a theoretical framework 

describing how specific aspects of ST and CT are applied through five distinct modeling practices. These 

28 

 
 
 
 
 
processes confirmed the theory-based modeling process outlined in the framework, ensuring that the ST 

and CT aspects were operationalized to monitor student involvement in these aspects while modeling. 

Construct validity is the extent to which the indicators of our ID Tool measure our intended constructs. 

To accomplish construct validity, our approach focuses on defining indicators (evidence) clearly and 

comprehensively and describing measurable (observable) behaviors that are present when learners are 

utilizing the desired ST and CT aspects through modeling. We first decomposed the various aspects of ST 

and CT associated with each modeling practice into smaller sub-aspects. For example, the computational 

modeling practice of “test, evaluate, and debug model behavior” (M4) is supported by the ST aspects of 

“defining a system” and “predicting system behavior based on system structure” along with the CT 

aspects of “generating, organizing, and interpreting data,” “testing and debugging,” and “making iterative 

refinements” (Table 1). These ST and CT aspects can in turn be broken down into more specialized sub-

aspects. Within the CT aspect of “testing and debugging” we identified three key sub-aspects associated 

with the modeling practice of “test, evaluate, and debug model behavior”: “detecting issues in an 

inappropriate solution,” “fixing issues based on the behavior of the artifact,” and “confirming the solution 

using multiple starting conditions.”  

We then identified specific learner-generated behaviors or knowledge products that can be associated 

with one or more of these sub-aspects of ST and CT. These behaviors were operationalized as indicators. 

In this study, we are specifically focusing on indicators associated with “testing and debugging” as the 

literature strongly suggests that students often have difficulty fully participating in its associated ST and 

CT aspects (Grover & Pea, 2018). The six indicators associated with the (M4) modeling practice behavior 

are listed below, along with their respective ST and CT aspects and sub-aspects. 

4A. Analyzing and Sensemaking through Discourse. ST: defining a system (redefining model structure) 

and predicting system behavior. CT: testing and debugging (detecting faults and fixing faults) 

4B. Analyzing Model Output: Simulations. ST: Not applicable (NA). CT: interpreting data (generating 

data, analyzing data), testing and debugging (detecting faults, confirming solutions), and iterative 

refinement (verifying solutions) 

29 

 
 
 
 
 
4C. Analyzing Model Output: Graphs. ST: NA. CT: interpreting data (generating data, analyzing data), 

testing and debugging (detecting faults, confirming solutions), and iterative refinement (verifying 

solutions) 

4D. Analyzing and Using External Data. ST: NA. CT: interpreting data (generating data, analyzing data) 

and iterative refinement (verifying solutions) 

4E. Using Feedback. ST: defining a system (redefining model structure). CT: testing and debugging 

(fixing issues) and iterative refinement (making modifications and verifying solutions) 

4F. Reflecting upon Iterative Refinement. ST: defining a system (redefining model structure). CT: testing 

and debugging (fixing issues) and iterative refinement (making modifications and verifying solutions). 

The modeling research team also reviewed the indicators to determine if they plausibly indicated student 

use of individual aspects of ST and CT. Once these indicators were reviewed, we further refined and 

developed them into a four-part classification system (ranging in ascending order from Level 1 to Level 

4) to explore the sophistication of student use of these aspects. We then conducted an interrater reliability 

test for these indicators, which demonstrated a 91.7% agreement (Cohen’s Kappa, .87) between two 

independent coders. 

Data analysis and findings 

To code students’ collaborative interactions as they built a computational model with 

SageModeler, we analyzed screencast data using Atlas.ti to organize the data according to the four levels 

of the ID Tool and to determine the relative frequency of each of these six indicators. The patterns of each 

group as well as among groups were analyzed to characterize how students used ST and CT aspects 

during modeling and which aspects seemed challenging for learners. Below we summarize how student 

behaviors served as evidence for ST and CT by matching indicators from the ID tool with observations 

from our study. 

The computational modeling practice of “test, evaluate, and debug model behavior” occurs as students 

evaluate their models and consider changes they need to make so that their models more accurately reflect 

their understanding of the phenomenon. Given the various approaches students can take to evaluating and 

30 

 
 
 
 
 
revising their models, we have identified six observable indicators as evidence that students are involved 

in different aspects of this modeling practice as listed above as 4A-4F. As students worked on refining 

their models with their partners, they often discussed specific model relationships and/or broader model 

behavior. For example, when evaluating their early model, two students had this conversation regarding 

relationships between variables:  

“Student 1: As the molecular energy increases, that makes the molecular spacing of the 

substance increase. That’s good. And then the spacing of the air molecules increases, this 

[molecular spacing] stays the same until it [spacing of the air molecules] gets small as there’s 

not a lot of space. Student 2: Makes sense. Student 1: We can change this [the relationship 

between spacing of air molecules and molecular spacing]. Student 2: Yeah, I don’t think that 

makes sense. Student 1: Is it the other way then? Student 2: Maybe?” 

This conversation is an example of indicator 4A, students participating in analyzing and 

sensemaking of model structure through discourse. At Level 1, students verbalize the changes they made 

to their model or describe the area of their model they believe needed further revisions, but do not provide 

reasoning for making these changes or why an area of their model needs improvement. If a student 

verbalizes reasoning but does not participate in a mutual dialogue with their partner, they are considered 

performing at Level 2. To progress to Level 3 requires that students engage in a back-and-forth dialogue 

by providing reasoning for making key changes to their model. Evidence for Level 4 would be a student 

exchange in which they consider how changes to their model would impact the behavior of the model. 

Because the students in this example provide a brief amount of reasoning as they are trying to justify the 

relationship between the spacing of air molecules and molecular spacing and both students contribute to 

the discourse (although Student 2 does so in a minimalist manner), we consider this to be evidence of 

students participating at Level 3 for Indicator 4A. 

In addition to listening to student discourse, we observe students using model output features to 

test their model’s behavior. SageModeler offers two ways of generating and analyzing model output: 

manipulating variables using the simulation tool and generating graphs. Both actions produce observable 

31 

 
 
 
 
 
indicators (4B and 4C, respectively). The simulation tool allows students to manipulate the relative 

amount of each input to test its impact on model behavior (Figure 4A). If students adjust the relative 

amount of one or more input variables, but do not verbalize their interpretation of this process, we 

consider this evidence of some attempt to interpret data, even if only at Level 1. Once students begin 

verbalizing their interpretation of the testing process (either by identifying specific flaws in their model or 

by stating that their model is functioning in accordance with their expectations), we can map their 

progress to Level 2. Identifying Levels 3 and 4 requires that students participate in a meaningful back and 

forth dialogue with either their partner, one or more peers, or a teacher. If the conversation focuses on the 

smaller aspects of model behavior (centering on a single causal chain), we consider this evidence of Level 

3 for interpreting data using the simulation tool, while evidence of Level 4 would require a more holistic 

discussion of the model (e.g., focusing on how multiple causal chains impact each other).  

Students can also use SageModeler to generate graphs that analyze the relationships between two 

variables from their model (Figure 4B).  

If students unsuccessfully attempt to make a graph of two variables from the model output, we 

consider this evidence of Level 1 for indicator 4C. If students successfully make a graph of two variables, 

but do not discuss their interpretation of this graph, their behavior can be categorized as Level 2. 

Evidence for Level 3 requires students to participate in a dialogue where they discuss their interpretation 

of the graph and its implications for those two variables in isolation from the broader context of the 

model. If students consider the implications this graph has for both these two variables and broader model 

behavior, this is considered evidence for Level 4 for generating and analyzing data, detecting faults, and 

confirming and verifying a solution. In Figure 4A, these students used the simulate feature to look at their 

model output, but neither student verbalized this process in a meaningful way, so we inferred that the 

students were performing at Level 1 for indicator 4B. The students who made the graph in Figure 4B 

verbalized their interpretation of the graph, stating, “This graph shows that as IMF increases the potential 

energy increases but then plateaus. That makes sense to me.” This is evidence of Level 3 performance for 

indicator 4C. 

32 

 
 
 
 
 
Figure 4: Student use of model output analysis features in SageModeler 

Figure 4A: Student use of “Simulate” feature in SageModeler 

Figure 4B: Student use if graphs in SageModeler 

Just as students can analyze their model output to see if their model behaves according to their 

understanding of the phenomenon, students can also examine external data sources to verify if their 

models accurately describe the phenomenon as evidenced by indicator 4D. When students superficially 

refer to the existence of data or loosely reference dubious data sources, they are at Level 1. At Level 2, 

students reference external data (from real-world observation or specific information provided by 

instruction or readings) to inform or justify changes made to their models, but do not actively compare 

these data to their model output. Once students progress to comparing specific pieces of real-world data to 

their model output, they can be said to be engaging at Level 3. This is particularly evident if they input 

quantitative external data into the modeling program and directly compare it to their model output (Figure 

33 

 
 
 
 
 
 
 
5). Finally, if students compare and contrast their external data to model output and discuss the validity of 

the external data, this is evidence of Level 4 performance. In Figure 5, the students input real-world data 

from an experiment into SageModeler but did not actively compare these data to their model output, 

indicating a Level 2 performance. 

Figure 5: Students inputting external data 

Another important way students can receive feedback on their models is through discussions with 

peers or a teacher, which can inform further revisions, allowing students to engage in using feedback to 

inform model revisions (Indicator 4E). If the feedback students receive does not inform any changes to 

their model or prompt further analytical discourse, they are at Level 1. Note that if the feedback they 

receive is inappropriate and students do not discuss why this feedback was inappropriate their behavior 

would still be indicative of a Level 1 performance. Students who use this feedback to make changes to 

their models but have neither a discussion with their partner before making these changes nor test their 

models after making these changes are at Level 2. Once students use this feedback to either spark an 

analytical discussion or analyze their model’s behavior after making recommended changes, they are 

operating at Level 3. If students then address the originator of the feedback or have a conversation with 

another student group about why they made these changes or what new insights have emerged from their 

testing and debugging of these changes, their behavior can be categorized as being at Level 4. For 

instance, one of their peers asked another pair to remove “density” as a variable from their model, arguing 

that it was not necessary to explain the phenomenon. This pair of students then removed the density 

34 

 
 
 
 
 
 
variable but did not discuss why they were removing this variable.  We classify this as performing at 

Level 2.  

Finally, students should be given opportunities to reflect on the changes they have made to their 

models. Students can participate in reflecting on iterative refinement (4F) through discussion or writing as 

seen in Figure 6. Student level of expertise is suggested by the depth and richness of the insights they 

exhibit into their own revision process. When students give surface-level feedback on the quality of their 

models at a given point in time, without considering the changes they have made or the reasons for 

making these changes, they are performing at Level 1. To infer Level 2 performance, students list specific 

changes that they have made to their models, but do not provide any detailed reasons for making them. 

Evidence for Level 3 performance requires that students reflect upon specific changes to their models and 

explain their reasoning behind these changes. Finally, students performing at Level 4 emphasize broader 

changes that have occurred to their models over a longer period (often across multiple revisions) and 

provide an explanation as to how their model has evolved. In Figure 6, a pair of students list the changes 

they have recently made to their models and give specific reasons for making these changes (peer 

feedback and changes in conceptual understanding). As this reflection focuses on more immediate 

changes and not broader patterns, it is evidence of Level 3 performance. Overall, these results support 

Research Question 1 (How can one characterize patterns of student use of specific aspects of ST and CT 

as they construct and revise models?) as they demonstrate how our ID Tool can be used to identify and 

classify specific instances of students using ST and CT as they are testing and debugging their models. 

Figure 6: Student’s written reflections on iterative refinement 

35 

 
 
 
 
 
 
Using our ID Tool, we examined the screencasts of five student groups during an evaporative 

cooling unit. We then compared the relative amount of time (as determined by the number of 10-minute 

intervals where students were involved in at least once in a respective indicator) these students spent 

participating in each of the six sets of behaviors we viewed as indicators of involvement with the 

modeling practice of testing and debugging (Table 2). Incidents were recorded for each 10-minute block 

and data from all five groups were aggregated to compare the relative amount of time coded for the 

presence of each indicator. Time points where students were not exhibiting any indicators were excluded 

from this data set and students could exhibit multiple indicators within one 10-minute block. The results 

suggest that students spent a large portion of their time using discourse-based strategies to analyze their 

models (as seen by their high use of Analyzing and Sensemaking of Models through Discourse [4A], 

59.5%) and often utilized the simulation features present within the modeling program. However, these 

students seem less likely to use external data sources to drive their revision process ([Indicator 4D], 

15.1%) and even more hesitant to use graphs to analyze their model output ([Indicator 4C], 3.2%). This 

suggests that additional scaffolds are likely needed to support student participation in these activities. It is 

important to note that although students were more likely to participate in Analyzing and Sensemaking of 

Models through Discourse (Indicator 4A), many exhibited performance only at Level 1 and Level 2 

behaviors, which might indicate that performing at higher levels for these indicators were more 

challenging for them. For instance, we observed several instances where the student in charge of the 

cursor dominated the sensemaking discussion, while their partner provided minimal feedback or verbal 

sensemaking support. Overall, these results address Research Question 2 (Which aspects seem to be more 

challenging for learners?) by suggesting that aspects associated with Indicator 4C are more challenging 

for students or are less supported by either the curriculum or their teachers compared to aspects associated 

with Indicators 4A or 4B. 

36 

 
 
 
 
 
 
 
Table 2: Relative time spent participating in each indicator for all five groups  

Indicator 

Total time Spent* 

Relative 

Percentage 

4A. Analyzing models: Discourse  

75 (G1:13, G2: 24, G3: 16, G4: 9, G5: 

59.5 

13)  

4B. Analyzing Model Output: 

48 (G1: 4, G2: 15, G3: 12, G4: 5, G5: 12)  38.1 

Simulations  

4C. Analyzing Model Output: Graphs  

4 (G1: 0, G2: 0, G3: 4, G4: 0, G5: 0) 

3.2 

4D. Analyzing External Data  

19 (G1: 6, G2: 6, G3: 3, G4: 2, G5: 2) 

15.1 

4E. Using Feedback  

36 (G1: 11, G2: 5, G3: 3, G4: 11, G5: 6) 

28.6 

4F. Reflecting upon Iterative 

41 (G1: 3, G2: 12, G3: 11, G4: 5, G5: 10)  32.5 

Refinement  

Total 

223 (G1:37, G2:62, G3: 49; G4:32 

G5:43) 

Note: * Total number of 10-minute coding blocks where indicator is present at least once. G: group. GX: 

total coding blocks where indicator is present. Due to students often participating in multiple indicators in 

a single 10-minute block and the “relative percentage” referring to the percentage of coding blocks where 

this indicator is present, the sum of the relative percentages does not add up to 100%. 

Conclusions and implications 

These results demonstrate how the ID Tool can be used to characterize patterns and challenges of student 

use of specific aspects of ST and CT as they construct and revise models. While the instrument described 

37 

 
 
 
 
 
 
 
in this paper focuses on aspects of ST and CT used during model revision, we have also developed other 

indicators for ST and CT that need to be validated. A draft of these additional indicators can be found at 

https://tinyurl.com/2ft6rkza. Building off the ST and CT Through Modeling Framework, our ID Tool 

seeks to connect abstract ideas of student cognition with concrete indicators that can be observed through 

screencasts, classroom videos, or direct observation of students in classrooms. As each indicator is 

grounded in specific aspects and sub-aspects of ST and CT, it can be used to track how students are using 

ST and CT in various learning activities across disciplines. Therefore, this tool can be used to develop 

future research instruments such as teacher and student interview protocols and classroom observation 

instruments as well as assist with creating ST and CT integrated learning activities. While the ID tool is 

primarily designed for research use, it can be modified to be used by teachers to help them identify 

moments where students are using ST and CT. Overall, our ID Tool represents an important step in 

developing a meaningful instrument for monitoring student use of ST and CT while constructing and 

revising models in realistic classroom settings. Further validity studies based on students’ data in various 

learning contexts are needed to iteratively revise this ID Tool as an evidence-based principled tool to 

observe student use of ST and CT as they construct and revise computational models. We are also in the 

process of utilizing this tool to further investigate how students utilize ST and CT aspects within a 

computational modeling context. Given the increased need and benefits to incorporate ST, CT, and 

modeling into science education, there is a growing demand for tools that can support researchers, 

curriculum developers, and teachers in classifying instances of student use of these practices. Our 

research efforts on the ID Tool seek to further research in ST, CT, and computational modeling and 

promote the integration of these three research fields to support student learning. 

38 

 
 
 
 
 
 
 
PAPER 2: EXAMINING STUDENT TESTING AND DEBUGGING WITHIN A 

COMPUTATIONAL SYSTEMS MODELING CONTEXT  

Abstract 

Interpreting and creating computational systems models are important goals of science education. One 

aspect of computational systems modeling that is supported by modeling, systems thinking, and 

computational thinking literature is “testing, evaluating, and debugging models.” Through testing and 

debugging, students can identify aspects of their models that either do not match external data or conflict 

with their conceptual understandings of a phenomenon. This disconnect encourages students to make 

model revisions, which in turn deepens their conceptual understanding of a phenomenon. Given that 

many students find testing and debugging challenging, we set out to investigate the various testing and 

debugging behaviors and behavioral patterns that students use when building and revising computational 

systems models in a supportive learning environment. We designed and implemented a six-week unit 

where students constructed and revised a computational systems model of evaporative cooling using 

SageModeler software. Our results suggest that despite being in a common classroom, the three groups of 

students in this study all utilized different testing and debugging behavioral patterns. Group 1 focused on 

using external peer feedback to identify flaws in their model, Group 2 used verbal and written discourse 

to critique their model’s structure and suggest structural changes, and Group 3 relied on systemic analysis 

of model output to drive model revisions. These results suggest that multiple aspects of the learning 

environment are necessary to enable students to take these different approaches to testing and debugging. 

Introduction 

Science education researchers and policymakers increasingly recognize the importance of involving 

learners in modeling. From the Next Generation Science Standards (NGSS) in the United States to South 

Korea’s new Korean Science Education Standards (KSES) and Germany’s science educational standards 

(KMK), policymakers have written scientific modeling into their science standards (KMK, 2005a, 2005b, 

2005c; National Research Council [NRC], 2012; NGSS Lead States, 2013; Song et al., 2019). While each 

of these key policy documents has somewhat different viewpoints on using modeling in science 

39 

 
 
 
 
 
classrooms, they, along with many scholars, generally agree that scientific modeling is a process of 

creating or interpreting a representation of a phenomenon that can be used to explain or predict the 

behavior of that phenomenon (Harrison & Treagust, 2000; Louca & Zacharia, 2012; Mittelstraß, 2005; 

Schwarz et al., 2009; Schwarz & White, 2005). There are multiple ways of approaching modeling within 

science classrooms. Teachers can have students examine and interpret pre-existing models, investigating 

what these models demonstrate about natural phenomena and their inherent limitations (Krell et al., 

2015). Students can also construct models of phenomena as sensemaking tools and to communicate their 

ideas to others (Bierema et al., 2017; Passmore et al., 2014; Schwarz et al., 2009). Just as there are 

multiple approaches to using models, students can construct multiple types of models, including 

mathematical models, diagrammatic models, and computational models (Grosslight et al., 1991; Harrison 

& Treagust, 2000; Zhang et al., 2014).  

Computational modeling uses algorithms or algorithmic thinking to create a model that represents 

the behavior of a system in a quantitative or semi-quantitative manner (Fisher, 2018; Pierson & Clark, 

2018; Sengupta et al., 2013; Shin et al., 2021, 2022; Weintrop et al., 2016). Computational models  can be 

valuable tools for science learning; by combining the visual aspects of diagrammatic models with the 

mathematical capabilities of mathematical models, computational models are responsive to new data 

inputs and can be tested and debugged (Campbell & Oh, 2015; Fisher, 2018; Pierson & Clark, 2018; 

Sengupta et al., 2013; Shin et al., 2022; Sins et al., 2005; Weintrop et al., 2016; Wilensky & Reisman, 

2006). While computational modeling programs have existed for decades, their use in K-12 classrooms 

remains limited. This absence can partially be attributed to the siloed nature of the three main bodies of 

literature underpinning our conceptualization of computational modeling: modeling, systems thinking 

(ST), and computational thinking (CT) (Shin et al., 2022). These three cognitive processes are all 

recognized individually as important goals for science learning, and their intrinsic synergy is a growing 

interest in the field (NRC, 2012; Sengupta et al., 2013; Shin et al., 2022; Shute et al., 2017; Weintrop et 

al., 2016). 

40 

 
 
 
 
 
Within computational modeling, several overlapping practices allow students to utilize ST and 

CT as they build computational models (Shin et al., 2022). One computational modeling practice that has 

strong foundations in ST and CT literature is the practice of testing, evaluating, and debugging model 

behavior (Aho, 2012; Basu et al., 2016; Grover & Pea, 2018; Shin et al., 2022; Stratford et al., 1998; 

Weintrop et al., 2016; Wilensky & Reisman, 2006). Debugging is a practice unique to computational 

contexts as it requires that the model be defined in an algorithmic manner such that its output can be 

calculated by changing the relative amount of each input variable (Emara et al., 2020; Li et al., 2019; 

McCauley et al., 2008). By manipulating the relative amount of each input variable, students can test their 

models to see if they behave according to their understanding and expectations of the phenomena and 

make changes based on these tests (Brennan & Resnick, 2012; Hadad, 2020; Li et al., 2019; Shin et al., 

2022; Stratford et al., 1998). Likewise, students can compare their model output to real-world data to 

further modify and improve their computational models (Campbell & Oh, 2015; Shin et al., 2021; 

Weintrop et al., 2016; Wilensky & Reisman, 2006). While testing and debugging is an important aspect 

of computational modeling, students often find it challenging (Li et al., 2019; Sins et al., 2005; Stratford 

et al., 1998; Swanson et al., 2021; Wilensky & Reisman, 2006). Grapin et al. (2022) and Stratford et al. 

(1998) suggest that students are reluctant to examine and interpret model output to inform later model 

revisions.  

Given that testing and debugging is both an affordance and a challenge within computational 

modeling, it is important to investigate how students test and debug as they revise computational models. 

In this paper, we categorize how students test and debug computational models within a constructivist 

classroom environment. We are interested in the different testing and debugging behavioral patterns 

students utilize during the model revision process. By categorizing how students test and debug their 

models within a constructivist learning environment (centered on Project-Based Learning [PBL] 

principles), we can hypothesize which aspects of the learning environment best support students in this 

endeavor. Before summarizing our investigative methods, we review the literature underpinning our 

41 

 
 
 
 
 
conceptualization of constructivism, computational modeling, and the modeling practice of testing and 

debugging. 

Literature Review 

Constructivism and Project-Based Learning 

For the past several decades, efforts at improving science education have centered on enacting 

constructivist philosophies and pedagogies in science classrooms (Fosnot, 1996; NRC, 2007, 2012). 

Constructivism argues that people do not absorb new knowledge in a pure form, but instead interpret new 

information through the lens of prior knowledge, experiences, and social relationships, thereby 

constructing their own knowledge based on their interactions with the world around them (Fosnot, 1996; 

Krahenbuhl, 2016; Pass, 2004). Advocates for constructivist approaches in science education push back 

against transmission-based approaches to teaching and learning, such as the Initiate, Respond, and 

Evaluate (IRE) model of classroom discourse (Berland & Reiser, 2009; Lemke, 1990; Mehan, 1979). 

Instead, they endorse classroom environments that allow students to engage in meaningful investigations 

of real-world phenomena so that they can build a deeper understanding of both science content and 

scientific practices (Berland et al., 2016; Krajcik & Shin, 2022; NRC, 2012; Windschitl et al., 2020). In 

the United States, this push for a constructivist approach to science education led to the development and 

adoption of the Next Generation Science Standards (NGSS), which prioritizes having students engage in 

authentic science practices, including modeling and computational thinking (NRC, 2012; NGSS, 2013).  

Within the broader umbrella of constructivist approaches to science education, there are several 

frameworks for designing and implementing constructivist lessons in K-12 classrooms, including 

Ambitious Science Teaching (Windschitl et al., 2020), the 5E instructional model (Duran & Duran, 

2004), and Project Based Learning/PBL (Krajcik & Shin, 2022;). PBL is a student-centered, constructivist 

approach to teaching and learning science (Krajcik & Blumenfeld, 2006) that emphasizes collaboration, 

inquiry, authentic problem solving, student autonomy, and teacher facilitation. The PBL approach to 

curriculum design is built around five key principles: centering lesson planning on learning goals that 

allow students to show mastery of both science ideas and science practices, building student engagement 

42 

 
 
 
 
 
using intriguing phenomena and driving questions, allowing students to explore the driving question and 

phenomena using authentic scientific practices, tasking students with creating knowledge products 

(models, explanations, or arguments) that demonstrate student learning, and scaffolding student learning 

through the use of appropriate learning technologies (Krajcik & Shin, 2022; Shin et al., 2021). This 

approach has been shown to enhance students’ understanding of scientific concepts (Geier et al., 2008; 

Hmelo-Silver et al., 2007; Karacalli & Korur, 2014; Schneider et al., 2022) and positively impact some 

affective aspects like self-efficacy and motivation for learning (Fernandes et al., 2014; Schneider et al., 

2016; Wurdinger et al., 2007).  

Computational Modeling 

Computational models are algorithmic representations that allow users to simulate the behavior of a 

phenomenon under multiple starting conditions (Brennan & Resnick, 2012; Fisher, 2018; Pierson & 

Clark, 2018; Shin et al., 2021, 2022; Sengupta et al., 2013). Students engage in computational modeling 

as they construct, test, revise, and evaluate computational models. Computational modeling is rooted in 

constructionist pedagogies, some of which strongly advocate for computational modeling as a mechanism 

for science learning (Papert, 1980; Papert & Harel, 1991; Pierson & Clark, 2018; Sengupta et al., 2013). 

Constructionist pedagogies argue that students learn best when given opportunities to construct and revise 

knowledge products in ways that promote authentic sensemaking (Kafai, 2005; Papert & Harel, 1991; 

Pierson & Clark, 2018). As computational models provide an environment where students can build and 

test different ways of representing a phenomenon, computational modeling facilitates sensemaking and, 

therefore, connects well with constructionism (Farris et al., 2019; Fisher, 2018; Papert, 1980; Pierson & 

Clark, 2018; Sengupta et al., 2013).  

Over the past few decades, the integration of computational modeling in science classrooms has 

been piloted by many researchers from both systems thinking (ST) and computational thinking (CT) 

perspectives (Arnold & Wade, 2017; Booth-Sweeney & Sterman, 2007; Brennan & Resnick, 2012; 

Forrester, 1971; Stratford et al., 1998; Weintrop et al., 2016; Wilensky & Reisman, 2006). Systems 

thinking approaches the exploration of a phenomenon as a series of interconnected elements that work 

43 

 
 
 
 
 
together to create a system with emergent behavior that is more than the sum of its constituent parts 

(Arnold & Wade, 2015; Cabrera et al., 2008; Forrester, 1971; Hmelo-Silver & Azevedo, 2006; Meadows, 

2008; Riess & Mischo, 2010). ST literature encompasses both agent-based modeling and system 

dynamics modeling. In the context of system dynamics, this literature tends to focus on how students 

include key structural elements in their computational models and how they represent a system’s behavior 

over time (Booth-Sweeney & Sterman, 2007; Cronin et al., 2009; Sterman & Sweeney, 2002).  

Other researchers often focus on how students use CT as they build and revise computational models 

(Brennan & Resnick, 2012; Swanson et al., 2021; Weintrop et al., 2016; Wilensky & Reisman, 2006). CT 

is a form of sensemaking that uses an iterative and quantitative approach to decompose a phenomenon or 

problem to explore, explain, and predict the behavior of that phenomenon or to find a solution to a 

problem through the creation and revision of algorithms (Grover & Pea, 2018; Psycharis & Kallia, 2017; 

Schwarz et al., 2017; Shin et al., 2022; Weintrop et al., 2016; Wing, 2006). Because the CT community 

has its origins in computer science education, CT literature emphasizes the algorithmic nature of 

computational models, in how students construct and revise their models (Brennan & Resnick, 2012; 

Weintrop et al., 2016). Additionally, the relationship between computational modeling and computational 

thinking has been well-established in the fields of mathematics and engineering education (Bakos & 

Thibault, 2018; Benton et al., 2017; Magana & Coutinho, 2017; Zhang et al., 2020). Zhang et al. (2020) 

found that engineering students who incorporated the practice of computational thinking within their 

model construction practices experienced a significant increase in learning outcomes. Similarly, Magana 

and Coutinho (2017) demonstrated the consensus among engineering experts in academia and industry on 

the crucial role of preparing future engineers to use computational models in problem solving. 

Furthermore, in mathematics education, studies have shown improved learning outcomes as students 

engage in computational thinking through basic programming (Bakos & Thibault, 2018; Benton et al., 

2017; Gleasman & Kim, 2020).  

Researchers in both disciplines have at various times addressed similar research questions and 

agree on many of the core components of computational modeling, including the crucial nature of testing 

44 

 
 
 
 
 
and debugging (Barlas, 1996; Brennan & Resnick, 2012; Shin et al., 2022; Sins et al., 2005; Stratford et 

al., 1998; Swanson et al., 2021; Wilensky & Reisman, 2006). Given this overlap between the ST and CT 

literature within computational modeling, “A Framework for Computational Systems Modeling” 

describes how ST and CT are expressed within computational systems modeling and support students in 

building, testing, and revising computational models (Shin et al., 2022). Within this framework, five 

computational systems modeling practices build on key aspects of both ST and CT (Shin et al., 2022). 

While each of these modeling practices represent possible avenues for students to develop ST and CT 

competencies, it is impractical to develop a singular research instrument to assess all aspects of this 

framework. Therefore, to conduct a more cohesive study, we chose to specifically focus on the modeling 

practice of “test, evaluate, and debug model behavior” as it is a particularly challenging aspect of 

computational systems modeling for many students (Figure 7) (Grapin et al., 2022; Li et al., 2019). 

Figure 7: Visual Representation of our Framework for “Test, Evaluate, and Debug Model Behavior” 

This diagram is a visual representation of the various ST and CT aspects that are included in our 

understanding of the computational systems modeling practice of “Test, Evaluate, and Debug Model 

Behavior” based on the work of Shin and colleagues (2022). On the left-hand side are the various ST 

subaspects that flow into the ST aspects that support this practice while the right-hand side shows the CT 

aspects and subaspects involved in this practice. 

45 

 
 
 
 
 
 
Test, Evaluate, and Debug Model Behavior 

Testing, evaluating, and debugging model behavior describes a broad range of strategies found across 

modeling, system dynamics, and computational thinking literature (Barlas, 1996; Brennan & Resnick, 

2012; Campbell & Oh, 2015; Csizmadia et al., 2015; Gilbert, 2004; Li et al., 2019; Sins et al., 2005). 

Testing and evaluating hypotheses is a core aspect of scientific inquiry (Gilbert, 2004; Lederman, 2013; 

NRC, 2012). Through this iterative process, scientists revise their understanding of natural phenomena. 

Testing and evaluating are also crucial for students constructing scientific models in K-12 settings 

(Campbell & Oh, 2015; Gilbert, 2004; Louca & Zacharia, 2012; Schwarz et al., 2009). Ideally, students 

have multiple opportunities to test their models through experiments and revise their models based on 

their results. As iterative refinement helps students make sense of a phenomenon in a constructionist 

manner, it is considered a key element of metamodeling knowledge (Schwarz et al., 2009; Krell & 

Kruger, 2016). 

In computational modeling, both the systems dynamics and CT communities agree on the 

importance of testing and debugging (Barlas, 1996; Brennan & Resnick, 2012; Csizmadia et al., 2015; 

Sins et al., 2005). Several system dynamics studies recognize model evaluation or interpretation (i.e., 

students’ ability to meaningfully analyze model output data and determine how their model functions 

based on its structures) and model revision (i.e., changes students make to their models based on their 

model evaluations) as core components of computational modeling (Barlas, 1996; Hogan & Thomas, 

2001; Stave, 2002). Likewise, CT literature also emphasizes the importance of troubleshooting or 

debugging and iterative refinement (Brennan & Resnick, 2012; Csizmadia et al., 2015; Katz & Anderson, 

1987; Li et al., 2019; Swanson et al., 2021; Wilensky & Reisman, 2006). Troubleshooting occurs when a 

problem is identified in an algorithmic system (Jonassen & Hung, 2006; Li et al., 2019). Once identified, 

a systematic search for the source of the problem is often conducted through debugging techniques (Aho, 

2012; Li et al., 2019; Sullivan & Heffernan, 2016). Iterative refinement involves making gradual changes 

to an algorithmic system (in this case, a computational model), and often happens in response to new 

information (Brennan & Resnick, 2012; Ogegbo & Ramnarain, 2021; Shute et al., 2017). 

46 

 
 
 
 
 
Building on this literature, our view of the modeling practice of testing, evaluating, and 

debugging involves students first evaluating model structure (Hmelo-Silver et al., 2017) and model output 

(Hadad et al., 2020), then comparing their model to their conceptual understandings and/or external data 

(Weintrop et al., 2016), and finally, making informed changes to their model based on these analyses 

(Aho, 2012; Sengupta, 2013). Within our framework (Figure 1), the synergy between ST and CT in 

supporting students in this practice is thoroughly fleshed out (Shin et al., 2022). The ST aspects of causal 

reasoning and predicting system behavior based on system structure often help students evaluate their 

model structure and make informed decisions about model revisions (Lee & Malyn-Smith, 2020; Shute et 

al., 2017). The CT aspects of iterative refinement, data analysis, and systematic troubleshooting help 

students identify flaws in their models so that they can make necessary changes (Aho, 2012; Sengupta et 

al., 2013; Türker & Pala, 2020; Yadav et al., 2014).  

Despite being identified as a core aspect of computational modeling across many studies, testing 

and debugging is challenging (Grapin et al., 2022; Li et al., 2019; Sins et al., 2005; Stratford et al., 1998). 

Students often hesitate to revise their models based on new evidence, and those who make changes tend 

to be conservative with their model revisions (Grapin et al., 2022; Swanson et al., 2021; Wilensky & 

Reisman, 2006). Another study suggests that students often take an ad hoc outcome-oriented stance 

toward testing and debugging (Sins et al., 2005). In these cases, students seek to modify their models so 

that they match an external set of data using the minimal number of changes possible, rather than focusing 

on having their models match their conceptual understanding of the phenomenon (Li et al., 2019; Sins et 

al., 2005; Wilensky & Reisman, 2006). This often results in models that functionally produce the correct 

outcome, but often lack internal consistency and explanation power (Sins et al., 2005; Wilensky & 

Reisman, 2006). Additionally, this outcome-oriented approach greatly reduces the potential of testing and 

debugging to support student learning by shifting the modeling process away from being a sensemaking 

tool towards being an ad hoc engineering problem (Hogan & Thomas, 2001; Sins et al., 2005). Given 

these challenges, finding evidence of students using testing and debugging in sophisticated ways and 

identifying aspects of a learning environment that can support students in this work becomes critical. 

47 

 
 
 
 
 
Research Questions 

Although the ST and CT literature argues for the importance of students using the modeling practice of 

test, evaluate, and debug model behavior, research shows that students often have challenges with this 

practice (Li et al., 2019; Sins et al., 2005; Stratford et al., 1998; Swanson et al., 2021; Wilensky & 

Reisman, 2006). Our goal was to identify instances of students testing and debugging their computational 

models and to examine different behavioral patterns that student groups can use to engage in this practice. 

In this paper, we define “behavior” as a distinct student action or series of actions occurring within a 

discrete timeframe and “cognitive behavioral pattern” as a long pattern of behaviors found across multiple 

episodes that suggest a generalized approach to testing and debugging. Additionally, we recognize that 

the learning environment can either support or hinder students (Assaraf & Orion, 2005) in building 

proficiency with testing and debugging. We thus set out to answer the following research questions within 

a design-based research environment centered on a high school chemistry unit on evaporative cooling 

developed according to PBL principles. 

RQ1. What different cognitive behavioral patterns do students use to approach testing and debugging 

within a computational modeling unit on evaporative cooling? 

RQ2. What testing and debugging behaviors do students seem to use more frequently within the context 

of a computational modeling unit on evaporative cooling? 

Methods 

Study Context and Learning Environment 

Learning Environment and Participants 

This study is based on data collected in January-February 2020 from a six-week high school chemistry 

unit on evaporative cooling. This unit was implemented at a STEM magnet school (which we call 

Faraday High School or FHS as a pseudonym) in a small Midwestern city. While it is a publicly funded 

institution, students need to apply to this school from across a broad catchment area consisting of three 

counties with admission based primarily on student academic test scores. Approximately 21% of the 

student body is part of a racial or ethnic minority and approximately 54% percent of students are on free 

48 

 
 
 
 
 
 
or reduced lunches. Two of the authors (Observers A and B) partnered directly with two high school 

chemistry teachers (Mr. H and Mr. M). Mr. H is a middle-aged White male with approximately 15 years 

of teaching experience and Mr. M is a young White male with 4 years of teaching experience. For this 

unit, Mr. H and Mr. M each taught 2-3 sections of 10-25 students for a total of 103 student participants. 

As a sophomore chemistry class, this was the first high school level chemistry class for these students, 

with their first year spent learning key physics concepts. Because FHS runs on a block schedule, each 

section meets for 80 minutes every other day.  

Curriculum 

The evaporative cooling unit was developed using PBL principles, which include starting the unit with a 

driving question grounded in a real-world phenomenon, exploring the driving question and the 

phenomenon through engaging in science practices, and scaffolding the unit with learning technologies 

(Krajcik & Shin, 2022). The evaporative cooling process results in liquids getting colder during 

evaporation as faster moving liquid particles with a high average kinetic energy (KE) tend to be the first 

particles to overcome the intermolecular forces (IMFs) of attraction. Overcoming these forces is what 

causes molecules in the liquid phase to enter the gas phase. As these high KE particles leave the liquid 

phase, the average KE of the remaining liquid molecules decreases, making the substance colder. The KE 

of the faster moving liquid particles is transferred to the potential energy (PE) of the gas particles.  

At the beginning of the unit, students were initially tasked with drawing a two-dimensional model 

of the evaporative cooling phenomenon on whiteboards. Students were then introduced to the 

SageModeler computational modeling program (Damelin et al., 2017) along with some of the key aspects 

of computational modeling, such as the need to recontextualize the phenomenon as a set of interacting 

variables to create a workable computational model in SageModeler. Students then worked in small 

groups (two to three students) to construct and revise a computational model of this phenomenon that 

addressed the cooling effect of evaporation and included IMF to explain why some liquids evaporate 

faster than others. Students had multiple opportunities to test, debug, and revise their computational 

models over the six-week unit.  

49 

 
 
 
 
 
These opportunities for students to test and debug their computational models included teacher 

and peer feedback, written reflections, and specific features embedded in the computational modeling 

program. The classroom teacher regularly visited each student group to ask them questions about their 

computational models. These questions provided opportunities for students to identify areas in their 

models that needed improvement and make changes accordingly. Student groups provided structured 

feedback to each other. By examining the computational models of other student groups and receiving 

feedback on their own models, the peer feedback cycle helped students identify aspects of their 

computational models that needed improvement. The students were also instructed to write down their 

reflections on the revision process after each revision cycle. These written reflections helped students 

assess any recent changes they had made to their models and consider what additional revisions might be 

needed in later modeling sessions. Finally, students were encouraged to use the testing and debugging 

features embedded in the computational modeling program (defined below) as they worked within their 

small groups to make changes to their computational models. 

SageModeler 

Within the evaporative cooling unit, students build, test, and revise computational models using 

SageModeler, a free, browser-based, open-source software program. SageModeler allows students to set 

certain variables as “collectors” (variables that can accumulate an amount over time) and transfer valves 

or flows between these collector variables. Additionally, SageModeler offers two main testing and 

debugging features that students can use to evaluate model output and compare their models to real-world 

data: simulation and graphing features. Using the simulation feature students can generate model output 

for all variables in their model, enabling them to test how the model output changes under different initial 

conditions (Figure 8A). Students can assess both the overall behavior of their model and examine how 

specific structural changes might impact this behavior. The graphing feature of SageModeler facilitates 

students in testing the relationship between any two variables in their model as one input variable is being 

manipulated (Figure 8B). Graphs serve two principal functions; they allow students to 1) look at the 

correlation between two distal variables and 2) compare their model’s output to real-world data. Students 

50 

 
 
 
 
 
can generate a graph between two variables in their model and then compare this model-generated graph 

to a graph of real-world data (Figure 8C).  

Figure 8: Testing and Debugging Features of SageModeler 

Figure 8A: Simulation Feature  

This figure demonstrates the simulation feature of SageModeler. The simulate function is turned on to 

allow for the student to generate model output (1). The student then manipulated the input variable “IMF” 

(by moving its associated slider bar up and down) (2) to determine its impact on downstream variables. 

51 

 
 
 
 
 
 
 
 
Figure 8 (cont’d) 

Figure 8B: Graphing Feature of SageModeler  

This figure demonstrates the graphing feature of SageModeler. The students begin by using the Record 

Continuously icon (1), which allows them to record how the different variables change as the input 

variable (2) is manipulated. Using these recorded data, the students can then create a graph in 

SageModeler showing the relationship between any two variables (3). 

Figure 8C: Data Comparison Using SageModeler 

This figure shows how students can input external data into SageModeler and compare it to their model 

output. Notice that the external data (graph on the right) shows an exponential relationship between 

potential energy and kinetic energy, which suggests that these students need to revise their model. 

52 

 
 
 
 
 
 
 
 
Data Collection 

The primary source of data for this research are screencasts, which are both video recordings that capture 

the various activities occurring on a laptop screen and audio recordings of the computer’s microphone. 

Screencasts allow researchers to observe how students construct and revise their computational models 

and the dialogue between group partners. From these screencasts, we can observe changes students make 

to their models, ascertain their reasoning for making these changes through their dialogue, and glean 

insights into their approach to testing and debugging. For this study, we focus on screencast data from 

five groups of students, three from Mr. H’s class and two from Mr. M’s class (Table 3). These screencast 

groups were recommended to us by Mr. H and Mr. M as they were among their more talkative students 

and gave consent for the screencast process. While other students were not chosen to be screencasted, 

they were present in the classroom and gave permission for their classroom discourse (including 

conversations with screencast groups) to be recorded for this project. Note that all names described in this 

manuscript are pseudonyms meant to protect student identities. Non-screencast students are given letter-

based pseudonyms (e.g., Student A, Student B, etc.) when engaging in conversations with screencasted 

students. 

Table 3: Student Demographics 

Student Group 

Mr. H 

Group 1: Andy and 
Ben 
Group 2: Leslie and 
Aubrey 
Group 3: Robert, Mark, 
and Jerry 
Group 4: Ron and Tom  Mr. M 
Mr. M 
Group 5: Rashida and 
Donna 

Mr. H 

Teacher  Grade 
Level 
10th  Male/Male 

Mr. H 

Gender Identity 

10th 

10th 

10th 
10th 

Female/Female 

Male/Male/Male 

Male/Male 
Female/Female 

Instrument Development 

To categorize how students test and debug their models, we use the ST and CT Identification Tool (ID 

Tool). The ID Tool is based on “A Framework for Computational Systems Modeling” (Bowers et al., 

53 

 
 
 
 
 
 
2022b; Shin et al., 2022) and was validated by a team of experts, who reached a 92% agreement (Cohen’s 

Kappa .87) among raters. Given that the six indicators of this tool are all contextualized within the 

computational modeling practice of test, evaluate, and debug model behavior, we used these indicators to 

investigate student testing and debugging behaviors in the evaporative cooling unit. The six testing and 

debugging indicators are listed in Table 4. Each indicator contains a four-part classification system (from 

Level 1 to Level 4 in ascending order) that explores the sophistication of observed student behavior.  

Table 4: Description of Key Indicators from the ST and CT Identification Tool 

Indicator 

Description 

Brief Level Descriptions 

A: Sensemaking 
through Discourse 

B: Analyzing 
Model Output: 
Simulations 

Students either verbalize 
their reasoning for making 
changes to their models or 
engage in conversations 
about why specific aspects of 
their models need to be 
improved.  

Students use embedded 
model output tools to 
analyze how their model 
behaves under different input 
conditions. In this case, 
students use the simulation 
tool in SageModeler to test 
their models.  

Level 1: Verbalize changes to model or 
identify areas needing revisions, but no 
reasoning  
Level 2: Verbalize reasoning but no mutual 
dialogue 
Level 3: Back and forth dialogue with verbal 
reasoning 
Level 4: Back and forth dialogue with verbal 
reasoning and impact on other parts of model 

Level 1: Adjusting one or more input 
variables, but no verbal reasoning 
Level 2: Adjusting input variables with verbal 
reasoning but no dialogue 
Level 3: Adjusting input variables with verbal 
reasoning and dialogue, focus on local 
behavior 
Level 4: Adjusting input variables with verbal 
reasoning and dialogue, holistic model 
discussion 

54 

 
 
 
 
 
Table 4 (cont’d) 

Indicator 
C: Analyzing Model 
Output: Graphs 

D: Analyzing and 
Using External Data 

E: Using Feedback 

F: Reflecting upon 
Iterative Refinement 

Description 
Students use embedded 
model output tools to analyze 
how their model behaves 
under different input 
conditions. In this case, 
students generate and analyze 
graphs in SageModeler. 

Students use external data 
sources to verify model 
behavior. At more 
sophisticated levels, students 
compare specific external 
data sources directly to their 
models and discuss the 
validity of the external data. 

Students receive meaningful 
feedback from others 
(teachers or peers), discuss 
the validity of the feedback, 
and use feedback to inform 
model revisions. At more 
sophisticated levels, students 
test their models after making 
recommended changes and 
have a follow-up discussion 
with others to share their new 
insights. 
Students reflect through 
writing or discourse on the 
changes they have made to 
their models. At more 
sophisticated levels, students 
give a defined rationale for 
the changes they have made. 

Brief Level Descriptions 
Level 1: Unsuccessful attempt to make a 
graph in SageModeler 
Level 2: Successful graph creation, but no 
interpretation 
Level 3: Successful graph creation with 
discussion of implications for the graphed 
variables 
Level 4: Successful graph creation with 
discussion of the broader implications for 
model behavior 
Level 1: Superficial reference to data or 
referencing inaccurate data 
Level 2: Reference external data to inform 
revisions but no direct comparisons to 
model output 
Level 3: Compare specific external data to 
model output without discussion of data 
validity 
Level 4: Compare specific external data to 
model output with discussion of data 
validity 
Level 1: Students receive feedback but do 
not discuss it or use it to inform revisions 
Level 2: Students make changes to their 
models based on feedback but do not 
discuss the validity of the feedback 
Level 3: Students receive feedback, discuss 
its validity, and make or do not make 
changes to their models based on feedback 
Level 4: Students receive feedback, discuss 
its validity, make, or do not make changes 
to their models based on feedback, and 
share reflections with another group  
Level 1: Ambiguous surface level 
reflection without reasoning 
Level 2: List specific model changes but 
do not provide detailed reasoning 
Level 3: List changes and reflect upon 
reasoning 
Level 4: List changes, reflect upon 
reasoning (with a defined rationale), and 
discuss broader changes to models 

55 

 
 
 
 
 
 
 
 
 
Data Analysis 

Using the ID Tool and Primary Analysis 

We used the ID Tool to conduct a primary analysis of the screencast data. Using Atlas.ti software, we 

annotated the screencast videos to mark instances where students were exhibiting testing and debugging 

behaviors based on the rubric described by our ID Tool. This annotation method was previously validated 

by Bowers and colleagues (2022 B). To maintain interrater reliability throughout this study, we engaged 

in periodic member checking where all scorers independently scored a 30-minute segment of student 

video to see if our coding results drifted from each other. As we annotated these specific instances using 

the ID Tool, we also developed descriptive memos to record notes of what was occurring in each specific 

episode. These descriptive memos summarized student actions in a narrative manner to make it easier for 

us to begin looking at broader behavioral patterns governing student testing and debugging. Because the 

ID Tool is primarily useful for identifying the extent of student testing and debugging behaviors, the 

descriptive memos were necessary for determining broader testing and debugging behavioral patterns. 

The primary analysis using the ID Tool along with the supplementary memos allowed us to identify 

instances where students were testing and debugging their models. In subsequent analysis, the ID Tool 

coding of the screencast results was used to create a timeline of the testing and debugging behaviors, 

which along with the supplementary memos, informed our narrative analysis of testing and debugging 

behavioral patterns. 

Timeline Construction and Analysis 

After analyzing screencast videos, we constructed a spreadsheet-based timeline for each of the five 

screencast groups that show which indicators students exhibited within a specific five-minute interval 

(Table 5). If students had separate or overlapping episodes between two adjacent time points where they 

exhibited indicators of A, B, and E, all three indicators were included within the timeline for that interval. 

Along with marking which indicators were present in each time interval, the highest level of student 

performance associated with said indicator within that time frame was also noted. The constructed 

timeline served as a tool for recording and organizing patterns of student testing and debugging behaviors 

56 

 
 
 
 
 
and subsequently informed later narrative analysis of student testing and debugging cognitive behavioral 

patterns and a summative quantitative analysis of student testing and debugging behaviors. 

Table 5 : Testing and Debugging Timeline for Group 1 

Episode 
Time 
Codes 

13-Jan 
10:00 
A(3) 

15-Jan 
55:00 
A(1) 

15-Jan 
60:00 
A(1), E(2), 
F(2) 

27-Jan 
50:00 
E(1)  A(1), E(2) 

27-Jan 
55:00 

27-Jan 
60:00 
E(2) 

Episode 
Time 
Codes  A(3) D(2) 

29-Jan 
5:00 

E(3) 

29-Jan 
10:00 
A(3) D(2) 
E(3) 

29-Jan 
15:00 
A(3) D(2) 
E(2) 

29-Jan 
25:00 
B(3) 
E(1) 

29-Jan 
30:00 
B(3) D(2) 
E(4) F(2) 

Episode 
Time 
Codes  A(2) B(2) 

10-Feb 
65:00 

10-Feb 
70:00 
B(2) 

12-Feb 
5:00 
A(3) E(2) 

12-Feb 
10:00 
A(2) 
B(3) 

12-Feb 
15:00 
A(3) B(2) 
D(2) E(1) 
F(2) 

Episode 
Time 
Codes 

14-Feb 
15:00 
D(2) 

14-Feb 
20:00 
D(2) E(1) 

14-Feb 
25:00 
D(1) F(3) 

14-Feb 
35:00 
D(2) 

14-Feb 
40:00 
D(2) 

31-Jan 
5:00 
A(1) 
F(2) 

14-Feb 
5:00 
D(2) 

14-Feb 
55:00 
A(2) 
B(2) 
E(2) 

27-Jan 
65:00 
A(3), 
D(2), E(2) 

10-Feb 
60:00 
A(3) B(4) 
E(2) 

14-Feb 
10:00 
D(2) 

14-Feb 
60:00 
F(3) 

Narrative Analysis 

Once the initial timeline was constructed, we conducted a narrative analysis for three student groups. 

While the timeline demonstrated general patterns of student testing and debugging behaviors, a more 

comprehensive analysis, focusing on key episodes from the screencasts, was needed to describe student 

testing and debugging cognitive behavioral patterns. Returning to the descriptive memos of each group, 

we started by looking for specific episodes that clearly demonstrated students exhibiting specific 

indicators. We also looked for patterns and outliers between episodes within the same student group and 

between student groups, so that we could articulate the major differences in the testing and debugging 

behaviors of these five groups to write a cohesive narrative for each group. We then compared these 

narratives to the timeline analysis to check for internal consistency. This allowed us to address RQ1.  

57 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Although we conducted a quantitative analysis using data from all five groups, we selected three 

groups for the narrative analysis that represent the breadth of student testing and debugging cognitive 

behavioral patterns. We did not select Groups 4 and 5 for the narrative analysis because their behavioral 

patterns overlapped greatly with those of Groups 1 and 2, respectively. We also endeavored to show the 

diversity of behavioral patterns that can occur within a single class of students, so our narrative analysis 

deliberately only includes students from Mr. H’s class.  

Semi-quantitative Analysis 

After conducting the narrative analysis, we returned to the timeline (which also served to help structure 

our narrative analysis) to examine student testing and debugging behaviors from a more quantitative 

perspective. We constructed a frequency table based on this timeline to show how frequently each 

indicator was observed across all five student groups and how each group differed in exhibiting the six 

testing and debugging indicators. By aggregating the timeline data into a single frequency table, we were 

able to determine which testing and debugging behaviors were most common across these five student 

groups. This semi-quantitative comparison of student testing and debugging behaviors primarily served to 

supplement the qualitative analysis of student testing and debugging behavioral patterns and functions as 

an additional method for visualizing our findings from our narrative analysis to address RQ2. 

58 

 
 
 
 
 
 
 
Results 

Research Question 1: What different cognitive behavioral patterns do students use to approach testing 

and debugging within a computational modeling unit on evaporative cooling? 

Table 6:  Student Testing and Debugging Behavior Patterns 

Group  

Behavior Pattern Summary 

Group 1 

● 

Initially focused on receiving external evaluation and feedback from 
peers 

●  Shifted towards internal analysis of model output using simulation 

feature 

Group 2 

●  Sensemaking discourse drove model evaluation and revision 
●  Reflected on reasoning behind modeling decisions to identify areas of 

uncertainty in their models 

Group 3 

●  Utilized simulation and graphing features of SageModeler to 

systematically assess model output and drive revision 

This table presents a summary outlining the general testing and debugging behavior patterns each student 

group used and how these behaviors shifted over this unit. 

Group 1: Andy and Ben 

Compared to the other groups, Group 1 relied more on collaboration with the broader classroom 

community (Indicator E: Using Feedback) as a form of checking the validity of their model and figuring 

out ways to refine their conceptual understanding (Table 6). For example, on Day 2 when trying to set the 

boundaries of their system, Ben wrote to Students C and D (both from a non-screencast group) in the 

online platform, “What is the scale range you will be using to model the system? Will you focus only on 

what you have been able to observe?” Student C responded, “We will focus on what we have observed in 

combination with what is going on at a particle level.” The nature of this question is further clarified by 

Observer A who explained to these students that the idea of a “scale range” is the level at which they are 

modeling the phenomenon. Group 1 then decided that their model should focus on the particle level of 

evaporative cooling.  

59 

 
 
 
 
 
 
While this behavioral pattern of borrowing ideas from other groups was generally beneficial to 

these students, it also occasionally led them towards considering adding non-canonical variables to their 

model. The following excerpt is an example of a conversation between Group 1 and Students C and D 

that took place during a peer revision discussion.  

Student C: Spacing of the molecules? Isn’t that density?  

Ben: I mean, it is talking about how far apart they are.  

Student C: That is density.  

Student C (to Student E from a second non-screencast group): Didn’t you use density in your 

model?  

Student E: Do not use density in your model. He (Mr. H) will get upset. But the spacing of the 

particles is important.  

In this conversation, Student C tried to convince Andy and Ben to add density as a variable to 

their model; they were stopped from doing so by Student E’s appeal to authority (Mr. H). Although 

Group 1 is heavily influenced by this appeal to authority, because Andy and Ben do not simply accept 

peer feedback at face value but discuss it with multiple individuals and consider the validity of this 

feedback, they were coded at Level 3 for Indicator E. 

Later in the unit, their behavioral patterns shifted away from focusing on peer feedback and 

towards incorporating the use of simulations of their model output (B: Analyzing Model Output: 

Simulations) as the complexity of their model increased. While they previously opted not to use the 

simulation features embedded in SageModeler, they began a more deliberate testing and debugging 

approach. For example, after including a positive relationship between the variable “Spacing of 

Molecules” and the transfer valve between “Kinetic Energy” and “Potential Energy”, Group 1 decided to 

simulate their model output (Figure 9). Through this simulation, they recognized that although they 

conceptually agreed with this specific relationship, they questioned the overall behavior of their model. 

They were concerned about the decrease in “Potential Energy” that occurs after all of the “Kinetic 

Energy” has been converted into “Potential Energy.” Yet despite their use of the simulation function to 

60 

 
 
 
 
 
 
identify this behavioral anomaly within their model, they did not determine which specific relationship 

was responsible for this behavior and, therefore, were unable to make the necessary changes so that their 

model matched their conceptual understanding. While this was an example of Ben and Andy 

systematically testing their model, they had difficulty interpreting their model’s structure in a way where 

they can identify the source of the behavioral anomaly, suggesting a gap in their computational thinking 

skills. Overall, Group 1 seemed to rely initially on external feedback to help them identify flaws in their 

models before shifting toward using the simulation features to interpret their model’s output.   

Figure 9: Screenshot of Group 1 testing and debugging their dynamic model 

In this figure, the students from Group 1 used the simulation features to determine how kinetic energy is 

impacting other variables in this model. 

Group 2: Leslie and Aubrey 

While Group 1 tended to utilize discussions with other groups, Group 2 often depended on discussions 

with each other to make sense of the phenomenon as a system of interconnected elements and to identify 

where revisions were needed (Indicator A: Sensemaking through Discourse). Early in the unit, as the 

students were trying to decide which variables to include in their model, they had the following 

discussion:  

61 

 
 
 
 
 
 
Aubrey: I don’t know if it is right, but it makes sense.  

Leslie: Now we need to add another box. 

Aubrey: The only other variable we have is temperature. But isn’t temperature a constant? 

Leslie: Yes, it is. So, our model is just two things long. That’s boring. So, molecular energy goes 

into molecular spacing of substance. Is this all about evaporation? 

Aubrey: Yeah. 

From this conversation, we see that these students were considering the boundaries of the system 

while they were also using causal reasoning by reviewing the relationship between “molecular energy” 

(which appears to be a student-generated term that is roughly equivalent to kinetic energy) and molecular 

spacing. Through this discussion, they were also identifying an area of their model that needed revision, 

proposing a change, and considering the ramifications of this change on their model’s behavior. Thus, this 

is an example of students verbally testing and debugging their model.  

This testing and debugging is also evident in later discussions where they verbalized their 

interpretation of their model’s structure as they considered which changes were necessary for creating a 

more robust model of evaporative cooling (Indicator A: Sensemaking through Discourse). In this 

example, Leslie and Aubrey were trying to figure out how to revise their models in response to a recent 

investigation on the role of potential and kinetic energy in evaporation (Day 7). In particular, they were 

trying to determine how the “spacing of molecules” variable (formerly called “molecular spacing”) fits in 

their new conceptual understanding of evaporative cooling.  

Leslie: So, maybe the temperature of water also affects the spacing of molecules and then kinetic 

energy affects potential energy, which also affects the spacing of molecules.   

Aubrey: Maybe.   

Leslie: We’ll try it. But maybe it doesn't.  

Aubrey: Well, for sure this one [pointing to the “Temperature of Water” variable].  

Leslie: Okay, spacing of molecules. As the temperature of water increases, the spacing of 

molecules increases more and more. So, remember that one model that we did.  

62 

 
 
 
 
 
Aubrey: Yeah, like the hexagon thing where they kept on getting more and more spread apart 

(referencing a simulation that showed how as the liquid heated up, the kinetic energy increased 

until it hit the boiling point, after which the potential energy started to increase as the molecules 

moved farther apart).  

Not only does this excerpt show how dialogue is manifested in the practice of testing and 

debugging, but it is also a clear example of how these students used external data to validate their 

sensemaking (Indicator D: Analyzing and Using External Data). This is subsequently followed up by the 

use of written reflections as an additional form of sensemaking. At first the students wrote, “as the 

temperature of the water (average kinetic energy) increases the molecules start gradually moving faster 

and hitting each other and breaking their force of attraction keeping them together, and become gas.” 

While this initial written explanation is an accurate justification for this relationship, they disagreed with 

the second part of this explanation and replaced everything after “hitting each other” with “. . .  and move 

farther apart. As the temperature of water increases they move more quickly than before and move 

farther apart than they were.” This writing seemed to help these students reflect upon their causal 

reasoning for this relationship.  

Group 2 expanded upon this use of written reflection by placing explanations of each relationship 

directly on the SageModeler canvas (Figure 10). Writing these notes supported their causal reasoning 

about relationships and served as a means of considering the validity of each relationship they had 

encoded into SageModeler, thereby acting as an alternative approach to the type of formal testing and 

debugging that is often conducted at this stage in model development. Their embedded notes also had the 

potential to support later revision efforts as they could have identified their original rationale behind a 

particular relationship and considered if new evidence supported or undermined that explanation for that 

causal relationship (Indicator F: Reflecting upon Iterative Refinement). Collectively, their verbal dialogue 

and written reflections demonstrate that Group 2 engaged with testing and debugging primarily through 

discourse (Table 6).  

63 

 
 
 
 
 
 
 
Figure 10: Screenshot of Group 2’s Annotated Model  

The students in Group 2 wrote their rationales for each relationship on their model as a form of 

sensemaking during the testing and debugging process. 

Group 3: Robert, Mark, and Jerry 

Group 3 utilized the testing and debugging features embedded within SageModeler (Indicator B: 

Analyzing Model Output: Simulations and Indicator C: Analyzing Model Output: Graphs) as they 

analyzed their models to determine which changes to make. One interesting example of systematic testing 

and debugging occurred when the students inserted a “dummy variable” into their model to see the effects 

of adding a fourth variable on the behavior of their model (Figure 11). After inserting this dummy 

variable, they used the simulation feature to observe its impact on the behavior of the model as a system 

(Indicator B: Analyzing Model Output: Simulations). However, they quickly removed the dummy 

variable, suggesting their dissatisfaction with its effect on model behavior. This use of a dummy variable 

along with their subsequent discourse is strong evidence that these students were using testing and 

debugging as they made a deliberate change to their model to see how it would impact model behavior 

and then removed this after testing this change and determining that it was unsatisfactory.   

64 

 
 
 
 
 
 
 
 
Figure 11: Example of Group 3 Utilizing a “Dummy” Variable to Facilitate Testing and Debugging 

When Group 3 was trying to decide if any additional variables might be needed in their model, they 

inserted a “dummy” variable (which they named “random thing”) to see how it would impact model 

behavior. 

Another example of Group 3 using the model simulation features to support their testing and 

debugging occurred as they were trying to decide which relationships to set between the variables of 

“Kinetic Energy,” “Potential Energy,” and “Density.” It is important to note that other screencast excerpts 

demonstrate that these students held non-canonical ideas about “density” at this stage in the unit. Most 

notably, they viewed “density” as an extrinsic characteristic of a substance that decreased as a substance 

changed from a liquid to a gas. As such, their understanding of “density” is closer to the canonical 

understanding of “molecular spacing of molecules.”  

Jerry: Kinetic energy does what? 

Robert: Okay, so as intermolecular force (IMF) increases, does density increase in the end? 

Jerry: It would be the other way around. As IMF increases, density decreases. But . . .  

Robert: ...which means one of these [relationships] has to be increases and the other has to be 

decreases, or they both have to be decrease. 

Jerry: The kinetic energy (KE) is opposite of potential energy (PE). 

Robert: So, this would probably be the one that is decreasing? 

65 

 
 
 
 
 
 
Jerry: But that doesn’t make sense because of the graph he [Mr. H] showed us. Just put 

decreasing. Wait. Actually, it would be increasing [KE to PE] and the last one [PE to Density] 

would be decreasing?  

The students changed the relationship between “PE” to “Density” to decreasing and then 

simulated the model and saw that an increase in IMF causes the density to decrease (which in their 

understanding would mean an increase in the molecular spacing of the substance). In this example, the 

students first considered the overall behavior of their model of evaporative cooling. The students then 

analyzed the individual relationships within this system to determine how these relationships would 

influence system behavior. Upon identifying how these relationships would impact the model’s output, 

they considered how these individual relationships reflected their understanding of the real-world 

phenomenon and ultimately selected a specific relationship to modify. After modifying this relationship, 

they once again used the simulation features to see the impact of this change upon the model output. This 

is an example of Indicator A: Sensemaking through Discourse and Indicator B: Analyzing Model Output: 

Simulations. 

Group 3 also used the model output generated from SageModeler to make a graph of the 

relationship between IMF and PE. After making several changes to their model, the students tested to see 

how these changes impacted the overall behavior of the system. They used the simulate feature to look at 

how manipulating the input variable (Intermolecular Force) of their model would affect intermediate and 

distal output variables. Given that they were specifically interested in how the IMF impacted PE, they 

used the simulation output to generate a graph in SageModeler (Figure 12A). Upon making this graph, 

they recognized that apart from a few outlier points at the end (likely artifacts from previous simulations), 

there was a linear relationship between IMF and PE, which was not in line with their understanding of the 

relationship between these two variables. They later changed the individual relationship between IMF and 

PE to an exponential one, which made its associated graph into an exponential relationship (Figure 12B). 

This is an example of students using Indicator C: Analyzing Model Output: Graphs. While other student 

groups periodically used the simulation features to explore the output of their models, only this group 

66 

 
 
 
 
 
used the model output to successfully make graphs of the relationships between two variables in their 

model. Overall, Group 3 tended to focus on testing and debugging behavioral patterns that prioritized 

systematically analyzing their model output to identify areas of their model that needed improvement 

(Table 6). 

Figure 12: Example of Group 3 Exhibiting Evidence of Indicator C: Analyzing Model Output: Graphs 

Figure 12A: Group 3 Pre-Revision Model (Day 8) with Graphical Representation of Relationship 

Between IMF and PE 

Figure 12B: Group 3 Revised Model (Day 8) with representation of relationship between IMF and PE 

67 

 
 
 
 
 
 
 
 
Research Question 2: What testing and debugging behaviors do students seem to use more frequently 

within the context of a computational modeling unit on evaporative cooling? 

Table 7: Relative Occurrence of Each Testing and Debugging Behavior 

Indicator 

Group 1  Group 2 

Group 3 

Group 4 

Group 5 

Total 

N  % 

N 

% 

N 

% 

N 

% 

N 

% 

N  % 

15  53.6  39 

83.0  24 

66.7  12 

50.0  22 

68.8  112  67.1 

8 

28.6  23 

48.9  26 

72.2  7 

29.2  19 

59.4  83 

49.7 

0 

0.0 

1 

2.1 

6 

16.7  0 

0.0 

0 

0.0 

7 

4.2 

13  46.4  11 

23.4  6 

16.7  3 

12.5  6 

18.8  39 

23.4 

Sensemaking 
via Discourse 
(A)  

Model 
Output: 
Simulations 
(B) 

Model 
Output: 
Graphs (C) 

Use External 
Data (D) 

Feedback (E)  15  53.6  16 

34.0  11 

30.6  13 

54.2  12 

37.5  67 

40.1 

6 

21.4  13 

27.7  12 

33.3  8 

33.3  12 

37.5  51 

30.5 

28 

47 

36 

24 

32 

167 

Reflecting on 
Refinement 
(F) 

Total 
Number of 
Intervals 

N is the number of five-minute intervals where we observed a particular exhibited behavioral indicator. % 

is the percentage of testing and debugging intervals during which this indicator was observed. Note that 

because student groups often exhibited multiple behaviors within any given five-minute interval, their 

percentages do not add up to 100. Total represents data from all groups. 

Based on our semi-quantitative analysis of student screencasts from all five focus groups, there is 

evidence that all six testing and debugging indicators were used at least once by a student group within 

this dataset (Table 7). Although all indicators were present, some indicators were more commonly used 

68 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
than others. Indicators associated with sensemaking discourse (as exemplified by A) were more 

frequently used while indicators that are linked to comparing models to external data (D) were less 

frequently used. This implies that while the learning environment supported students in using 

sensemaking discourse, it did not sufficiently support students in comparing their models to external data 

(D). Additionally, there is a sharp contrast between the frequency at which students used SageModeler’s 

simulation (B) and graphing (C) features. Even though both of these SageModeler features allow students 

to examine their model output, the simulation feature focuses on how the relative amount of each variable 

changes as the input variables are manipulated. In contrast, the graphing feature allows students to 

compare the relationship between two individual variables in isolation from other aspects of the model 

(Figure 2b). Student preference for the simulation feature (B) over the graphing feature (C) suggests that 

these students found the simulation feature easier to navigate and/or more useful for the learning tasks in 

this unit. Given that the graphing feature tends to better support direct comparison with external data and 

the noted low use of external data by students in this unit, the latter explanation has merit. 

The results from the indicator analysis are largely consistent with the results from the narrative 

analysis as these data show a preferred set of behaviors for each group (Table 7). Group 1 was more apt to 

reference external data (D) and use external feedback (E) to drive their revision process while minimizing 

their use of model output simulations (B) and not using model output graphs (C). This contrasts with 

Group 3, which strongly prioritized analyzing model output (B) over using external data (D) and external 

feedback (E). Indeed, Group 3 is one of only two groups to use the graphing feature (C) and the only one 

to do so successfully. While Group 2 attempted to use the graphing feature (C), they strongly preferred 

using discourse as their primary means of model analysis (A). Group 5 also prioritized discourse (A) for 

testing and debugging, but also frequently utilized model simulation features (B). Finally, Group 4 had a 

strong preference for using feedback (Indicator E) but tended to have limited discussions on the 

meaningfulness of the feedback, so their behavior was assessed at a lower level using the ID Tool. 

Overall, these comparative results from both our narrative and semi-quantitative analyses show that 

despite having a common learning environment, student groups found opportunities to approach testing 

69 

 
 
 
 
 
and debugging in unique ways. This suggests that multiple scaffolds and supports are needed to help all 

students test, evaluate, and debug their models.    

Discussion 

Our results demonstrate that within the evaporative cooling learning environment, there is 

evidence of student behavior that corresponds to all six testing and debugging indicators in the ID Tool. 

As anticipated, some behaviors corresponding to certain testing and debugging indicators occurred more 

frequently than others, with students particularly spending more time analyzing their models through 

discourse (A) compared to other indicators (Table 7). However, it is also important to mention that 

differences in student behaviors, as noted by both narrative analysis and quantitative analysis, 

demonstrate that even within a common learning environment, student groups may adopt different 

approaches to testing and debugging (Tables 6 and 7). 

Importance of Learning Environment 

Because each student group used a different set of cognitive behavioral patterns for testing and 

debugging within a common PBL-aligned learning environment, multiple supports are likely needed to 

accommodate these different approaches to testing and debugging. Having multiple pathways for students 

to engage in the learning process allows students to leverage their unique strengths and prior knowledge 

to further their sensemaking endeavors (Basham & Marino, 2013; Hansen et al., 2016; Scanlon et al., 

2018). Because this study suggests that students in the same class can utilize different testing and 

debugging behaviors and behavioral patterns, it reinforces the need to design multifaceted learning 

environments, so that all learners can fully participate in computational modeling. Although the learning 

environment in our evaporative cooling unit provided multiple pathways for students to participate in 

testing and debugging, two features that seemed to be the most meaningful for supporting students in 

testing and debugging were the simulation feature embedded in SageModeler and the use of student small 

groups, which facilitated discourse. Because the simulation feature allowed students to generate model 

output data in real time, students could test how changes in their model structures impacted model 

behavior and detect flaws in their models. This allowed students to analyze model output in a way that 

70 

 
 
 
 
 
would not have been possible through traditional paper-pencil modeling. By making it easier to detect 

areas of their models that needed improvement, the simulation feature further assisted students in revising 

their models, thereby encouraging testing and debugging (Fan et al., 2018; Lee et al., 2011; Shen et al., 

2014). 

Another feature of the learning environment that supported students in testing and debugging is 

the use of student groups. In this unit, students worked in small groups of two to three students and were 

encouraged to collaborate with each other and verbalize their thought processes. By the nature of using 

collaborative student groups as opposed to having each student build their own models independently, 

students were implicitly encouraged to share their design choices and modeling behavioral patterns with 

their partners. These partners could in turn ask each other to provide evidence or reasoning to defend their 

design choices or provide a counterclaim of their own. For example, one student might state that density 

impacts the rate of evaporation because higher density particles evaporate slower. Another student could 

then argue that density does not impact the rate of evaporation because oil is less dense than water but 

evaporates far slower (if at all). Through such productive argumentative discourse, students can identify 

flaws in their reasoning and in their model construction, prompting them to revise their models (Campbell 

& Oh, 2015; Kyza et al., 2011; Lee et al., 2015). As such, placing students in pairs or small groups 

encourages them to have these sensemaking conversations (Indicator A), which in turn facilitate model 

evaluation and model revision, both of which are key aspects of testing, evaluating, and debugging model 

behavior. In a similar manner, peer reviews provided further support for the model revision process 

(King, 1998). Having another group of students analyze their models and provide meaningful feedback 

often gave students a fresh perspective on their models. Their peers could detect flaws in their models that 

the student pair might have otherwise ignored. Students then used that feedback to prompt additional 

sensemaking discourse and inform future model revisions. Thus, by receiving and using feedback 

(Indicator E), students were able to have an external party evaluate their models and provide key insights 

on aspects of their models that needed further review, thereby supporting students in the model revision 

process.  

71 

 
 
 
 
 
Limitations 

Although this study shows some promising aspects of the design of our learning environment to 

support students in testing and debugging, there are both limitations with our methodology and aspects of 

the learning environment. It is important to note that this research took place at a STEM magnet school in 

a classroom environment that encouraged student discourse and collaboration. Because traditional 

classroom environments often lack a strong culture of student discourse, our results might not be fully 

applicable to all classrooms (Grifenhagen & Barnes, 2022; Jimenez-Aleixandre et al., 2000; Kelly, 2013). 

We also recognize that the limited sample size makes it difficult to draw broader conclusions from our 

semi-quantitative analysis. While these results do suggest a diversity in student approaches to testing and 

debugging and that certain testing and debugging behaviors are more common than others within this 

class, we cannot argue from this analysis alone that these are universal patterns. Given the design-based 

nature of this study, it was not feasible to isolate specific aspects of the learning environment to determine 

definitively if either the SageModeler simulation feature or the use of student groups are the most 

important scaffolds for testing and debugging for these students. While our qualitative analysis does 

suggest that these factors helped support students in testing and debugging, additional factors, such as 

teacher instructions and prior student experiences, also might have contributed to our results. 

Additionally, it is difficult to determine why any specific student groups chose to use a particular set of 

testing and debugging behaviors. It is possible that more introverted students preferred testing and 

debugging behavioral patterns that were more focused on analyzing model outputs (Indicator B) 

compared to extroverted students who might have gravitated towards more social approaches to model 

validation, such as peer feedback (Indicator E). Another explanation could be that more mathematically 

inclined students preferred using simulations and graphs (Indicators B and C) to interpret their models 

compared to using more verbally intensive behavioral patterns. However, both ideas are difficult to assess 

without targeted interviews and/or additional written assessments, neither of which occurred for this 

study. 

72 

 
 
 
 
 
 
 
Conclusion 

Testing, evaluating, and debugging models is an important competency. Being able to identify the 

flaws in a model helps students engage in revisions, improving both their representational and conceptual 

models of a phenomenon (Barlas, 1996; Grapin et al., 2022; Stratford et al., 1998; Sterman, 1994). 

Frequent testing, debugging, and revision cycles also reinforce the scientific principle of iterative 

refinement through experimentation, which is essential to the scientific thinking process (Gilbert, 2004; 

NRC, 2012; Schwarz et al., 2009). Within our framework, we view testing, evaluating, and debugging 

model behavior as an integral aspect of computational modeling, drawing upon modeling, ST, and CT 

traditions (Shin et al., 2022). As it facilitates model revision and iterative refinement, this practice 

benefits student modeling (Schwarz et al., 2009). Likewise, model evaluation, model interpretation, and 

model revision are all important concepts in system dynamics that overlap with our understanding of 

testing, evaluating, and debugging model behavior (Barlas, 1996; Martinez-Moyano & Richardson, 2013; 

Richardson, 1996). Additionally, students often need to consider parts of their model structure from a 

systems thinking perspective to accurately identify areas of their model that need improvement and to 

guide the subsequent revision process (Lee & Malyn-Smith, 2020; Shute et al., 2017). Finally, our 

understanding of testing, evaluating, and debugging model behavior incorporates the CT concepts of 

debugging, wherein students systematically review their computational models to identify flaws and 

structural errors, and iterative refinement, the process by which students make changes to their models in 

response to new information (Aho, 2012; Sengupta et al., 2013; Türker & Pala, 2020; Yadav et al., 2014).  

This investigation into how students used testing and debugging within the context of an 

evaporative cooling unit demonstrates both the possibilities and challenges of integrating this practice into 

secondary science education. Although we have evidence of students testing and debugging their models, 

the relative absence of using external data to directly validate their models is an area of concern (Table 5). 

It suggests that more direct curricular support is needed to encourage students to compare their models to 

external data. This, along with a desire to better support other aspects of testing and debugging (such as 

the peer review process), has led us towards making several curriculum changes. We have developed a set 

73 

 
 
 
 
 
of model design guidelines to help students identify areas of their models that can be improved during the 

model revision process. These model design guidelines ask students to consider if their models have 

appropriately named/relevant variables, define appropriate relationships between variables, have clearly 

defined boundaries, and work appropriately when simulated. The last section of these guidelines asks 

students to consider how their models compare to real-world data, further emphasizing the importance of 

using external data to validate their models. In addition to scaffolding the model revision process, 

students are encouraged to use these guidelines when giving feedback to their peers. We also plan on 

being more explicit with students about which specific pieces of experimental data they should use to 

validate their models during model revisions. For example, we have added a built-in table to the 

SageModeler canvas where students can input experimental data showing how the temperature of liquids 

changes over the course of evaporation. Finally, we are streamlining the unit to allow for more in-depth 

classroom discourse and more scaffolded model revisions. In this way, we hope to reduce student and 

teacher fatigue over this unit.     

 In addition to curricular scaffolds, future iterations of this design-based research could 

investigate the role of teacher scaffolds in supporting students in using external data to validate their 

models, as instructional supports offer another avenue to bring this aspect of testing and debugging into 

the classroom. We also might further investigate the role of student groups in supporting collaborative 

discourse around testing and debugging and find additional ways to leverage peer revisions to best 

support student model revisions. Finally, future work needs to investigate how different testing and 

debugging behavioral patterns are linked to model outcomes. Given our small sample size, it was difficult 

to determine any meaningful correlations between student testing and debugging behavioral patterns and 

either their post-unit understanding of disciplinary core ideas or the conceptual accuracy of their final 

models. While we hypothesize that student groups that systematically analyze the model output and 

frequently compare their models to real-world experimental data will end up with models that more 

accurately represent a canonical understanding of the system they are modeling, it is also possible that 

such behaviors lead to model fitting and limit opportunities for students to reflect on their evolving 

74 

 
 
 
 
 
conceptual understanding of the phenomenon (Sins et al., 2005; Wilensky & Reisman, 2006). Therefore, 

while it is likely that student groups with the most robust models and deepest understanding of underlying 

disciplinary core ideas will use testing and debugging behavioral patterns that combine dialogic analysis 

(a la Group 2) with systematic analysis and comparison of model output to experimental data (a la Group 

3), future work will be needed to address this hypothesis.  

This research can guide future research and inform teacher educators and teachers in their efforts 

to effectively engage students in computational tasks involving testing and debugging. By increasing 

awareness of the different behavioral patterns exhibited during testing and debugging, such as seeking 

advice from peers or making inferences based on simulations, teachers can provide a more nuanced 

facilitation that supports students’ strategies. For instance, teachers can prompt students to utilize tools 

such as graphs. Teachers can also encourage the use of effective simulation strategies, such as holding all 

variables constant except for one and comparing results across various scenarios. To foster productive 

discussion and critical thinking during testing and debugging, teachers can guide students in asking 

questions such as “How do you know that?” and “What does your model show?” and encourage 

simulation as a means of exploration and validation. Ultimately, by showing some of the different ways 

that students can test and debug their models, we hope that this research will encourage teachers to adopt 

holistic approaches to supporting students with this practice. 

75 

 
 
 
 
 
 
 
PAPER 3: SYNERGISTIC SCAFFOLDING AND CLEAR RATIONALES: HOW TEACHERS 

CAN SUPPORT STUDENTS WITH TESTING AND DEBUGGING IN A COMPUTATIONAL 

Abstract 

MODELING CONTEXT  

In our technology driven and algorithm dominated world, students need familiarity with computational 

thinking (CT), systems thinking (ST), and computational modeling to make sense of their lived 

experiences. One aspect of computational modeling that allows students to meaningfully integrate various 

aspects of CT and ST is testing and debugging. Through testing and debugging, students can analyze 

model output to identify areas of their models that need revisions, give and receive peer feedback on their 

models, and use external data to validate model output and revise model structure. Although testing and 

debugging has been identified as an important aspect of CT and computational modeling, pedagogical 

strategies for supporting students with testing and debugging in a computational modeling context remain 

understudied. In this study, I investigated how two teachers supported their students with testing and 

debugging in the context of secondary level chemistry unit involving computational modeling. Both 

teachers in this study demonstrated key pedagogical strategies for supporting students with testing and 

debugging. Their use of synergistic scaffolding and their efforts to present students with a clear rationale 

for engaging with different aspects of testing and debugging empowered their students to use testing and 

debugging to make meaningful revisions to their computational models. While these teachers’ synergistic 

scaffolding strategies seemed to support these students with testing and debugging, certain advanced 

testing and debugging behaviors (such as comparing and contrasting model output with experimental 

results to find errors in model structure) were seldom found in this study. This suggests that additional 

support, beyond what was observed in this study, is needed for students to perform at higher levels of 

testing and debugging. 

Introduction 

Given the increasing prevalence of algorithmic programs, computer software, and complex 

iterative tasks across all aspects of society, it is increasingly necessary for students to become familiar 

76 

 
 
 
 
 
 
with computational thinking (CT), regardless of their future career paths (Barr et al., 2011; Bourgault & 

E., 2023; Wing, 2006). Computational thinking (CT) is a form of sensemaking that uses an iterative and 

quantitative approach to decompose a phenomenon or problem to explore, explain, and predict the 

behavior of that phenomenon or to find a solution to a problem through the creation and revision of 

algorithms (Grover & Pea, 2018; Shin et al., 2022; Weintrop et al., 2016; Wing, 2006). Although CT has 

been increasingly recognized as a key goal for K-12 education (NRC, 2012), with many nations taking 

steps to incorporate CT into their educational standards, there remain many conflicting definitions and 

ideas about how CT should be introduced and supported in classrooms (Brackmann et al., 2016;  Hsu et 

al., 2019; Heintz et al., 2014; Webb et al., 2017). The Next Generation Science Standards (NGSS) 

emphasizes the synergy between CT and mathematical thinking and focuses on having students use 

computational tools to analyze and interpret large data sets (NRC, 2012; NGSS, 2013; Shin et al., 2022). 

Some scholars focus on having students learn standard programming languages, such as C++, Java, or 

Python, due to their practical uses outside of the classroom (Abid et al., 2015; Grandell et al., 2006; Price 

& Price-Mohr, 2018; Tabet et al., 2016). Others argue for a broader definition of CT as “thinking like a 

computer scientist” and examine specific CT concepts and practices needed to solve algorithmic problems 

(Grover & Pea, 2018; Nardelli, 2019; Shute et al., 2017). By emphasizing broader CT concepts and 

practices, it is possible for students to use CT to approach and solve algorithmic problems through 

constructing, testing, and revising computational models (Hutchins et al., 2020; Sengupta et al., 2013; 

Shin et al., 2021; Weintrop et al., 2016).  

 Computational modeling describes efforts to create models using algorithms or algorithmic 

thinking to represent a phenomenon in a quantitative or semiquantitative manner, typically using 

computational modeling software (Fisher, 2018; Pierson & Clark, 2018; Sengupta et al., 2013; Shin et al., 

2022). When students construct computational models of scientific phenomena, they often need to 

employ various aspects of computational thinking, such as problem decomposition, testing and 

debugging, and making iterative refinements (Anderson, 2016; Brennan & Resnick. 2012; Irgens et al., 

2020; Wang, 2021b). In addition to serving as a platform to assist students in practicing CT, 

77 

 
 
 
 
 
 
computational models can allow students to visualize phenomena as systems of interacting elements and 

explore complex relationship patterns such as stock and flow/collector and flow systems and feedback 

loops (Bowers et al., 2023; Basu et al., 2016; Cronin et al., 2009; Nguyen & Santagata, 2020). As such, 

computational modeling exists at a critical intersection between CT and systems thinking (ST) (Hamidi et 

al., 2023; Shin et al., 2022; Weintrop et al., 2016). Systems thinking (ST) describes the cognitive 

processes necessary to explore how the various aspects of a phenomenon interact with each other to form 

a more complex system (Arnold & Wade, 2015; Meadows, 2008; Stave & Hopper, 2007). The synergy 

between computational modeling, CT, and ST was explored by Shin and colleagues (2022) in “A 

Framework for Computational Modeling” (Figure 13). When constructing this framework, Shin et al. 

(2022) took inspiration and guidance from the ST, CT, and Modeling literature to define five ST aspects, 

five CT aspects, and five computational modeling practices shown in this framework. While the ST and 

CT aspects of this framework serve to summarize the authors’ conceptualization of systems thinking and 

computational thinking, respectively, the five computational modeling practices are concrete actions that 

students perform as they design, construct, test, and revise their computational systems models. Each of 

the five computational modeling practices is informed by the ST and CT aspects defined in this 

framework and provides students with the opportunity to develop and demonstrate various aspects of ST 

and CT (Shin et al., 2022).  

78 

 
 
 
 
 
 
 
Figure 13: A Framework for Computational Systems Modeling (Shin et al., 2022) 

One computational modeling practice from this framework that illustrates how computational 

modeling supports students with CT and ST is “test, evaluate, and debug model behavior”, often 

shortened to “testing and debugging” (Bowers et al., 2023; Shin et al., 2022). When testing and 

debugging models, students are actively analyzing and interpreting model structures and model behavior, 

often through examining model output, to find aspects of their models that need to be revised (Barlas, 

1996; Hogan & Thomas, 2001; Sengupta et al., 2013; Stave, 2002). As such, students need to understand 

both how the structural aspects of their models influence its behavior (ST) and how to make changes to 

their algorithmic system (CT) to fully test and debug their models (Bowers et al., 2023). Testing and 

debugging is often an iterative process that involves frequent analysis of model output to identify 

structural components of their models that are not functioning as intended (i.e., “bugs”) as well as model 

aspects that no longer fit their conceptual understanding of the phenomenon (Jonassen & Hung, 2006; Li 

et al., 2019; Ogegbo & Ramnaraian, 2021). Students can also utilize peer feedback and compare model 

output to external data to help facilitate testing and debugging (Bowers et al., 2023; Emara et al, 2020; 

Weintrop et al., 2016; Yoon et al., 2016).  

79 

 
 
 
 
 
 
 
Although testing and debugging are core aspects of computational modeling and CT, many 

students find it challenging (Eidin et al., 2023; Grapin et al., 2022; Li et al., 2019; Sins et al., 2005). 

Previous studies suggest that students are often reluctant to interpret model output to inform model 

revision, preferring an “ad hoc” approach to model revision (Grapin et al., 2022; Stratford et al., 1998; 

Swanson et al., 2021). Research also suggests that students are likely to ignore key opportunities to 

compare the model output of their computational models to external data (Bowers et al., 2023; Grapin et 

al., 2022; Swanson et al., 2021). This reluctance to interpret model output or to compare model output to 

external data could correlate to students not fully understanding the technical capabilities of the 

computational modeling environment and thus not knowing all the tools that are available to use. 

Additionally, students might hesitate to use external data to validate their models if their teachers do not 

consistently reinforce the principle that computational models should reflect real-world experimental 

results.  

Other studies show that when students compare their models to external data, they often adopt an 

“outcome oriented” stance to model revisions, forcing their models to fit this external data without 

considering how the structural changes they are making to their model reflect their conceptual 

understanding of the phenomenon (Li et al., 2019; Sins et al., 2005; Wilensky & Reisman, 2006). This 

can lead students towards creating final models that can structurally replicate the net behavior of the 

phenomenon but lack internal consistency and are less useful for explaining how the phenomenon 

behaves in the real world. For example, a student might create a model that shows that an increase in 

human industrial activity leads to a net increase in global temperatures. However, this model internally 

suggests that increasing carbon dioxide levels increases the acidity of the oceans, thereby making sea-ice 

less stable, increasing the loss of sea ice, leading to higher global temperatures. Although the net 

conclusion of this model (that increased industrial activity leads to increased global temperatures) is 

scientifically accurate, there are some key internal relationships that defy the scientific consensus. 

Increasing carbon dioxide levels does increase the acidity of the oceans, but the increasing acidity of the 

oceans has a negligible impact on the stability of sea-ice and thus invalidates this line of causal reasoning. 

80 

 
 
 
 
 
So, while this explanation arrives at the scientifically accurate conclusion that increasing carbon dioxide 

emissions increase global temperatures, its lack of internal logic and inconsistency with key scientific 

principles, suggest a lack of understanding of disciplinary core ideas around the mechanisms of climate 

change. 

Given these differential modeling outcomes, it is important to consider the role that teachers can 

play in supporting students with testing and debugging. Studies on synergistic scaffolding have 

demonstrated that learning in computerized learning environments is often enhanced by whole class 

discussions and targeted support from teachers (Baker et al., 2004; Li & Lim, 2008; Wu & Pedersen, 

2011). Previous literature has also investigated how teachers can support students in computational 

modeling and in coding-based debugging tasks (Fretz et al., 2002; McCauley, 2008; Michaeli & 

Romeike, 2019; Snyder et al., 2022). However, none of these research studies narrowed in specifically on 

aspects of testing and debugging that are particularly critical to computational modeling, including 

analyzing a model’s output, analyzing and using external data to validate model output, and using peer 

feedback to support model revisions. As such, I investigated how teachers can support students in these 

three major aspects of testing and debugging by addressing the following research questions. 

Research Questions 

1.  How does a teacher support students with testing and debugging in a secondary science unit 

involving computational systems modeling? 

2.  How do these pedagogical strategies compare to those used by another teacher teaching the same 

secondary science unit? 

3.  What pedagogical strategies correlate with student testing and debugging behaviors in this 

secondary science unit? 

Literature Review 

Testing and Debugging across Disciplines 

Testing and debugging has been identified as a core practice of computational thinking and 

computational modeling across several STEM related disciplines. The computer science literature often 

81 

 
 
 
 
 
 
emphasizes debugging as a key skill essential for programming and argues that proficiency with 

debugging is a major indicator that separates a novice from expert programmers (Griffin, 2016; Murphy 

et al., 2008; Soloway & Spohrer, 2013). Griffin (2016) views testing and debugging as the process of 

searching for anomalies in a software program, finding specific parts of the code that are not working 

appropriately (i.e., “bugs”), and then fixing these aspects of the computer code. Given the inherent 

complexities of computer programming languages, the scope of testing and debugging can range from 

finding simple syntax errors (e.g., missing parentheses) to larger structural errors, such as unresolved 

recursion loops (Ahmadzadeh et al., 2005; Ford & Teorey; McCauley et al., 2008; Michaeli & Romeike, 

2019). Due to these complexities, other scholars emphasize first analyzing a computer program as a 

broader system of interconnected segments of code, then testing to see if the program behaves as planned 

before going through individual lines of code to identify errors that need to be corrected (Fix et al., 1993; 

McCauley et al., 2008; Vessey, 1985). 

Testing and debugging is also prominent in studies focusing on block programming (Kim et al., 

2018; Emara et al., 2020; Tsan et al., 2022). Block programming describes computer programs that 

contain preset “blocks” of code that students can use to help construct more complex programs (Lye & 

Koh, 2014; Resnick et al., 2009; Tsan et al., 2022). While these block programs often allow students to 

engage in programming tasks without needing to learn more complex programming languages and syntax 

(Akcaoglu, 2014; Bers et al., 2014; Lye & Koh, 2014), students have a tendency towards surface-level 

engagement and often arrive at functioning code through ad-hoc tinkering rather than more nuanced 

coding techniques (Brennan & Resnick, 2012; Grover et al., 2015; Kim et al., 2018). As such, some 

computer science educators push for teachers to demonstrate more sophisticated and deliberate efforts at 

testing and debugging when teaching with block programming to help students better understand broader 

organizational patterns that are useful across programming contexts (Grover et al., 2015; Kim et al., 

2018). Block programming is often used in conjunction with physical computing systems, including 

robots (Bers, 2010; Kazakoff & Bers, 2012; Wang et al., 2021a). In these cases, student testing and 

debugging must also consider errors that can occur in assembling these physical computing systems in 

82 

 
 
 
 
 
addition to traditional software bugs (Bers et al., 2014; Elliott et al., 2023). Despite these additional points 

of failure, physical computing systems can help students better identify the behavioral outputs of their 

code, further facilitating testing and debugging (Bers et al., 2014.; Elliott et al., 2023)   

Another area of STEM education where students often engage in testing and debugging is 

through computational modeling (Emara et al., 2020; Hutchins et al., 2020; Weintrop et al., 2016; Yoon 

et al., 2016). Computational modeling, particularly when it involves text-coded or block-coded agent-

based modeling programs, has a fair degree of overlap with code-based programming. However, testing 

and debugging in computational modeling also has been influenced by the model revision process (Emara 

et al., 2020; Lin et al., 2021; Papaevripidou et al., 2007; Shin et al., 2022). Scientific models represent 

natural phenomena that explain or predict the behavior of a system (Harrison & Treagust, 2000; Louca & 

Zacharia, 2012; Mittelstraß, 2005; Schwarz et. al, 2009). As scientists conduct experiments and collect 

new data, they will often need to make modifications to their models so that they better explain the 

behavior of observed phenomena (Louca & Zacharia, 2012; Oh & Oh, 2011). By having students go 

through the process of constructing a model, testing it through experimentation, and making changes 

based on experimental data, students are given a first-hand experience with the iterative nature of 

scientific investigations (Louca & Zacharia, 2012; Metcalf et al., 2000; NRC, 2012; Shin et al. 2021). 

This process of continuous model revision based on new experimental data (or new information made 

available to students) is often referred to as iterative refinement and is a major aspect of testing and 

debugging within computational modeling (Bowers et al., 2023; Grover & Pea, 2018; Hutchins et al., 

2020; Shin et al., 2022). As students engage in iterative refinement, they often need to reassess their 

understanding of the phenomenon and the underlying science ideas they are modeling. This reassessment 

can help students identify gaps in their understanding of the phenomenon and thereby improve their 

learning of science content (Clement, 2000; Schwarz et al., 2007, 2009; Windschitl et al., 2008).     

In addition to facilitating iterative refinement, the ability to visualize model output is another key 

facet of computational modeling software that supports students in testing and debugging (Bowers et al., 

2023; Fretz et al., 2002; Sengupta et al., 2012; Weintrop et al., 2016). Computational modeling programs 

83 

 
 
 
 
 
typically produce a visual model output that allows students to see model behavior to facilitate model 

testing (Sengupta et al., 2012; Shin et al., 2022; Campbell & Oh, 2015; Fisher, 2018). In agent-based 

modeling programs (e.g. NetLogo), students can program various aspects or agents in their models that 

have distinct behavioral traits which are visualized when the program runs (Dickes & Sengupta, 2012; 

Goldstone & Janssen, 2005; Sengupta & Farris, 2012; Wilensky & Reisman, 2006). If the visual output of 

the program behaves contrary to the student’s expectations, they reexamine their programming choices to 

change the outcome behavior of two or more agents. In icon-based modeling programs (e.g., Stella, 

Model-it, and SageModeler), the model output is often visualized through graphs of the relative amounts 

of different variables present in the model (Damelin et al., 2017; Metcalf et al., 2000; Nguyen & 

Santagata, 2020; Richmond, 1994). These model output graphs can often be impacted by changing the 

relative amount of each input variable or making changes to the overall model structure (e.g., changing 

relationships between variables in the model). Agent-based modeling software can also offer graphical 

outputs that summarize the behavior of various model agents in more quantifiable terms (e.g., how the 

population of wolves interacts with the population of sheep in an ecosystem model) (Dabholkar et al., 

2018; Gkiolmas et al., 2013; Wilensky & Reisman, 2006). Because these graphical outputs are often 

quantitative or semi-quantitative, they allow students to compare their models to data collected from real-

world experiments (Campbell & Oh, 2015; Shin et al., 2021; Wilensky & Reisman, 2006). These 

comparisons between model output and real-world data are a critical aspect of testing and debugging as it 

provides students with the opportunity to have their models validated (Basu et al., 2016; Sengupta et al., 

2013; Shin et al., 2021; Stratford et al., 1998). This validation process helps students connect their 

computational models back to the real-world phenomenon and develop a greater appreciation for the 

experimental aspect of science.    

Testing and Debugging in “A Framework for Computational Systems Modeling” 

Building off these previous efforts to describe how students can engage in testing and debugging, 

my colleagues and I compiled a description of testing and debugging rooted in our understandings of CT, 

Computational Modeling, and ST in “A Framework for Computational Systems Modeling” (Shin et al., 

84 

 
 
 
 
 
 
2022). While the term “testing and debugging” appears in both the CT aspect of “testing and debugging” 

and in the computational modeling practice of “test, evaluate, and debug model behavior” in this 

framework, for the purpose of this study, I am focusing on testing and debugging as a computational 

modeling practice. When testing and debugging, students will often begin by analyzing the visual output 

of their model or discussing the various relationships and variables that are present in their model 

(Bowers et al., 2023; Hadad et al., 2020; Lee et al., 2020; Shin et al., 2022). Through this process, 

students will often identify aspects of their model that do not match their evolving understanding of the 

phenomenon or do not align with experimental data. This will, in turn, prompt students to search for 

specific relationships and variables that can be changed to improve their model’s behavior. Through an 

iterative process of critiquing model output and refining model structures, student models will generally 

come closer to matching the behavior of the real-world phenomenon. 

This inclusive view of testing and debugging, inspired by several scholars (Aho, 2012; Basu et 

al., 2016; Lee et al., 2020; Yadav et al., 2014), acknowledges how students manifest aspects of CT and 

ST as they engage in this practice (Bowers et al., 2022, 2023; Shin et al., 2022; Figure 14). In this 

framework, aspects of the scientific practice of “using mathematics and computational thinking” and the 

crosscutting concepts of systems and systems models and cause and effect are embedded into the 

computational systems modeling practice of “testing, evaluating, and debugging model behavior” and the 

broader scientific practice of “developing and using models” (Shin et al., 2021, 2022). By running the 

computational model to examine graphical model output to find aspects of their models that are not 

behaving as expected, students utilize the CT aspect of “testing and debugging.” Likewise, when students 

compare this model output to external real-world data, they are also “generating, organizing, and 

interpreting data”. As students discuss the validity of various relationships in their model, they are 

exhibiting the ST aspect of “causal reasoning”, which also overlaps with the crosscutting concept of 

“cause and effect”. When these conversations shift towards dissecting how these structural elements 

impact broader aspects of model behavior, students are “interpreting and predicting system behavior 

85 

 
 
 
 
 
 
based on system structure”. Finally, students “make iterative refinements” when they make changes to 

their models so that their model’s behavior better matches the real-world phenomenon.  

Figure 14: Aspects of Systems Thinking and Computational Thinking exhibited through the 

computational modeling practice of “Test, Evaluate and Debug Model Behavior” 

As students often simultaneously utilize multiple aspects of ST and CT as they are testing and 

debugging their models, it is not practical to subdivide student testing and debugging behaviors based on 

these categories. Instead, my colleagues and I sought to describe the different testing and debugging 

behaviors of students as they created, tested, and revised computational models (Bowers et al., 2022). By 

developing the ST and CT Identification Tool (ID Tool), we identified six categories of testing and 

debugging behaviors that are evidence of students utilizing CT and ST: Sensemaking through Discourse, 

Analyzing Model Output: Simulations, Analyzing Model Output: Graphs, Analyzing and Using External 

Data, Using Feedback, and Reflecting upon Iterative Refinement (Table 8). These six categories were 

chosen to reflect the diversity of approaches students took to testing and debugging within the context of 

a high school chemistry unit centered on using icon-based computational modeling (Bowers et al., 2022). 

For each of these categories, we created a four-level system for describing the complexity of student 

86 

 
 
 
 
 
 
behaviors. This coding scheme was subsequently validated by external reviewers and an extensive 

literature review; internal interrater reliability tests further established its usefulness as a tool for 

demonstrating student testing and debugging behaviors.  

Table 8: Description of key indicators from the ST and CT Identification Tool 

Indicator 

Description 

Brief Level Descriptions 

A: 
Sensemaking 
through 
Discourse 

B: Analyzing 
Model Output: 
Simulations 

C: Analyzing 
Model Output: 
Graphs 

Students either verbalize 
their reasoning for making 
changes to their models or 
engage in conversations 
about why specific aspects 
of their models need to be 
improved. 

Level 1: Verbalize changes to model or identify 
areas needing revisions, but no reasoning 
Level 2: Verbalize reasoning but no mutual 
dialogue 
Level 3: Back and forth dialogue with verbal 
reasoning 
Level 4: Back and forth dialogue with verbal 
reasoning and impact on other parts of model 

Students use embedded 
model output tools to 
analyze how their model 
behaves under different 
input conditions. In this 
case, students use the 
simulation tool in 
SageModeler to test their 
models. 

Level 1: Adjusting one or more input variables, 
but no verbal reasoning 
Level 2: Adjusting input variables with verbal 
reasoning but no dialogue 
Level 3: Adjusting input variables with verbal 
reasoning and dialogue, focus on local behavior 
Level 4: Adjusting input variables with verbal 
reasoning and dialogue, holistic model 
discussion 

Students use embedded 
model output tools to 
analyze how their model 
behaves under different 
input conditions. In this 
case, students generate and 
analyze graphs in 
SageModeler. 

Level 1: Unsuccessful attempt to make a graph 
in SageModeler 
Level 2: Successful graph creation, but no 
interpretation 
Level 3: Successful graph creation with 
discussion of implications for the graphed 
variables 
Level 4: Successful graph creation with 
discussion of the broader implications for 
model behavior 

87 

 
 
 
 
 
 
Table 8 (cont’d) 

Indicator 
D: Analyzing and 
Using External 
Data 

E: Using 
Feedback 

F: Reflecting 
upon Iterative 
Refinement 

Description 
Students use external data 
sources to verify model 
behavior. At more sophisticated 
levels, students compare specific 
external data sources directly to 
their models and discuss the 
validity of the external data. 

Students receive meaningful 
feedback from others (teachers 
or peers), discuss the validity of 
the feedback, and use feedback 
to inform model revisions. At 
more sophisticated levels, 
students test their models after 
making recommended changes 
and have a follow-up discussion 
with others to share their new 
insights. 

Students reflect through writing 
or discourse on the changes they 
have made to their models. At 
more sophisticated levels, 
students give a defined rationale 
for the changes they have made. 

Brief Level Descriptions 
Level 1: Superficial reference to data or 
referencing inaccurate data 
Level 2: Reference external data to inform 
revisions but no direct comparisons to model 
output 
Level 3: Compare specific external data to 
model output without discussion of data 
validity 
Level 4: Compare specific external data to 
model output with discussion of data validity 
Level 1: Students receive feedback but do not 
discuss it or use it to inform revisions 
Level 2: Students make changes to their 
models based on feedback but do not discuss 
the validity of the feedback 
Level 3: Students receive feedback, discuss its 
validity, and make or do not make changes to 
their models based on feedback 
Level 4: Students receive feedback, discuss its 
validity, make, or do not make changes to their 
models based on feedback, and share 
reflections with another group 
Level 1: Ambiguous surface level reflection 
without reasoning 
Level 2: List specific model changes but do not 
provide detailed reasoning 
Level 3: List changes and reflect upon 
reasoning 
Level 4: List changes, reflect upon reasoning 
(with a defined rationale), and discuss broader 
changes to models 

While each of these testing and debugging behaviors is rooted in CT, ST, and Computational 

Modeling literature and were present in previous studies, three behavioral categories (analyzing model 

output, using feedback, and analyzing and using external data) stand out as being particularly relevant to 

key aspects of testing and debugging (Bowers, 2022, 2023). As previously stated, being able to visualize 

a model output is a key feature of computational modeling tools that can strongly support students in 

testing and debugging (Bowers et al., 2023; Fretz et al., 2002; Sengupta et al., 2012; Weintrop et al., 

2016). One computational modeling tool that allows students to generate visual model output is 

SageModeler. SageModeler is an icon-based opensource modeling program that was used to design the 

88 

 
 
 
 
 
 
ID tool. In SageModeler, there are two main ways that students can generate model output from their 

computational models: the simulation feature, which shows how the relative amount of each variable 

changes when the input variables are manipulated, and the graphing feature, which can demonstrate the 

relationship between any two variables in the model (Damelin et al., 2017; Bowers et al., 2023; Figure 

15). A previous study (Bowers et al., 2023) suggests students primarily use the graphing features of 

SageModeler only when they are comparing model output data to external data and use the simulation 

features throughout the rest of the modeling process to drive testing and debugging. In this study, I will 

focus on students using the simulation features to analyze model output. The ability to learn from the 

feedback of others, especially peers, is an important goal of social constructivist approaches to science 

education (Ben-Ari, 2001; Louca & Zacharia, 2012; Schreiber & Valle, 2013: Tsivitanidou et al., 2018). 

My earlier work shows that students often use peer feedback to identify aspects of their models that need 

revisions and gain new insights into model design from analyzing peer models (Bowers et al., 2023). 

However, teacher and curricular support are necessary for students to get the most benefit out of the peer 

review process (Louca & Zacharia, 2012; Luxton-Reilly, 2009; Reynolds & Moskovitz, 2008; Wen & 

Tsai, 2008). Given these benefits and challenges, I am continuing to focus on students using feedback 

(Indicator E) to drive testing and debugging in this unit.  

89 

 
 
 
 
 
 
 
Figure 15: Simulation and Graphing Features of SageModeler 

Figure 15A: Using the Simulation Features of SageModeler 

Figure 15B: Comparing Model Output Data to Experimental Data using Graphing Features 

90 

 
 
 
 
 
 
 
 
Lastly, using external data to verify computational models has long been identified as a key 

learning goal for computational modeling (Basu et al., 2016; Bravo et al., 2006; Sengupta et al., 2013; 

Stratford et al., 1998). Previous studies have shown that this is a task that students find challenging 

(Bowers et al., 2023; Grapin et al., 2022; Sins et al., 2005; Stratford et al., 1998). In some studies, 

students largely ignore external data as a means of validating model output (Bowers et al., 2023; Grapin 

et al., 2022; Stratford et al., 1998). In other studies, students who do use external data to drive testing and 

debugging often focus on forcing their model to fit the output patterns suggested by the external data 

rather than using the incongruence between their computational model and external data to drive 

discussions on the conceptual ideas embodied in the model structures they are revising (Li et al., 2019; 

Sins et al., 2005; Wilensky & Reisman, 2006). In both cases, the students are not fully engaging with a 

key affordance of computational modeling and therefore miss critical learning opportunities from 

comparing model output to external data. As such, I argue that it is important to investigate additional 

mechanisms for supporting students in analyzing and using external data (indicator D) as they are testing 

and debugging their computational models.  

Scaffolding and Synergistic Scaffolds 

Scaffolding is a common framework for supporting students in learning a new task or practice. 

Building off Vygotsky's ideas about the zone of proximal development, Wood and colleagues (1976) 

postulated that students need supports or “scaffolds” to enable them to achieve tasks beyond their present 

abilities (Lin et al., 2012; Rogoff, 1990; Tabak & Kyza, 2018; Wertsch, 1979). As such, scaffolding 

describes the combination of verbal instructions, written directions, and technological tools that enable 

students to perform learning tasks that they would not necessarily be able to complete otherwise. 

Scaffolding enables students to experience a complex task, even if they are unable to complete certain 

portions of the task independently (Lin et al., 2012; Tabak, 2004; Tabak & Kyza, 2018). For example, 

many children have initial difficulty maintaining the velocity necessary to keep a bike in an upright 

position. As such, training wheels are often a necessary scaffold so that the child can focus on mastering 

pedaling and steering without needing to focus on maintaining balance during this phase of learning. 

91 

 
 
 
 
 
Additionally, having a parent available to talk to the child through the importance of peddling and to 

catch the child when they are losing balance is another form of scaffolding children in learning how to 

ride a bicycle. 

Over time, students gain the ability to complete a task without the support of the scaffolds 

(Collins et al., 1989; Lin et al., 2012; Tabak & Kyza, 2018). In anticipation of such growth, teachers and 

curriculum developers can adopt a “fading scaffolds” strategy, where scaffolds are gradually removed as 

students become more proficient in a task (Collins et al., 1989; Lin et al., 2012; Wu & Pedersen, 2011). In 

the bicycle example, the training wheels are often removed once the child has proficiency with pedaling, 

steering, and stopping the bicycle; the child can then focus on learning how to balance the bicycle without 

the training wheels as scaffolds. In science classrooms, these scaffolds can take many forms including: 

whole class demonstrations on how to perform a task, computer text boxes encouraging students to write 

down their reasoning during the modeling process, and sentence starter guidelines to support students 

with making claim, evidence, reasoning statements during classroom argumentation (Basu et al., 2017; 

Lin et al., 2012; McNeill & Krajcik, 2009; Tabak & Kyza, 2018). 

Because many tasks and practices, such as testing and debugging, are too complex to be covered 

in a single lesson or through a single scaffold, scholars advocate for distributed scaffolding (Hsu et al., 

2015; Puntambekar & Kolodner, 2003; Tabak, 2004; Tabak & Kyza, 2018). Distributed scaffolding 

describes efforts to create a set of scaffolds spread across several types of media and/or multiple 

timepoints and is usually subdivided into three major categories: differentiated scaffolding, redundant 

scaffolding, and synergistic scaffolding (Puntambekar & Kolodner, 2003; Tabak, 2004). Differentiated 

scaffolding emphasizes using different tools to support different learning needs around a common 

practice or task (Krajcik et al., 2000; Tabak, 2004; Tabak & Reiser, 1999). For example, a teacher might 

provide verbal instruction on the importance of frequently testing the output of computational models and 

then distribute handouts for how to provide feedback during peer review sessions. While both of these 

scaffolds help support students with the broader practice of testing and debugging, they focus on different 

learning goals and do not naturally build on each other. Redundant scaffolding describes efforts to use 

92 

 
 
 
 
 
different tools and techniques or the same tools and techniques across multiple time points to support a 

common learning goal (Puntambekar & Kolodner, 2003; Tabak, 2004). Redundant scaffolding often 

either involves repetition of past scaffolds or using different sets of supports for the same practice in a 

disconnected or disjointed manner. 

Teachers and curriculum developers engage in synergistic scaffolding when they design different 

tools and techniques to work in tandem to support a set of learning needs associated with a more complex 

task or practice (Tabak, 2004; McNeill & Krajcik, 2009). Because the different supports are designed to 

build off each other to reinforce student learning over the course of the unit, such scaffolding creates 

synergy that transcends that of redundant scaffolding. In implementing synergistic scaffolds, a teacher 

might provide a brief whole class demonstration on how to use the simulation features embedded in a 

computational modeling program. They will later have a brief informational talk on the importance of 

using this simulation feature to find flaws in model structures and guide model revisions. When the 

teacher subsequently sets up a whole class discussion on how to evaluate peer models, they will again 

reference their earlier informational talk and whole class demonstration to reemphasize the importance of 

analyzing model output through simulations and to build towards the next learning goal of giving quality 

peer feedback. As the teacher provides feedback to individual student groups, they will then reference 

their earlier informational talks and whole class demonstrations to help remind students of the importance 

of using the simulation features and to review the mechanics of the simulation features. Synergistic 

scaffolding allows for teachers to use multiple mediums to reach a larger cross-section of students to 

support them with using more complex practices, thus bolstering student learning (Tabak, 2004; McNeill 

& Krajcik, 2009).  

Given the multi-media nature of learning in computerized settings, there has been an emphasis on 

creating synergistic scaffolds to support students in these learning environments. Hutchins and colleagues 

(2020) investigated how a block coded agent-based modeling program served as a synergistic learning 

environment for supporting students in learning both physics content and CT practices. Other studies 

focus on how scaffolding supports, such as adaptive mentor agents, can be embedded within a 

93 

 
 
 
 
 
computerized learning environment to support students in performing key practices, including 

computational modeling (Baker et al., 2004; Fretz et al., 2002; Grawemeyer et al., 2017; Putnambekar & 

Hubscher, 2005). Wu and Pedersen (2011) recognized that many studies on scaffolding in computerized 

settings were ignoring the role of teachers in supporting student learning. They argued that solely relying 

on computer-based scaffolds in these learning environments overlooks how students often ignore text-

based supports and how students tend to focus on task completion and thereby dismiss key reflection 

opportunities embedded into these learning environments (Baker et al., 2004; Li & Lim, 2008; Wu & 

Pedersen, 2011). Wu and Pedersen (2011) subsequently demonstrated that synergistic scaffolding 

provided by both the computerized environment and in-person teachers supported student learning better 

than either set of scaffolds in isolation. Snow and colleagues (2022) also reflected on the role of teachers 

in supporting student learning in computerized learning environments as they investigated how teachers 

used discourse to create synergy between computerized simulations and classroom discussions in a 

chemistry classroom. Given the importance of teachers in supporting student learning in computerized 

learning environments, I aim to investigate how the pedagogical strategies used by teachers in a 

computational modeling unit helps scaffold students in testing and debugging computational models. 

Pedagogical Strategies for Supporting Students with Testing and Debugging 

Computer science educators recognize that testing and debugging is a specialized practice that 

does not come naturally to novice programmers but must instead be supported by explicit instruction 

(Kessler & Anderson, 1986; McCauley, 2008; Michaeli & Romeike, 2019; Murphy et al., 2008).   As 

such, scholars have identified several examples of successful pedagogical strategies for supporting 

students with testing and debugging in traditional text based programming environments (Michaeli & 

Romeike, 2019; Wilson, 1987; Chmiel & Loui, 2004). Wilson (1987) observed computer science 

professors using Socratic questioning to support students in adopting a more systems level view of their 

code. With questions about the purpose of their code and how various elements of their code interacted 

with each other, students were able to identify more broadly where the program might be malfunctioning 

and were subsequently able to debug that part of their code. In addition to having teachers ask students 

94 

 
 
 
 
 
reflective questions, the literature also emphasizes the need for students to be given guiding questions and 

frameworks to guide their own debugging practices. Carver & Risinger (1987) showed that when students 

were given 30 minutes of explicit debugging instruction, centered on teaching students how to use a flow 

chart of questions designed to help students locate and repair bugs in their code, students were able to 

identify and fix bugs in independent programming tasks more effectively. In a similar manner, Michaeli 

and Romeike (2019) suggest that students can build greater self-efficacy and competency with testing and 

debugging by using the “Compile, Run, Compare” debugging approach. In “Compile, Run, Compare,” 

students first ask if the program is compiling and/or running within a timely manner to see if there are 

compile time and/or runtime errors. Then students are tasked with comparing their program output to their 

expected output to see if logical errors exist in their code. Through giving students reflective questions 

and straightforward paradigms, these teachers helped them develop a core framework for testing and 

debugging their computer programs that they could build upon in future assignments. 

Beyond flow charts and Socratic questioning to help students master generic debugging 

strategies, Chmiel & Loui (2004) demonstrated the importance of practice problems and reflective 

practices for supporting students in building competency with testing and debugging. In this study, 

students were regularly given practice problems where they had to identify and repair bugs that existed in 

a teacher created program. As students worked on their own programs, they were also encouraged to keep 

a journal to reflect on changes they made to their code, where bugs were occurring in their code, and how 

they would code differently in the future. Lastly these students also regularly gave and received peer 

feedback on their code, providing more opportunities to identify problems and make improvements to 

their computer programs. This approach of providing students with multiple opportunities to practice 

testing and debugging (through practice problems and peer review) and reflect on the debugging process 

appears to help students develop a deeper understanding of testing and debugging that transcends the 

“rules based” paradigms laid out in flowcharts or other simplified frameworks. 

Both modeling and computational modeling literature often focus on broader efforts to support 

students with revising their models. In their endeavors to develop a learning progression for scientific 

95 

 
 
 
 
 
modeling, Schwarz and colleagues (2009), emphasized the importance of students learning that models 

should be revised to better explain and communicate key science ideas. Li & Schwarz (2020) identified 

that a key way that teachers can support students with model revisions is through using generative 

questions redirecting students to consider the nature and purpose of modeling. For example, when a 

student’s model was not adequately addressing a key aspect of the phenomenon, the teacher asked the 

student “How and why do you think those changes happened? . . . How and why do you think the liquid 

seemed to disappear?” (pg. 187). Such questions prompted the student to revise their model to include a 

deeper explanation of the phenomenon. In a similar manner, Ambitious Science Teaching encourages 

teachers to use “back pocket” questions such as “how does it do that?” and “does that reflect our 

experimental results” to encourage students to revise their models to better explain the underlying 

mechanisms of a phenomenon (Windschitl et al., 2020). In addition to questions that encourage students 

to revise their models to unpack mechanistic reasoning, Justi recommends that teachers support students 

with analyzing and interpreting experimental data (2009). This assists students with recognizing if their 

models reflect the real-world phenomenon or if further revisions are needed. In all these cases, teachers 

are using generative questions to help students identify aspects of their models that can be improved as 

well as giving students a clear rationale for model revisions. 

Compared to computer science and modeling, there are few studies of pedagogical strategies for 

testing and debugging in computational modeling literature. Pierson and colleagues (2017) examined 

computational modeling as an extension of scientific modeling and thus centered their vision of testing 

and debugging firmly within the ideas of model revision developed by Schwarz et al., 2009. As such, they 

identified that students were more driven to revise their models to increase their explanatory power rather 

than to match an external data source. Pierson and colleagues (2017) attribute this emphasis on 

explanatory power to the culture of collaboration and peer review created by the teacher, as students were 

revising their models to better communicate their ideas with other students in their classroom. In a similar 

study, Pierson and Clark (2018) found that having students present their computational models to an 

external audience of younger students was a motivating factor for encouraging model revisions. As such, 

96 

 
 
 
 
 
both studies show that creating an authentic need to use computational models as a communication tool is 

an effective pedagogical strategy for encouraging students to engage in the revision aspects of testing and 

debugging. In contrast to Pierson and colleagues (2017), Basu and colleagues (2016), firmly center their 

vision of computational modeling within a computational thinking framework and are thus more focused 

on how teachers support students with identifying bugs in their block-based coding program. As such, 

they encourage teachers to ask their students to break down their code into different subsystems to help 

narrow down the source of any potential bugs. Such types of questions largely mirror the Socratic 

questions discussed by Wilson (1987) and the flow charts of Carver & Risinger (1987). Beyond coding-

based approaches to testing and debugging, Basu and colleagues (2016) identified that the simulation 

features embedded in their computational modeling program should be used to help students test their 

model output and determine if it matches external data or the output of an “expert model”. 

While the computer science, modeling, and computational modeling studies all provide insight 

into potential strategies for teachers to use to support students with testing and debugging, none of them 

fully address the vision of testing and debugging laid out in “A Framework for Computational Modeling”. 

While text-based programming studies support teachers in encouraging students to compare computer 

output to their expected outcomes, the sources of text-based programming errors differ from those 

common in computational modeling. Pedagogical strategies described in scientific modeling literature 

that provide students with a clear rationale for engaging in model revisions are helpful for computational 

modeling contexts. However, without properly supporting students with analyzing model output or 

comparing model output to external data, students could easily adopt ad-hoc revision strategies that 

ignore the affordances of computational modeling. Likewise, few computational modeling studies have 

gone sufficiently in-depth on testing and debugging to adequately address how teachers should be 

supporting students with this practice in a computational modeling context. As such, I set out to 

investigate how two teachers supported their students with the computational modeling practice of testing 

and debugging in a high school chemistry unit on evaporative cooling. 

97 

 
 
 
 
 
 
 
Methods 

Study Context and Learning Environment           

Learning Environment and Participants 

Two high school chemistry teachers, Mr. H and Mr. M (both pseudonyms) collaborated with me 

and each other to implement an evaporative cooling unit in their classrooms during November-December 

2022. Both Mr. H, (a 44-year-old White male with approximately 20 years of teaching experience) and 

Mr. M (a 32-year-old White male with approximately 10 years of teaching experience) teach 10th grade 

chemistry at Faraday High School (pseudonym; FHS). FHS is a Midwestern STEM magnet school; while 

publicly funded, students must apply to this school from a tri-county catchment area with admissions 

based on academic test scores, teacher recommendations, and considerations for equity (as the school tries 

to take a representative population from the many districts in its catchment area). Around 79% of FHS 

students identify as White and around 54% of students receive free or reduced lunches. FHS runs on a 

block schedule, meaning each chemistry class meets for 80 minutes every other day.  

Prior to implementing this unit, Mr. H and Mr. M participated in a professional learning program 

(PLP) focused on supporting students with creating, testing, debugging, and modifying computational 

systems models.  Both Mr. H and Mr. M fully participated in this PLP and implemented the evaporative 

cooling unit in their classrooms and a full narrative of their implementations serve to address research 

questions 1 and 2, respectively. However, given that few students from Mr. M’s class agreed to 

participate in the screencast data collection process, I am only able to address the impact of Mr. H’s 

pedagogy on his students’ testing and debugging through research question 3.  As such this is primarily a 

case study on how Mr. H supported his class in testing and debugging their computational models in a 

unit on evaporative cooling, with one of Mr. M’s sections used to compare with Mr. H’s pedagogical 

strategies (Table 9). 

98 

 
 
 
 
 
 
 
Table 9: Demographic Data of Mr. H and Mr. M’s classes 

Teacher 

Mr. H 
Mr. M 

# of 
Students 
29 
14 

# of Female 
Students 
13 (45%) 
8 (57%) 

# of Hispanic 
Students 
0 (0%) 
0 (0%) 

# of Black 
Students 
2 (7%) 
1 (7%) 

# of Asian 
Students 
2 (7%) 
3 (21%) 

# of White 
Students 
25 (86%) 
10 (72%) 

Professional Learning 

To prepare for implementing this unit, Mr. H and Mr. M participated in a professional learning program 

(PLP) geared towards supporting them in using the evaporative cooling unit to engage students in testing 

and debugging practices. In the two months leading up to implementing this unit, Mr. H and Mr. M 

participated in weekly 45-minute PLP meetings over Zoom. Linsey Brennan (a fellow graduate student), 

Emil Eidin (a post-doctoral researcher), and I collaborated and co-organized these PLP meetings. Early in 

these meetings, we deliberately unpacked the curriculum and mapped out a timeline to encourage fidelity 

of implementation. Next, we reviewed an approach for introducing the students to using SageModeler, 

through an online self-directed learning module publicly available on the SageModeler website 

(https://sagemodeler.concord.org/app/#file=examples:Getting%20Started). The teachers ended up having 

students work through this module for a single class period a few days before this unit started. We also 

discussed how to support students with building their initial models, including the use of an embodied 

modeling experience where students took on the role of evaporating liquid molecules and co-constructing 

the initial model backbone together as a class.  

To help foster student driven discussions and better support students in small group work, we 

reviewed a series of practitioner focused guidelines for classroom discourse and student collaboration 

strategies. We encouraged Mr. H and Mr. M to have students adopt a “copilot” strategy where one student 

would be designated to control the cursor during the process of modeling, while the other student(s) 

provided ideas and insights to help the first student build or revise the model. The students would 

periodically switch roles to help ensure that all students got a chance to control the cursor during the 

process of modeling. Another aspect of supporting students with collaboration was the peer review 

99 

 
 
 
 
 
 
guidelines. These peer review guidelines aimed to help students identify aspects of their peers’ models 

that needed improvement and share this feedback with their peers. We also discussed how students could 

gain deeper insights from the peer review process, which could help them improve their models. We also 

highly encouraged Mr. H and Mr. M to have frequent whole class model review sessions, where they 

would place an anonymized student model in front of the whole class and help students provide feedback 

and critique for this model. Mr. H and Mr. M also had the opportunity to reflect on the importance of 

having students revise their models using real world data and discussed strategies to support students in 

using real-world data to validate their models, such as having a whole class demonstration on how to 

input external data into SageModeler and how to overlay the graphs from the external data with model 

output data to validate model output.  

Throughout these meetings, we frequently pointed out how the curriculum and the SageModeler 

software were designed to support students in the three main aspects of testing and debugging: analyzing 

model output through simulation features, analyzing external data to validate model output, and using 

peer feedback to further model revision. By highlighting the scaffolds for testing and debugging already 

embedded in the curriculum and the SageModeler software, we helped Mr. H and Mr. M identify areas 

where they could have informational talks, whole class discussions, or conversations with small groups 

that further supported and scaffolded these learning goals. For example, at one professional learning 

meeting, we discussed how after building their initial model, students would have the opportunity to 

begin analyzing model output using the simulation feature embedded in SageModeler and suggested that 

the teachers provide additional support and scaffolding at that time for this aspect of testing and 

debugging. In this manner, our professional learning program encouraged the teachers to develop 

pedagogical strategies and scaffolds for testing and debugging that synergized with the supports already 

built into the curriculum and the SageModeler software, beyond the specific synergistic strategies we 

discussed regarding the use of the peer review guidelines, the importance of frequent whole class model 

review sessions, and the classroom discussions and demonstrations necessary to support students in 

analyzing external data to validate model output. 

100 

 
 
 
 
 
Curriculum 

The five-week evaporative cooling unit implemented in this study was designed according to 

project-based learning (PBL) principles. Building off the work of Krajcik & Shin (2022), my vision of 

PBL includes: exploring a meaningful driving question based on a real-world phenomenon, investigating 

the driving question and phenomenon through scientific practices, creating knowledge products such as 

computational systems models, encouraging productive collaboration, and utilizing learning technologies. 

Evaporative cooling describes the phenomenon of liquids becoming colder as fast moving, high kinetic 

energy (KE) particles evaporate first. These faster moving particles overcome the intermolecular forces 

(IMFs) and transition from a liquid to a gas. As they evaporate, the KE of these liquid particles is 

transferred to the potential energy (PE) of the gas particles. This loss of high KE particles reduces the 

average KE of the remaining liquid, causing the liquid to become colder and reducing the rate of 

evaporation. To represent this phenomenon using SageModeler, students need to demonstrate how both 

the mass and kinetic energy of a liquid transition into the mass and potential energy of a gas via 

evaporation (Figure 16A). 

Figure 16: Examples of Evaporative Cooling Models 

Figure 16A: Example of a Final Form Evaporative Cooling Model 

101 

 
 
 
 
 
 
Figure 16 (cont’d) 

Figure 16B: Backbone of Evaporative Cooling Model 

Students first interacted with this phenomenon by observing how rubbing alcohol, acetone, and 

water feel as they evaporate from their skin. They were then tasked with creating a diagrammatic model 

of evaporative cooling that addressed the unit’s driving question: Why do I feel colder when I am wet 

than when I am dry? Students were next introduced to SageModeler by Mr. H and Mr. M, who helped 

them construct the initial “backbone” relationship of their models, showing the mass of a liquid 

transforming into the mass of a gas via evaporation (Figure 16B). Students then worked in small groups 

(two to three students) to expand on this initial model. As the unit progressed, students were exposed to 

additional concepts (e.g., IMF, kinetic energy, and potential energy) through hands-on experiments, 

computerized learning modules (Figure 17), and classroom discussions, encouraging them to test, debug, 

and revise their models. Students were also given opportunities to receive structured feedback from other 

groups. Mr. H and Mr. M also anonymously presented student models during whole class discussions to 

allow for the class to collectively provide feedback to this anonymous group and to discuss aspects of 

these selected models that could benefit all students in their model revision process. 

Figure 17: Example of a Computerized Learning Module from the Evaporative Cooling Unit 

102 

 
 
 
 
 
 
 
SageModeler 

Throughout the evaporative cooling unit, students constructed, tested, debugged, and revised 

computational systems models using SageModeler – a free, browser-based, open-source software 

program. SageModeler is an icon-based modeling program that enables students to create variables and 

set relationships between these variables using a dropdown menu (Figure 18A). Students also design 

appropriate variables as “collectors” (variables that can accumulate an amount over time) and make 

transfer relationships/flows between these collector variables (Figure 18B). This can allow them to model 

how the mass of a liquid transitions into the mass of a gas during evaporation. SageModeler also has a 

simulation feature that allows students to manipulate the relative amount of each input variable to see how 

their model behaves under different initial conditions and how their model behavior changes over time 

(Figure 18A). Students can also input real world data into SageModeler and make graphs to compare their 

simulated model generated graphs with real world data (Figure 18C).  

Figure 18: Introduction to SageModeler Features 

Figure 18A: Setting Relationships and using the Simulate Feature 

Figure 18B: Collector and Flow Relationships 

103 

 
 
 
 
 
 
 
Figure 18 (cont’d) 

Figure 18C: Comparing Model Output Data to Experimental Data 

Data Collection 

I, along with Linsey Brennan and Tingting Li, collected data for this study in collaboration with 

Mr. H and Mr. M in November and December of 2022. My primary targets for data collection were 

teacher videos and student screencasts. Teacher videos were captured using a specialized microphone 

alongside whole class video using an iPad camera attached to a tripod stand. This allowed me to record 

how Mr. H and Mr. M supported student testing and debugging through whole class discussions, 

conversations with small groups/individual students, and behind the scenes efforts to troubleshoot 

learning technology. As such, the teacher audio is my primary data source for assessing Mr. H and Mr. 

M’s teacher moves. The student screencasts capture student audio and student screen actions as they 

constructed and revised their computational systems models using SageModeler software. These student 

screencasts were used to determine how students were testing and debugging their models during this 

unit. Student screencasts were collected from five student groups in Mr. H’s class (11 students total); their 

pseudonyms and demographics are listed in Table 10. Given the smaller class size and fewer student 

104 

 
 
 
 
 
 
volunteers, screencasts were not collected in Mr. M’s class. Students who are not screencast students 

(e.g., the students in Mr. M’s class) are given letter-based pseudonyms (i.e. Student A, Student B, etc.). 

Table 10: Screencast Student Pseudonyms and Demographics 

Student Group 

Student Pseudonyms 

Demographics 

Group 1 

Reese and Eric 

South Asian Male, White Male 

Group 2 

Esme and Lilly 

White Female, White Female 

Group 3 

Carter, Sam, and Fred 

White Male, White Male, White Male 

Group 4 

Tiffany and Anna 

South Asian Female, White Female 

Group 5 

Morty and Isabelle 

White Male, White Female 

It is also important to note that either myself or another colleague was present each day of this 

unit. While our primary purpose was to set up our data collection equipment, we also assisted Mr. H and 

Mr. M with troubleshooting the various technology related problems associated with SageModeler their 

students encountered during this unit. As such, we can be said to have been active participants in this 

classroom environment. However, we did our best to limit our support to aiding with technology related 

challenges and to avoid providing any prompting or other scaffolds that would have helped students with 

learning science content, CS Modeling, or Testing and Debugging. While we occasionally gave brief, in 

class feedback to Mr. H and Mr. M, we largely avoided efforts to influence student learning or teacher 

pedagogy during classroom enactment, providing most of our suggestions after the lesson was finished on 

any given day. This was to show respect to Mr. H and Mr. M as professionals and to minimize the impact 

of our interference on student learning outcomes. 

Data Analysis 

I uploaded all teacher videos and all screencast videos into Atlas.ti for data analysis. Atlas.ti is a 

qualitative software analysis program that allows users to highlight specific segments of video, assign 

qualitative codes to these video segments, and make additional notes to summarize these video segments 

in addition to the broader qualitative codes. For the purposes of this study, I used separate sets of 

105 

 
 
 
 
 
 
qualitative codes for the teacher videos and for the screencast videos so I could narrow in on teacher 

pedagogical moves and student testing and debugging behaviors, respectively.  

Teacher Pedagogical Moves 

To categorize how teachers scaffolded students in testing and debugging, along with broader 

teacher pedagogical moves, I developed a three-tier coding system loosely based on Fretz and colleagues’ 

efforts to classify how teachers were supporting students with computational modeling (2002). Fretz and 

colleagues created three major categories to describe teacher pedagogical actions: pedagogical activities 

(what activities teachers have assigned to students at a specific moment in time), scaffold focus (the 

specific types of information/support teachers provide to help students), and targeted indicator (aspects of 

computational modeling the instruction is aiming to support). In my qualitative coding, I adapted these 

three categories into pedagogical method, pedagogical focus, and computational systems modeling 

content (CS Modeling content). This coding scheme was reviewed by two external experts in 

computational modeling, whose feedback was incorporated into the final version of this coding scheme. It 

was also validated by four researchers collaborating on this project, who each coded one set of three 30-

minute segments of video to reach a coding consensus and an interrater reliability of 87%. Note that each 

quote used in this paper will include all corresponding categories in its description to provide further 

examples of how these categories were used to analyze the data. 

The pedagogical method category differs from the pedagogical activities category developed by 

Fretz and colleagues (2002). Rather than focusing on the actions teachers have assigned to students, my 

pedagogical method category describes the broader actions and methods Mr. H and Mr. M used to support 

student learning during any moment of teaching. Within the pedagogical method category are the 

following seven subcategories: information sharing/informational talk, teacher-centered whole class 

discourse, teacher discussions with small groups/individual students, computer and laboratory 

demonstrations, and behind-the-scenes actions/conversations (Table 11). These pedagogical methods 

subcategories were based on my own initial observations about the different ways Mr. H was 

communicating with students during his teaching, with the initial subcategories being “informational 

106 

 
 
 
 
 
 
talks”, “whole class discourse”, “small group discussions”, and “demonstrations”. Upon watching the first 

few classroom videos, I decided to subdivide the demonstration subcategory into three separate 

categories, to better capture the diversity of delivery methods used in these classrooms. I also added the 

“behind the scenes” subcategory after noticing that Mr. H and Mr. M frequently had important generative 

conversations about their teaching as students worked in their independent groups. 

Table 11: Pedagogical Method Subcategories 

Subcategory 
Name 

Informational 
Talk 

Whole class 
discourse 

Discussions 
with small 
groups 

Subcategory Description 

Subcategory Example 

The teacher speaks to the whole class and 
students are not verbally participating in 
discourse. Can last from 1 minute to 12 minutes 
based on data collected in this study. 

Mr. M tells students the 
agenda for that class period 
while students listen 

The teacher addresses the whole classroom and 
either asks students to share their ideas or has 
students ask questions, which might be answered 
by the teacher or by other students. Includes 
conversations where the teacher shares 
information with students but asks frequent 
questions to foster student participation. 

As Mr. M discusses 
evaporation, he asks students 
to share their experiences 
with evaporation to the 
whole class 

Teacher visits small groups and talks with them.  Mr. H asks Reese and Eric to 

explain their models and 
provide feedback to them. 

Computer 
Demonstrations 

The teacher demonstrates how to use an aspect of 
SageModeler or another piece of software in 
front of the whole class. 

Mr. H shows students how to 
use the simulate features of 
SageModeler 

Laboratory 
Demonstrations 

Teacher presents a key scientific principle or 
laboratory technique through a visual experiment 
or demonstration 

Video 
Demonstrations 

The teacher demonstrates a key concept through 
a video. 

Behind the 
Scenes 

Teacher talks with the researcher or another 
teacher OR the teacher troubleshoots a 
technology problem on his own. Usually out of 
view of students. 

Mr. H shows that despite 
being less dense than water, 
canola oil does not seem to 
evaporate after 30 minutes 

Mr. M has students watch a 
video about diagrammatic 
models. 

Mr. H shares his experiences 
with leading a discussion of 
evaporation with Mr. M 
while students are revising 
their models. 

107 

 
 
 
 
 
The “pedagogical focus” category classifies the various types of information that Mr. H and Mr. 

M are seeking to communicate with students through their teaching (Table 12). Several of the 

subcategories of “pedagogical focus” were directly adapted from the “teacher scaffolds” category of Fretz 

and colleagues (2002). One of the broader categories of pedagogical focus is classroom housekeeping, 

which includes efforts to organize student groups, describe present and future tasks, and redirecting 

students back to their assigned work. This was inspired by the “task” scaffold subcategory of Fretz and 

colleagues, which classified efforts taken by teachers to help redirect students back towards 

computational modeling tasks. In addition to classroom housekeeping, I added “relationship building” to 

describe interactions where the teachers were primarily focused on building rapport and community with 

their students. The “science content” subcategory was adapted from the “conceptual” scaffold 

subcategory as a means of describing episodes where teachers were explicitly supporting students with 

learning key principles of science content knowledge. Technology and Sagemodeler utility come from the 

“utility” scaffold, which also emphasizes how to use learning software and technology. Given the many 

challenges students faced when various technology software and hardware malfunctioned, I also added 

the “technology troubleshooting” category to cover efforts by the teachers to help students with 

technology when it was not working as intended. Lastly, CS Modeling is the broader pedagogical focus 

category for the entire concept of Computational Systems Modeling, which is explained more in-depth in 

the next category of codes. 

108 

 
 
 
 
 
 
 
Table 12: Pedagogical Focus Categories 

Category Name 

Category Description 

Category Example 

Classroom 
Housekeeping 

Science Content  

Teacher discusses the class agenda, 
student tasks, and classroom 
organization. Focus is on managing the 
classroom as a learning community. 
Includes task redirection. 

Mr. H helps a student group that 
has been absent for a few days 
figure out what tasks they need 
to focus on to get caught up to 
their peers. 

Teacher talks about key scientific 
concepts or principles. In this unit, the 
focus is on evaporative cooling, energy, 
and IMF. 

Mr. M discusses different types 
of Potential Energy and Kinetic 
Energy that exist in nature. 

Relationship Building  The teacher talks with students about 
topics that pertain to student personal 
lives and non-academic interests.  

Mr. H has an opening 
discussion asking students 
about their favorite Pixar 
movies. 

Technology Utility 

Teacher demonstrates or talks about how 
to use technology and software. Includes 
the learning management system and 
data collection tools for the experiments. 
Does not include any use of 
SageModeler. 

Mr. H demonstrates how to log 
into the learning management 
system; Mr. M demonstrates 
how to use the data collection 
tools for the temperature vs. 
time experiment. 

Sagemodeler Utility 

The teacher demonstrates or discusses 
how to use key aspects of SageModeler. 

Mr. H demonstrates how to use 
the simulate feature. 

Technology 
Troubleshooting 

CS Modeling 

Teacher focuses on fixing problems that 
have arisen from issues with technology. 
Includes SageModeler, the learning 
management system, and physical 
technology. The focus is on when 
technology is not behaving as intended 
(as contrasted to technology utility). 

Mr. H works with individual 
students to try and recover their 
models which have not been 
saved by the learning 
management system; Mr. M 
tells students how to work 
around a key glitch in 
SageModeler software. 

The teacher highlights key concepts and 
key technology tools associated with 
supporting students in the process of 
constructing, testing, debugging, and 
revising computational systems models. 

Mr. H provides students with 
the peer review guidelines to 
help scaffold the peer review 
process. Mr. M asks individual 
groups to explain their models. 

CS modeling content describes the specific aspects of CS modeling that Mr. H and Mr. M are 

focusing on during their teaching (Table 13). As such, CS modeling is technically a subset of pedagogical 

content. The CS modeling categories are based directly on key indicators from the Systems Thinking and 

109 

 
 
 
 
 
 
Computational Thinking Identification Tool, along with key ideas from “A Framework for Computational 

Systems Modeling.” Given that this study predominantly focuses on how Mr. H and Mr. M are supporting 

their students with testing and debugging, the categories of CS modeling reflect this emphasis on the three 

aspects of testing and debugging (analyzing model output through simulations, using external data to 

support model revisions, and using external feedback through peer review) I am targeting in this study. In 

addition to those three categories, I also chose to categorize instances where teachers pointed out specific 

model components or where they discussed broader concepts related to systems thinking. Using these 

categories to classify instances where these two teachers were instructing students in the three main 

aspects of testing and debugging served as the foundation for more in-depth narrative analyses. 

Table 13: CS Modeling Content Categories 

Category Name 

Category Description 

Category Example 

Analyzing Model 
Output 

Teacher supports students with 
using the simulate features to 
analyze model output to revise 
their models. 

Mr. H gives a 5-minute informational 
talk about the importance of using the 
simulate features to speed up the 
revision process. 

External Data 

Peer Review 

Model 
Components 

Teacher supports students with 
using external data (both 
quantitative and qualitative) to 
support model revisions. 

Mr. M shows students how to input 
external data into SageModeler and 
how to compare these data with model 
output. 

Teacher assists students in both 
critiquing peer models and in 
utilizing peer feedback to drive 
future model revisions. 

Mr. H leads a whole class model 
critique, showing students how to 
critique peer models. Mr. M organizes 
peer reviews. 

The teacher points out specific 
model components (variables and 
relationships) that students should 
revise in their models. 

In a small group discussion, Mr. H asks 
students about the “density” variable 
and why they have density in their 
model. 

Systems Thinking  The teacher assists students in 

understanding and utilizing 
systems thinking principles. 

Mr. M discusses feedback loops during 
a whole class model critique. 

110 

 
 
 
 
 
 
 
 
Student Testing and Debugging Behaviors 

To classify student testing and debugging behaviors, I used three key indicators from the ST and 

CT Identification Tool (Table 14). This coding scheme, originally developed based on A Framework for 

Computational Systems Modeling, identifies six major testing and debugging behaviors that students 

often use to test and debug computational systems models (Shin et al., 2022; Bowers et al., 2022). The ST 

and CT Identification Tool was originally designed for analyzing screencasts of students building and 

revising SageModeler models and was reviewed by a panel of five external expert reviewers and by four 

internal reviewers (with a 91.7% agreement between all four raters). Given that this instrument has 

previously been validated, both externally and internally using screencast data that closely mirror the 

student screencast data found in this study (Bowers et al., 2022, 2023), I find this instrument to be 

appropriate for assessing student testing and debugging behaviors in this study.  

Although there are a total of six indicators in the original ST and CT Identification tool, I only 

used the three indicators associated with the aspects of testing and debugging that were most relevant for 

this study (i.e., analyzing the model output, analyzing, and using external data, and using feedback). 

These three indicators were chosen based on earlier literature suggesting that students either found these 

aspects of testing and debugging particularly challenging or that additional teacher support is needed for 

students to fully demonstrate these aspects of testing and debugging (Bowers et al., 2023; Grapin et al., 

2022; Li et al., 2019; Louca & Zacharia, 2012; Sins et al., 2005). I also used the level descriptions to 

assign appropriate levels for student testing and debugging behaviors based on this rubric as I analyzed 

student screencasts. These individual categories were strongly supported by previous efforts to validate 

this instrument (Bowers et al., 2022, 2023; Table 4), achieving a higher degree of agreement across the 

four reviewers than the other three categories. Removing the other three indicators (Sensemaking through 

Discourse, Analyzing Model Output: Graphs, and Reflecting upon Iterative Refinement) narrows the 

scope of my results, However, my previous studies (Bowers et al., 2022, 2023) suggest that these 

additional aspects of the ST and CT Identification Tool, were both more challenging to track and less 

meaningful indicators of student proficiency with testing and debugging. Additionally, because none of 

111 

 
 
 
 
 
these categories were emphasized in the professional learning program nor were specifically targeted by 

curricular or software scaffolds, their absence serves to streamline this study to focus on the most relevant 

and well supported aspects of testing and debugging present in this unit. 

Table 14: Description of key indicators from the ST and CT Identification Tool (Bowers et al., 2022) 

Indicator 

Description 

Brief Level Descriptions 

B: Analyzing 
Model Output: 
Simulations 

D: Analyzing and 
Using External 
Data 

E: Using 
Feedback 

Students use embedded model 
output tools to analyze how 
their model behaves under 
different input conditions. In 
this case, students use the 
simulation tool in 
SageModeler to test their 
models. 

Students use external data 
sources to verify model 
behavior. At more 
sophisticated levels, students 
compare specific external data 
sources directly to their 
models and discuss the 
validity of the external data. 

Level 1: Adjusting one or more input 
variables, but no verbal reasoning 
Level 2: Adjusting input variables with 
verbal reasoning but no dialogue 
Level 3: Adjusting input variables with 
verbal reasoning and dialogue, focus on 
local behavior 
Level 4: Adjusting input variables with 
verbal reasoning and dialogue, holistic 
model discussion 

Level 1: Superficial reference to data or 
referencing inaccurate data 
Level 2: Reference external data to inform 
revisions but no direct comparisons to 
model output 
Level 3: Compare specific external data to 
model output without discussion of data 
validity 
Level 4: Compare specific external data to 
model output with discussion of data 
validity 

Students receive meaningful 
feedback from others (teachers 
or peers), discuss the validity 
of the feedback, and use 
feedback to inform model 
revisions. At more 
sophisticated levels, students 
test their models after making 
recommended changes and 
have a follow-up discussion 
with others to share their new 
insights. 

Level 1: Students receive feedback but do 
not discuss it or use it to inform revisions 
Level 2: Students make changes to their 
models based on feedback but do not 
discuss the validity of the feedback 
Level 3: Students receive feedback, discuss 
its validity, and make or do not make 
changes to their models based on feedback 
Level 4: Students receive feedback, discuss 
its validity, make, or do not make changes 
to their models based on feedback, and 
share reflections with another group 

112 

 
 
 
 
 
Preliminary Analysis and Summary Table Construction 

Once I developed these qualitative coding rubrics, I began coding both the teacher videos and the 

student screencasts using these qualitative codes and making additional notes to summarize key teaching 

moments and key examples of student testing and debugging. Once I had finished my initial coding, I 

constructed a summary table for the teacher videos and the student screencasts. For the teacher videos, I 

added up the minutes for each category in the coding scheme for each day. This allows exploration of 

how frequently Mr. H and Mr. M utilized different pedagogical methods, emphasizing different 

pedagogical content, and focusing on specific aspects of CS modeling to see how their pedagogical 

focuses compared with each other and shifted over the course of the unit. For the student screencasts, I 

also added up the minutes for each coding category for each student group for each day and then 

combined data from all five groups into a single unified data set. This enables us to compare student 

testing and debugging behaviors with Mr. H’s pedagogical methods and pedagogical content. While the 

summary tables alone were unable to provide an adequate answer to the research questions, they helped to 

inform the narrative analyses. 

Narrative Analyses 

Building off the summary tables and my preliminary data analysis, I revisited the teacher videos 

with a focus on teaching moments where Mr. H and Mr. M supported students with CS modeling content. 

I specifically rewatched every part of the videos where I coded for “Analyzing Model Content,” “External 

Data,” and “Peer Review,” taking notes on their pedagogical moves and the specific aspects of these 

testing and debugging behaviors they chose to focus on in those moments. Based on these additional 

notes and my initial findings from the summary tables, I wrote a detailed narrative analysis of Mr. H’s 

pedagogical strategies for supporting students in these three areas of testing and debugging, thus 

addressing Research Question 1: How does a teacher support students with testing and debugging in a 

secondary science unit involving computational systems modeling? Additionally, I conducted a parallel 

narrative analysis of Mr. M’s pedagogical strategies to address Research Question 2: How do these 

113 

 
 
 
 
 
 
 
pedagogical strategies compare to those used by another teacher teaching the same secondary science 

unit? 

Once I had finished the teacher focused narrative analyses, I returned to the summary tables to 

see how student testing and debugging behaviors compared to Mr. H’s efforts to support them with CS 

modeling content. I also systematically rewatched student screencasts to see what aspects of the learning 

environment prompted them to test and debug their models and if there were specific aspects of Mr. H’s 

teaching that were particularly helpful for encouraging students to use more advanced testing and 

debugging techniques. These determinations were made based on the proximity of student testing and 

debugging behaviors to specific moments of Mr. H’s teaching as well as student appropriation of key 

phrases or testing and debugging behaviors previously shared by Mr. H. This further investigation 

culminated in a narrative analysis that summarizes my findings and addresses Research Question 3: What 

pedagogical strategies correlate with student testing and debugging behaviors in this secondary science 

unit? After completing these narrative analyses, I had Mr. H and Mr. M review my findings as a form of 

member checking to further validate my interpretation of their pedagogical strategies. Both Mr. H and Mr. 

M largely agreed with my interpretation of these data and offered some additional context, particularly on 

their rationale for certain pedagogical strategies, that informed the final version of this manuscript. 

Results 

Research Question 1: How does a teacher support students with testing and debugging in a secondary 

science unit involving computational systems modeling? 

The summary tables illustrate that Mr. H used a diverse array of pedagogical methods to support 

his students throughout this unit (Table 15). Mr. H commonly supported students through discussions 

with small groups and individual students (256 minutes, 26.6% of class time). These discussions often 

corresponded to moments where the small groups were working on their evaporative cooling models, and 

Mr. H offered individualized support with the modeling process. Mr. H also spent a substantial amount of 

time providing information to the whole class (153 minutes, 15.9% of class time). This included both 

brief comments, such as when he shares an issue he has found while working with a small group with the 

114 

 
 
 
 
 
whole class and longer informational talks, where Mr. H explained a specific scientific concept or 

modeling principle. In addition to these informational talks, Mr. H provided opportunities for interactive 

whole class discussions (167 minutes, 17.4% of class time). These whole class discussions included 

discussing opening questions meant to ease students into learning and build classroom community, 

sharing and answering phenomenon-driven questions from the driving question board, and critiquing 

anonymous models shared by Mr. H to build student familiarity with reviewing peer models and 

encourage self-reflection and revisions of their own models. 

In addition to incorporating these three main pedagogical methods, Mr. H also had to balance 

different pedagogical foci across this unit. The three most common categories Mr. H focused on were: CS 

Modeling (278 minutes, 29% of class time), Science Content (237 minutes, 24.7% of class time) and 

Classroom Housekeeping (230 minutes, 23.9% of class time). While the focus on CS Modeling and 

Science Content are self-explanatory (given the design goals of this unit), the substantial amount of time 

spent on Classroom Housekeeping seems to originate from the organizational complexity of this unit and 

the need to support students through key transitions between classroom activities. Students frequently 

shifted between hands-on investigations, whole class informational talks and discussions, and small group 

work on their computational systems models and associated learning modules. Through these transitions, 

Mr. H provided instructional support to keep students moving forward. Additionally, Mr. H needed to 

organize student peer reviews and other key logistical aspects of this unit. It is also possible that 

conducting this unit towards the end of the fall semester (right before winter break) could have 

contributed to more Classroom Housekeeping being necessary. Beyond these three main categories, Mr. 

H spent a significant amount of class time teaching students how to use SageModeler (SageModeler 

Utility, 95 minutes, 9.9% of class time) and other classroom technologies (Technology Utility, 59 

minutes, 6.1% of class time) such as the learning management system associated with this unit and 

laboratory technology for recording temperature data. Mr. H also spent a significant amount of class time 

troubleshooting SageModeler and the learning management system (100 minutes, 10.4% of class time). 

115 

 
 
 
 
 
Such troubleshooting efforts limited Mr. H’s ability to provide more support with student models and 

with key aspects of testing and debugging. 

Within his focus on CS modeling, Mr. H provided targeted support for the testing and debugging 

aspects of analyzing model output (78 minutes, 8.1% of class time), analyzing and using external data, 

(63 minutes, 6.6% of class time), and using feedback/peer review (99 minutes, 10.3% of class time). Mr. 

H’s emphasis on these three aspects of testing and debugging varied throughout this unit, largely 

mirroring the changing emphasis placed on each by the curriculum. For the first three days of this unit, 

Mr. H provided little direct support for testing and debugging as students were building their initial 

models. On Nov 17th (Day 4) and Nov 21st (Day 5), Mr. H provided targeted instruction for analyzing 

model output, emphasizing the need to use the simulation features of SageModeler to drive model 

revisions. Likewise, Mr. H helped scaffold student peer reviews on Nov 21st (Day 5) and Dec 1st (Day 7), 

providing both organizational and instructional supports. Mr. H had extended (10-minute) informational 

talks on using external data to validate their models on Dec 12th (Day 10) and Dec 15th (Day 11), which 

coincided with students collecting quantitative data on how the temperature of water, rubbing alcohol, and 

acetone change during the process of evaporation. Additionally, when showing students how to compare 

external data to model output on Dec 15th (Day 11), Mr. H also reinforced the need to analyze model 

output using the simulation features, showing a synergistic approach for supporting both practices. On 

both Dec 12th (Day 10) and Dec 19th (Day 12), Mr. H facilitated a whole class review of anonymized 

student models, once more showcasing key aspects needed for peer reviews. A more detailed narrative 

analysis of how Mr. H supported each of these practices is found below. In addition to focusing on testing 

and debugging, Mr. H also dedicated a substantial amount of time to systems thinking (160 minutes, 

16.6% of class time) and targeting specific model components (118 minutes, 12.3% of class time), e.g., 

pointing out specific variables and relationships students should revise in their models. 

116 

 
 
 
 
 
 
 
Table 15: Summary Table of Mr. H’s pedagogical methods  

Note that time is in minutes (rounded to the nearest .25 minutes). The percentage is based on 960 minutes 

of class time across all 12 days (80 minutes per day). 

Category 

Nov 7 
Day 1  

Nov10 
Day 2 

Nov 14 
Day 3 

Nov 17 
Day 4 

Nov 21 
Day 5 

Nov 28 
Day 6 

Info Sharing 

20.25 

14.25 

9 

Whole Class 

4 

7.5 

27 

Small Group 

22.5 

20.25 

26.75 

9 

23.5 

8 

9 

0 

0 

5 

0 

0 

0 

0 

23.5 

14.25 

5 

0 

0 

6.5 

4.5 

4.25 

27.25 

24.75 

26.5 

11.25 

Comp Demos 

Lab Demos 

Videos 

BTS 

Classroom 
Housekeeping 

Science 
Content  

Relationship 
Building 

Tech Utility 

2.75 

6.25 

3.25 

3.75 

Sage Utility 

Tech Tshoot 

0 

16 

0 

3 

19.5 

10 

6 

2 

CS Modeling 

Model Output 

External Data 

Peer Review 

Model 
Components 

Systems 
Thinking 

1 

0 

0 

0 

0 

0 

0 

0 

0 

4 

4.5 

0 

1.5 

10.25 

16.5 

20 

10.5 

0 

9 

6 

9 

117 

Dec 1 
Day 7 

19.5 

13.75 

10 

4 

39 

0 

0 

0 

0.5 

16.5 

18.5 

13.75 

6 

0 

7 

0 

2.5 

19.5 

0 

4.25 

2.25 

3 

10.25 

16.75 

3 

4 

12 

5.25 

0.75 

0 

3 

17 

0 

13 

20 

3.25 

0 

4.75 

4 

7.75 

15.5 

7.75 

38.5 

4.5 

0 

29.75 

10.25 

14.75 

4.75 

18.25 

10.25 

39.25 

25.25 

19.25 

15.75 

22.5 

18.25 

3.5 

0.5 

2.75 

6 

3.75 

4.25 

23.75 

26 

11.25 

41.25 

11.25 

 
 
 
 
 
 
Table 15 (cont’d) 

Category 

Dec 5 
Day 8 

Dec 8 
Day 9 

Dec 12 
Day 10 

Dec 15 
Day 11 

Dec 19 
Day 12 

Total 

% 

Info Sharing 

11 

13.75 

12.75 

8.5 

Whole Class 

19.75 

4.75 

19.75 

11.5 

16 

3 

0.75 

0 

28 

12.5 

0 

0 

11.25 

13.25 

22 

0 

0 

0 

153 

167 

256 

48 

38 

7 

57 

15.9 

17.4 

26.6 

5 

3.9 

0.8 

6 

6.75 

6.25 

5.25 

8.25 

19.25 

19.25 

11.75 

30.25 

230 

23.9 

Small Group 

13.75 

34.25 

0 

0 

0 

Comp Demos 

0 

Lab Demos 

11.5 

Videos 

BTS 

Classroom 
Housekeeping 

0 

4.5 

16 

Science Content  

39.5 

10.5 

Relationship 
Building 

Tech Utility 

Sage Utility 

Tech Tshoot 

3 

6.5 

4.5 

0 

0.5 

0 

2.5 

27 

CS Modeling 

8.25 

17.25 

Model Output 

External Data 

Peer Review 

0 

2.5 

0 

Model Components  0 

3.75 

4.5 

3.5 

7 

18.5 

5.25 

8.25 

17.25 

9.5 

5.5 

237 

24.7 

61 

6.4 

23.75 

0 

2.25 

11.25 

28.25 

0 

0 

37 

9 

17.5 

38.5 

8.25 

24.25 

20.25 

5.5 

17.75 

17.75 

30 

0 

3.75 

21.25 

15.25 

28.75 

15.25 

59 

95 

100 

278 

78 

63 

99 

118 

160 

6.1 

9.9 

10.4 

29 

8.1 

6.6 

10.3 

12.3 

16.6 

Systems Thinking 

8.25 

8.75 

16.75 

27.5 

15.25 

Analyzing Model Output 

Mr. H first introduced the students to the simulation features within the context of a whole class 

critique of an anonymized student model. He began by pulling up a student model that had an undefined 

118 

 
 
 
 
 
 
relationship between temperature and the number of liquid particles (Figure 19). Mr. H then asked the 

students what is impacting the model beyond the initial model backbone (the transfer relationship 

between the number of liquid particles and the number of gas particles), to which they responded, “the 

temperature.” Mr. H then said,  

So, we have temperature affecting this.  Or does it? I am going to zoom out a little bit because I 

need to be able to click this one (the simulate button). (Mr. H clicks the simulate button). Since 

we should start with a lot of particles, I am going to keep that up there. (Mr. H keeps the slider on 

the number of liquid particles high). But you said temperature is a factor. So, I am going to be 

changing this line right here (slider bar for temperature) which means I am going to be changing 

the amount of heat or temperature and if we had a temperature factor that would mean that that 

would change other aspects of this model but look what happens. (Mr. H moves this slider bar up 

and down and nothing happens). (Relevant Categories: Whole Class Discussion, Computer 

Demonstration, SageModeler Utility, CS Modeling, Analyzing Model Output, Peer Review, 

Systems Thinking)  

Through this initial introduction, Mr. H showed the students how to use the simulate feature of 

SageModeler to identify an area of their model that needed to be changed (in this case, the students 

needed to define the relationship between temperature and number of liquid particles so that temperature 

can have an impact on model behavior). Given that the simulation feature is a built-in software scaffold 

for supporting students in analyzing model output, this is an example of Mr. H supporting students by 

highlighting technological scaffolds built into SageModeler. 

119 

 
 
 
 
 
 
 
Figure 19: Student Model with undefined temperature relationship 

After showing the students a few other examples of using the simulate feature to analyze model 

output, Mr. H made a strong case for why analyzing model output through using the simulation feature 

can help students improve their models faster. 

I am saying “make sure you run your simulation,” because sometimes when you run a 

simulation, you go “uh oh.” Because sometimes this doesn't work the way my mind said it should, 

the way my evidence in the back (referencing hands on experiments done in the back of the room) 

said it should, the way that all of the stuff I worked on, and my understanding said it should. So, 

take the time to change it. Play around with it and change. There is something that I want you 

guys to do in this unit and it will help you an infinite amount. I want you guys to fail faster. And I 

know you are looking like me like "did Mr. H say he wants us to fail?" No. I want you to fail 

faster. And what that means is I want you to throw those ideas down, make those connections, run 

that simulation, and say "Oh crud that's not working. Alright let's change this around." Because 

the more iterations and the faster you fail and the quicker you go through this process, the 

quicker that you will get your model into something you like instead of debating for 5 or 10 

minutes where you are going to connect heat. Instead of debating, connect heat, run the 

simulation, and change it. That is what I mean when I say, "fail faster". (Relevant Categories: 

Information Sharing, CS Modeling, Analyzing Model Output). 

120 

 
 
 
 
 
 
The idea of “failing faster” by making model modifications and testing them immediately through the 

simulation feature demonstrated Mr. H’s interpretation of the importance of analyzing model output as a 

primary mechanism for facilitating student testing and debugging. By making a strong case for having 

students frequently analyze their model output through simulations early in the unit, Mr. H aimed to 

encourage this testing and debugging behavior to help students improve their models throughout the unit. 

It is also important to note how this “fail faster” talk builds directly off his earlier demonstrations of how 

to use the simulation features embedded in SageModeler and adds additional context to underscore the 

importance of these earlier supports. As such this is an example of synergistic scaffolding. 

In addition to demonstrating how to use the testing and debugging features in SageModeler and 

arguing for the benefits of frequently using the simulation features to assess model behavior, Mr. H 

reinforced this practice throughout the unit in his discussions with small groups. In this example, the 

students are struggling to figure out where to go next with their model and call on Mr. H for assistance.  

Anna: So, we don't really know what to change this arrow to or how to change our model.  

Mr. H: Simulate.  

Tiffany: Should we mention the IMF (Intermolecular force)?  

Mr. H: Absolutely, if you think that it belongs there. See in the upper right-hand corner. The way 

that you make it a better model is that you click that simulate button. We want to fail faster. So, 

you have an idea of what each of those boxes should look like, right? How the graphs should look 

over time. 

Anna: Those are graphs?  

Mr. H: Yeah, those are graphs that depict what is happening to that variable overtime. So, is that 

overtime happening the way you think it should?  

Tiffany: No.  

Mr. H: So, play with the slider bars and see what happens. 

(Relevant Codes: Discussions with Small Groups, SageModeler Utility, CS Modeling, Analyzing 

Model Output, Model Components.) 

121 

 
 
 
 
 
Through this conversation, Mr. H not only showed these students how to use the simulation features to 

test model output but also reinforced key testing and debugging features by directly referencing his earlier 

“fail faster” informational talk, once again demonstrating synergy across multiple supports. Once these 

students have recognized that their model does not behave the way they think it should, Mr. H suggested 

that they “stop the simulation, change your relationships and see if you can make it work the way you 

think it should.” In this example, Mr. H provided individualized support on using the simulation features 

of SageModeler to interpret the model output. His support focused on the technical aspects of 

SageModeler (turning on the simulation features, recognizing how model output is displayed in the 

program) and the rationale behind analyzing and interpreting model output as a form of testing and 

debugging while still allowing students to freely draw their own conclusions from the model output and 

refraining from telling them to make specific changes to their models. This example further spotlights 

how Mr. H is building on earlier discussions and demonstrations, specifically his fail faster informational 

talk, to reinforce student learning. 

Analyzing External Data for Model Verification 

Early on in this unit, Mr. H tried to have students use experimental data to validate their models. 

After the students had finished an activity where they counted the number of drops of each liquid (water, 

acetone, and rubbing alcohol) that would fit on a penny, Mr. H asked them to input the data into a preset 

data table in the online learning program and use it to inform their modeling process.  

“First of all, when you take your data on this lab activity, before you segue into building your 

model, it is important that you get all three trials in. Because if you don't get all three trials in, 

you won't be able to get your average and do the thing I am about to show you. So once you get 

your isopropyl, acetone, and water data put that name of the liquid right here and what we are 

doing right now is figuring out how to put raw data from a lab into here so that with the other 

thing that we are going to be doing today, you guys are going to be able to start running some 

simulations and then you are going to see if you are going to get your numbers to match your 

122 

 
 
 
 
 
simulations.” (Relevant Categories: Computer Demonstration, SageModeler Utility, CS 

Modeling, Analyzing Model Output, External Data). 

Although Mr. H had aimed to have students use the raw data from this experiment as a means of model 

evaluation, most student groups were actively constructing their initial models at that time and were, 

therefore, not yet ready to consider using external data to validate their models.  

Later in the unit, Mr. H returned to focusing on having students make use of external data in the 

testing and debugging process. On December 8th (Day 9), the class period before students were expected 

to collect experimental data on how the temperature of acetone, rubbing alcohol, and water change during 

the process of evaporation, Mr. H reiterated the importance of using external data to validate student 

models. 

 Today is about “revision, revision, revision”. The goal by the end of this class is “Can we 

incorporate potential and kinetic energy into the models so that we can make it behave the way 

that we know it should.” The way we experienced it with the embodied model. The way that we 

saw with a lot of other simulations that we saw within the unit. Can we get this model to 

represent, as best as possible, what is going on so that on Monday, when we jump right in and we 

take lab data, when we put that lab data in on Thursday next week, we can see how awesome our 

models are. But those models have to be ready for when we hit Thursday. (Relevant Categories: 

Information Sharing, CS Modeling, External Data, Model Components). 

Through this short informational talk, Mr. H reiterated the importance of model revisions and sets the 

stage for students using external data from their laboratory experiments to drive testing and debugging. 

He also reminded students of prior classroom experiences that showcased different key aspects of the 

evaporative cooling phenomenon and should be influencing their models. Because Mr. H uses this talk to 

prepare students for the demonstrations of how to use SageModeler to compare model output with 

external data, it is also an example of synergy between scaffolds. 

After students had collected their experimental data on how the temperature of the three liquids 

changed during evaporation, Mr. H showed the students how to input the data into SageModeler. 

123 

 
 
 
 
 
We did 15 second intervals, I am not going to ask you for all of the data. I am just going to ask 

you for a little bit. You should have a zero point, so every 15 seconds . . .  You guys said it 

stopped (the acetone stopped decreasing in temperature) after 90 seconds? We will throw 105 in 

there. I will do the same for the isopropyl and the water. . . . All you are going to do to get to this 

data table is on page 5.2, go up here to tables and hit temperature vs. time. When you do that, 

this table will open up for you. (Relevant Categories: Computer Demonstrations, SageModeler 

Utility, CS Modeling, External Data). 

Once Mr. H had finished showing the students how to input the external data into SageModeler, he then 

showed them how to generate a graph from this external data, how to create a similar graphical output 

from their model output, and how to overlay these two graphs to compare their model output to real-world 

data (Figure 6C). As such, Mr. H supported students in using existing technological scaffolds that are 

important for students to be able to analyze external data and use said data to verify model output. His 

demonstration of these technological scaffolds for using external data to validate model output largely 

mirrors conversations during the PLC on how to support students with this practice. 

In addition to showing the students the mechanics of how to use SageModeler to compare model 

output to external data, Mr. H gave his students the task of using this data analysis to revise their models. 

Your goal today is to spend about half to two thirds of the remaining time to enter your data into 

5.2 and then play. . .. Your job is going to be to overlay your graph (model output graph) with 

this graph (external data graph) and then play. Try to figure out how to make things match. 

(Relevant Categories: Information Sharing, Classroom Housekeeping, CS Modeling, External 

Data). 

This direction asked students to spend much of the remainder of class time using the external data to drive 

model revisions and to improve their models so that it matches the experimental results. Across this set of 

computer demonstrations and informational talks, Mr. H used multiple modalities in a synergistic manner 

to better support students in analyzing external data to validate model output. 

124 

 
 
 
 
 
Just as Mr. H showed the class how to use SageModeler to compare model output data to external 

experimental data, he also helped individual student groups with this task. In one instance, the online 

learning management system accidentally did not save a group’s experimental data, leading Mr. H to 

troubleshoot.  

Tiffany: We can’t find our graph. I can’t find the graph we made in last class. Do I have to do all 

of this again?  

Mr. H: Well as long as. . .. Oh my. All of your data is not there either.  

Anna: Well, we have nothing apparently.  

Mr. H: Alright, so here is what I would like you to do. Did you try to copy over from the previous 

pages. . .. . If you can’t get it to work, just run your simulations. You know what that graph looked 

like every single time. So, you know what the graph should look like. So do some model revisions 

based on what you know it should look like. 

(Relevant Categories: Small Groups, Tech Troubleshooting, CS Modeling, Analyzing Model 

Output, External Data). 

In this instance, Mr. H first tried to help the students recover their data from an earlier page in the 

program. Once that option led to a dead end, rather than having the students go through the tedious task of 

reentering their data into SageModeler, Mr. H suggested a more efficient solution. By having students 

compare their model output to the exponential curve of temperature vs. time they remembered from the 

previous class period, Mr. H helped the students overcome the hurdle of repeating their past work, 

allowing them to more easily revise their models based on their experimental data. Although issues with 

the software forced the student to deviate from the designed technological scaffolds associated with 

verifying model output using external data, Mr. H was able to provide additional support so that the 

student could still complete the task while still building off the principles of this practice that he shared 

earlier in the unit. 

125 

 
 
 
 
 
 
 
Peer Review  

On November 21 (Day 5), Mr. H provided direct instructions on how to get the most out of the 

peer review process when reviewing computational systems models. At the beginning of the lesson, Mr. 

H introduced his students to a set of general model design guidelines (which were intended to help 

scaffold the model revision process by being a general checklist that students could use to help them 

identify aspects of their models that needed to be revised) and peer review guidelines. These peer review 

guidelines, which were reviewed by Mr. H and Mr. M during the PLC, included three general goals for 

peer reflections, and several sentence starters meant to help students respectfully provide constructive 

feedback to their peers and were projected in front of the whole class (Figure 20). 

Figure 20: Peer Reflection guidelines

As Mr. H had these Peer Review Guidelines displayed for the whole class, he began sharing his 

advice for peer review of student models.  

126 

 
 
 
 
 
 
 
“Today we are doing our first peer reflection. And when you do a peer reflection, what you are 

trying to do is . . .  First, leave your ego at the door. No model is perfect. Even with models that I 

create, Mr. X (the author) and Mr. M always look at them and say, “Why did you do this right 

here? How could we do this better?” Even my models aren’t perfect. Because there is no perfect 

model. There are only steps along the way. So, realizing that your model isn’t perfect, be open to 

feedback from someone else. Be open to “ooh why did you do this?” from someone else. Because 

these are the three things that are important inside of that peer reflection. You are trying to help 

your classmates refine your model, trying to make it as best as you can. You are trying to prepare 

yourself for the whole class discussion. Also, there are so many different ways that we can model 

the phenomenon that there is no correct way to create your model. Sometimes seeing it from a 

different perspective and how the phenomenon is being modeled helps us understand it better. 

When you have your discussion, use these (peer reflection questions) as sentence starter 

guidelines. Take notes inside of your lab book as you are looking at someone else’s model on 

things that look cool that you want to incorporate into your model or questions that you have for 

them, based on these five sentence starters.” (Relevant categories: Information Sharing, CS 

Modeling, Peer Review). 

Mr. H’s informational talk mirrors the peer reflection guidelines as they were discussed in the PLC (and 

were visually presented to the students), but his focus on the importance of humility in this process 

represents his personal interpretation of these guidelines. This brief informational talk on peer reviews 

provided Mr. H with an opportunity to emphasize the key goals of the peer review process (as defined by 

the curriculum), showcase the sentence starters (a built in curricular scaffold) designed to help scaffold 

student conversations around peer reflections, and communicate the importance of humility to getting the 

most out of peer reflections all in a manner that synergized with the curricular scaffolds (e.g. the peer 

review guidelines) present in the unit. By using both visual (the projection of the peer reflection 

guidelines) and auditory (the informational talk) media, Mr. H made use of multiple modalities to scaffold 

students with using peer feedback in addition to sharing existing curricular scaffolds with students. 

127 

 
 
 
 
 
Mr. H also spent a lot of time helping organize and troubleshoot the peer review process. After a 

behind the scenes effort to troubleshoot the peer-sharing features embedded into the online learning 

platform, Mr. H told the students how to navigate this feature. “So, you are going to go to page 3.4.  You 

are going to grab your boxes (model variables) and you are just going to shift your boxes around a little 

bit (move them slightly on the screen). You are not going to actually change anything about your model. 

But you are going to force it to save. Then you are going to click the little up arrow to share.” (Relevant 

categories: Information Sharing, Tech Utility, CS Modeling, Peer Review). This is another example of 

Mr. H supporting students in using existing technological scaffolding. 

 After providing instruction on how to use the peer sharing features, Mr. H identified student 

groups who were ready to share their models and organized these peer review sessions. “Esme, I want you 

to share out with Tiffany and Anna. So, Esme to Tiffany, Tiffany to Esme. . . Eric, I want you to share with 

Isabelle.” (Relevant categories: Small Group, Classroom Housekeeping, CS Modeling, Peer Review). 

These organizational and troubleshooting efforts were meant to help ensure that all students could get the 

most out of the peer feedback process. 

Mr. H also found opportunities to reinforce the need to reflect on peer feedback as he met with 

small groups as they were actively revising their models. 

Mr. H: So, what changes would you want to make to your model based on the comments you got.  

Morty: They were not very constructive; I hope we would have got some constructive criticism on 

our model. But yeah, I think it is pretty good.  

Mr. H: What do you want to change? How do you think you can make it better?  

Isabelle: I think there is another variable we could add to make this flow easier.  

Morty: I don’t really know exactly how Kinetic Energy affects this whole thing so if I understood 

more, it would probably be easier. (Relevant Categories: Small Groups, CS Modeling, Peer 

Review, Model Components, Systems Thinking). 

 With this conversation, Mr. H aimed to get the students to consider the feedback they received from their 

peers as a starting point for model revisions. As it appears that this group did not receive the most 

128 

 
 
 
 
 
sophisticated feedback from their peers, Mr. H further asked them to consider other aspects of their model 

that they can improve upon. Thus Mr. H managed to encourage students to reflect on peer feedback while 

also pushing them to consider other aspects of their models to revise.  

In another example, Esme and Lilly were unsure of what to do as they waited for their turn to use 

the laboratory equipment to test how the temperature changed over time. Mr. H suggested, “While you 

are biding time, if you want to go to your peer review model and make any modification, because I know 

you guys were in the middle of trying to turn your chain into a feedback loop.” (Relevant Categories: 

Small Groups, Classroom Housekeeping, CS Modeling, Peer Review, Systems Thinking). This 

redirection encouraged the students to return to making changes to their model based on peer feedback. 

By emphasizing the specific aspect of peer feedback that was most relevant to their model structure 

(creating a feedback loop), Mr. H helped simplify the overall task, which made it more feasible for these 

students to accomplish before they began collecting data to test their model behavior. Through his 

conversations with students, Mr. H reinforced key messages about peer feedback from his earlier 

informational talks and helped students use other scaffolds embedded into SageModeler software and the 

curriculum.  

Lastly, Mr. H supported students in the peer review process by having students practice 

evaluating peer models through whole-class model evaluations. In these evaluations, Mr. H displayed a 

student model anonymously and walked students through the process of critiquing said model.  

Mr. H: So here is the first one I want you to look at. Remember we don’t claim models, we look at 

them, analyze them, and offer feedback blindly to whomever’s model this is so they can make 

improvements. First of all, what is the very first thing you notice about this model? Does it have 

the ability to answer the modeling question? What does the final model output go to?  

Sam: Temperature  

Mr. H: So, the final model output goes to temperature. So, does this have the ability to answer the 

driving modeling question?  

Isabelle: Yes. 

129 

 
 
 
 
 
 
 (Relevant Categories: Whole Class Discussion, CS Modeling, Analyzing Model Output, Peer 

Review, Systems Thinking). 

In this initial part of the model evaluation process, Mr. H reaffirmed the goals of model evaluation. He 

then asked the students to consider the model output and how this relates to its ability to address the 

driving question of the unit (why do we feel colder when we are wet than when we are dry?). As this 

whole class evaluation continued, Mr. H asked his students to consider the input variables that are present 

in this model (by considering the number of manipulation bars/slider bars that are present in the model). 

Mr. H: How many manipulation bars are we going to have on this?  

Lilly 6  

Mr. H: 6? Where do you see six? Remember what always gets a bar. What always gets a bar? 

 Reese: The collector? 

 Mr. H: Yep collectors, what else?  

Carter: Anything that doesn’t have anything else feeding into it?  

Mr. H: Yeah, any that doesn’t have anything else feeding into it. So, l anything at the beginning of 

a chain will have bars as well. So how many bars are here?  

Fred: 3. 

(Relevant Categories: Whole Class Discussion, SageModeler Utility, CS Modeling, Peer Review, 

Systems Thinking). 

Through these lines of questioning, Mr. H demonstrated the sorts of questions that students should ask as 

they analyze the models of their peers as well as when reflecting on their own models. He also used these 

questions to support students with other key aspects of CS modeling, including analyzing model output 

and systems thinking, showing an ability to use a singular scaffold to support multiple learning goals. 

Summary 

Overall, Mr. H used synergistic pedagogical strategies to support students with the testing and 

debugging behaviors of analyzing the model output, analyzing external data for verifying model behavior 

and using peer feedback (Table 16). Across these three aspects, Mr. H provided targeted demonstrations 

130 

 
 
 
 
 
 
of the associated software tools and scaffolds and modeled the reasoning pathways necessary for students 

to engage in these aspects of testing and debugging. In particular, he showed the whole class how to use 

the simulation features of SageModeler to analyze model output and went in-depth on how to input 

external data into SageModeler and how to compare the model output with the external data to verify 

model behavior. He also used whole class model reviews to showcase the types of questions students 

should ask during peer review as well as other aspects of testing and debugging and CS Modeling, 

supporting multiple learning goals with a unified set of scaffolds.  

Mr. H also gave relevant informational talks on the rationale and importance for each aspect of 

testing and debugging. For analyzing model output, Mr. H encouraged students to adopt a strategy of 

frequently analyzing model output via the simulation features after making changes to their models so 

that they could more quickly identify flaws in their models or “fail faster” and more rapidly improve their 

models compared to discussing model structures without analyzing model behavior. Meanwhile Mr. H 

explained the peer reflection guidelines in a manner that emphasized that the goal of peer feedback was to 

help fellow students improve their models and to be exposed to different ways of constructing evaporative 

cooling models to gain insights into refining one’s own model. These informational talks often built on 

ideas previously introduced during his demonstrations of the mechanics of each feature of SageModeler 

(or in the case of “using external data to verify model output,” presage the later demonstration), building a 

cohesive and synergistic narrative for these testing and debugging practices. Many of these talks took 

place alongside demonstrations of visual scaffolds, with Mr. H making use of multimodality in his 

teaching. Finally, Mr. H reinforced the mechanics and rationale behind each of the three targeted testing 

and debugging behaviors in his direct interactions with small groups, building on prior discussions and 

demonstrations, showcasing synergistic scaffolding over time. 

131 

 
 
 
 
 
 
 
Table 16: Mr. H’s Pedagogical Strategies for Supporting Students with Testing and Debugging 

Testing and Debugging 
Behavior 
Analyzing Model Output 

External Data 

Mr. H’s Pedagogical Strategies 

•  Whole Class Demonstration of Behavior (Showing the simulation 

features) 
 Demonstration of Technological Scaffolds 
Informational Talk on Rationale for Behavior 

• 
• 
•  Direct Interactions with Small Groups to reinforce Mechanics 

and Rationale for Behavior. 

•  Whole Class Demonstration of Behavior (Inputting Data into 
SageModeler; Comparing Model Output with External Data) 

•  Demonstration of Technological Scaffolds 
• 
•  Direct Interactions with Small Groups to reinforce Mechanics 

Informational Talk on Rationale for Behavior 

and Rationale for Behavior 

Peer Review 

•  Whole Class Demonstration of Behavior (Whole Class Model 

Reviews) 

•  Demonstration of Technological and Curricular Scaffolds 
• 
•  Direct Interactions with Small Groups to reinforce Mechanics 

Informational Talk on Rationale for Behavior 

and Rationale for Behavior 

Research Question 2: How do these pedagogical strategies compare to those used by another teacher 

teaching the same secondary science unit? 

Based on my analysis of Mr. M’s Pedagogical Methods and Pedagogical Foci, there are 

differences and similarities in how Mr. M and Mr. H approached teaching the evaporative cooling unit 

(Table 17). Because Mr. M was absent on Nov 14 (Day 3) and Nov 28 (Day 6), I have removed those two 

dates from Mr. H’s data set to have a fair comparison of their Pedagogical Methods and Pedagogical Foci. 

With regards to pedagogical methods, Mr. H spent more class time sharing information through 

informational talks (16.8%) compared to Mr. M (7.7%). Additionally, Mr. M spent more class time 

having students partake in whole class discussions (25.2%) compared to Mr. H (15.2 %).  As far as 

pedagogical foci, Mr. M spent more time discussing science content with students, particularly as they 

revised their computational models compared to Mr. H (Mr. M, 35.6%; Mr. H, 23.6%). For all other 

pedagogical categories used to compare Mr. H and Mr. M, there were no noteworthy differences between 

the two teachers. 

132 

 
 
 
 
 
 
 
Table 17: Comparison between Mr. H’s and Mr. M’s pedagogical methods and foci 

Note that both teachers’ times and percentages are based on 800 minutes of class time as I eliminated all 

data points for Mr. H from Days 3 and 6 (when Mr. M was absent) for a fair comparison. 

Category 

Mr. H 
Total 

Mr. H 
Percent 

Mr. M Total 

Mr. M 
Percent 

Info Sharing 

134 

Whole Class 

121.75 

Small Group 

223 

Comp Demos 

24.5 

Lab Demos 

30.75 

Videos 

BTS 

Classroom 
Housekeeping 

7.25 

50.5 

193 

Science Content  

189 

54.75 

55 

75.5 

Relationship 
Building 

Technology 
Utility 

Sagemodeler 
Utility 

Tech 
Troubleshooting 

16.8 

15.2 

27.9 

3.1 

3.8 

0.9 

6.3 

24.1 

23.6 

6.4 

6.9 

9.4 

61.5 

201.25 

278.25 

10 

18.75 

4.5 

50.5 

166.5 

284.5 

57.75 

38 

48.5 

7.7 

25.2 

34.8 

1.3 

2.3 

0.6 

6.3 

20.8 

35.6 

7.2 

4.75 

6.1 

6.5 

87.25 

10.9 

52 

CS Modeling 

241 

30.1 

222.5 

27.8 

Analyzing 
Model Output 

70.5 

External Data 

58.5 

8.8 

7.3 

Peer Review 

94.25 

11.8 

7.8 

6.7 

9.8 

62.5 

53.25 

78.75 

133 

 
 
 
 
 
 
Although Mr. H and Mr. M spent a similar amount of time addressing CS Modeling, there are 

some key differences in the amount of time they spent supporting students with the three targeted testing 

and debugging behaviors in this unit (Table 18). Mr. H spent about the same amount of class time 

assisting students with analyzing model output (8.1%) as did Mr. M (7.8%) throughout the whole unit. 

However, Mr. H seems to have spent much more time supporting students in analyzing model output 

earlier on in the unit compared to Mr. M when he was focusing on pushing students to use the simulation 

features of SageModeler to help them identify flaws in their models through his “fail faster” approach. 

Additionally, Mr. H spent more time making connections between model output analysis and analyzing 

external data than Mr. M on Dec 15 (Day 11) when introducing students to inputting external data into 

SageModeler, whereas Mr. M spent more time having students use the simulation features to make sense 

of more complex model structures on Dec 19 (Day 12) as part of the final whole class model review. Mr. 

H and Mr. M spent roughly equal class time supporting students with analyzing external data (6.6% and 

6.7%, respectively). While Mr. H did present the rationale for this behavior a bit earlier than Mr. M, both 

primarily taught students how to input external data into SageModeler to verify their models on Dec 15 

(Day 11), the day after students collected data from the temperature vs. time experiment. Although Mr. H 

and Mr. M, in general, followed a common schedule for the unit and spent roughly equal time supporting 

students by using peer feedback (10.3% and 9.8 %, respectively), there are noticeable differences in when 

they conducted whole class model reviews. Because Mr. M was absent on Nov 14 (Day 3), he conducted 

a whole class model review on Nov 17 (Day 4) to help ensure that all students had a common backbone 

for their models. Likewise, Mr. H conducted a whole class model review on Dec 1 (Day 7), while Mr. M 

allowed extra time for students to work on revising their models and give feedback to their peers. Finally, 

Mr. M had a short whole class model review on Dec 8 (Day 9), prior to students conducting the 

temperature vs. time experiment on Dec 12 (Day 10) whereas Mr. H had his whole class review after 

students finished collecting data on Dec 12 (Day 10).  

134 

 
 
 
 
 
 
 
Table 18: Mr. H vs. Mr. M Pedagogy for Testing and Debugging Behaviors 

Please note that for this table, Mr. H’s time is calculated out of 960 total minutes whereas Mr. M’s time is 

calculated out of 800 minutes to account for Mr. M’s two absences. 

Mr. H 
Analyzing 
Model 
Output 
0 

Mr. M 
Analyzing 
Model 
Output 
0 

0 

4 

10.5 

17 

3.25 

4.5 

0 

3.75 

9 

20.25 

5.5 

78 
8.1 

0.5 

Abs 

6.25 

12.5 

Abs 

6 

0 

6.25 

0 

13 

18 

62.5 
7.8 

Nov 7  
Day 1 
Nov 10  
Day 2 
Nov 14 
Day 3 
Nov 17 
Day 4 
Nov 21 
Day 5 
Nov 28 
Day 6 
Dec 1 
Day 7 
Dec 5 
Day 8 
Dec 8 
Day 9 
Dec 12 
Day 10 
Dec 15 
Day 11 
Dec 19 
Day 12 
Total 
Percent 

Mr. H 
External 
Data 

Mr. M 
External 
Data 

Mr. H 
Peer 
Review 

Mr. M Peer 
Review 

0 

0 

4.5 

0 

0 

0 

0 

2.5 

4.5 

17.75 

30 

3.75 

63 
6.6 

0 

0 

Abs 

2.75 

0 

Abs 

0 

0 

1.5 

8.75 

40 

0.25 

0 

0 

0 

9 

0 

0 

Abs 

16.75 

13 

19.5 

4.75 

Abs 

29.75 

7.5 

0 

3.5 

0 

10.75 

17.75 

0 

0 

1.25 

21.25 

23 

53.25 
6.7 

99 
10.3 

78.75 
9.8 

Analyzing Model Output 

Mr. M’s overall approach to supporting students with the testing and debugging behavior of analyzing 

mode output was different from Mr. H’s pedagogical strategies. It is important to note that both Mr. H 

and Mr. M had their students complete a one lesson introduction to SageModeler one week before the 

start of this unit on Nov 3 (Day 0), which did include a brief introduction of the simulate features present 

in SageModeler. Mr. M did not provide students with an informational talk on the rationale for analyzing 

135 

 
 
 
 
 
 
 
model output as a means of expediting the model revision process in the same manner as Mr. H’s “fail 

faster” informational talk during the early part of this unit. The closest Mr. M came to addressing a 

rationale for analyzing model output comes when he contrasts diagrammatic models with computational 

systems modeling. 

“And they (diagrammatic models) are really useful for us being able to see our understanding of 

the situation, but they are limited as there is no feedback within the model. There is no simulation 

within the model to show you whether or not your model is accurately representing something.  It 

also doesn’t help you build understanding of how relationships fit together (Relevant Categories: 

Information Sharing, CS Modeling, Analyzing Model Output, Systems Thinking).”  

Here Mr. M listed the limitations of diagrammatic modeling from a system thinking perspective (lack of 

feedback structures, obscuring relationships between different variables in the system) and a testing and 

debugging lens. In particular, he remarked on how the absence of simulation features is a hinderance of 

diagrammatic modeling. While this informational talk did not directly endorse using simulation features 

to analyze model output, it suggested that the students should be using the simulation features present in 

SageModeler to see if their model is accurately representing the phenomenon.  

As with his presentation of an indirect rationale for analyzing model output, Mr. M provided 

whole class instruction on the mechanics of the simulation features of Sagemodeler that allow students to 

analyze model output that was imbedded in a discussion on aspects of systems thinking. During his first 

whole class model review on Nov. 17th (Day 4), Mr. M asked students to consider the structural meaning 

of “sliders” (the sliders on SageModeler variables that allow student to manipulate the relative amount of 

each initial variable in SageModeler). 

Mr. M: This is a dynamic model. In a dynamic model, do we set initial conditions by changing 

whole variables that way or do we set initial conditions by changing the sliders? 

Student A: Changing the Sliders:  

Mr. M: Yes, because all these little boxes (the model output boxes present in collector variables 

in SageModeler) are graphs overtime (show how the collector variables are changing overtime) 

136 

 
 
 
 
 
so in this situation it stays the same because there are no liquid particles to start off with. And so, 

we can’t have a variable that is stating an initial condition because the variables are changing 

throughout. (Relevant Categories: Whole class discourse, SageModeler Utility, Analyzing Model 

Output, Peer Review, Systems Thinking). 

In this discussion, Mr. M pointed out that the sliders represent the relative amount of each initial variable 

and therefore negate the need to have an additional variable as an initial starting condition. As such Mr. M 

indirectly communicated that the sliders allow students to manipulate the relative amount of each variable 

when they are analyzing model output. Mr. M also stated that “these little boxes are graphs over time, 

showing students how to look at model output in their models without directly addressing the rationale for 

analyzing model output. As the conversation continues, Mr. M discussed that students need to push the 

simulate button to access the slider bars by saying “I can give it a slider by pushing simulate” but never 

directly showed the whole class the mechanics of analyzing model output with the simulate features. This 

indirect approach to presenting both the mechanics behind analyzing model output strongly contrasts with 

Mr. H’s approach to walking students step by step through this process. While this indirect approach does 

not clearly point out the existing technological scaffolds to the same extent as Mr. H’s demonstration, Mr. 

M is supporting students with systems thinking through these discussions and demonstrations, thus 

creating some synergy between his scaffolding for analyzing model output and systems thinking. 

Additionally, the combination of visual and verbal components in these demonstrations shows a 

multimodal approach to supporting students in this practice, which parallels Mr. H’s pedagogical 

methods. 

Although Mr. M adopted an indirect approach towards communicating the rationale and 

mechanics for analyzing model output via the simulation features of SageModeler, he did support 

students in this testing and debugging practice through direct conversations with small groups. Early on, 

Mr. M directly told a small group that they need to “push the simulate button on the top” to analyze 

model output. In this way, Mr. M reinforced the procedural skills and technological scaffolds needed to 

generate model output in SageModeler in more direct manner than in his whole class discussions. In a 

137 

 
 
 
 
 
 
parallel to his whole class discussion, Mr. M had a discussion with students about the mechanics of 

analyzing model output based on the absence of the slider bar from a variable the students want to 

manipulate.  

Mr. M: Oh you want to know why it doesn’t have a slider. Well what is true about all of the ones 

that do have sliders and how they are connected?  

Student B: Nothing is going into them, only outgoing stuff? 

Mr. M: Because if there are inputs going in and outputs going out you might not get to set the 

amounts. (Relevant Categories: Small Groups, SageModeler Utility, CS Modeling, Analyzing 

Model Output, Systems Thinking). 

Here Mr. M assisted students with analyzing the output of their model by pointing out the structural flaws 

that prevent them from manipulating model input.  

In both instances, Mr. M used conversations with small groups to have more direct conversations 

about how to generate and interpret model output using SageModeler, better supporting them with 

analyzing and interpreting model output. This parallels Mr. H’s approach to supporting individual student 

small groups with Analyzing Model Behavior during the model revision process. It is also important to 

note that many of these small group conversations strongly parallel Mr. M’s earlier whole class 

conversation around the slider bars, therefore showing how these small group conversations reinforced 

earlier ideas addressed through informational talks and whole class discussions. Mr. M also tended to use 

the analysis of model output to prompt further discussions on specific structural issues in student models, 

such as the presence of unnecessary starter variables. This suggests that Mr. M aimed to have students use 

the testing and debugging behavior of analyzing and interpreting model output to start conversations on 

model structure and system thinking that were not as clear in Mr. H’s “fail faster” approach to this 

behavior. As such, Mr. M seems to be creating a sense of coherence between his scaffolds for analyzing 

model output and systems thinking in ways that differ from Mr. H’s approach. 

138 

 
 
 
 
 
 
 
 
External Data 

Mr. M had a similar approach to Mr. H for supporting students with analyzing external data to 

validate model output. Given the structure of the unit, both teachers tended to wait until the later part of 

the unit to begin introducing this aspect of testing and debugging to their students, in line with the 

occurrence of the major quantitative experiment (temperature vs. time) for this unit. As with Mr. H, Mr. 

M began laying out the rationale for data collection and its ultimate use for verifying student models on 

Dec 8 (Day 9), the class before the temperature vs. time experiment.  

Mr. M: And it is kind of important for us today to make sure we get our models working correctly 

with the flows of energy and the impact it has on things like temperature because next time we are 

going to be in lab and we are going to be quantifying the evaporation rates of these different 

substances and we are going to be overlaying it into our models and we are going to see how well 

our models actually work. (Relevant Categories: Information sharing, Classroom Housekeeping, 

CS Modeling, External Data).  

In this brief informational talk, Mr. M encourages the students to revise their models in preparation for 

the temperature vs. time experiment. In discussing the need to revise prior to the experiment, Mr. M 

emphasized the importance of having the best possible model possible prior to using external data for 

model validation, thereby elevating the value of the external data. Mr. M also conveyed that by 

overlaying the experimental data on top of student models, it will allow students to see how well their 

models reflect the real-world phenomenon. While not explicit in this informational talk, Mr. M heavily 

implied that external data is of higher value than student model output and if their model output does not 

match their external data, more revisions will be needed.  

In a discussion with a small group on Dec 12 (Day 10), after students had collected experimental 

data, but before they have input the data into SageModeler, Mr. M asked the students to consider making 

additional revisions prior to the next class when they will be more explicitly comparing model output to 

external data. 

139 

 
 
 
 
 
Do you think that there is anything else that you can add that will make it better line up with the 

experiment? If the answer to that is no, you should be fine for now. If the answer is yes, you 

should make those changes. (Relevant Categories: Small group, Classroom Housekeeping, CS 

Modeling, External Data). 

In this conversation, Mr. M told the students that their models should reflect real-world data and “line up 

with the experiment.” He then suggested that in preparation for using the quantitative data from the 

experiment to validate their models (which will happen in the following class session) that the students 

make changes to their models to better match their preliminary understanding of their experimental 

results. In doing so, Mr. M further emphasized the importance and rationale for the upcoming task of 

using external data to validate model output. 

Mr. M’s approach to introducing the students to the mechanics of inputting external data into 

SageModeler along with how to compare their model output data to the experimental data largely 

mirrored Mr. H’s approach, as both were influenced by the PLC. Mr. M began by having the students 

follow along as he demonstrates where to find the data tables prebuilt into the program for this unit. He 

then asked a student to share the general trend of their temperature vs. time data for acetone. 

Mr. M: Can somebody walk me through what the graph for acetone did?  

Student C: It went down sharply and then bounced back up and then went kind of straight.  

(Relevant Categories: Whole Class Discussion, Science Content, CS Modeling, External Data) 

Mr. M used this moment to point out that the “bounce back” happens after the acetone has completely 

evaporated and the thermometer returns to the temperature of the room. He goes on to state, “If you only 

have a certain amount of time for the acetone, you only need to go about as far for the other two 

(liquids),” thus informing the students that they only need to include enough data points to match the time 

for the low point of acetone (when all the acetone has evaporated) as any additional data would be 

superfluous.  

Mr. M followed this up with a multimodal presentation showing students how to input their 

experimental data into SageModeler using “dummy data.”   

140 

 
 
 
 
 
 
So I am going to go ahead and make some dummy data. So dummy data is what we use if we want 

to see how something works but we don’t want to bias our data by messing around with it that 

can cause us to make inappropriate conclusions off of it. (Relevant Categories: Computer 

Demonstration, SageModeler Utility CS Modeling, External Data). 

This use of “dummy data” differed from Mr. H’s use of actual student data to demonstrate how to input 

external data into SageModeler, but it did seem to encourage students not to draw “inappropriate 

conclusions” off this demonstration and instead focus on their own experimental data when they return to 

their own models. Mr. M then showed the students how to create graphs of both the external data and 

model output data and how to make comparisons between them, demonstrating the technological 

scaffolds necessary to use external data to validate model output data. 

I can change the background color to transparent and I can actually overlay it and you can see 

through. Now I can compare the data and ask, “Do these two match?” And if the answer is yes, 

you can say this is all good. But if not, our model maybe doesn’t work as well as we think it does 

and we need to make some adjustments to it. (Relevant Categories: Computer Demonstration, 

SageModeler Utility, CS Modeling, Analyzing Model Output, External Data). 

Mr. M used this opportunity to reinforce the rationale behind analyzing external data by stating that the 

aim of this exercise is for student model output to match experimental data. As with Mr. H, Mr. M further 

advocated that students needed to make changes to their models so that they would match experimental 

data, thereby giving students a clear rationale for using external data to validate model output as a testing 

and debugging behavior. This discussion thus reinforced earlier talking points from Mr. M on the 

importance of this testing and debugging practice and therefore is evidence of synergy between Mr. M’s 

different supports for this practice. 

After demonstrating how to use external data to validate model output, Mr. M set out to help 

individual student groups that needed further assistance with this aspect of testing and debugging. Once a 

student group had finished inputting their experimental data into SageModeler they asked Mr. M to come 

over to validate their progress. 

141 

 
 
 
 
 
 
Student D: Does this look okay? 

Mr. M: Don’t show me the table, show me the graph. The graph is the important bit. You already 

recorded your data right?  

Student D: Yes.  

Mr. M: Ok. Now make a separate graph. Now drag the variables from the graph on to wherever 

you want. Now you can spread out the graph. That might be good enough because you can tell 

which one is which. 

(Relevant Categories: Small Groups, SageModeler Utility, CS Modeling, External Data). 

In this example, Mr. M assisted this student group by walking them through the process of making two 

sets of graphs (one for external data, the other for model output data) to allow for a side-by-side 

comparison of model output with external data. By helping students with using the technological 

scaffolds present in SageModeler, Mr. M reinforced his earlier demonstration of these features. 

Later, Mr. M returned to this same group to help them with making revisions based off their data 

analysis. Once students recognized that their model output shows the hand temperature staying constant 

even though their experimental data shows a decrease in temperature due to evaporation, Mr. M 

suggested that they revise that part of their model. 

Mr. M: But should your hand temperature stay the same? 

Student D: No. 

 Mr. M: So, through this data investigation, we have discovered that your model needs a revision 

in the hand temperature department. 

Student E: Yep. 

(Relevant Categories: Small Group, CS Modeling, Analyzing Model Output, External Data, 

Model Components). 

In this example, Mr. M reinforced the main rationale for analyzing external data (to revise models so that 

the models match experimental data and thus better represent real-world phenomenon) and encouraged 

the students to make further revisions to their models accordingly. 

142 

 
 
 
 
 
In another conversation with a student group, Mr. M reiterated that part of the purpose of 

analyzing external data is for students to compare their models with experimental data. 

Student F: When you did the simulation what were your collectors named?  

Mr. M: It should be your model. You shouldn’t be using my model because my model wasn’t good and 

didn’t really reflect it (the phenomenon). The whole point is to find out if your model matches your data. 

(Relevant Categories: Small group, CS Modeling, External Data). 

In this example, the student appeared to be trying to change his model to match that shown by Mr. M 

during the whole class demonstration. To help the student refocus on using their experimental data to 

improve their own model, Mr. M stated that the model he showed on the board was flawed and that the 

student should not replicate it. This enabled the student to refocus on the task and continue to revise their 

own model rather than recreating a different flawed model.  

  In addition to using small group conversations to assist students with the mechanics and 

rationale behind analyzing external data to validate model output, Mr. M also took opportunities to help 

further student understanding of the underlying science content they are attempting to model. When a 

student asked Mr. M why it was important to include “time zero” when inputting his data into 

SageModeler, Mr. M gave an explanation that also references some key ideas from the phenomenon 

itself. 

Time zero? Time zero tends to be important because you want them to all start at the same point. 

They should all start at the same temperature. Also, the biggest drop tends to be at the beginning, 

when the liquid is evaporating fastest, so if it is absent, you are missing an important part of the 

phenomenon. (Relevant Categories: SageModeler Utility, Science Content, CS Modeling, 

External Data). 

Here Mr. M explained that from an experimental standpoint, time zero is a critical moment as it is the 

point where all three liquids are at room temperature. He also stated that because the liquid is evaporating 

fastest at the beginning of the experiment and that this drop is “an important part of the phenomenon,” the 

student needed to include it when inputting data into SageModeler. By suggesting that the student include 

143 

 
 
 
 
 
 
 
“time zero” in the data set they are inputting into their model, Mr. M helped them have more accurate 

external data to validate their model output. Additionally, by discussing the importance of the “big drop” 

Mr. M highlighted the exponential decrease in temperature that is a key part of the phenomenon of 

evaporative cooling, potentially furthering that student’s science knowledge. As such, Mr. M used these 

small group conversations as opportunities to simultaneously address using external data to validate 

model output and key science content associated with evaporative cooling. 

Peer Feedback 

Mr. M’s approach to supporting students by using peer feedback strongly mirrored Mr. H’s 

approach and paralleled discussions from the PLC. On Nov 17 (Day 4), Mr. M had his first whole class 

model review session with his students. As with Mr. H, Mr. M used these whole-class model review 

sessions to show the perspective and positionality students should take when reviewing peer models and 

the questions they should ask each other. At the beginning of this process, Mr. M illustrated the overall 

disposition students should have towards the model review session and peer feedback in general.  

When we look at the models, we are going to look at them in an anonymous way. You don’t claim 

one as yours. And we are going to be talking about what the model is showing, strengths of the 

model things that can be improved, things that are missing. And we are going to try to do that in 

a constructive way. And when we do that, remember that it is worth writing down some ideas and 

if there is one group that did something that you want to incorporate that you write that down 

too. We aren’t just analyzing one group’s model just for them. It is for all of us together. 

(Relevant Categories: Information Sharing, CS Modeling, Peer Feedback). 

Here Mr. M pointed out that the goal of model reviews is not only to suggest improvements to these 

models, but to identify strengths that can be used to further improve one’s own model. By emphasizing 

the importance of learning from other models as a key aspect of the peer feedback process, Mr. M built a 

deeper rationale for using peer feedback and encouraging students to see all models as potential 

inspirations for further improving their own models. These key points strongly parallel the informational 

144 

 
 
 
 
 
 
 
talk given by Mr. H where he encouraged students to use the peer feedback process as an opportunity to 

gather ideas from other groups and explore new ways to represent key aspects of the phenomenon in their 

models.  

In addition to building a rationale for peer reviews, Mr. M showcases the types of questions 

students should ask during future peer feedback sessions through these whole class model reviews. For 

example, Mr. M asked the students to identify the strengths of a peer model. 

Mr. M: Is there anything else that is strong about this model.  

Student B: It is easy to read.  

Mr. M: It is easy to read, why is it easy to read?  

Student B: Because there are only four things we need to think about.  

Mr. M: Does just adding more stuff into your model necessarily make it better?  

Student C: No 

Mr. M. Exactly. We only want to include things that are actually impacting the phenomenon. 

(Relevant Categories: Whole Class, CS Modeling, Peer Review, Systems Thinking). 

Here Mr. M demonstrates the use of an open-ended question (What is strong about this model?) that can 

spark a deeper conversation about several aspects of the models (in this case, their efficient simplicity). 

He also shows the importance of follow-up questions in the model review process and uses this as an 

opportunity to discuss a key Systems Thinking principle (more complex representations of phenomena 

with a higher number of variables is not necessarily a better model). 

Paralleling Mr. H’s pedagogical strategies, Mr. M shared the peer reflection guidelines. These 

peer reflection guidelines were included in the evaporative cooling curriculum as a scaffold for 

reinforcing the main goals of peer review and for sharing some questions students can use during the peer 

review process. As such these guidelines were reviewed during the PLC and both Mr. H and Mr. M were 

highly encouraged to present these to their students as a means of scaffolding the peer review process. 

Mr. M began by projecting the peer review guidelines in front of the whole class. He then reviewed the 

three main goals of peer review as written in the peer review guidelines: helping other students improve 

145 

 
 
 
 
 
 
 
their models, preparing for whole class model reviews, and gaining insights from peer models to improve 

One’s own model. 

When you pair up, your goal is to help them refine their models so that their models actually 

match the phenomenon. You should familiarize yourself with how other people have been able to 

model these things so that you are ready for a whole class discussion. And you should see that 

are potentially multiple different ways to model this phenomenon. So, what we are doing is that 

you are not trying to tell them, ‘This is how you build your model.’ You are trying to give them 

tips for how they can build their model better. (Relevant Categories: Info Sharing, CS Modeling, 

Peer Feedback). 

As with Mr. H’s informational talk on the peer review guidelines (and the guidelines themselves), Mr. M 

also emphasized that peer feedback should not be centered on telling other students how to build their 

models or trying to make a peer’s model conform to your expectations of an ideal model. Instead, Mr. M, 

like Mr. H, advocated for students to offer suggestions and help to improve their peer’s models in a 

manner that preserved the unique strengths of the original model. 

Mr. M further supported students in giving generative and supportive feedback by sharing key 

examples of the sort of questions he wants students to ask each other during peer feedback. 

Instead of saying “You shouldn’t have this variable.” Or “You need this variable, it’s not 

included” Ask them questions like, “How do you include this variable?”, “How does this 

variable impact the rest of your system?” “What makes you think that this variable is necessary 

to include?” Ask them questions like that. Ask them about the shape of the graph. But don’t be 

like “You should do, this, this and this.” (Relevant Categories: Information Sharing, CS 

Modeling, Peer Review). 

In this part of his informational talk, Mr. M provided several strong examples of generative questions that 

students can ask during peer review and the types of questions/comments they should avoid. Mr. M 

advised the students to refrain from using judgmental language (i.e., “You shouldn’t have this variable”) 

and instead encouraged students to use generative questions that allow for further discussion. Although 

146 

 
 
 
 
 
these questions paralleled those found on the peer reflection guidelines, these example questions were 

independently crafted by Mr. M. These generative questions aimed to help students be less defensive 

about the peer feedback they received and be more likely to revise their models; therefore, these questions 

were meant to also benefit the students receiving feedback. Additionally, these generative questions 

(along with those embedded in the peer reflection guidelines) also were designed to allow students to 

share their reasoning behind their model design choices and, therefore, can facilitate deeper discourse 

between students. Through sharing these questions and the peer review guidelines, Mr. M aimed to help 

the students who gave feedback to ask more meaningful questions and to help the students who received 

the feedback have more meaningful conversations during the process.  

As with Mr. H, in addition to whole class instruction, Mr. M supported students by using peer 

feedback through his interactions with small groups as they were revising their models. Mr. M spent a 

substantial amount of time organizing students into group dyads so that they could give and receive 

feedback from other groups. During his routine check-ins with different student groups, he asked them if 

they were ready to share their models to receive and give feedback to another group.  

Mr. M So, you are ready to a share out your models?  

Student A: We are sharing out our models?  

Mr. M: With another group. Right.  

Student F: Sure?  

Mr. M: Or not are you not ready for it?  

Student A: I mean I am technically ready. So yes, we are ready. 

(Relevant Categories: Small Groups, Classroom Housekeeping, CS Modeling, Peer Feedback). 

Upon recognizing that these students were ready to receive peer feedback, Mr. M helped arrange another 

group to meet with them for peer review. 

Mr. M: Hey Student G and Student H would you be willing to come look at Student A’s model 

because there is an odd number of groups? (Relevant Categories: Small Groups, Classroom 

Housekeeping, CS Modeling, Peer Feedback). 

147 

 
 
 
 
 
As such, Mr. M created an environment where student groups could meet to provide peer feedback as an 

aspect of testing and debugging. 

Beyond facilitating peer feedback between student groups, Mr. M also helped students with 

interpreting the models of their peers, another key goal of peer feedback. In this example, as the students 

are initially reviewing a peer’s model (prior to a more in-depth conversation with this other group), Mr. M 

asked the students if the model makes sense to them. 

Mr. M: Does their model make sense?  

Student C: Yeah, but there’s this part.  

Mr. M: Are there problems with it? 

 Student D: We had a question on the size of liquid droplets. We didn’t see how it affects the rate 

of evaporation.  

Mr. M: Go ask them. (Relevant Categories: Small Groups, CS Modeling, Peer Feedback, Model 

Components). 

Here the students recognized an aspect of the other group’s model that they question. However, the 

students were a bit uncertain as to whether they should immediately ask the other group to explain their 

reasoning behind the relationship between the size of liquid droplets and the amount of liquid particles 

(which in turn undergo evaporation) or wait until the other group is done reviewing their model. To 

encourage further group discourse, Mr. M went ahead and asked the other group to explain their 

reasoning about this relationship. 

Mr. M: Hey Student B. What’s the deal with the droplet size? 

 Student B: The droplet should set the initial value of the amount of liquid particles.  

Mr. M: They are saying that the droplet size determines the initial value.  

(Relevant Categories: Small Groups, CS Modeling, Peer Feedback, Model Components). 

By asking Student B to explain his reasoning behind this relationship, Mr. M modeled the sort of 

questions this student group should be asking their peers during the peer feedback process. It also had the 

immediate impact of providing the students with an interpretation of this model structure/component. The 

148 

 
 
 
 
 
students were then able to offer better feedback to their peers and have a deeper conversation around the 

relationship between the size of liquid droplets and the rate of evaporation. 

Summary 

There are many differences and similarities and how Mr. H and Mr. M approached teaching the 

evaporative cooling unit and in how they supported students with the three targeted aspects of testing and 

debugging. Due to differences in class sizes, Mr. H spent more time on informational talks and less time 

on whole class and small group discussions compared to Mr. M. In contrast, Mr. M found more 

opportunities to have small group discussions and have more conversations about science content 

compared to Mr. H. While both teachers spent about the same amount of class time on all three aspects of 

testing and debugging, Mr. H provided a more explicit rationale for analyzing model output in his “fail 

faster” informational talk compared to Mr. M. Likewise, Mr. M gave students more time to analyzing 

model output in his final round of whole class model review than Mr. H. As for analyzing external data 

for validating model output and using peer feedback, Mr. M’s pedagogical strategies largely align with 

those used by Mr. H with only modest differences in their approaches to supporting students with these 

two behaviors. Both Mr. H and Mr. M showed synergy in how they supported students with testing and 

debugging. Mr. H and Mr. M both used informational talks and whole class discussions to highlight 

existing curricular and technological scaffolds, often using multimodal presentations to do so. 

Additionally, both teachers often used small group conversations to reinforce ideas previously covered in 

a whole class environment. While they both found opportunities to cover multiple learning goals within a 

unified context (e.g., addressing both analyzing model output and using feedback through the practice of 

whole class model critiques), Mr. M was more likely to directly incorporate science content related to 

evaporative cooling and systems thinking in his efforts to support students with testing and debugging 

practices. 

149 

 
 
 
 
 
 
 
 
Research Question 3: What pedagogical strategies correlate with student testing and debugging 

behaviors in this secondary science unit? 

Based on student screencasts, evidence shows students utilizing all three target aspects of testing 

and debugging (i.e., analyzing model output, analyzing and using external data, and using feedback and 

peer reviews) (Table 19). Once again, it is important to note that because very few of Mr. M’s students 

agreed to be screencasted, we are unable to include student data from his class to address this research 

question. While these student behaviors in general seem to parallel Mr. H’s instructional patterns, with 

student’s using specific approaches to testing and debugging soon after it is discussed by Mr. H, this is 

not a universal pattern amongst the five screencast groups. It is also important to note how the general 

course of the unit also impacted student testing and debugging. For example, students for the most part 

did not begin building their models in SageModeler until November 14th (Day 3), were focused on 

completing the learning modules on November 17th (Day 4) and December 5th (Day 8) and spent most of 

December 12th (Day 10) collecting experimental data outside of SageModeler. While these tasks were 

important for creating a context for later testing and debugging and the overall goals of the unit, the 

students were largely unable to test and debug their models on these dates. Given the complexity of these 

data, I have combined my semi-quantitative analysis with a narrative analysis to further illustrate the 

correlation between Mr. H’s pedagogical strategies and student testing and debugging behaviors for each 

of the three target aspects of testing and debugging. 

150 

 
 
 
 
 
 
 
Table 19: Summary Table of Student Testing and Debugging Behaviors 

Note that data from all five screencast groups has been aggregated in this table. The percentages for this 

table are calculated out of 4,800 minutes (960 total minutes of class time multiplied by five groups). For 

Mr. H, we used the amount of time Mr. H spent supporting students with the respective behavior. 

Nov 
7(1) 

0 

Nov
10 
(2) 
0 

Nov 
14(3) 

Nov 
17(4) 

Nov 
21(5) 

Nov 
28(6) 

Dec 
1(7) 

Dec 
5(8) 

Dec 
8(9) 

4 

0.5 

46.75 

13.5 

14.25 

0 

17. 
25 

Dec 
12 
(10) 
0.5 

0 

0 

4 

10.5 

17 

3.25 

4.5 

0 

3.75 

9 

0 

0 

0.25 

0 

6.5 

1.25 

0 

0 

2 

0 

Dec 
15 
(11) 
12. 
75 

20. 
25 

86. 
75 

Dec 
19 
(12) 
15.5 

Total  % 

125 

2.6 

5.5 

78 

8.1 

115 

2.4 

18.2
5 

0 

0 

4.5 

0 

0 

0 

0 

2.5 

4.5 

17. 
75 

30 

3.75 

63 

6.6 

0 

0 

10.5 

0 

33.25 

3 

13.75 

0 

6.5 

0 

4.25 

5.5 

76.75 

1.6 

0 

0 

0 

9 

12 

4.75 

29.75 

0 

3.5 

0 

17. 
75 

21.2
5 

99 

10.3 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.25 

0.5 

20 

9.25 

4.5 

1.5 

2.25 

0 

0.25 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

12.25 

2.25 

2.5 

12.5 

1.5 

3.75 

2 

0.5 

3.5 

2.5 

1.25 

4 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

7.5 

0.5 

5 

60.25 

1.3 

12.7
5 

5.25 

0 

6.25 

1.5 

31.5 

0.7 

4.5 

0 

0 

1.25 

25.75 

0.5 

1.25 

0 

1.5 

0 

8.75 

0.2 

0.5 

0 

0 

1 

5.5 

0.1 

1.5 

0 

0 

0 

0 

0 

76. 
75 

10 

0 

9.25 

91.5 

1.9 

8 

0 

18 

0.4 

0 

0 

Category 

Student 
Analyzing 
Model 
Output 
Mr. H 
Analyzing 
Model 
Output  
Student 
Analyzing 
External 
Data  
Mr. H 
Analyzing 
External 
Data 
Student 
Feedback 
and Peer 
Review 
Mr. H 
Feedback 
and Peer 
Review 

Analyzing 
Model 
Output L1 
Analyzing 
Model 
Output L2 
Analyzing 
Model 
Output L3 
Analyzing 
Model 
Output L4 

Analyzing 
External 
Data L1 
Analyzing 
External 
Data L2 
Analyzing 
External 
Data L3 
Analyzing 
External 
Data L4 

151 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Table 19 (cont’d) 

Nov
10 
(2) 
0 

0 

0 

0 

Category  Nov 
7(1) 

0 

0 

0 

0 

Feedback 
and Peer 
Review 
L1 
Feedback 
and Peer 
Review 
L2 
Feedback 
and Peer 
Review 
L3 
Feedback 
and Peer 
Review 
L4 

Nov 
14(3) 

Nov 
17(4) 

Nov 
21(5) 

Nov 
28(6) 

Dec 
1(7) 

Dec 
5(8) 

Dec 
8(9) 

Dec 
12(10) 

0.25 

0 

12 

0.5 

2 

0 

0.75 

0 

Dec 
15 
(11) 
2.5 

Dec 
19 
(12) 
1.5 

Total  % 

19.5 

0.4 

10.25 

0 

6 

2.5 

9.5 

0 

5.25 

0 

0 

4 

37.5 

0.8 

0 

0 

0 

0 

6.75 

0 

2.25 

0 

0.5 

0 

1.75 

0 

11.25 

0.2 

8.5 

0 

0 

0 

0 

0 

0 

0 

8.5 

0.2 

Analyzing Model Output 

Although Mr. H briefly showed students how to use the simulation feature of SageModeler on 

November 17th (Day 4), the students generally did not use these simulation features to analyze model 

practice prior to November 21st (Day 5). In total, only 4.5 minutes were spent analyzing model output 

across all five groups before November 21st (Day 5). One student group (Group 1) stumbled across the 

simulation features on November 14th (Day 4; Figure 21). While they briefly used the simulation features 

to make sense of their model, correctly interpreting that their evaporation rate was constant, they did not 

make subsequent changes based on this analysis. Many of the other students were actively focused on 

constructing their initial models or learning about IMF through the learning modules and therefore largely 

ignored the simulation features on November 14th (Day 3) and 17th (Day 4).  

152 

 
 
 
 
 
 
 
 
 
Figure 21: Student use of Simulate Features 

On November 21st (Day 5), Mr. H discussed the importance of analyzing model output via the 

simulation feature in his “fail faster” informational talk. This informational talk and his subsequent 

conversations with small groups seems to have been a major catalyst for getting students to begin 

regularly using the simulate feature of SageModeler to analyze model output. On November 21st (Day 5), 

the five groups collectively spent 46.75 minutes analyzing and interpreting model output through the 

simulation feature. While most subsequent days showed a more modest use of this practice (13 to 16 

minutes a day, with December 5th (Day 8) and 12th (Day 10) being an exception due to being content 

focused and experimental focused days respectively), students remained consistent in their use of this 

testing and debugging strategy for the remainder of the unit. 

Analyzing External Data 

There were few examples of students using external data to directly drive model revisions early in 

the unit, with only 10 minutes of this practice across all five groups prior to December 15th (Day 11). This 

corresponds with the little direct support for this practice provided by the curriculum and Mr. H at the 

beginning of the unit along with it being a more advanced aspect of testing and debugging more 

commonly used towards the end of a unit. However, the general principle that their models should reflect 

their real-world experiences (which Mr. H did reference throughout the early part of this unit) is reflected 

in some student conversations justifying certain parts of their models. During a peer review on November 

153 

 
 
 
 
 
 
 
 
21 (Day 5), Isabelle uses her experiences with the initial phenomenon (comparing how water, rubbing 

alcohol, and acetone feel as they evaporate off human skin) as external data to justify her model’s 

relationship between evaporation speed and temperature felt on hand.  

Isabelle: So I used “about the same” for the evaporation speed and temperature on the hand 

because, if it evaporates faster like the alcohol and the acetone, the temperature felt really cold. 

But with the water, it evaporated slower but it didn’t feel that much colder. So I just did, I 

changed it to “about the same” because it evaporates at about the same rate if that made sense. 

(Using External Data Student Level 2) 

Here we can see the students apply the general principle that their models should reflect real world data to 

the qualitative data they collected as Isabelle uses her experiences with evaporative cooling to justify this 

relationship in her model. 

Once students have collected external data, they spent a substantial amount of class time inputting 

these data into SageModeler. As the average student spent between 10 and 25 minutes inputting the 

experimental data from the temperature vs. time lab into SageModeler on December 15th (Day 11), the 

total class time spent analyzing external data was 86.75 minutes for all five groups on this date. While this 

does correspond with Mr. H’s efforts to get students to compare their models to real world data, little time 

was spent using said data to make meaningful comparisons with model output. Because quantitative data 

comparison with model output is directly equivalent to level 3 behavior for this practice, sufficient 

evidence shows that students spent a total of 18 minutes across December 15th and 19th (Day 11 and Day 

12 respectfully) meaningfully using these external data to validate their CS models. After inputting the 

experimental temperature vs. time data into SageModeler, Morty and Isabelle try to line up their model 

output data (in blue) with their experimental data (three lines in orange) (Figure 22). 

Isabelle: You might want to try lining it up, but that is not even close.  

Morty: It looks close to me.  

Isabelle: It’s not lined up Morty.  

154 

 
 
 
 
 
Morty: (after spending some time trying to line the two graphs up: It still isn’t matching any of 

them. Maybe it matches water. . ..  See it is not horrible.  

Isabelle: Yeah, but we can fix this (points to their model). (Using External Data Student Level 3) 

Figure 22: Student comparison of experimental and model data 

By comparing their model output with real world data (as Mr. H had expressly requested of them), Morty 

and Isabelle figured out that the linear nature of their model’s temperature vs. time graph did not line up 

with the exponential graph present in their experimental data and therefore were encouraged to make 

further revisions to their CS model. 

Peer Review 

Although Mr. H did not address peer review of SageModeler Models until November 17th (Day 

4), with an in-depth discussion on November 21st (Day 5) there is evidence of students receiving and 

utilizing meaningful feedback from their peers as early as November 14th (Day 3), the first day of building 

their SageModeler Models. Tiffany and Anna had an opportunity to look at the diagrammatic model of 

Carter, Sam, and Fred. From this experience, Tiffany and Anna concluded that the water particles in 

liquid form were moving slower than the water particles that had evaporated (note this is not an accurate 

representation of the phenomenon). Thus, when they were writing down a justification for the transfer 

155 

 
 
 
 
 
 
relationship from number of liquid particles to number of gas particles they wrote, “because of another 

person’s drawing, it showed that the water particles were going slowly versus the evaporation speed, 

which was significantly faster.” (Using Peer Feedback, Level 2) Likewise, Carter, Sam, and Fred also 

borrowed from other student models in the initial construction of their own SageModeler model. When 

making a justification for their transfer relationship from number of liquid particles to number of gas 

particles Carter said, “I am going to write that down in here and do what he (Morty) said.” James 

subsequently wrote down, “There is an increase of speed of the particles which changes the liquid 

particles to gas particles.” Using Peer Feedback, Level 2) Both examples show that students were 

already cooperating and borrowing ideas from each other without formal direction from Mr. H in this 

unit.  

Once Mr. H provided students with more explicit scaffolding in the form of peer review 

guidelines on November 21st (Day 5) and organized the students into peer review dyads and triads, the 

amount of time spent in peer review increased substantially to 33.25 minutes. Student screencasts also 

show that students were explicitly using the peer review prompts to scaffold their analysis of these 

models, such as when Morty and Isabelle were looking at Reese and Eric’s model (Figure 23).  

Isabelle: (reading from the peer review prompts) ‘So I was wondering why you included blank. 

How does that help explain the phenomenon?’ Do we have any questions as to why he included 

something?  

Morty: Why does the IMF decrease?  

Isabelle: Oh, fix this (She moves the sliders, so they are even and then she continues to move the 

IMF slider) It doesn’t really change it a lot though.  

Morty: Temperature does though. 

Isabelle (to Eric): I just was wondering why the IMF of the liquid doesn’t change it much, but it 

might not have much to do with it.  

Eric: Well, I have temperature to be exponentially increasing. So, the lower down the temp is, the 

less impact the IMF will have. 

156 

 
 
 
 
 
(Analyzing Model Output, Student Level 3, Using Peer Feedback, Student Level 3) 

In this conversation, Isabelle is directly using the peer review guidelines that Mr. H shared with them to 

help them identify an area of Reese and Eric’s model that they think needs further discussion. They then 

utilize the simulation features of SageModeler to test the model behavior before sharing their conclusion 

that the IMF “doesn’t really change it (evaporation rate) a lot.” Eric is then able to defend their model by 

explaining that temperature is having a more dramatic effect than IMF, meaning that it is hard to see the 

impact of IMF on evaporation when temperature is low. Note that while temperature does impact the rate 

of evaporation (with a higher initial temperature leading to a higher rate of evaporation), Reese and Eric’s 

model is missing a critical feedback loop showing how temperature is decreasing as the liquid evaporates. 

Figure 23: Reese and Eric’s pre-peer review model 

Later in this model review, Isabelle and Morty recommend that Eric include an evaporation speed 

variable so that they can model how evaporation is affecting the temperature felt on hand (to answer the 

driving question more directly). After redesigning their model Eric shares their revisions with Isabelle and 

Morty (Figure 24). 

Eric: The faster it evaporates, the colder it feels. Yeah. I just completely copy pasted the things 

that affected evaporation (temperature and IMF) and just entirely made an evaporation speed on 

157 

 
 
 
 
 
 
 
its own. Completely not touching anything because I tried it the other way (having evaporation 

speed come out of the evaporation rate) and completely messed it up.  

Isabelle: That is kind of what we did too. Instead of making the valve our evaporation we made 

evaporation its own box and then connected that back into the valve. 

(Using Peer Feedback, Student Level 4). 

This conversation demonstrates how the students were able to use the peer review process to make 

meaningful changes to their models. It also shows students sharing their ideas with each other in an 

iterative manner as Eric took Morty and Isabelle’s suggestions, made changes to their model and then 

discussed those revisions with Morty and Isabelle to complete the revision and feedback cycle. 

Figure 24: Reese and Eric’s Post-Peer Review Model 

Although Mr. H did intend for students to engage in peer review towards the end of the unit, 

students only collectively spent 9.75 minutes of the last three days of class sharing, receiving, or utilizing 

peer feedback to drive model revisions. The whole class model reviews during these last few class periods 

did take up a significant amount of class time. While they theoretically could have helped scaffold later 

158 

 
 
 
 
 
 
peer review discussions, the upcoming winter break set a hard deadline on this unit. As such, Mr. H was 

unable to add an extra day or two to allow for a final round of peer review and peer feedback. Instead, 

student efforts during these last days were focused on inputting experimental data from the temperature 

vs. time experiment and on revising their models to better match these experimental data.  

Overall, these data demonstrate a correlation between student testing and debugging behaviors 

and both teacher pedagogical moves and trends in the broader unit. For example, students seldom used the 

simulation to analyze and interpret model output prior to being formally introduced to these features by 

Mr. H. After his subsequent “fail faster” informational talk, where he provided a clear rationale of the 

benefits of using the simulation features, students were far more likely to test their model output using 

this built in tool. As far as analyzing external data, student’s behaviors tended to mirror the general course 

of the unit, which was reinforced by informational talks and demonstrations provided by Mr. H. Because 

both the unit and Mr. H did not emphasize using external data until around the time of the Temperature 

vs. Time experiment (Dec 12, Day 10), students largely did not show evidence of this behavior until Dec 

15th (Day 11) and Dec 17th (Day 12). Students also showed evidence of peer review early in the unit, prior 

to a formalized introduction by either the unit or Mr. H. Such early evidence of peer review suggests that 

Mr. H’s general classroom management style encouraged students to share ideas across groups and that 

students have previously engaged in collaborative projects at Faraday High School. After being provided 

with explicit scaffolding in the form of the peer review guidelines disseminated by Mr. H, students were 

both more likely to engage in peer review and use the peer review guidelines to ask more meaningful 

questions to their peers. As such, it appears that the pedagogical support for peer review provided by Mr. 

H reinforced and enhanced the quality of student peer review in this unit. 

Discussion and Conclusion 

Discussion 

The findings show several key strategies that teachers used to help support students with testing 

and debugging and how these strategies compare to those discussed in previous literature (Table 20). Mr. 

H’s “fail faster” talk emphasized the importance of continuously testing model output using the 

159 

 
 
 
 
 
 
simulation feature imbedded in SageModeler. This philosophy of frequent model testing strongly parallels 

the “Compile, Run, Compare” strategy described by Michaeli & Romeike (2019) for testing and 

debugging text-based programs. Encouraging students to frequently use the simulation features also 

aligns well with similar suggestions from Basu and colleagues (2016). As the “Compile, Run, Compare” 

strategy emphasizes the importance of comparing model output with external data, much of Mr. H and 

Mr. M’s efforts to encourage students to use external data to validate model output, align well with this 

earlier work (Michaeli & Romeike, 2019). Additionally, Mr. H and Mr. M’s informational talks on the 

importance of using external data to validate model output, along with their conversations with individual 

small groups have strong parallels to many of the “back pocket” questions proposed by Windschitl and 

colleagues (2020). Although neither study provides detailed insights on how to support peer review, both 

Pierson and colleagues (2017) and Chmiel & Loui (2004) emphasize the importance of peer review in 

supporting students with model revisions and testing and debugging, respectively. As such, Mr. H and 

Mr. M’s efforts to encourage and support students with peer review share a common philosophy with 

these studies. Likewise, the broader ideas of the importance of “practice” for developing student 

proficiency with testing and debugging from Chmiel & Loui (2004), is present both in Mr. H’s “fail 

faster” talk and in the frequent efforts by both teachers to have students practice analyzing and 

interpreting peer models through whole class model reviews. Lastly the types of questions Mr. H and Mr. 

M ask the small groups to support them across all three aspects of testing and debugging largely mirrors 

the strategies envisioned by Wilson’s Socratic questioning (1987) and Li & Schwarz’s (2020) generative 

questioning. 

160 

 
 
 
 
 
 
 
Table 20: Summary of Pedagogical Scaffolds used to Support Students in Testing and Debugging 

Testing and 
Debugging Aspect 
Analyzing Model 
Output 

Examples of Pedagogical 
Supports 
Demonstrating the 
Simulation Feature 
Mr. H’s “Fail Faster” Talk  
Discussions with Small 
Groups 

Analyzing and 
Interpreting External 
Data to validate 
Model Output 

Using Peer 
Feedback  

Demonstrating how to input 
external data into 
SageModeler 
Demonstrating how to 
compare model output 
directly with external data in 
SageModeler 
Talks on importance of 
using external data to 
validate model output 
Discussions with Small 
Groups 
Reviewing the Peer 
Reflection Guidelines 
Talks on how to give and 
receive Peer Feedback 
Whole Class Model 
Critiques 
Discussions with Small 
Groups 

Comparisons to Previous Literature 

“Compile Run Compare” vs. Fail Faster 
(Michaeli & Romeike, 2019)  
Frequent use of Simulation Features (Basu 
et al., 2016) 
Importance of “practice” (Chmiel & Loui, 
2004) 
Socratic and Generative Questioning 
(Wilson, 1987; Li & Schwarz, 2020) 
“Compile Run Compare”  (Michaeli & 
Romeike, 2019) 
Importance of external data (Windschitl et 
al., 2020) 
Socratic and Generative Questioning  
(Wilson, 1987;  
Li & Schwarz, 2020) 

Importance of peer review for model 
revisions  
(Pierson et al., 2017) 
Using peer review to support testing and 
debugging  
(Chmiel & Loui, 2004) 
Importance of “practice”  
(Chmiel & Loui, 2004) 
Socratic and Generative Questioning 
(Wilson, 1987; Li & Schwarz, 2020) 

Beyond parallels with earlier studies, these pedagogical strategies also align with the ideas of 

synergistic scaffolding. Across all three targeted aspects of testing and debugging, these teachers 

explicitly showcased relevant technological and curricular scaffolds imbedded in the unit. From showing 

students SageModeler’s simulation feature to demonstrating how to input external data into SageModeler 

and subsequent use the data to validate model output, these teachers both presented these technological 

and curricular scaffolds to students while also demonstrating how to use these technological and 

curricular scaffolds to carry out key aspects of testing and debugging. By introducing students to the 

planned technological and curricular scaffolds and using these tools to structure demonstrations of key 

161 

 
 
 
 
 
 
 
 
testing and debugging behaviors, these teachers exhibited synergy between their pedagogical scaffolds 

and the other scaffolds embedded into the learning environment (Puntambekar & Kolodner, 2003; Tabak, 

2004; Tabak & Kyza, 2018). Additionally, these teachers also provided explicit rationales (as exemplified 

by Mr. H’s “Fail Faster” informational talk) for students to use these aspects of testing and debugging to 

help them revise their models. Through their whole class model reviews, Mr. H and Mr. M demonstrated 

to students how to analyze each other’s models when giving peer feedback while also addressing other 

learning goals associated with CS Modeling, including “analyzing model output” and “systems thinking.” 

This use of a common set of pedagogical supports to address several different learning goals provides 

another example of how these teachers created cohesion within their scaffolding (Tabak, 2004; Tabak & 

Kyza, 2018). Lastly, both Mr. H and Mr. M reinforced earlier supports and scaffolds through their 

discussions with individual small groups. As building on earlier supports is a key aspect of synergistic 

scaffolding, these discussions with individual small groups further demonstrate the synergistic scaffolding 

embedded in Mr. H and Mr. M’s pedagogies (Puntambekar & Kolodner, 2003; Tabak, 2004).  

In addition to providing examples of synergistic pedagogical strategies that can be used to support 

students with testing and debugging, this study reinforces the importance of using synergistic scaffolding 

strategies to support students in constructing and revising models, particularly in the context of testing 

and debugging computational systems models. Synergistic scaffolding strategies involve using multiple, 

overlapping, and complementary scaffolds to support student learning (Tabak, 2004; Tabak & Kyza, 

2018). Within computerized learning environments, such as computational systems modeling, multiple 

technological scaffolds are often embedded into the learning environment to help students navigate the 

program and perform key tasks that would otherwise be beyond their abilities (Baker et al., 2004; Basu et 

al., 2017; Fretz et al., 2002; Grawemeyer et al., 2017; Putnambekar & Hubscher, 2005). While these 

technological scaffolds are often beneficial to students in developing and revising models, additional 

teacher support that synergizes with these technological scaffolds is often necessary for students to obtain 

the greatest benefit from these technological scaffolds (Baker et al., 2004; Li & Lim, 2008; Wu & 

Pedersen, 2011). In this study, students seldom used key technological scaffolds, such as the simulation 

162 

 
 
 
 
 
 
feature or the features associated with inputting external data into SageModeler to validate model output 

prior to being given explicit demonstrations and informational talks from their teacher. Once students 

were given presentations on how to use these technological scaffolds and provided a clear rationale on 

why the related testing and debugging practices are important for revising their computational models, 

they began using these technological scaffolds to support them in testing and debugging their models. As 

such the synergy between these teacher-led demonstrations and informational talks and the existing 

technological scaffolds was essential for students to test and debug their models. By demonstrating that 

students benefit from the teachers’ synergistic scaffolds, this study supports previous findings that 

technological scaffolding often needs to be supported in a synergistic manner by additional scaffolds 

provided by a teacher.  

One particularly resonant finding from this study is the importance of teachers providing students 

with a clear, explicit rationale for engaging in intellectually rigorous tasks. In this study, providing 

students with access to key curricular and technological scaffolds (i.e., the peer review guidelines and the 

simulation feature) was insufficient for motivating students to use these resources to test and debug their 

computational models. Instead, Mr. H and Mr. M’s informational talks centered on sharing a clear 

rationale for using existing scaffolds to test and debug their models were instrumental in supporting 

students with testing and debugging. Once Mr. H and Mr. M presented a clear rationale for a respective 

testing and debugging behavior, students were more likely to exhibit evidence of said behavior during the 

model revision process. The students even occasionally directly referenced the informational talks 

provided by their teachers when explaining their testing and debugging behaviors.  

The need for teachers to provide students with not only the knowledge of how to perform a 

scientific practice but to also instruct them on the logical rationale or epistemic aims of a scientific 

practice has been documented in other studies (Kuhn et al., 2000; McNeill & Krajcik, 2008). McNeill & 

Krajcik (2008) demonstrated that when teachers share meaningful reasons for students to participate in 

the practice of scientific explanations, they develop a greater proficiency with this scientific practice 

compared to peers whose teachers focused primarily on the mechanics of scientific explanation. Likewise, 

163 

 
 
 
 
 
 
the Epistemologies in Practice framework encourages teachers to support students in moving beyond 

focusing on the mechanics of scientific practices and towards considering the broader epistemic goals and 

rationales underlying the scientific practices they engage in as they construct a meaningful knowledge 

product (Berland et al., 2016). While our study resonates with established literature it is important to note 

the key differences that show the novelty of our results. Both McNeill & Krajcik (2008) and Berland & 

colleagues (2016) centered their work on the practice of argumentation at the middle school level. As this 

study focuses on high school students with computational modeling, it demonstrates how providing a 

clear rationale can support students at a different grade level and with a different scientific practice than 

found in previous studies. As such, these earlier studies and our study results underscores the critical role 

that pedagogical guidance and rationale-setting play in fostering students’ meaningful engagement with 

complex intellectual tasks and scientific practices, including testing and debugging.  

Another key insight from this study pertains to how differences in the class sizes between each 

teacher impacted their pedagogical practices in this unit. It has long been established that class size has a 

substantial impact on how teachers approach and implement pedagogical practices and therefore can 

impact the academic achievement of students (Brühwiler & Blatchford, 2011; DiBiase & McDonald, 

2015; Rice, 1999; Rockoff, 2004). Having larger class sizes makes it more challenging for teachers to 

engage students in whole class discussions as having more students often means either that each 

individual student has fewer opportunities to contribute to the conversation, limiting the participatory 

nature of discourse, or that classroom discussions must take additional class time to allow every student a 

chance to share their ideas (Blatchford et al., 2011; Cuseo, 2007) As such classes with more students 

likely have fewer opportunities for whole class discussion compared to smaller classes. Because Mr. H 

had roughly double the number of students (29) compared to Mr. M (14) in the classes we observed, Mr. 

H’s pedagogy inevitably diverged from Mr. M’s.  

Although this study does provide evidence for several ways in which teachers can support 

students with building competency with testing and debugging, it also shows several areas where 

additional supports are needed. While the existing curricular and technological scaffolds, combined with 

164 

 
 
 
 
 
 
 
synergistic pedagogical supports from Mr. H helped to support students at lower levels of the three 

targeted aspects of testing and debugging, students rarely performed at the highest levels for these three 

aspects. For example, most students from our screencast focus groups used the simulation features to 

analyze model behavior at a local level (Indicator B: Analyzing Model Output: Level 3), but students 

seldom used the simulation features to discuss how changing the relative amounts of various input 

variables impacted system behavior on a more holistic level (Indicator B: Analyzing Model Output: Level 

4). As the existing supports did not allow for most students to reach this higher level of model analysis, 

having additional scaffolds could help students achieve this higher-level learning goal. It also seems 

likely that a greater emphasis on holistic ST throughout previous grade bands is necessary for students to 

regularly have in depth discussions about how changing one or more input variables impacts the behavior 

of the whole system. Likewise, when analyzing and using external data to validate model output, students 

mostly spent time putting their external data into SageModeler and relatively little time (18 minutes 

across all five groups) comparing their model output to this external data. This suggests that both 

additional time and additional support is needed to help more students perform at this higher level. As 

neither the curriculum, SageModeler, or Mr. H substantially prompted the students to consider the 

validity of the external data they collected and used to validate model output, it is unsurprising that there 

is no evidence of students exhibiting level 4 behavior for this aspect of testing and debugging in this 

study. Finally, while most of the screencast focus groups showed evidence of using peer feedback to 

make substantial revisions to their models, the overall set up of the unit and the organization of student 

groups limited their opportunity to have a second round of discussion for students to share their feedback 

on the revision process with their peers. As such there is only one clear example of students having the 

reflective conversations indicative of level 4 behavior for using feedback. As such, it is likely that both 

changes to curricular design and instructional support would be needed for more students to engage in 

higher-level behaviors for all three targeted testing and debugging practices. 

165 

 
 
 
 
 
 
 
Limitations 

While this study does provide some important insights into the ways teachers can support students with 

different aspects of testing and debugging in the context of computational modeling, there are several 

factors that affect the scope of this study. As a case study that focuses on the teaching strategies and 

scaffolds developed by two teachers who work together in the same school building and participated in 

the same PLC, the pedagogical strategies investigated in this study do not represent all the different ways 

that teachers can support students with testing and debugging. Additionally, given the magnet school 

nature of the school in which this study took place, it is likely that these students had more familiarity 

with giving and receiving feedback from their peers and using digital learning tools, such as 

SageModeler. Therefore, while this study showed that Mr. H’s pedagogical strategies appeared effective 

at supporting students with testing and debugging, it is likely that additional supports would be needed if 

this curriculum was implemented in a less privileged environment. Not only does the case study nature of 

this research limit the scope of my findings, but time limitations also impacted the results of this study. In 

particular, the hard deadline imposed by the arrival of Winter Break meant that Mr. H and Mr. M were 

unable to add an additional day to allow for a final round of peer review and model revisions. This 

truncated finale also reduced the amount of time available for students to make meaningful comparisons 

between their model output and the external data they inputted into SageModeler, therefore limiting their 

opportunities to “Analyze and Use External Data to Validate Model Output” at higher levels.  

Conclusion 

Testing and debugging is the process of identifying anomalies and/or logical inconsistencies in an 

algorithmic artifact and making changes to correct these problems (Bowers et al., 2023; Griffin, 2016; 

Shin et al., 2022). Testing and debugging is often associated with computational thinking and 

computational modeling and cuts across a number of STEM disciplines (Griffin, 2016; Michaeli & 

Romeike, 2019; Sengupta et al., 2012; Shin et al., 2022; Weintrop et al., 2016) In this study, I examined 

testing and debugging through the lens of “A Framework for Computational Systems Modeling” that 

views testing and debugging as a core computational modeling practice that students engage in as they are 

166 

 
 
 
 
 
building and revising computational models (Bowers et al., 2023; Shin et al., 2022). In this framework, 

there are six behavioral categories associated with students using testing and debugging to build and 

revise computational models: Sensemaking through Discourse, Analyzing Model Output: Simulations, 

Analyzing Model Output: Graphs, Analyzing and Using External Data, Using Feedback, and Reflecting 

upon Iterative Refinement (Bowers et al., 2022). For the purposes of this study, I primarily chose to focus 

on three of these behavioral categories (analyzing model output: simulations, analyzing and using external 

data, and using feedback) as these three aspects of testing and debugging were clearly defined by this 

framework, were established as key testing and debugging learning goals by other authors, and are areas 

where students often need additional support and scaffolding (Bowers et al., 2023; Fretz et al., 2002; 

Grapin et al., 2022; Louca & Zacharia, 2012). 

Although testing and debugging has been established as a key learning goal across several STEM 

disciplines, many scholars agree that testing and debugging is often a difficult task for students, requiring 

explicit supports from teachers (Grapin et al., 2022; Michaeli & Romeike, 2019; Weintrop et al., 2016; 

Yadav et al., 2011). While computer science educators have proposed several pedagogical strategies for 

supporting students with testing and debugging (Katz & Anderson, 1989; McCauley, 2008; Michaeli & 

Romeike, 2019), these studies are typically embedded in a traditional text-based programming context 

and are therefore not as relevant to the computational modeling context. Meanwhile, computational 

modeling studies tend to focus on broader processes involved in computational modeling, rather than 

narrowing in specifically on testing and debugging (Fretz et al., 2002; Snyder et al., 2022; Wilkerson et 

al., 2018). Because neither the computer programming or computational modeling literature fully 

addresses how to support students with testing and debugging in a manner that aligns with the vision of 

testing and debugging laid out in “A Framework for Computational Systems Modeling,” this study sought 

to identify teacher pedagogical strategies that support students in using three targeted aspects of testing 

and debugging: analyzing model output, analyzing and using external data to validate model output, and 

using peer feedback. 

167 

 
 
 
 
 
This study demonstrates some of the different scaffolding strategies that teachers can use to 

support students in three targeted aspects of testing and debugging: analyzing model output, analyzing 

and using external data to validate model output, and using peer feedback (Table 19). It also provides 

evidence on the benefits of using synergistic scaffolding to support students with building competency in 

testing and debugging. The results suggest that the curricular and technological scaffolds designed to 

support students with testing and debugging were not sufficient as students only utilized these scaffolds 

after being given synergistic instructional scaffolding on how to use these supports from their teacher. As 

such, this study reinforces the need for explicit instructional supports from teachers for students to receive 

the most out of embedded curricular and technological scaffolds. In addition to showing the importance of 

synergistic scaffolds, the results of this study highlight that teachers should support students not only with 

the mechanics of scientific practices (including testing and debugging) but also provide students with a 

clear rationale for engaging in these practices and making use of relevant scaffolds. This study also 

suggests that the differences in class size impacts pedagogical practices. Both Mr. H’s relative affinity for 

informational talks and the more time he spent helping students with troubleshooting malfunctioning 

technology largely mirror the anticipated effects of larger class sizes on teacher pedagogy. Lastly, while 

this study suggests that pedagogical strategies were helpful for facilitating student testing and debugging, 

the relative absence of higher-level testing and debugging behaviors implies that additional pedagogical 

and technological supports and changes to the curriculum are needed for students to reach their full 

potential in this unit.  

Implications and Future Directions 

Based on the results of this study, there are several recommendations for practitioners, curriculum 

developers, and researchers moving forward. The scaffolding strategies used by Mr. H and Mr. M for 

supporting students with testing and debugging can be adapted by other teachers to help them support 

students with testing and debugging in their classrooms (Table 19). While many teachers will likely find 

individual supports from Mr. H and/or Mr. M’s implementation of this unit beneficial, the overall 

synergistic nature of their scaffolding can benefit science teachers when approaching a variety of topics 

168 

 
 
 
 
 
beyond testing and debugging. This study demonstrates that technological and curricular scaffolds need to 

be supported by synergistic instructional scaffolding from a teacher for students to fully utilize these other 

scaffolds. Additionally, these results further uphold the importance of teachers providing students with a 

clear rationale for engaging in scientific practices (i.e., testing and debugging) and for using scaffolds that 

support these practices. Curricular developers can add some of these strategies to their teacher guides to 

help support teachers in using the scaffolds that have emerged from this study. 

This study also suggests that additional teacher scaffolding and restructuring of this unit would be 

useful for supporting higher level behaviors across all three targeted aspects of testing and debugging. 

This is especially relevant for scaffolding student conversations around using external data to validate 

model output as this aspect of testing and debugging seems to be an area where students seldom perform 

at higher levels. While this study does suggest that these scaffolding techniques were beneficial for 

supporting students in testing and debugging, given the small sample size and case study nature of this 

work, these results are not fully conclusive. As such, a larger study, involving multiple teachers across 

several school districts with diverse student populations would be beneficial for seeing which specific 

supports used by Mr. H were most helpful for supporting students with testing and debugging and 

whether these supports also correspond to a deeper understanding of the underlying science content. 

While the case study nature of this research does limit the scope of this conclusion, these findings still 

provide key insights into how teachers can support students with testing and debugging in the context of 

computational modeling. Such insights are useful for teachers, curriculum developers, and researchers 

aiming to better understand how to support students in this practice. 

169 

 
 
 
 
 
 
 
CONCLUSIONS 

Table 21: Summary of Findings 

Major Contributions and 
Findings 
Development of the ST and CT 
ID Tool to measure student 
testing and debugging behaviors 
•  Allows researchers to assess 

student testing and 
debugging behaviors in-situ 

•  Can be adapted for 

practitioners to help them 
recognize where students 
need additional supports 
with testing and debugging 
Identification of discourse 
as a major indicator of 
testing and debugging. 

• 

Evidence  

•  Content and 

construct validity 
established (Paper 
1; 28-30) 

•  Evidence of testing 
and debugging 
behaviors (Paper 1; 
30-37) 

•  Student discourse 
provides key 
evidence of student 
testing and 
debugging patterns 
(Paper 2; 59-67) 

Connections to Previous 
Literature 
•  Definitions of ST 

o  Arnold & Wade, 2015; 
Sweeney & Sterman, 
2007 

•  Definitions of CT 

o  Grover & Pea, 2018; 

Wing, 2006 

•  Definitions of Modeling 

o  Schwarz et al., 2009; 
Zu Belzen & Kruger, 
2010 

•  Efforts to synthesize ST, CT, 

and Modeling 

o  Weintrop et al., 2016; 
Shin et al., 2021; Shin 
et al., 2022. 

•  Students using 

•  A Framework for 

external feedback 
from peers. (Student 
dialogue; Paper 2; 
59-61) 
•  Students use 

discourse to identify 
flaws in models. 
(Dialogue and 
screencasts; Paper 
2; 61-64) 

•  Analysis of model 
output through 
simulation feature 
(Dialogue and 
screencasts; Paper 
2, 64-67) 

Computational Systems 
Thinking 

o  Shin et al., 2022; 

Grover & Pea, 2018; 
Arnold & Wade, 2017 

•  Definitions of Testing and 

Debugging 

o  Hadad et al., 2020; 

Weintrop et al., 2016; 
Lee & Malyn-Smith, 
2020; Sengupta et al., 
2013 

•  ST and CT ID Tool 
Development  

o  Paper 1; Bowers et al., 

2022 

Students develop different 
strategies for approaching 
testing and debugging. 
•  External Feedback from 

Peers 

o  Students relying on 
Peer Feedback to 
identify flaws in 
their models 

•  Verbal and written 

discourse to identify flaws 
in computational models 

o  Students engage in 

discourse to identify 
and correct flaws in 
their models. 

•  Analysis of model output 

through simulation features 
in SageModeler 

o  Students frequently 
test model output to 
determine if model 
needs revisions  

170 

 
 
 
 
 
 
 
 
 
 
Table 21 (cont’d) 

Major Contributions and 
Findings 
Some testing and debugging 
behaviors (Analysis through 
discourse, Analyzing Model 
Output: Simulations) are more 
common than others (Using 
Graphs and Using External Data 
to validate models). 

•  More accessible 

behaviors (Analysis 
through Discourse; 
Model Simulations) 
appear more frequently. 

•  Students need more 

support with using the 
graphing features of 
SageModeler 
•  Students need more support 

with using external data to 
validate their models. 
Strategies for supporting 
students with Analyzing Model 
Output 

•  Mr. H’s demonstration 
of how to input data 
into SageModeler 
•  Mr. H’s “Fail Faster” 
informational talk 
(laying out a clear 
rationale for analyzing 
model output) 
•  Mr. H and Mr. M’s 

demonstrations of using 
the simulation features 
to test model output. 
o  Mr. H and Mr. M’s 

small group 
discussions using 
questions that built 
off earlier supports. 

Evidence  

Connections to Previous Literature 

•  Table 7, Paper 2, 

•  Students are often hesitant to 

Page 68 

•  Discussion on semi-
quantitative results; 
Paper 2, 68-70 

• 

interpret model output to inform 
model revision 

o  Grapin et al., 2022; 

Stratford et al., 1998; 
Swanson et al., 2021 

•  Students are likely to revise their 

models without reviewing external 
data to verify model output 
o  Grapin et al., 2022; 

Swanson et al., 2021 

•  When students do use external data 
to revise their models, they often 
adopt an “outcome oriented” 
approach with little regard for 
internal logic or consistency 

o  Li et al., 2019; Sins et al., 

2005; Wilensky & 
Reisman, 2006 

•  Definitions of Testing and 

Debugging and Analyzing Model 
Output 

o  Shin et al., 2022; Paper 1; 

Bowers et al. 2022 

•  Similarities in philosophy between 
“Compile Run Compare” and “Fail 
Faster” in encouraging students to 
frequently test model output 

o  Michaeli & Romeike, 

2019 

• 

Importance of encouraging 
students to frequently use 
simulation features  

o  Basu et al., 2016 

•  Efficacy of “practice” 

environments in supporting 
students with testing and 
debugging  

o  Chmiel & Loui, 2004 

•  Use of Socratic and Generative 
Questioning to support students 
with testing and debugging and 
model revisions 

o  Wilson, 1987; Li & 
Schwarz, 2020 

•  Mr. H’s strategies 
for supporting 
students with 
Analyzing Model 
Output  

o  Paper 3, 
118- 122 
o  Summary 
Table, 
Table 16, 
132 

•  Mr. M’s strategies 
for supporting 
students with 
Analyzing Model 
Output  

o  Paper 3, 
135-138 

171 

 
 
 
 
 
 
 
 
Table 21 (cont’d) 

Major Contributions and Findings  Evidence  

•  Mr. H’s strategies for 
supporting students 
with Analyzing and 
Using External Data to 
Validate Model 
Output  

o  Paper 3, 122-

125 
o  Summary 

Table, Table 
16, 132 

•  Mr. M’s strategies for 
supporting students 
with Analyzing and 
Using External Data to 
Validate Model 
Output  

o  Paper 3, 139-

144 

• 

Strategies for supporting students 
with Analyzing and Using External 
Data to Validate Model Output 
•  Mr. H’s and Mr. M’s 

informational talks on the 
importance of data prior to 
the experiment 
•  Mr. H’s and Mr. M’s 

demonstrations of how to 
input external data into 
SageModeler 

o  Mr. M’s use of 
“dummy data” 

•  Mr. H’s and Mr. M’s 

demonstrations of how to 
compare model output to 
external data using 
SageModeler. 

•  Mr. H’s and Mr. M’s small 
group discussions using 
questions that built off earlier 
supports. 

•  Mr. H’s troubleshooting 
support for students 

Strategies for supporting students 
with Giving and Using Peer Feedback 
•  Mr. H’s and Mr. M’s  reviews 
of the peer review guidelines 
and rationale for engaging in 
peer review. 

•  Mr. H’s strategies for 
supporting students 
with Analyzing and 
Using External Data to 
Validate Model 
Output  

o  Mr. H’s talk on 
humility being 
necessary to 
give/receive feedback 

•  Mr. H’s and Mr. M’s whole 
class reviews of student 
models demonstrating “how 
to give peer feedback” 
•  Mr. H’s and Mr. M’s small 

group discussions 

o  Paper 3, 126-

130 
o  Summary 

Table, Table 
16, 132 

•  Mr. M’s strategies for 
supporting students 
with Analyzing and 
Using External Data to 
Validate Model 
Output  
Paper 3, 144-149 

172 

Connections to Previous 
Literature 
•  Definitions of Testing and 
Debugging and Analyzing 
and Using External Data 
to Validate Model Output 
o  Shin et al., 2022; 

Paper 1; Bowers et 
al. 2022 

•  Similarities between 

“Compile Run Compare” 
and Mr. H and Mr. M’s 
approach to having 
students use external data 
to validate model output 
(Michaeli & Romeike, 
2019) 
Importance of external 
data in supporting students 
with model revisions 
(Windschitl et al., 2020) 

•  Use of Socratic and 

Generative Questioning to 
support students with 
testing and debugging and 
model revisions 

o  Wilson, 1987; Li 
& Schwarz, 2020 
•  Definitions of Giving and 
Using Peer Feedback 

• 

o  Paper 1; Bowers et 

al. 2022 
Importance of peer review 
for supporting model 
revisions 

o  Pierson et al., 

2017 
•  Examples of using peer 
review to support testing 
and debugging  

o  Chmiel & Loui, 

2004 

•  Efficacy of “practice” 
environments in 
supporting students with 
testing and debugging  

o  Chmiel & Loui, 

2004 

 
 
 
 
 
 
 
Table 21 (cont’d) 

Major Contributions and Findings  Evidence  
Synergistic Scaffolding supports 
students with developing proficiency 
with testing and debugging. 

•  Analysis of 

student results  

•  Both Mr. H and Mr. M 

created synergy across their 
various scaffolds throughout 
this unit 

•  Students only began using 
different testing and 
debugging behaviors after 
receiving both support in the 
mechanics and rationale for 
the behavior 

•  Students referenced specific 
supports (quotes from 
teacher informational talks, 
quotes from peer review 
sheet) during the unit. 

o  Both Mr. H and Mr. 
M returned to ideas 
from earlier talks 
and demonstrations 
when providing 
support to small 
groups. 

Clear, explicit rationales help 
support students with developing 
proficiency with testing and 
debugging. 

•  Mr. H provided a clear 
rationale for analyzing 
model output through his 
“fail faster” talk 
o  Students 

subsequently spent 
more time analyzing 
model output. 

•  Mr. H provided a clear, 

explicit rationale for using 
the peer review guidelines. 
•  Students displayed higher 

level behaviors when giving 
and receiving peer review 

Connections to Previous Literature 
•  Definitions of Synergistic 

Scaffolding 

o  Tabak, 2004; Tabak & 
Kyza, 2018; McNeill & 
Krajcik, 2009 
•  Benefits of Synergistic 

Scaffolding within computerized 
learning environments  

o  Basu et al., 2017; Fretz et 
al., 2002; Grawemeyer et 
al., 2017 

Importance of teacher support and 
need for teacher instruction to 
promote synergy within 
computerized learning 
environments  

o  Baker et al., 2004; Li & 
Lim, 2008; Wu & 
Pedersen, 2011 

o  Paper 3, 
150-159 

o  Table 

19, Page 
151-152 
•  Summary of Mr. 
M’s Strategies  
o  Table 

16, 132 

•  Mr. H’s 

• 

conversation 
with student 
group 
referencing his 
“fail faster” talk, 
121 

Mr. H’s peer review 
sheet informational 
talk, 126-127 

•  Analysis of 

•  Earlier studies showing 

student results  

o  Paper 3, 
150-159 

o  Table 

19, Page 
151-152 

•  Mr. H’s “Fail 
Faster” 
Informational 
Talk 

o  Paper 3, 
120 
•  Mr. H’s Peer 
Review 
Informational 
Talk, 

o  Paper 3, 
126-127 

importance of clear rationale for 
supporting students with 
scientific practices  

o  Kuhn et al., 2000; 

McNeill & Krajcik, 2008 
•  Providing a clear rationale and a 
meaningful reason to engage in 
scientific explanations supports 
students with scientific 
explanations better than focusing 
primarily on the mechanics of 
creating explanations  

o  McNeill & Krajcik, 2008 

•  Epistemologies in Practice 

framework encourages teachers to 
support students in considering 
the broader epistemic goals and 
rationales underlying various 
scientific practices  

o  Berland et al., 2016 

173 

 
 
 
 
 
 
Major Findings 

Across these three papers investigating how students test and debug computational models and 

how teachers and the broader learning environment can support students in various aspects of testing and 

debugging, several findings and themes emerged. One major outcome of these studies has been 

solidifying a clearer vision of testing and debugging in the context of computational modeling. While “A 

Framework for Computational Systems Modeling” (Shin et al., 2022) unpacked how ST and CT can be 

expressed through students, testing, evaluating, and debugging model behavior, and proposed a set of 

testing and debugging aspects, the theoretical nature of this framework did not fully operationalize what 

student testing and debugging can look like in real-world classrooms. As such papers 1 and 2 of this 

thesis set out to categorize the practice of testing and debugging into a meaningful set of testing and 

debugging behaviors based on classroom evidence from students. Building off “A Framework for 

Computational Systems Modeling” and classroom observations, I identified six major testing and 

debugging behaviors: sensemaking through discourse, analyzing model output: simulations, analyzing 

model output: graphs, analyzing and using external data, using feedback, and reflecting upon iterative 

refinement. Through categorizing these six testing and debugging behaviors, I created a validated 

research instrument that could be used to assess how students test and debug computational models and 

how their testing and debugging behaviors evolve over time. Additionally, I also described broader 

behavioral patterns and approaches students would take towards testing and debugging computational 

models. These patterns include: a model output approach centered on using the simulation feature of 

SageModeler, a revision technique emphasizing the use of peer feedback to identify flawed aspects of 

model structure, and a discourse-based method focusing on unpacking the reasoning behind individual 

relationships within a model.  

Creating a clear, theory driven and evidence-based vision for how students can engage in testing 

and debugging in a computational modeling context represents a major step forward for the field of 

science education. By unpacking testing and debugging using concrete examples from classroom data, I 

developed a useful lexicon that researchers and practitioners can use across STEM disciplines to support 

174 

 
 
 
 
 
 
students in this computational modeling practice. The six testing and debugging behaviors can help guide 

curriculum developers interested in designing instructional, curricular, and technological supports to 

better facilitate students in testing and debugging. Teaching practitioners can also use this framework to 

help create formative assessments for testing and debugging and to structure their teaching to better 

support students with this practice. The “ST and CT ID Tool” also has the potential to serve as a novel 

research instrument that can be adapted to assess how students are testing and debugging computational 

models across multiple contexts. Researchers can use the language of the six testing and debugging 

behaviors to figure out where students need additional supports for testing and debugging and to target 

interventions focusing on scaffolding specific testing and debugging behaviors. 

In addition to proposing an evidence-based vision for testing and debugging alongside an 

associated research instrument, these studies provide guidance on how teachers and the broader learning 

environment can support and scaffold students with testing and debugging. Student results from papers 2 

and 3 demonstrate how many of the technological scaffolding features built into SageModeler facilitate 

student testing and debugging. In particular, the simulation feature helps students visualize model output 

and identify how specific model structures impact model behavior at both a local and systemic level. 

These same results also showcase the unintuitive nature of some of the technological scaffolds, such as 

the graphing and data input features of SageModeler as students rarely used these tools to test and debug 

their models without first being given extensive instructional support from their teacher. Because students 

required substantial, explicit instructional support to use the graphing features and to use external data to 

validate their models using SageModeler, it suggests that more built-in support, is needed for students to 

consistently use these features independently.  

Another key finding suggests that having students collaboratively build and revise computational 

models in small groups while also providing frequent opportunities to share their models with other small 

groups facilitates student testing and debugging. Working with peers encourages students to verbalize 

their reasoning behind their modeling design choices. If their partners disagree with their design choices, 

discourse will ensue, helping the students to determine if the design element (e.g., a specific relationship 

175 

 
 
 
 
 
between two variables) is supported by reasoning and evidence or if it is inappropriate for describing the 

phenomenon. Such sensemaking conversations help facilitate iterative model refinement in ways that are 

only possible through collaborative model construction. In a similar manner, semi-structured peer review 

sessions are also a critical aspect of testing and debugging as they allow for students to get additional 

feedback on their own models and to be exposed to alternative ideas on how to structure different aspects 

of their models. Although peer review and collaboration are generally considered to be important aspects 

of constructivist approaches to STEM education and have been long established as pedagogical supports 

for modeling, relatively little research has investigated how peer review and collaboration can support 

students in testing and debugging (Ben-Ari, 2001; Louca & Zacharia, 2012; Schreiber & Valle, 2013: 

Tsivitanidou et al., 2018). As such these results represent a significant shift towards acknowledging the 

potential of using peer collaboration to support students in testing and debugging in a computational 

modeling context.  

In addition to technological scaffolds and the benefits of a collaborative learning environment, I 

also investigated how teacher instructional supports assisted students with testing and debugging. My 

findings suggest that when teachers provided direct, synergistic instruction on how to use key 

technological and curricular scaffolds, students were better equipped to engage in the corresponding 

testing and debugging behaviors. Indeed, even though students had initial access to relevant technological 

and curricular scaffolds built into the learning environment, having additional synergistic support from 

their teacher, often in the form of whole class informational talks or conversations with individual small 

groups, was often necessary for students to use these scaffolds to assist in testing and debugging. When 

teachers revisited their earlier instructional scaffolds by referencing them in later discussions and 

informational talks, it helped reinforce the importance of key testing and debugging practices. These 

results show the importance of synergistic scaffolding reflect earlier studies (McNeill & Krajcik, 2009; 

Tabak, 2004; Wu & Pedersen, 2011), further emphasizing how in technology centered learning 

environments, such as a unit centered on computational modeling, technological scaffolds should be 

supported by instructor led synergistic scaffolds.  

176 

 
 
 
 
 
My results also demonstrate the value of teachers providing students with a clear rationale for 

engaging in specific aspects of testing and debugging. For example, Mr. H gave an informational talk 

laying out the importance of using the simulate feature of SageModeler after making changes to model 

structure to quickly examine model output behavior and determine if the model output matched either 

experimental results or the student’s understanding of the phenomenon. After giving this talk, students 

were more likely to use the simulate feature to test model output and key phrases from this informational 

talk were referenced in their subsequent conversations with Mr. H. Although the benefits of providing 

students with clear rationales for engaging with scientific practices has been established by both McNeill 

& Krajcik (2008) and Berland & Colleagues (2016), both studies centered on supporting middle school 

students with argumentation. Because my work focuses on supporting high school students with 

computational modeling, it demonstrates how the principles established by these earlier studies can be 

applied to both a different grade level and a different scientific practice. 

Implications 

Curricular and Technology Implications 

One important implication of this study is the identification of aspects of testing and debugging 

that remain challenging for students despite teacher supports and the designed learning environment. 

While the learning environment and instructor scaffolds implemented in the evaporative cooling unit 

encouraged students to use model simulation and engage in discourse (both within small groups and 

between small groups), outside of lessons where they were expressly told to do so, students were highly 

unlikely to input external data into SageModeler and use the graphing features to compare model output 

to external data to validate their models. Previous studies suggest that using external data to validate 

model output is a challenged common across computational modeling environments (Grapin et al., 2022; 

Li et al., 2019; Sins et al., 2005; Wilensky & Reisman, 2006). However, it is also likely that additional 

revisions to the curriculum and the SageModeler learning environment could be made to better scaffold 

students in using external data to validate model output. While Mr. H did make initial efforts to stress that 

student models need to match real world data, the absence of early quantitative experiments in the 

177 

 
 
 
 
 
curriculum meant that students had little opportunity to input real world data into SageModeler prior to 

the temperature vs. time experiment towards the end of the unit. Additionally, the tight deadlines at the 

end of the unit limited the opportunities students had to do an in-depth comparison of their model output 

to external data; in future implementations, I would recommend securing an additional day for students to 

compare model output more thoroughly to external data.  

Another aspect of the learning environment whose redesign could benefit students with testing 

and debugging are the technological scaffolds that allow for students to input external data and compare 

model output data directly to these external data. For students to use this feature of SageModeler, they 

must navigate multiple screens with little to no written prompts to scaffold this process. First students 

must open the Tables tab (1) in SageModeler and input the external data manually into a table (2) of their 

creation (Figure 25A). While it is possible to input external data into SageModeler in the form of a CSV 

file, there are no internal prompts built into the program to suggest that this is an option. To put these data 

in a graph, students then need to open the Graph tab (1) and drag the labels from their data table (2) over 

to their respective axes (3) in the graph (Figure 25B). Next if students wish to look at their model output 

data in a graphical manner that can be directly compared to this external data, they must first click the 

simulate button (1) and select record continuously (2); students then will move the slider bar (3) of the 

respective independent variable (in this case the mass of the vehicle) and SageModeler will automatically 

generate a table (4) of model output (Figure 25C). Making a graph of these data requires opening another 

graph using the Graph tab (1) and dragging the labels (2) from the model output table to their respective 

axes (3) in the graph (Figure 25D). Because there are no internal scaffolds built into SageModeler to 

guide this process (unless students chose to open the help menu, navigate the link to a separate webpage, 

and scroll through a few paragraphs of text to learn about these features), students realistically will never 

use the graphing features to help them use external data to validate model output without being given 

explicit instructions from a teacher. As such a more guided scaffolding process built into the program that 

walks students through this process would be necessary if students are to be engaging in this aspect of 

testing and debugging more independently. Overall, the results from these studies show that testing and 

178 

 
 
 
 
 
debugging of computational models, even in an environment specifically designed to promote testing and 

debugging, is not intuitive for most students. Curricular scaffolding, technological scaffolding, and 

synergistic instructional scaffolding coupled with a clear rationale for engaging in specific testing and 

debugging behaviors are all necessary to support students in testing and debugging.  

Figure 25: Validating Model Output using External Data 

Figure 25A: Inputting External Data into SageModeler 

When inputting external data into SageModeler, students need to first open the Tables tab (1) and then 

manually input their data into a spreadsheet (2) or import an existing spreadsheet as a CSV file. 

Figure 25B: Making a Graph of External Data in SageModeler 

To make a graph of external data, students need to open the Graph tab (1), drag the labels from the data 

spreadsheet (2) directly into the respective axes of the graph (3). 

179 

 
 
 
 
 
 
 
Figure 25 (cont’d) 

Figure 25C: Generating a Data Table from Model Output 

To generate a data table from model output, students must first click on the simulate feature (1) and then 

press record continuously (2) to allow semi-quantitative numerical data to be generated from model 

output. Students then should manipulate the slider bar for the targeted independent variable (3) while 

leaving all other variables constant to automatically generate a model output table (4). 

Figure 25D: Creating a Graph of Model Output 

For students to create a graph of model output, they must open the Graph tab (1) to create a blank graph. 

They will then drag the labels from the SageModeler output spreadsheet (2) directly to the respective axes 

on the graph (3), filling in the graph with the model output data. 

180 

 
 
 
 
 
 
 
Implications for Equity 

Although equity and inclusion were not the central foci of my research questions, this work does 

have implications for promoting more equitable teaching practices for science education. SageModeler, as 

an icon-based computational modeling program, likely has a lower barrier because students do not have to 

acquire the same level of programming knowledge as more complex, agent based or text-based 

programming environments. This likely makes SageModeler more accessible for all students, particularly 

those without prior programming experiences. Additionally, SageModeler has been translated into 13 

different languages, including Spanish, Chinese, and Portuguese, making it more accessible for native 

speakers of these 13 languages. These 13 languages make it possible for students with lower levels of 

English fluency to more easily figure out how to use the various features of SageModeler, further 

lowering the barriers to computational modeling for an underserved population. Because 13.5% of the US 

population lives in Spanish speaking households (41.8 million people) and another 8.2 million people live 

in households where the other 12 languages are spoken (Dietrich & Hernandez, 2022), it is important that 

computational modeling programs, like SageModeler be translated into multiple languages to support 

students growing up households where English is not the only language spoken.  

Just as the design features of SageModeler have the potential to support equity and inclusion in 

science education by lowering barriers to computational modeling (especially for Spanish speaking 

students), other aspects of this research can be used to better assist students from marginalized 

backgrounds. One of the major outcomes of this research has been to demonstrate the potential of using 

student discourse as a means of assessing student testing and debugging behaviors in real time. By 

shifting the focus away from assessing final models and towards documenting the process of testing and 

debugging these models, I was able to achieve a more holistic understanding of student competency with 

testing and debugging. This more holistic approach could be adapted by practitioners as a more equitable 

means of assessing student testing and debugging as it removes additional barriers and challenges that 

students from more marginalized backgrounds and students with disabilities face with traditional tests and 

assessments. Another key finding of this study was the importance of discourse and peer review in 

181 

 
 
 
 
 
supporting students with testing and debugging. One of the positive outcomes of including more 

opportunities for small group discussions and peer discourse is that encourage more student participation 

from students who might otherwise be excluded from whole class discourse or traditional lecture-based 

approaches to schooling (Chi & Wylie, 2014). However, it is important to note that small group 

environments often reproduce the racial and gender-based social hierarchies of the broader society 

(Patterson, 2019). Therefore, teachers must make an active effort in establishing equitable classroom 

norms if small group discussions are to remain inclusive of all students. While the professional 

development did include strategies for encouraging equitable small group environment, more explicit 

supports for establishing racial and gender equity through small groups would have been beneficial. 

Future Directions 

There are several opportunities and possibilities for future practitioners, curriculum developers, 

and educational researchers to build upon this work. As previously mentioned, I recognize that more can 

be done to make SageModeler a more intuitive learning environment for students to test and debug their 

models. Additional scaffolding is needed for students to input external data and compare model output to 

these data independently. Such additional scaffolding should be designed in a manner that it can guide 

students through the intricate steps involved in inputting external data and comparing model output to 

these data but also vanish as students become more confident in this practice. I also aim to adapt the 

findings of this work to share with practitioners on a broader scale through publications in practitioner 

journals and future projects involving professional learning communities. By sharing the pedagogical 

strategies pioneered by Mr. H and Mr. M for supporting students with testing and debugging with a 

broader audience of teachers, I hope to bolster the teaching and learning of testing and debugging on a 

larger scale. In addition to efforts to support practicing teachers, I also recognize that my findings can 

support future curricular developers and designers of computational modeling programs with creating 

learning environments that better support students with testing and debugging. By encouraging future 

curriculum developers to implement design strategies that were shown to be successful in these studies 

while avoiding design strategies that were less beneficial, future efforts to design curricula/learning 

182 

 
 
 
 
 
 
environments to support students in testing and debugging can avoid unnecessary pitfalls and reach even 

better learning outcomes than were found in these studies. 

Educational researchers also have opportunities to build upon this work to further advance our 

collective understanding of how to support students with testing and debugging. While these studies 

represent a strong proof of concept, as a series of case studies involving two teachers at one privileged 

magnet school, these results are not representative of the standard American high school environment. As 

such, future studies need to explore the efficacy of implementing the evaporative cooling unit and the 

associated pedagogical strategies developed by Mr. H and Mr. M on a broader scale. Given the need to 

better support students from racially and economically marginalized backgrounds in urban school 

districts, I believe that it is necessary to investigate how this unit/learning environment can be modified to 

better support these students. Conducting a parallel case study at a large diverse urban district would help 

the field identify specific adaptations that can be implemented to better support students from 

marginalized backgrounds with testing and debugging. Future researchers can also build upon this work 

by expanding the time scale and designing a year-long curriculum centered on computational modeling 

and testing and debugging. Given the time commitment needed for students to learn how to work with 

SageModeler, having only one unit where students are expected to master the mechanics of SageModeler, 

key testing and debugging behaviors, and new science content can be overwhelming. Having a year-long 

curriculum would allow for students to gradually develop an understanding of the core mechanics of 

SageModeler and the principles behind key testing and debugging behaviors. By adopting a more gradual 

approach, students would have more opportunities to engage with testing and debugging and therefore 

will likely have a stronger mastery of testing and debugging compared to students who only had the 

singular evaporative cooling unit.  

In parallel to efforts to better understand how we can support students with testing and 

debugging, future researchers should investigate how computational modeling environments, particularly 

those that encourage frequent testing and debugging, benefit student learning. Computational modeling 

units often require that students spend a substantial amount of class time learning how to use the program 

183 

 
 
 
 
 
to build models. Because these units represent a large time investment, it is important for the field of 

science education to figure out how beneficial computational modeling is for student learning, especially 

when compared to traditional paper-pencil modeling. On a smaller scale, researchers could have two 

teachers at the same school district teach two versions of the same unit, with one version incorporating 

computational modeling and the other utilizing paper-pencil modeling. Students in both classrooms will 

be given a pre-post test to assess their learning of disciplinary core ideas, scientific practices, and cross-

cutting concepts. If students who took the computational modeling unit scored significantly higher on the 

post-assessment compared to their peers in a paper-pencil modeling classroom, it would suggest that 

computational modeling has a tangible benefit for student learning outcomes. Such a study could 

subsequently be scaled up to include multiple school districts across multiple states to more definitively 

determine the efficacy of computational modeling. If a large-scale study provides strong evidence of the 

learning benefits of computational modeling, it will open more opportunities for policymakers and 

teachers to incorporate computational modeling into their science classrooms as they can more easily 

justify the large upfront time investment needed for students to become familiar with utilizing the 

computational modeling program. 

Another research pathway I am interested in pursuing based on these results is to further explore 

the interactions between testing and debugging and systems thinking. While both ST and CT are key 

aspects that support students with testing and debugging in “A Framework for Computational Systems 

Modeling,” much of the results from these studies focus more on aspects of testing and debugging that 

align more with computational thinking. For example, the testing and debugging behavior of analyzing 

external data to validate model output is heavily aligned towards the CT aspect of generating, organizing, 

and interpreting data and does not fully explore how students are considering broader system structures in 

their models. As such, I am interested in further investigating the complex relationship between student 

ST and student testing and debugging behaviors in the context of computational modeling. My future 

research might address how student understanding of key aspects of ST and system behavior helps 

students identify potential areas of their model that need improvement, thus facilitating testing and 

184 

 
 
 
 
 
debugging. Likewise, I would also explore how having students engage in frequent testing and debugging 

can enhance student understanding of the behavioral impact of key system structures, thus bolstering their 

ST. Building from these studies, I could then address how teachers and curriculum developers can design 

computational modeling learning environments that best support students with mastering ST and testing 

and debugging in a synergistic manner. Such a future study focusing on how to design better learning 

environments to support students with ST and testing and debugging reflect the key findings of these 

studies: testing and debugging (and computational modeling in general) do not come naturally to most 

students and require both well designed learning environments and instructional scaffolds for students to 

be successful with these practices. 

185 

 
 
 
 
 
 
 
ACKNOWLEDGEMENT OF PREVIOUSLY PUBLISHED WORK 

I adapted earlier drafts from work that I had previously published to create the first two papers of this 

thesis. Paper 1 was modified and reformatted from an eight-page conference paper submitted and 

accepted by the International Conference of the Learning Sciences (ICLS) in 2022. This original paper is 

publicly available on the International Society of the Learning Sciences (ISLS) online repository 

(https://repository.isls.org/handle/1/8516). Prior to submitting this thesis, I received permission to include 

this work in my thesis from my co-authors and from ISLS, who still retain the copyright for the original 

manuscript (Figure 26). 

Original article citation: Bowers, J., Shin, N., Brennan, L., Eidin, E. E., Stephens, L., & Roderick, S. 

(2022), Developing the systems thinking and computational thinking identification tool In Proceedings of 

the 16th International Conference of the Learning Sciences-ICLS 2022, pp. 147-154. International 

Society of the Learning Sciences. 

Figure 26: Screenshot of Permission Letter from ISLS 

186 

 
 
 
 
 
 
 
 
Paper 2 was adapted from a journal article previously published in the Journal of Science 

Education and Technology. This article is publicly available on the journal’s data base as an open access 

article (https://link.springer.com/article/10.1007/s10956-023-10049-w) with the following DOI link: 

https://doi.org/10.1007/s10956-023-10049-w . Prior to submitting this thesis, I received permission from 

the coauthors and the Journal of Science Education and Technology to include this work in my thesis 

(Figure 27). As the Journal of Science Education and Technology retains copyright on the published 

article, I ask all future scholars reading this thesis to cite the original publication. 

Original article citation: Bowers, J., Eidin, E., Stephens, L., & Brennan, L. (2023). Examining Student 

Testing and Debugging Within a Computational Systems Modeling Context. Journal of Science 

Education and Technology, 32(4), 607-628. 

187 

 
 
 
 
 
 
 
Figure 27: Screenshots of Correspondence with the Journal of Science Education and Technology 

188 

 
 
 
 
 
 
 
 
 
BIBLIOGRAPHY 

Abar, S., Theodoropoulos, G. K., Lemarinier, P., & O’Hare, G. M. (2017). Agent Based Modelling and 

Simulation tools: A review of the state-of-art software. Computer Science Review, 24, 13-33. 

Abid, A., Farooq, M. S., & Farooq, U. (2015). A strategy for the design of introductory computer 

programming course in high school. Journal of Elementary Education, 25(1), 145-165. 

Ahmadzadeh, M., Elliman, D., & Higgins, C. (2005). Novice programmers: An analysis of patterns of 

debugging among novice computer science students. Inroads, 37(3), 84–88.  

Aho, A. V. (2012). Computation and computational thinking. The Computer Journal, 55(7), 832–835. 

Arndt, H. (2006). Enhancing system thinking in education using system dynamics. Simulation, 82(11), 

795-806. 

Arnold, R. D., & Wade, J. P. (2015). A definition of systems thinking: A systems approach. Procedia 

Computer Science, 44, 669-678. 

Akcaoglu, M. (2014). Learning problem-solving through making games at the game design and learning 
summer program. Educational Technology Research and Development, 62(5), 583–600.  

Anderson, N. D. (2016). A call for computational thinking in undergraduate psychology. Psychology 

Learning & Teaching, 15(3), 226–234  

Arnold, R. D., & Wade, J. P. (2015). A definition of systems thinking: A systems approach. Procedia 

Computer Science, 44, 669–678. 

Arnold, R. D., & Wade, J. P. (2017). A complete set of systems thinking skills. Insight, 20(3), 9–17. 

Assaraf, O. B. Z., & Orion, N. (2005). Development of system thinking skills in the context of Earth 

system education. Journal of Research in Science Teaching: The Official Journal of the National 
Association for Research in Science Teaching, 42(5), 518–560. 

Australian Curriculum, Assessment and Reporting Authority (ACARA). (2017). Australian curriculum: 

F-10 curriculum: Science 

Bailer-Jones, D. (1999). Tracing the development of models in the philosophy of science. In L. Magnani, 

N. J. Nersessian, & P. Thagard (Eds.), Model-based reasoning in scientific discovery. 
Proceedings of an international conference on model-based reasoning in scientific discovery, held 
December 17–19, 1998, in Pavia, Italy (pp. 23–40). New York: Kluwer Academic. 

Baker, R. S., Corbett, A. T., Koedinger, K. R., & Wagner, A. Z. (2004, April). Off-task behavior in the 

cognitive tutor classroom: When students" game the system". In Proceedings of the SIGCHI 
conference on Human factors in computing systems (pp. 383-390). 

Bakos, S., & Thibault, M. (2018). Affordances and tensions in teaching both computational thinking and 
mathematics. Proceedings of the 42nd Conference of the International Group for the Psychology 
of Mathematics Education. Vol. 2., 107-144.    

Barab, S., & Squire, K. (2016). Design-based research: Putting a stake in the ground. In Design-based 

189 

 
 
 
 
 
 
Research (pp. 1-14). Psychology Press. 

Barlas, Y. (1996). Formal aspects of model validity and validation in system dynamics. System Dynamics 

Review: The Journal of the System Dynamics Society, 12(3), 183–210.  

Barlas, Y. (1996). Formal aspects of model validity and validation in system dynamics. System Dynamics 

Review: The Journal of the System Dynamics Society, 12(3), 183–210. 

Barr, D., Harrison, J., & Conery, L. (2011). Computational thinking: A digital age skill for everyone. 

Learning & Leading with Technology, 38(6), 20-23. 

Barr, V., & Stephenson, C. (2011). Bringing computational thinking to K-12: What is involved and what 

is the role of the computer science education community?. Acm Inroads, 2(1), 48-54. 

Basham, J. D., & Marino, M. T. (2013). Understanding STEM education and supporting students through 

universal design for learning. Teaching Exceptional Children, 45(4), 8–15. 

Basu, S., Biswas, G., & Kinnebrew, J. S. (2017). Learner modeling for adaptive scaffolding in a 

computational thinking-based science learning environment. User Modeling and User-Adapted 
Interaction, 27, 5-53. 

Basu, S., Biswas, G., Sengupta, P., Dickes, A., Kinnebrew, J. S., & Clark, D. (2016). Identifying middle 

school students’ challenges in computational thinking-based science learning. Research and 
practice in technology enhanced learning, 11(1), 1-35. 

Basu, S., Dukeman, A., Kinnebrew, J. S., Biswas, G., & Sengupta, P. (2014). Investigating student 

generated computational models of science. Boulder, CO: International Society of the Learning 
Sciences. 

Ben-Ari, M. (2001). Constructivism in computer science education. Journal of computers in Mathematics 

and Science Teaching, 20(1), 45-73. 

Benton, L., Hoyles, C., Kalas, I., & Noss, R. (2017). Bridging primary programming and mathematics: 

Some findings of design research in England. Digital Experiences in Mathematics Education, 3, 
115–138.  

Berge, Z. L. (1995). The role of the online instructor/facilitator. Educational technology, 35(1), 22-30. 

Berland, L., & Reiser, B. (2009). Making sense of argumentation and explanation. Science Education, 93, 

26–55.  

Berland, L. K., Schwarz, C. V., Krist, C., Kenyon, L., Lo, A. S., & Reiser, B. J. (2016). Epistemologies in 

practice: Making scientific practices meaningful for students. Journal of Research in Science 
Teaching, 53(7), 1082–1112. 

Berland, M., & Wilensky, U. (2015). Comparing virtual and physical robotics environments for 

supporting complex systems and computational thinking. Journal of Science Education and 
Technology, 24(5), 628–647. 

Bers, M. U. (2010). The tangible K robotics program: Applied computational thinking for young children. 

Early Childhood Research and Practice, 12(2), n2.  

190 

 
 
 
 
 
Bers, M. U., Flannery, L., Kazakoff, E. R., & Sullivan, A. (2014). Computational thinking and tinkering: 

Exploration of an early childhood robotics curriculum. Computers & Education, 72, 145-157. 

Bielik, T., Krell, M., Zangori, L., & Ben Zvi Assaraf, O. (2023) Investigating Complex Phenomena: 

Bridging between Systems Thinking and Modeling in Science Education. In Frontiers in 
Education (Vol. 8, p. 1308241). Frontiers. 

Bielik, T., Stephens, L., Damelin, D., & Krajcik, J. S. (2019). Designing Technology Environments to 
Support System Modeling Competence. In Towards a Competence-Based View on Models and 
Modeling in Science Education (pp. 275-290). Springer, Cham. 

Bierema, A. M. K., Schwarz, C. V., & Stoltzfus, J. R. (2017). Engaging undergraduate biology students in 

scientific modeling: Analysis of group interactions, sense-making, and justification. CBE—Life 
Sciences Education, 16(4), 68. 

Blatchford, P., Bassett, P., & Brown, P. (2011). Examining the effect of class size on classroom 

engagement and teacher–pupil interaction: Differences in relation to pupil prior attainment and 
primary vs. secondary schools. Learning and instruction, 21(6), 715-730. 

Boersma, K., Waarlo, A. J., & Klaassen, K. (2011). The feasibility of systems thinking in biology 

education. Journal of Biological Education, 45(4), 190-197. 

Booth-Sweeney, L. B., & Sterman, J. D. (2007). Thinking about systems: student and teacher conceptions 

of natural and social systems. System Dynamics Review: The Journal of the System Dynamics 
Society, 23(2‐3), 285-311. 

Bourgault, S., & E, J. (2023). Exploring the Horizon of Computation for Creativity. XRDS: Crossroads, 

The ACM Magazine for Students, 29(4), 6-9. 

Bowers, J., Damelin, D., Eidin, E., & McIntyre, C. (2022a). Keeping Cool With SageModeler. The 

Science Teacher, 89(4). 

Bowers, J., Eidin, E., Stephens, L., & Brennan, L. (2023). Examining Student Testing and Debugging 
Within a Computational Systems Modeling Context. Journal of Science Education and 
Technology, 1-22. 

Bowers, J., Shin, N., Brennan, L., Eidin, E., Stephens, L., & Roderick, S. (2022b). Developing the 

Systems Thinking and Computational Thinking Identification Tool. International Society of the 
Learning Sciences. 

Brackmann, C., Barone, D., Casali, A., Boucinha, R., & Muñoz- Hernandez, S. (2016). Computational 

thinking: Panorama of the Americas. In F. J. García-Peñalvo & A. J. Mendes (Eds.), 2016 
international symposium on computers in Education (SIIE) (pp. 1– 29). Piscataway: IEEE  

Bravo, C., van Joolingen, W. R., & deJong, T. (2006). Modeling and simulation in inquiry learning: 

Checking solutions and giving advice. Simulation, 82(11), 769–784. 

Brennan, K., & Resnick, M. (2012). New frameworks for studying and assessing the development of 

computational thinking. In Proceedings of the 2012 Annual Meeting of the American Educational 
Research Association, Vancouver, Canada (Vol. 1, p. 25). 

191 

 
 
 
 
 
Brühwiler, C., & Blatchford, P. (2011). Effects of class size and adaptive teaching competency on 

classroom processes and academic outcome. Learning and instruction, 21(1), 95-108. 

Cabrera, D., Colosi, L., & Lobdell, C. (2008). Systems thinking. Evaluation and Program Planning, 

31(3), 299–310. 

Campbell, T., & Oh, P. S. (2015). Engaging students in modeling as an epistemic practice of science: An 
introduction to the special issue of the “Journal of Science Education and Technology.” Journal 
of Science Education and Technology, 24(2), 125–131. 

Carver, M. S., & Risinger, S. C. (1987, December). Improving children's debugging skills. In Empirical 

studies of programmers: Second workshop (pp. 147-171). 

Chi, M. T., & Wylie, R. (2014). The ICAP framework: Linking cognitive engagement to active learning 

outcomes. Educational psychologist, 49(4), 219-243. 

Chmiel, R., & Loui, M. (2004). Debugging: From novice to expert. Inroads, 36(1), 17–21.  

Clement, J. (2000). Model based learning as a key research area for science education. International 

Journal of science education, 22(9), 1041-1053. 

Collins, A., Brown, J. S. & Newman, S. E. (1989). Cognitive apprenticeship: Teaching the craft of 

reading, writing, and mathematics. In L. B. Resnick (Ed.), Knowing, learning, and instruction: 
Essays in honor of Robert Glaser (pp. 453–494). Hillsdale, NJ: Lawrence Erlbaum 

Costanza, R., & Voinov, A. (2001). Modeling ecological and economic systems with STELLA: Part 

III. Ecological Modelling, 143(1-2), 1-7. 

Cronin, M. A., Gonzalez, C., & Sterman, J. D. (2009). Why don’t well-educated adults understand 

accumulation? A challenge to researchers, educators, and citizens. Organizational Behavior and 
Human Decision Processes, 108(1), 116–130. 

Cuseo, J. (2007). The empirical case against large class size: Adverse effects on the teaching, learning, 
and retention of first-year students. The Journal of Faculty Development, 21(1), 5-21. 

Csizmadia, A., Curzon, P., Dorling, M., Humphreys, S., Ng, T., Selby, C., & Woollard, J. (2015). 

Computational thinking: A guide for teachers. Computing at School. Swindon, UK. 

Dabholkar, S., Anton, G., & Wilensky, U. (2018). GenEvo-An emergent systems microworld for model-

based scientific inquiry in the context of genetics and evolution. International Society of the 
Learning Sciences, Inc.[ISLS]. 

Damelin, D., Krajcik, J. S., McIntyre, C., & Bielik, T. (2017). Students making systems models. Science 

Scope, 40(5), 78–83. 

DiBiase, W., & McDonald, J. R. (2015). Science teacher attitudes toward inquiry-based teaching and 

learning. The Clearing House: A Journal of Educational Strategies, Issues and Ideas, 88(2), 29-
38. 

Dickes, A., & Sengupta, P. (2012). Learning Natural Selection in 4th Grade with Multi Agent-Based 

Computational Models. Research in Science Education. 

192 

 
 
 
 
 
Dietrich, S., & Hernandez, E. (2022, August). Language Use in The United States: 2019. American 

Community Survey Reports: US Census Bureau. 

Duran, L. B., & Duran, E. (2004). The 5E instructional model: A learning cycle approach for inquiry-

based science teaching. Science Education Review, 3(2), 49–58. 

Eidin, E., Bielik, T., Touitou, I., Bowers, J., McIntyre, C., Damelin, D., & Krajcik, J. (2023). Thinking in 
Terms of Change over Time: Opportunities and Challenges of Using System Dynamics Models. 
Journal of Science Education and Technology, 1-28. 

Elliott, C. H., Chakarov, A. G., Bush, J. B., Nixon, J., & Recker, M. (2023). Toward a debugging 

pedagogy: helping students learn to get unstuck with physical computing systems. Information 
and Learning Sciences, 124(1/2), 1-24. 

Emara, M., Grover, S., Hutchins, N., Biswas, G., & Snyder, C. (2020). Examining students’ debugging 

and regulation processes during collaborative computational modeling in science. In International 
Conference of The Learning Sciences 2020 Proceedings (ICLS 2020). 

Fan, C., Liu, X., Ling, R., & Si, B. (2018). Application of proteus in experimental teaching and research 

of medical electronic circuit. In 2018 3rd International Conference on Modern Management, 
Education Technology, and Social Science (MMETSS 2018) (pp. 512–515). Atlantis Press. 

Farris, A. V., Dickes, A. C., & Sengupta, P. (2019). Learning to interpret measurement and motion in 

fourth grade computational modeling. Science & Education, 28(8), 927–956. 

Fernandes, S., Mesquita, D., Flores, M. A., & Lima, R. M. (2014). Engaging students in learning: 

Findings from a study of project-led education. European Journal of Engineering Education, 
39(1), 55–67. 

Fisher, D. M. (2018). Reflections on teaching system dynamics modeling to secondary school students for 

over 20 years. Systems, 6(2), 12. 

Fix, V., Wiedenbeck, S., & Scholtz, J. (1993). Mental representations of programs by novices and 

experts. In P. Bauersfeld, J. Bennett & G. Lynch (Eds.), Proceedings of the SIGCHI conference 
on human factors in computing systems (pp. 74–79). New York: ACM Press. 

Ford, A., & Teorey, T. (2002). Practical debugging in Cþþ. Upper Saddle River, NJ: Prentice-Hall. 

Forrester, J. W. (1971). Counterintuitive behavior of social systems. Theory and Decision, 2(2), 109-140. 

Forrester, J.W. 1994. System dynamics, systems thinking, and soft OR. System Dynamics Review 

10(2/3)245-256. 

Forrester, J. W. (2007). System dynamics—the next fifty years. System Dynamics Review: The Journal 

of the System Dynamics Society, 23(2‐3), 359-370. 

Fosnot, C. T. (Ed.). (1996). Constructivism: Theory, perspectives, and practice. Teachers College Press. 

Fretz, E. B., Wu, H. K., Zhang, B., Davis, E. A., Krajcik, J. S., & Soloway, E. (2002). An investigation of 

software scaffolds supporting modeling practices. Research in Science Education, 32(4), 567-
589. 

193 

 
 
 
 
 
Geier, R., Blumenfeld, P. C., Marx, R. W., Krajcik, J. S., Fishman, B., Soloway, E., & Clay‐Chambers, J. 
(2008). Standardized test outcomes for students engaged in inquiry‐based science curricula in the 
context of urban reform. Journal of Research in Science Teaching: The Official Journal of the 
National Association for Research in Science Teaching, 45(8), 922–939.  

Gilbert, J. K., & Justi, R. (2016). Modelling-Based Teaching in Science Education (Vol. 9). Cham, 

Switzerland: Springer. 

Gilmore, D. J. (1991) Models of debugging. Acta psychologica, 78 (1-3), 151–172.  

Ginovart, M. (2014). Discovering the power of individual-based modelling in teaching and learning: The 

study of a predator–prey system. Journal of Science Education and Technology, 23, 496-513. 

Gkiolmas, A., Karamanos, K., Chalkidis, A., Skordoulis, C., Papaconstantinou, M., & Stavrou, D. (2013). 
Using simulations of netlogo as a tool for introducing greek high-school students to eco-systemic 
thinking. Advances in Systems Science and Applications, 13(3), 276-298. 

Gleasman, C., & Kim, C. (2020). Pre-service teacher’s use of block-based programming and 

computational thinking to teach elementary mathematics. Digital Experiences in Mathematics 
Education, 6, 52–90. 

Goldstone, R. L., & Janssen, M. A. (2005). Computational models of collective behavior. Trends in 

cognitive sciences, 9(9), 424-430. 

Gouvea, J., & Passmore, C. (2017). Models of’ versus ‘models for. Science & Education, 26(1–2), 49–63. 

Grandell, L., Peltomaki, M., Back, R. J., & Salakoski, T. (2006). Why complicate things?: introducing 

programming in high school using Python. In ACM International Conference Proceeding Series 
(Vol. 165, pp. 71-80). 

Grapin, S. E., Llosa, L., Haas, A., & Lee, O. (2022). Affordances of computational models for English 

learners in science instruction: Conceptual foundation and initial inquiry. Journal of Science 
Education and Technology, 31(1), 52–67. 

Grifenhagen, J. F., & Barnes, E. M. (2022). Reimagining discourse in the classroom. The Reading 

Teacher, 75(6), 739–748. 

Grawemeyer, B., Mavrikis, M., Holmes, W., Gutiérrez-Santos, S., Wiedmann, M., & Rummel, N. (2017). 

Affective learning: Improving engagement and enhancing learning with affect-aware feedback. 
User Modeling and User-Adapted Interaction, 27, 119-158. 

Griffin, J. M. (2016). Learning by taking apart: deconstructing code by reading, tracing, and debugging. 
In Proceedings of the 17th Annual Conference on Information Technology Education (pp. 148-
153). 

Grosslight, L., Unger, C., Jay, E., & Smith, C. L. (1991). Understanding models and their use in science: 

Conceptions of middle and high school students and experts. Journal of Research in Science 
teaching, 28(9), 799-822. 

Grover, S., & Pea, R. (2018). Computational thinking: A competency whose time has come. In Computer 

Science Education: Perspectives on Teaching and Learning in School, 19 -34. New York, NY: 

194 

 
 
 
 
 
Bloomsbury. 

Grover, S., Pea, R., & Cooper, S. (2015). Designing for deeper learning in a blended computer science 

course for middle school students. Computer Science Education, 25(2), 199–237. 

Hadad, R., Thomas, K., Kachovska, M., & Yin, Y. (2020). Practicing formative assessment for 

computational thinking in making environments. Journal of Science Education and Technology, 
29(1), 162–173.  

Hamidi, A., Mirijamdotter, A., & Milrad, M. (2023). A Complementary View to Computational Thinking 

and Its Interplay with Systems Thinking. Education Sciences, 13(2), 201. 

Hansen, A. K., Hansen, E. R., Dwyer, H. A., Harlow, D. B., & Franklin, D. (2016). Differentiating for 

diversity: Using universal design for learning in elementary computer science education. In 
Proceedings of the 47th ACM technical symposium on computing science education (pp. 376–
381). 

Harrison, A. G., & Treagust, D. F. (2000). A typology of school science models. International Journal of 

Science Education, 22(9), 1011–1026. 

Heintz, F., Mannila, L., & Farnqvist, T. (2014). A review of models for introducing computational 

thinking, computer science and computing in K–12 education. In 2016 IEEE Frontiers in 
Education conference (FIE) (pp. 1–9). Piscataway, NJ: IEEE. 

Hmelo-Silver, C. E., & Azevedo, R. (2006). Understanding complex systems: Some core challenges. The 

Journal of the Learning Sciences, 15(1), 53–61. 

Hmelo-Silver, C. E., Duncan, R. G., & Chinn, C. A. (2007). Scaffolding and achievement in problem-
based and inquiry learning: A response to Kirschner, Sweller, and Clark (2006). Educational 
Psychologist, 42(2), 99–107.  

Hmelo-Silver, C. E., Jordan, R., Eberbach, C., & Sinha, S. (2017). Systems learning with a conceptual 

representation: A quasi-experimental study. Instructional Science, 45(1), 53–72. 
doi.org/10.1007/s11251-016-9392-y  

Hofman-Bergholm, M. (2018). Changes in thoughts and actions as requirements for a sustainable future: 

A review of recent research on the Finnish educational system and sustainable development. 
Journal of Teacher Education for Sustainability, 20(2), 19–30. 

Hogan, K., & Thomas, D. (2001). Cognitive comparisons of students’ systems modeling in ecology. 

Journal of Science Education and Technology, 10(4), 319–345. 

Hopper, M., & Stave, K. A. (2008, July). Assessing the effectiveness of systems thinking interventions in 
the classroom. In 26th international conference of the system dynamics society (pp. 1-26). 

Hsu, Y. C., Irie, N. R., & Ching, Y. H. (2019). Computational thinking educational policy initiatives 

(CTEPI) across the globe. TechTrends, 63, 260-270. 

Hsu, Y. S., Lai, T. L., & Hsu, W. H. (2015). A design model of distributed scaffolding for inquiry-based 

learning. Research in Science Education, 45, 241-273. 

195 

 
 
 
 
 
Hutchins, N. M., Biswas, G., Maróti, M., Lédeczi, Á., Grover, S., Wolf, R., ... & McElhaney, K. (2020). 

C2STEM: A system for synergistic learning of physics and computational thinking. Journal of 
Science Education and Technology, 29, 83-100. 

Irgens, G. A., Dabholkar, S., Bain, C., Woods, P., Hall, K., Swanson, H. ... Wilensky, U. (2020). 

Modeling and measuring high school students’ computational thinking practices in science. 
Journal of Science Education and Technology, 29(1), 137–161  

Jiménez‐Aleixandre, M. P., Bugallo Rodríguez, A., & Duschl, R. A. (2000). “Doing the lesson” or “doing 

science”: Argument in high school genetics. Science Education, 84(6), 757-792. 

Jonassen, D. H., & Hung, W. (2006). Learning to troubleshoot: A new theory-based design architecture. 

Educational Psychology Review, 18(1), 77–114. 

Justi, R. (2009). Learning how to model in science classroom: Key teacher's role in supporting the 

development of students’ modelling skills. Educación química, 20(1), 32-40. 

Kafai, Y. B. (2005). The classroom as “living laboratory”: Design-based research for understanding, 
comparing, and evaluating learning science through design. Educational Technology, 28–34. 

Karacalli, S., & Korur, F. (2014). The effects of project‐based learning on students’ academic 

achievement, attitude, and retention of knowledge: The subject of “electricity in our lives.” 
School Science and Mathematics, 114(5), 224–235.  

Katz, I. R., & Anderson, J. R. (1987). Debugging: An analysis of bug-location strategies. Human-

Computer Interaction, 3(4), 351–399. 

Katz, I. R., & Anderson, J. R. (1989). Debugging: An analysis of bug-location strategies. ACM SIGCHI 

Bulletin, 21(1), 123. 

Kazakoff, E., & Bers, M. (2012). Programming in a robotics context in the kindergarten classroom: The 

impact on sequencing skills. Journal of Educational Multimedia and Hypermedia, 21(4), 371–
391. 

Ke, L., Sadler, T. D., Zangori, L., & Friedrichsen, P. J. (2020). Students’ perceptions of socio-scientific 

issue-based learning and their appropriation of epistemic tools for systems thinking. International 
Journal of Science Education, 42(8), 1339-1361. 

Kelly, G. J. (2013). Discourse in science classrooms. Handbook of Research on Science Education, 457–

484. 

Kessler, C., & Anderson, J. (1986). A model of novice debugging in LISP. In E. Soloway & S. Iyengar 

(Eds.), Empirical studies of programmers (pp. 198–212). Norwood, NJ: Ablex.  

Keynan, A., Assaraf, O. B. Z., & Goldman, D. (2014). The repertory grid as a tool for evaluating the 

development of students’ ecological system thinking abilities. Studies in Educational Evaluation, 
41, 90-105. 

King, A. (1998). Transactive peer tutoring: Distributing cognition and metacognition. Educational 

Psychology Review, 10(1), 57–74. 

196 

 
 
 
 
 
Kim, C., Yuan, J., Vasconcelos, L., Shin, M., & Hill, R. B. (2018). Debugging during block-based 

programming. Instructional Science, 46, 767-787. 

KMK [Sekretariat der Ständigen Konferenz der Kultusminister der Länder in der BRD]. (2005a). 

Bildungsstandards im Fach Biologie für den Mittleren Schulabschluss [Educational standards in 
biology for middle school graduation]. München/Neuwied, Germany: Wolters Kluwer. 
https://www.kmk.org/fileadmin/Dateien/veroeffentlichungen_beschluesse/2004/2004_12_16-
Bildungsstandards-Biologie.pdf 

KMK [Sekretariat der Ständigen Konferenz der Kultusminister der Länder in der BRD]. (2005b). 

Bildungsstandards im Fach Chemie für den Mittleren Schulabschluss [Educational standards in 
chemistry for middle school graduation]. Wolters Kluwer. 
https://www.kmk.org/fileadmin/Dateien/veroeffentlichungen_beschluesse/2004/2004_12_16-
Bildungsstandards-Chemie.pdf 

KMK [Sekretariat der Ständigen Konferenz der Kultusminister der Länder in der BRD]. (2005c). 

Bildungsstandards im Fach Physik für den Mittleren Schulabschluss [Educational standards in 
physics for middle school graduation]. Wolters Kluwer. 
https://www.kmk.org/fileadmin/Dateien/veroeffentlichungen_beschluesse/2004/2004_12_16-
Bildungsstandards-Physik-Mittleren-SA.pdf 

KMK (Standing Conference of the Ministers of Education and Cultural Affairs of the Federal States in the 

Federal Republic of Germany) (2020). Bildungsstandards im Fach Biologie für die Allge- meine 
Hochschulreife. Hürth: Wolters Kluwer. 

Krahenbuhl, K. S. (2016). Student-centered education and constructivism: Challenges, concerns, and 

clarity for teachers. The Clearing House: A Journal of Educational Strategies, Issues and 
Ideas, 89(3), 97–105. 

Krajcik, J., & Blumenfeld, P. (2006). Project-based learning. In R. K. Sawyer (Ed.), The Cambridge 

handbook of the learning sciences (pp. 317–333). Cambridge University Press. 

Krajcik, J., Blumenfeld, P., Marx, R., & Soloway, E. (2000). Instructional, curricular, and technological 

supports for inquiry in science classrooms. In J. Minstrell & E. H. v. Zee (Eds.), Inquiring into in- 
quiry learning and teaching science (pp. 283–315). Washington, DC: American Association for 
the Advancement of Science.  

Krajcik, J., & Shin, N. (2022). “Project-Based Learning.” In R. K. Sawyer (Ed.), Cambridge Handbook of 

the Learning Sciences 3rd Edition, New York: Cambridge University Press.  

Krell, M., Reinisch, B., & Krüger, D. (2015). Analyzing students’ understanding of models and modeling 
referring to the disciplines biology, chemistry, and physics. Research in Science Education, 
45(3), 367–393. 

Krell, M., & Krüger, D. (2016). Testing models: a key aspect to promote teaching activities related to 

models and modelling in biology. Journal of Biological Education, 50(2). 

Kuhn, D., Black, J., Keselman, A., & Kaplan, D. (2000). The development of cognitive skills to support 

inquiry. Cognition and Instruction, 18, 495–523. 

Kyza, E. A., Constantinou, C. P., & Spanoudis, G. (2011). Sixth graders’ co-construction of explanations 

197 

 
 
 
 
 
of a disturbance in an ecosystem: Exploring relationships between grouping, reflective 
scaffolding, and evidence-based explanations. International Journal of Science Education, 
33(18), 2489–2525. 

Lederman, N. G. (2013). Nature of science: Past, present, and future. In Handbook of research on science 

education (pp. 845-894). Routledge. 

Ledley, T. S., Rooney-Varga, J., & Niepold, F. (2017). Addressing climate change through education. In 

Oxford Research Encyclopedia of Environmental Science. 

Lee, I., Grover, S., Martin, F., Pillai, S., & Malyn-Smith, J. (2020). Computational thinking from a 
disciplinary perspective: Integrating computational thinking in K-12 science, technology, 
engineering, and mathematics education. Journal of Science Education and Technology, 29(1), 1–
8.  

Lee, I., & Malyn-Smith, J. (2020). Computational thinking integration patterns along the framework 

defining computational thinking from a disciplinary perspective. Journal of Science Education 
and Technology, 29(1), 9–18. 

Lee, I., Martin, F., Denner, J., Coulter, B., Allan, W., Erickson, J., ... & Werner, L. (2011). Computational 

thinking for youth in practice. ACM Inroads, 2(1), 32–37. 

Lee, S., Kang, E., & Kim, H. B. (2015). Exploring the impact of students’ learning approach on 

collaborative group modeling of blood circulation. Journal of Science Education and Technology, 
24(2), 234–255. 

Lemke, J. (1990). Talking science: Language, learning, and values. Ablex. 

Li, C., Chan, E., Denny, P., Luxton-Reilly, A., & Tempero, E. (2019). Towards a framework for teaching 

debugging. In Proceedings of the Twenty-First Australasian Computing Education Conference 
(pp. 79–86). 

Li, D. D., & Lim, C. P. (2008). Scaffolding online historical inquiry tasks: a case study of two secondary 

school classrooms. Computers & Education, 50, 1394–1410. 

Li K. & Schwarz, C. (2020). Using Epistemic Considerations in Teaching: Fostering Students’ 

Meaningful Engagement in Scientific Modeling. 10.1007/978-3-030-30255-9_11. 

Lin, T. C., Hsu, Y. S., Lin, S. S., Changlai, M. L., Yang, K. Y., & Lai, T. L. (2012). A review of 

empirical evidence on scaffolding for science education. International Journal of Science and 
Mathematics Education, 10, 437-455. 

Lin, Y. T., Yeh, M. K. C., & Hsieh, H. L. (2021). Teaching computer programming to science majors by 

modelling. Computer Applications in Engineering Education, 29(1), 130-144. 

Louca, L. T., & Zacharia, Z. C. (2012). Modeling-based learning in science education: cognitive, 

metacognitive, social, material and epistemological contributions. Educational Review, 64(4), 
471–492. 

Luxton-Reilly, A. (2009). A systematic review of tools that support peer assessment. Computer Science 

Education, 19(4), 209-232. 

198 

 
 
 
 
 
Lye, S. Y., & Koh, J. H. L. (2014). Review on teaching and learning of computational thinking through 

programming: What is next for K-12? Computers in Human Behavior, 41, 51–61.  

Magana, A. J., & Silva Coutinho, G. (2017). Modeling and simulation practices for a computational 
thinking‐enabled engineering workforce. Computer Applications in Engineering Education, 
25(1), 62–78.  

Mandinach, E. B. (1988). The Cognitive Effects of Simulation-Modeling Software and Systems Thinking 

on Learning and Achievement. 

Martinez-Moyano, I. J., & Richardson, G. P. (2013). Best practices in system dynamics modeling. System 

Dynamics Review, 29(2), 102–123. 

McCauley, R., Fitzgerald, S., Lewandowski, G., Murphy, L., Simon, B., Thomas, L., & Zander, C. 

(2008). Debugging: A review of the literature from an educational perspective. Computer Science 
Education, 18(2), 67–92. 

McNeill, K. L., & Krajcik, J. (2008). Scientific explanations: Characterizing and evaluating the effects of 

teachers' instructional practices on student learning. Journal of Research in Science Teaching: 
The Official Journal of the National Association for Research in Science Teaching, 45(1), 53-78. 

McNeill, K. L., & Krajcik, J. (2009). Synergy between teacher practices and curricular scaffolds to 

support students in using domain-specific and domain-general knowledge in writing arguments to 
explain phenomena. The journal of the learning sciences, 18(3), 416-460. 

Meadows, D. (2008). Thinking in systems: A primer. Chelsea Green Publishing. White River Junction, 

Vermont. 

Metcalf, S. J., Krajcik, J., & Soloway, E. (2000). Model-It: A design retrospective. Innovations in science 

and mathematics education, 77-115. 

Mehan, H. (1979). Learning lessons: Social organization in the classroom. Harvard University Press. 

Michaeli, T., & Romeike, R. (2019, October). Improving debugging skills in the classroom: The effects 

of teaching a systematic debugging process. In Proceedings of the 14th workshop in primary and 
secondary computing education (pp. 1-7). 

Mittelstraß, J. (2005). Anmerkungen zum Modellbegriff. In Modelle des Denkens: Streitgespräch in der 

Wissenschaftlichen Sitzung der Versammlung der Berlin-Brandenburgischen Akademie der 
Wissenschaften; Berlin-Brandenburgische Akademie der Wissenschaften. 

Monroe, M. C., Plate, R. R., & Colley, L. (2015). Assessing an introduction to systems thinking. Natural 

Sciences Education, 44(1), 11-17. 

Murphy, L., Lewandowski, G., McCauley, R., Simon, B., Thomas, L., & Zander, C. (2008). Debugging: 

the good, the bad, and the quirky--a qualitative analysis of novices' strategies. ACM SIGCSE 
Bulletin, 40(1), 163-167. 

Nardelli, E. (2019). Do we really need computational thinking? Communications of the ACM, 62(2), 32–

35  

199 

 
 
 
 
 
National Research Council (NRC). (2007). Taking science to school: Learning and teaching science in 

grades K-8. National Academies Press. 

National Research Council (NRC). (2012). A framework for K-12 science education: Practices, 

crosscutting concepts, and core ideas. National Academies Press. NGSS Lead States. 

Nercessian, N. (2008). Model-based reasoning in scientific practice. In R.A. Duschl and R.E. Grandy 

(Eds.), Teaching Scientific Inquiry: Recommendations for Research and Implementation (pp. 57-
79). Rotterdam, the Netherlands: Sense. 

Next Generation Science Standards (NGSS) Lead States: For states, by states. (2013). Washington, DC: 

The National Academy Press. 

Nguyen, H., & Santagata, R. (2020). Impact of computer modeling on learning and teaching systems 

thinking. Journal of Research in Science Teaching. 

Ogegbo, A. A., & Ramnarain, U. (2021). A systematic review of computational thinking in science 

classrooms. Studies in Science Education, 1–28. 

Oh, P. S., & Oh, S. J. (2011). What teachers of science need to know about models: An overview. 

International Journal of Science Education, 33(8), 1109-1130. 

Papaevripidou, M., Constantinou, C. P., & Zacharia, Z. C. (2007). Modeling complex marine ecosystems: 

An investigation of two teaching approaches with fifth graders. Journal of Computer Assisted 
Learning, 23(2), 145-157. 

Papert S. (1980) Mindstorms: Children, computers, and powerful ideas. Basic Books. 

Papert, S., & Harel, I. (1991). Situating constructionism. Constructionism, 36(2), 1–11. 

Pass, S. (2004). Parallel paths to constructivism: Jean Piaget and Lev Vygotsky. IAP. 

Passmore, C., Gouvea, J. S., & Giere, R. (2014). Models in science and in learning science: Focusing 

scientific practice on sense-making. In International handbook of research in history, philosophy 
and science teaching (pp. 1171-1202). Springer. 

Passmore, C., Stewart, J., & Cartier, J. (2009). Model-Based Inquiry and School Science: Creating 

Connections. School Science and Mathematics, 109(7), 394-402.  

Patterson, A. D. (2019). Equity in groupwork: The social process of creating justice in a science 

classroom. Cultural Studies of Science Education, 14, 361-381. 

Pierson, A. E., & Brady, C. E. (2020). Expanding opportunities for systems thinking, conceptual learning, 

and participation through embodied and computational modeling. Systems, 8(4), 48. 

Pierson, A. E., Clark, D. B., & Sherard, M. K. (2017). Learning progressions in context: Tensions and 
insights from a semester‐long middle school modeling curriculum. Science Education, 101(6), 
1061-1088. 

Pierson, A. E., & Clark, D. B. (2018). Engaging students in computational modeling: The role of an 

external audience in shaping conceptual learning, model quality, and classroom discourse. 
Science Education, 102(6), 1336–1362. 

200 

 
 
 
 
 
Price, C. B., & Price-Mohr, R. M. (2018). An evaluation of primary school children coding using a text-

based language (Java). Computers in the Schools, 35(4), 284-301. 

Psycharis, S., & Kallia, M. (2017). The effects of computer programming on high school students’ 

reasoning skills and mathematical self-efficacy and problem solving. Instructional Science, 45(5), 
583–602.  

Puntambekar, S., & Hubscher, R. (2005). Tools for scaffolding students in a complex learning 

environment: What have we gained and what have we missed?. Educational psychologist, 40(1), 
1-12. 

Puntambekar, S., & Kolodner, J. L. (2003). Distributed scaffolding: helping students learn science from 

design. Cogn Inst. 

Rice, J. K. (1999). The impact of class size on instructional strategies and the use of time in high school 
mathematics and science courses. Educational evaluation and policy analysis, 21(2), 215-229. 

Reiser, B. J., Berland, L. K., & Kenyon, L. (2012). Engaging students in the scientific practices of 

explanation and argumentation. The Science Teacher, 79(4), 34–39. 

Resnick, M., Maloney, J., Monroy-Hernández, A., Rusk, N., Eastmond, E., Brennan, K., Millner, A., 
Rosenbaum, E., Silver, J., Silverman, B. and Kafai, Y. (2009).Scratch: programming for all. 
Communications of the ACM, 52(11), pp.60-67.  

Reynolds, J., & Moskovitz, C. (2008). Calibrated Peer Review Assignments in Science Courses. Journal 

of College Science Teaching, 38(2). 

Richardson, G. P. (1996). Problems for the future of system dynamics. System Dynamics Review: The 

Journal of the System Dynamics Society, 12(2), 141–157. 

Richmond, B. (1994). Systems thinking/system dynamics: Let's just get on with it. System Dynamics 

Review, 10(2‐3), 135-157. 

Riess, W., & Mischo, C. (2010). Promoting systems thinking through biology lessons. International 

Journal of Science Education, 32(6), 705–725. 

Rockoff, J. E. (2004). The impact of individual teachers on student achievement: Evidence from panel 

data. American economic review, 94(2), 247-252. 

Rogoff, B. (1990). Apprenticeship in thinking: Cognitive development in social context. Oxford, UK: 

Oxford University Press.  

Scanlon, E., Schreffler, J., James, W., Vasquez, E., & Chini, J. J. (2018). Postsecondary physics curricula 
and Universal Design for Learning: Planning for diverse learners. Physical Review Physics 
Education Research, 14(2), 020101. 

Schneider, B., Krajcik, J., Lavonen, J., Salmela-Aro, K., Broda, M., Spicer, J., Bruner, J., Moeller, J., 

Linnansaari, J., Juuti, K., & Viljaranta, J. (2016). Investigating optimal learning moments in U.S. 
and Finnish science classes. Journal of Research in Science Teaching, 53(3), 400–421. 

Schneider, B., Krajcik, J., Lavonen, J., Salmela-Aro, K., Klager, C., Bradford, L., Chen, I.-C., Baker, Q., 

201 

 
 
 
 
 
Touitou, I., & Peek-Brown, D. (2022). Improving science achievement—Is it possible? 
Evaluating the efficacy of a high school chemistry and physics project-based learning 
intervention. Educational Researcher, 51(2), 109–121.  

Schreiber, L. M., & Valle, B. E. (2013). Social constructivist teaching strategies in the small group 

classroom. Small Group Research, 44(4), 395-411. 

Schwarz, C. V., Meyer, J., & Sharma, A. (2007). Technology, pedagogy, and epistemology: 

Opportunities and challenges of using computer modeling and simulation tools in elementary 
science methods. Journal of Science Teacher Education, 18(2), 243-269. 

Schwarz, C. V., Passmore, C., & Reiser, B. J. (2017). Helping students make sense of the world using 

next generation science and engineering practices. NSTA Press.  

Schwarz, C. V., Reiser, B. J., Davis, E. A., Kenyon, L., Acher, A., Fortus, D., Shwartz, Y., Hug, B., & 

Krajcik, J. (2009). Developing a learning progression for scientific modeling: Making scientific 
modeling accessible and meaningful for learners. Journal of Research in Science Teaching, 46(6), 
632–654. 

Schwarz, C. V., & White, B. Y. (2005). Metamodeling knowledge: Developing students' understanding of 

scientific modeling. Cognition and Instruction, 23(2), 165–205. 

Selby, C. C., & Woollard, J. (2013). 5–8 March). Computational thinking: the developing definition. Spe- 
cial Interest Group on Computer Science Education, Atlanta GA. Retrieved December 17, 2021, 
from https://core.ac.uk/download/pdf/17189251.pdf. 

Sengupta, P., & Farris, A. V. (2012, June). Learning kinematics in elementary grades using agent-based 

computational modeling: a visual programming-based approach. In Proceedings of the 11th 
international conference on interaction design and children (pp. 78-87). 

Sengupta, P., Farris, A. V., & Wright, M. (2012). From agents to continuous change via aesthetics: 

learning mechanics with visual agent-based computational modeling. Technology, Knowledge 
and Learning, 17, 23-42. 

Sengupta, P., Kinnebrew, J. S., Basu, S., Biswas, G., & Clark, D. (2013). Integrating computational 

thinking with K-12 science education using agent-based computation: A theoretical framework. 
Education and Information Technologies, 18(2), 351–380. 

Shen, J., Lei, J., Chang, H. Y., & Namdar, B. (2014). Technology-enhanced, modeling-based instruction 

(TMBI) in science education. In Handbook of research on educational communications and 
technology (pp. 529–540). Springer. 

Shin, N., Bowers, J., Krajcik, J., & Damelin, D. (2021). Promoting computational thinking through 

project-based learning. Disciplinary and Interdisciplinary Science Education Research, 3(1), 1-
15. 

Shin, N., Bowers, J.,  Roderick, S., Mclntyre, C., Stephens, L., Eidin, E., Krajcik, J., & Damelin, D. 

(2022). A framework for supporting systems thinking and computational thinking through 
constructing modeling. Instructional Science. 

Shute, V. J., Sun, C., Asbell-Clarke, J. (2017). Demystifying computational thinking. Educational 

202 

 
 
 
 
 
Research Review, 22(1), 142–158. 

Sins, P. H., Savelsbergh, E. R., & van Joolingen, W. R. (2005). The difficult process of scientific 

modelling: An analysis of novices’ reasoning during computer‐based modelling. International 
Journal of Science Education, 27(14), 1695–1721. 

So, H. J., Jong, M. S. Y., & Liu, C. C. (2020). Computational thinking education in the Asian Pacific 

region. 

Song, J., Kang, S. J., Kwak, Y., Kim, D., Kim, S., Na, J., ... & Joung, Y. J. (2019). Contents and features 
of ‘Korean Science Education Standards (KSES)’ for the next generation. Journal of the Korean 
Association for Science Education, 39(3), 465–478. 

Smith, F. P., Holzworth, D. P., & Robertson, M. J. (2005). Linking icon-based models to code-based 
models: a case study with the agricultural production systems simulator. Agricultural 
Systems, 83(2), 135-151. 

Snow, M., Stieff, M., & Spurgeon, S. (2022). Creating Synergistic Scaffolding Between the Tools of 

Discourse and Technology. In Proceedings of the 16th International Conference of the Learning 
Sciences-ICLS 2022, pp. 1297-1300. International Society of the Learning Sciences. 

Snyder, C., Hutchins, N. M., Biswas, G., Narasimham, G., Emara, M., & Yett, B. (2022). Instructor 

facilitation of STEM+ CT discourse: engaging, prompting, and guiding students’ computational 
modeling in physics. In Proceedings of the 16th International Conference of the Learning 
Sciences-ICLS 2022, pp. 631-638. International Society of the Learning Sciences. 

Soloway, E., & Spohrer, J. C. (2013). Studying the novice programmer. Psychology Press. 

Stave, K. A. (2002). Using system dynamics to improve public participation in environmental decisions. 
System Dynamics Review: The Journal of the System Dynamics Society, 18(2), 139–167. 

Stave, K., & Hopper, M. (2007). What constitutes systems thinking? A proposed taxonomy. In 25th 

International Conference of the System Dynamics Society. 

Stratford, S. J., Krajcik, J., & Soloway, E. (1998). Secondary students’ dynamic modeling processes: 

Analyzing, reasoning about, synthesizing, and testing models of stream ecosystems. Journal of 
Science Education and Technology, 7, 215–234. 

Sterman, J. D. (1994). Learning in and about complex systems. System Dynamics Review, 10(2‐3), 291–

330. 

Sterman, J. D. (2002). All models are wrong: reflections on becoming a systems scientist. System 
Dynamics Review: The Journal of the System Dynamics Society, 18(4), 501-531. 

Sterman, J. D., & Sweeney, L. B. (2002). Cloudy skies: Assessing public understanding of global 

warming. System Dynamics Review: The Journal of the System Dynamics Society, 18(2), 207–
240. 

Sullivan, F. R., & Heffernan, J. (2016). Robotic construction kits as computational manipulatives for 

learning in the STEM disciplines. Journal of Research on Technology in Education, 48(2), 105–
128. 

203 

 
 
 
 
 
Svoboda, J., & Passmore, C. (2013). The strategies of modeling in biology education. Science & 

Education, 22(1), 119-142. 

Swanson, H., Sherin, B., & Wilensky, U. (2021). Refining student thinking through computational 

modeling. In Proceedings of the 15th International Conference of the Learning Sciences-ICLS 
2021. International Society of the Learning Sciences. 

Tabak, I. (2004). Synergy: A complement to emerging patterns of distributed scaffolding. The journal of 

the Learning Sciences, 13(3), 305-335. 

Tabak, I., & Kyza, E. A. (2018). Research on scaffolding in the learning sciences: A methodological 

perspective. Taylor and Francis.  

Tabak, I., & Reiser, B. J. (1999, April). Steering the course of dialogue in inquiry-based science class- 

rooms. Paper presented at the annual meeting of the American Educational Research Association, 
Montréal, Québec, Canada.  

Tabet, N., Gedawy, H., Alshikhabobakr, H., & Razak, S. (2016, July). From alice to python. Introducing 

text-based programming in middle schools. In Proceedings of the 2016 ACM Conference on 
innovation and Technology in Computer Science Education (pp. 124-129). 

Tsan, J., Weintrop, D., & Franklin, D. (2022, July). An Analysis of Middle Grade Teachers' Debugging 
Pedagogical Content Knowledge. In Proceedings of the 27th ACM Conference on on Innovation 
and Technology in Computer Science Education Vol. 1 (pp. 533-539). 

Tsivitanidou, O. E., Constantinou, C. P., Labudde, P., Rönnebeck, S., & Ropohl, M. (2018). Reciprocal 

peer assessment as a learning tool for secondary school students in modeling-based learning. 
European Journal of Psychology of Education, 33, 51-73. 

Türker, P. M., & Pala, F. K. (2020). The effect of algorithm education on students’ computer 

programming self-efficacy perceptions and computational thinking skills. International Journal 
of Computer Science Education in Schools, 3(3), 19–32. doi:10.21585/ijcses.v3i3.69  

Verhoeff, R. P., Knippels, M. C. P., Gilissen, M. G., & Boersma, K. T. (2018, June). The theoretical 

nature of systems thinking. Perspectives on systems thinking in biology education. In Frontiers in 
Education (Vol. 3, p. 40). Frontiers Media SA. 

Vessey, I. (1985). Expertise in debugging computer programs: A process analysis. International Journal 

of Man–Machine Studies, 23, 459–494.  

Wang, X. C., Choi, Y., Benson, K., Eggleston, C., & Weber, D. (2021a). Teacher’s role in fostering 

preschoolers’ computational thinking: An exploratory case study. Early Education and 
Development, 32(1), 26-48. 

Wang, C., Shen, J., & Chao, J. (2021b). Integrating computational thinking in STEM education: A 

literature review. International Journal of Science and Mathematics Education, 1–24. 

Webb, M., Davis, N., Bell, T., Katz, Y. J., Reynolds, N., Chambers, D. P., & Syslo, M. M. (2017). 

Computer science in K-12 school curricula of the 2lst century: why, what and when? Education 
and Information Technologies, 22(2), 445–468. 

204 

 
 
 
 
 
Weintrop, D., Beheshti, E., Horn, M., Orton, K., Jona, K., Trouille, L., & Wilensky, U. (2016). Defining 
Computational Thinking for Mathematics and Science Classrooms. Journal of Science Education 
and Technology, 25(1), 127–147. 

Wen, M. L., & Tsai, C. C. (2008). Online peer assessment in an in-service science and mathematics 

teacher education course. Teaching in Higher Education, 13(1), 55-67. 

Wertsch, J. V. (1979). From social interaction to higher psychological processes: A clarification and 

application of Vygotsky’s theory. Human Development, 22, 1–22.  

Wilkerson, M. H., Shareff, R., Laina, V., & Gravel, B. (2018). Epistemic gameplay and discovery in 

computational model-based inquiry activities. Instructional Science, 46, 35-60. 

Wilkerson-Jerde, M., Wagh, A. And Wilensky, U. (2015), Balancing Curricular and Pedagogical Needs 
in Computational Construction Kits: Lessons from the Delta Tick Project. Sci. Ed., 99: 465-499. 

Wilensky, U., & Reisman, K. (2006). Thinking like a wolf, a sheep, or a firefly: Learning biology through 

constructing and testing computational theories—An embodied modeling approach. Cognition 
and Instruction, 24(2), 171–209. 

Wilson, J. (1987). A Socratic approach to helping novice programmers debug programs. SIGCSE 

Bulletin, 19(1), 179–182.  

Windschitl, M., Thompson, J., & Braaten, M. (2008). Beyond the scientific method: Model‐based inquiry 
as a new paradigm of preference for school science investigations. Science education, 92(5), 941-
967.  

Windschitl, M., Thompson, J., & Braaten, M. (2020). Ambitious science teaching. Harvard Education 

Press. 

Wurdinger, S., Haar, J., Hugg, R., & Bezon, J. (2007). A qualitative study using project-based learning in 

a mainstream middle school. Improving schools, 10(2), 150–161.  

Wing, J. M. (2006). Computational thinking. Communications of the ACM, 49(3), 33-35. 

Wing, J. M. (2017). Computational thinking’s influence on research and education for all. Italian Journal 

of Educational Technology, 25(2), 7–14. doi:10.17471/2499-4324/922 

Wood, D., Bruner, J. S. & Ross, G. (1976). The role of tutoring in problem solving. Journal of Child 

Psychology and Psychiatry, and Allied Disciplines, 17(2), 89– 100.  

Wu, H. L., & Pedersen, S. (2011). Integrating computer-and teacher-based scaffolds in science inquiry. 

Computers & Education, 57(4), 2352-2363. 

Xia, S., Zhou, X. N., & Liu, J. (2017). Systems thinking in combating infectious diseases. Infectious 

diseases of poverty, 6(05), 57-63. 

Xiang, L. (2011). A collective case study of secondary students' model-based inquiry on natural selection 
through programming in an agent-based modeling environment (Order No. 3474498). Available 
from ProQuest Dissertations & Theses Global. (897916462). 

Yadav, A., Good, J., Voogt, J., & Fisser, P. (2017). Computational thinking as an emerging competence 

205 

 
 
 
 
 
domain. Competence-based vocational and professional education: Bridging the worlds of work 
and education, 1051-1067. 

Yadav, A., Mayfield, C., Zhou, N., Hambrusch, S., & Korb, J. T. (2014). Computational thinking in 

elementary and secondary teacher education. ACM Transactions on Computing Education 
(TOCE), 14(1), 1–16. 

Yadav, A., Zhou, N., Mayfield, C., Hambrusch, S., & Korb, J. T. (2011). Introducing Computational 
Thinking in Education Courses. In Proceedings of the 42Nd ACM Technical Symposium on 
Computer Science Education (SIGCSE ’11). ACM, New York, NY, USA, 465–470.  

Yoon, S. A. (2008). An evolutionary approach to harnessing complex systems thinking in the science and 

technology classroom. International Journal of Science Education, 30(1), 1-32. 

Yoon, S. A., Anderson, E., Koehler-Yom, J., Klopfer, E., Sheldon, J., Wendel, D., ... & Evans, C. (2015). 

Design features for computer-supported complex systems learning and teaching in high school 
science classrooms. International Society of the Learning Sciences, Inc. [ISLS]. 

Zhang, L., VanLehn, K., Girard, S., Burleson, W., Chavez-Echeagaray, M. E., Gonzalez-Sanchez, J., & 

Hidalgo-Pontet, Y. (2014). Evaluation of a meta-tutor for constructing models of dynamic 
systems. Computers & Education, 75, 196–217. 

Zhang, N., Biswas, G., McElhaney, K. W., Basu, S., McBride, E., & Chiu, J. L. (2020). Studying the 

interactions between science, engineering, and computational thinking in a learning-by-modeling 
environment. Artificial Intelligence in Education: 21st International Conference, AIED 2020, 
Ifrane, Morocco, July 6–10, 2020. 

zu Belzen, A. U., van Driel, J., & Krüger, D. (2019). Introducing a Framework for Modeling 

Competence. In Towards a Competence-Based View on Models and Modeling in Science 
Education (pp. 3-19). Springer, Cham. 

zu Belzen, A., & Krüger, D. (2010). Modellkompetenz im Biologieunterricht [Modeling competence in 

biology classes]. Zeitschrift für Didaktik der Naturwissenschaften, 16, 41–57. 

206