SUPPORTING STUDENTS AND TEACHERS WITH TESTING AND DEBUGGING IN THE CONTEXT OF COMPUTATIONAL SYSTEMS MODELING By Jonathan Robert Bowers A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Curriculum, Instruction, and Teacher Education – Doctor of Philosophy 2024 ABSTRACT To make sense of our interconnected and algorithm driven world, students increasingly need proficiency with computational thinking (CT), systems thinking (ST), and computational modeling. One aspect of computational modeling that can support students with CT, ST, and modeling is testing and debugging. Testing and debugging enables students to analyze and interpret model output to identify aspects that need improvement. Students can subsequently revise their own models or provide meaningful feedback to their peers. Testing and debugging has long been identified as a key learning goal in both science education and computer science. However, current evidence suggests that students have limited opportunities to engage in testing and debugging in K-12 science classrooms. Additionally, both curricular and teacher supports for testing and debugging remain understudied. As such I set out to investigate how students test and debug computational models within a supportive learning environment and how two teachers supported students with testing and debugging in the context of a high school chemistry unit. Through this research, I developed the ST and CT Identification Tool to categorize student testing and debugging behaviors during computational modeling. Using this tool, I identified that students implemented a variety of different pattens of testing and debugging during computational modeling. This suggests that teachers and curricular designers should embrace a diversity of testing and debugging pathways when supporting students with testing and debugging. Likewise, my analysis of pedagogical strategies provides evidence that using synergistic scaffolding and presenting students with clear rationales for engaging with different aspects of testing and debugging encourages students to utilize testing and debugging as a means of improving their computational models. ACKNOWLEDGEMENTS I would like to thank the following people for all the support they have given me with respect to this thesis and across my time as a graduate student at Michigan State University. First, I would like to thank my parents, Carlton and Robyn Bowers and Carolyn and Donald Kindell, for always believing in my potential and encouraging me to work hard to achieve my dreams. I am grateful to Lisa Kenyon for introducing me to the field of science education and connecting me with the amazing faculty at Michigan State University. I thank Julie Christensen for being my graduate student mentor during my first year as a graduate student and for serving on my practicum committee. I also would like to acknowledge my colleagues from Concord Consortium (particularly Daniel Damelin, Cynthia McIntyre, Steve Rodrick, and Lynn Stephens) for their work with designing SageModeler and co-developing the evaporative cooling unit used throughout this thesis, their support with creating “A Framework for Computational Systems Modeling,” and their feedback on the many research instruments and manuscripts that comprise this thesis. I also want to thank my colleagues from CREATE for STEM (Lindsey Brennan, Emil Eidin, Namsoo Shin, Israel Touitou, and Joseph Krajcik) who also worked collaboratively on the Multilevel Computational Modeling project. I especially want to thank Namsoo Shin for her leadership and mentorship throughout my time at Michigan State and for her work with developing “A Framework for Computational Systems Modeling.” I am grateful to Emil Eidin and Lindsey Brennan for their contributions towards developing the many research instruments used in this thesis, their hard work with collecting data from “Faraday High School,” and for collaboratively developing the evaporative cooling unit alongside Daniel Damelin and me. I also thank Peng He for contributing his statistical advice to this thesis and I thank Tingting Li for collecting data for me the day that I had my thesis proposal meeting. I am also thankful for my thesis committee (Amelia Gotwals, Joseph Krajcik, Christina Schwarz, and Gail Richmond) for supporting me throughout this process. Thank you, Amelia Gotwals for being an amazing instructor for my first three semesters at Michigan State. Thank you, Christina Schwarz and David Stroupe, for sharing your professional wisdom, advice, and encouragement throughout many Science Education lunches. I am also grateful for Gail Richmond and her support with mentoring me iii throughout the “Native Animals, Native Knowledge” project. I once more want to thank Joseph Krajcik for being an amazing PI and steadfastly supporting me throughout my work on this thesis and as I pursue the next steps in my career. I also want to acknowledge and thank the many members of the science education writing group. The science education writing group at Michigan State University has been an invaluable community for me during my time at Michigan State providing much needed mentorship and community. Their feedback has greatly influenced the direction of my research and tremendously improved the quality of all three manuscripts found within this thesis. I particularly want to thank Matt Adams for his leadership within the science education writing group and for his support and expertise when we co-taught TE 802 and TE 804. Above all, I want to thank Mr. H and Mr. M of “Faraday High School” for four years of collaboration without which this thesis would have been impossible. Thank you for your willingness to allow us to visit your classroom and work with your students, even during the challenges of a global pandemic. Thank you for all your efforts to take our vision of the evaporative cooling unit and being willing to implement its many variations across its long development cycle. Thank you for all the feedback that enabled us to continuously improve the evaporative cooling unit and for your feedback on this thesis. I especially want to thank you for allowing me to tell the narrative of your teaching strategies so that other teachers might gain deeper insights into how to best support students with testing and debugging. And to whomever is not listed here who supported me throughout this process, Thank you! iv TABLE OF CONTENTS INTRODUCTION …………………………………………………………………………………………1 PAPER 1: DEVELOPING THE SYSTEMS THINKING AND COMPUTATIONAL THINKING IDENTIFICATION TOOL ……………………………………………………………………………….23 PAPER 2: EXAMINING STUDENT TESTING AND DEBUGGING WITHIN A COMPUTATIONAL SYSTEMS MODELING CONTEXT …………………………………………………………………….39 PAPER 3: SYNERGISTIC SCAFFOLDING AND CLEAR RATIONALES: HOW TEACHERS CAN SUPPORT STUDENTS WITH TESTING AND DEBUGGING IN A COMPUTATIONAL MODELING CONTEXT ………………………………………………………………………………………………..76 CONCLUSIONS …………………………………………………………………………………….......170 ACKNOWLEDGMENT OF PREVIOUSLY PUBLISHED WORK …………………………………..186 BIBLIOGRAPHY ……………………………………………………………………………………….189 v INTRODUCTION In an increasingly interconnected world, it is important that students have a deep appreciation of the intricacies of natural systems and a firm grasp on key aspects of systems thinking (ST) and testing and debugging. For example, to articulate how small changes (such as the introduction of an invasive species) can have a massive impact on broader systems, students need to understand feedback mechanisms and other advanced aspects of ST (Hofman-Bergholm, 2018; Keynan et al., 2014; Ledly et al., 2017; Meadows, 2008). Modeling can help students visualize the relationships that exist between different elements in a system, further supporting ST (Hopper & Stave, 2008; Monroe et al., 2015). Likewise, computational modeling software, such as Model-IT and STELLA, help students create interactive models that can generate a numeric or semi-quantitative model output (Bielik et al., 2019; Mandinach, 1988; Metcalf et al., 2000). Learners can compare this model output to external data and use it to facilitate testing and debugging of their models (Shin et al., 2022; Campbell & Oh, 2015; Fisher, 2018; Sins et al., 2005). Additionally, the process of constructing, testing, and revising these computational models often allows students to develop computational thinking (CT) skills, such as problem decomposition and iterative refinement (Pierson & Brady, 2020; Sengupta et al., 2013; Weintrop et al., 2016). Given the synergies between ST, CT, and modeling, researchers proposed “A Framework for Computational Systems Modeling” to demonstrate the interconnected nature of these three constructs and to help researchers, curriculum developers, and teachers better support student engagement in all aspects of computational systems modeling (Bowers et al., 2022a; Hamidi et al., 2023; Shin et al., 2022; Weintrop et al., 2016). One prominent computational system modeling practice emerging from this framework that engages students with ST, CT, and modeling is testing and debugging. Testing and debugging processes allow students to analyze a model’s output and structure. If the model does not align with their understanding of the system or with external data students can make subsequent changes to improve the model (Hadad et al., 2020; Hogan & Thomas, 2001; Sengupta et al., 2013). Through testing and debugging, students can consider how broader structural patterns influence model behavior, thus 1 engaging in ST (Pierson & Brady, 2020; Shin et al., 2022). By systematically analyzing model output to uncover unexpected errors in a model’s structure, students also utilize major aspects of CT during testing and debugging (Lee et al., 2020; Li et al., 2019; Michaeli & Romeike, 2019). Finally, iteratively refining a model based on new experimental evidence and a growing understanding of the underlying phenomenon is a central tenet of scientific modeling that intersects with and informs understanding of testing and debugging (Grover & Pea, 2018; Metcalf et al., 2000; NRC, 2012; Shin et al., 2022). Although testing and debugging has been recognized by several scholars as a major aspect of computational modeling and an important learning goal for STEM education, students typically have limited opportunities to engage with testing and debugging in most K-12 classrooms (Sins et al., 2005; Swanson et al., 2021; Wilensky & Reisman, 2006). Even in learning environments specifically designed to support students with computational modeling, students often do not spend adequate time analyzing model output to make necessary revisions after constructing their initial models (Grapin et al., 2022; Stratford et al., 1998; Swanson et al., 2021). Likewise, students often use superficial testing and debugging approaches to create models that have functional outcomes that mirror real-world experimental data but demonstrate a lack of internal consistency and thus fail to explain the phenomenon in a meaningful way (Li et al., 2019; Sins et al., 2005; Wilensky & Reisman, 2006). Given the challenges students face with testing and debugging, as well as the many overlapping and competing visions of testing and debugging that exist within the broader STEM education literature, it is important to both clearly define a vision for testing and debugging in the context of computational modeling and show examples of how students use different aspects of testing and debugging as they build and revise a computational model. Additionally, given the lack of clear instructional resources for supporting students with testing and debugging in computational modeling, research needs to investigate how teachers can scaffold students with key aspects of testing and debugging in a computational modeling context. As such I address the following research questions through the three respective papers in this thesis. RQ1: How can I categorize student testing and debugging behaviors in the context of computational modeling based on “A Framework for Computational Systems Modeling”? 2 RQ2: What testing and debugging behaviors do students use as they are revising computational systems models in an evaporative cooling unit? RQ3: How do teachers scaffold students with testing and debugging in a computational modeling context? Literature Review A Framework for Computational Systems Modeling Systems Thinking The natural world contains many complex systems with elements that interact with each other in interesting ways to give rise to a multitude of interesting phenomena (Bielik et al., 2023; Hofman- Bergholm, 2018; Meadows, 2008). From the complex web of biochemical reactions within a single cell, to the numerous feedback systems within the human body that help us maintain homeostasis, to the mechanisms by which energy flows and nutrients circulate within ecosystems, natural systems play a critical role in shaping our daily lives. As such systems thinking (ST), or the process of viewing the natural world as a series of interconnected elements that interact to form complex systems, is an important part of scientific literacy (Arnold & Wade, 2015; Ledley et al., 2017; Meadows, 2008; Stave & Hopper, 2007). Indeed, many of the most important scientific issues of our contemporary world, including infectious diseases, invasive species, and climate change, can greatly benefit from a systems thinking approach (Hofman-Bergholm, 2018; Keynan et al., 2014; Ledley et al., 2017; Xia et al., 2017). While individual aspects of these systems can sometimes be described in the simple terms of “cause and effect” relationships, understanding many of the phenomena associated with these systems requires a holistic systems thinking approach (Forrester, 1994; Hofman-Bergholm, 2018; Ledley et al., 2017; Stave & Hopper, 2007). For example, while a person can describe anthropogenic climate change by stating that, “more carbon dioxide in the atmosphere makes the planet warmer”, this simplistic causal explanation ignores the broader complexities of climate change. A more complete, ST based explanation of climate change would include discussions about how carbon dioxide is transferred from fossil fuels into atmospheric CO2 through industrial processes and how rising temperatures themselves create 3 feedback loops that lead to even higher concentrations of atmospheric greenhouse gasses and even higher global temperatures (Hofman-Bergholm, 2018; Ledley et al., 2017). This more complex, ST based approach to teaching climate change communicates the responsive nature of earth’s natural systems and the urgency of climate action. Therefore, incorporating ST into science education has the potential to enhance student understanding of key science ideas, including climate change (Ke et al., 2020; Hofman- Bergholm, 2018; Ledley et al., 2017). Modeling and Computational Modeling Given the potential that ST has in supporting student understanding of key science phenomena and core science ideas, there have been several efforts to find ways to integrate ST into science classrooms (Boersma et al., 2011; Hmelo-Silver et al., 2017; Yoon, 2008). One way that shows promise in helping students with developing ST skills is by embedding ST into modeling (Arndt, 2006; Forrester, 2007; Sterman, 2002; Svoboda & Passmore, 2013). Modeling is the process of creating a static (paper pencil or 3-dimensional) or dynamic representation of a phenomenon such that the representation can be used to explain or predict the behavior of that phenomenon (Harrison & Treagust, 2000; Louca & Zacharia, 2012; Mittelstraß, 2005). From this perspective, models are viewed not just as the product of scientific inquiry but as essential tools for supporting scientific reasoning (Bailer-Jones, 1999; Schwarz & White, 2005; zu Belzen et al., 2019). Additionally, models can help one gain insight into previously unknown aspects of a phenomenon and have predictive power (zu Belzen & Krüger, 2010). Models can also act as sensemaking tools by helping learners to synthesize knowledge and by serving as a focal point for asking future questions about a phenomenon (Gouvea & Passmore, 2017; Nercessian, 2008; Schwarz et. al, 2009). Models can support systems thinking by allowing students to represent the relationships between different elements in a system and facilitate discussions on how distant elements within a system can impact each other. Modeling also facilitates more constructivist approaches to science learning as students can continuously revise their models as they gather new information through investigations, simulations and analysis of data or make modifications to their models to enhance the explanatory power of their models (Krell et al., 2015; Passmore et al., 2009; Windschitl et al., 2008). 4 To help students visualize how structural changes to their models impact system behavior, many researchers and educators have turned to computational modeling software (Basu et al., 2016; Nguyen & Santagata, 2020; Shin et al., 2022). Computational modeling uses algorithms or algorithmic thinking to create a model that visually represents the behavior of a system in a quantitative or semi-quantitative manner (Fisher, 2018; Pierson & Clark, 2018; Sengupta et al., 2013; Shin et al., 2021, 2022; Weintrop et al., 2016). Computational models have many affordances that make them valuable tools for science learning. The visual aspect of computational models allows for students to explore how various elements of the model interact to generate complex behaviors and see that changes to model parameters can affect system behavior thus facilitating ST (Basu et al., 2016; Cronin et al., 2009; Nguyen & Santagata, 2020). The algorithmic nature of computational models also provides opportunities for students to utilize different aspects of CT (Anderson, 2016; Brennan & Resnick. 2012; Irgens et al., 2020; Wang, 2021b). For example, students need to engage in the CT aspect of problem decomposition as they decide how to best represent different aspects of a phenomenon in a format that can be interpreted by the computational modeling software. Computational models are also responsive to new data inputs and can be tested and debugged using a computer (Fisher, 2018; Pierson & Clark, 2018; Sengupta et al., 2013; Weintrop et al., 2016). As such computational modeling supports students with ST, CT, and testing and debugging. In general, there are two major classes of computational modeling programs: agent-based modeling and icon-based modeling. In agent-based modeling, students use computer programming languages to create or manipulate individual elements or “agents” in a programming canvas (Basu et al., 2016; Sengupta et al., 2013; Wilensky & Reisman, 2006). These agents can be given unique behaviors and be programmed to interact with other agents. In icon-based modeling, the user represents variables as symbols or icons (Costanza & Voinov, 2001; Smith et al., 2005; Xiang, 2011). The user then sets links between these different variables to demonstrate the causal relationships between variables in the system (Damelin et al., 2017; Nguyen & Santagata, 2020). Early examples of computational modeling software used to support students with ST come from the icon-based modeling programs of STELLA and Model- IT (Metcalf et al., 2000; Richmond, 1994; Stratford et al., 1998). Both of kinds of software allow for 5 students to set distinct relationships between multiple elements in their models and generate a model output. The generation of model output facilitates students in exploring how changes to system structure impact system behavior and encourages them to engage in testing and debugging so that their models better match both their conceptual understanding of the phenomenon and real-world data (Basu et al., 2016; Bravo et al., 2006; Sengupta et al., 2013; Stratford et al., 1998). One icon based computational modeling software that is particularly promising in its potential to support students with systems thinking and testing and debugging is SageModeler. SageModeler is an open-source semi-quantitative icon based computational modeling software developed by Concord Consortium (Damelin et al., 2017; Eidin et al., 2023; Nguyen & Santagata, 2020). Several of the features of SageModeler have the potential to support students in various aspects of systems thinking, computational modeling, and testing and debugging. On a basic level, SageModeler allows students to set relationships between elements and define these relationships in semi-quantitative terms (Figure 1A). Students can also set certain elements to be collectors (meaning that these elements can either increase or decrease in value over time) and set flows between these collectors to simulate how two interrelated elements can change over time (Figure 1B). Additionally, SageModeler allows students to generate model output through both simulation features and a specialized graphing tool, facilitating student testing and debugging (Figure 1C). Because these features of SageModeler were designed specifically to support students with computational modeling, the studies found in this document are built around students using this software program. However, many of the principles studied in the context of SageModeler can be applied to other system dynamics programs and other forms of computational modeling. 6 Figure 1: SageModeler Introduction The simulation features of SageModeler are activated through the simulate button (1). This allows for students to change the relative amount of each input variable (2) and see its impact on model behavior. Using the record button (3), students can record how the system is changing over time and can subsequently generate a graph (4) showing the relationship between any two variables in the system. Figure 1A: Example of a simple causal relationship in SageModeler Figure 1B: Example of a simple collector and flow system Figure 1C: Simulations and graphing features of SageModeler 7 Computational Thinking While computational modeling provides a platform for students to visualize a phenomenon as a system of interconnected elements, thereby facilitating ST, it also creates opportunities for students to engage in computational thinking (Basu et al., 2016; Wilensky & Reisman, 2006; Weintrop et al., 2016). Computational thinking (CT) is a form of sensemaking that uses an iterative and quantitative approach to decompose a phenomenon or problem to explore, explain, and predict the behavior of that phenomenon or to find a solution to a problem through the creation and revision of algorithms (Grover & Pea, 2018; Shin et al., 2022; Weintrop et al., 2016; Wing, 2006). Because the CT community has its origins in computer science education, CT literature emphasizes the algorithmic nature of computational models, both in how students construct and revise their models (Brennan & Resnick, 2012; Weintrop et al., 2016). Two aspects of CT deeply intertwined with ST and computational modeling are testing & debugging and iterative refinement (Brennan & Resnick, 2012; Li et al., 2019; Swanson et al., 2021; Wilensky & Reisman, 2006). As students examine and test their model behavior and model output, they often notice aspects of model behavior that do not match their conceptual understanding of the phenomenon or external data (Hadad et al., 2020; Weintrop et al., 2016). Using debugging strategies, they can identify a specific aspect of their model (be it an individual element or relationship between elements) that needs modification, so their model better fits their conceptual understanding or external data (Aho, 2012; Shin et al., 2022; Türker & Pala, 2020; Weintrop et al., 2016). Likewise, as student conceptual understanding evolves over the course of a unit, they will inevitably need to make iterative refinements to their models to match their changing understanding of the phenomenon (Barr & Stephenson, 2011; Basu et al., 2016; Shin et al., 2021). As these aspects of CT encourage students to consider how model structure influences model behavior, they provide opportunities to reinforce aspects of students’ understanding of ST. A Framework for Computational Systems Modeling While systems thinking and computational thinking are a part of many science educational policy documents (ACARA, 2017; KMK, 2005, 2020; Yadav et al., 2017), including the Next Generation Science Standards (NGSS, 2013), and the idea of using computational modeling to support ST and CT in 8 science classrooms has existed for several decades, this approach to science education remains absent from most science classrooms (Boersma et al., 2011; Riess & Mischo, 2010; Verhoeff et al., 2018). Integrating ST and CT with computational modeling (to create “computational systems modeling”) requires that science educators recognize the synergy between modeling, CT, and ST and abandon the former “siloing” that has previously defined these three bodies of literature. While many researchers have explored computational modeling and strived to integrate CT into science classrooms, these researchers focused on agent-based modeling and computer science principles, thereby did not explore or address icon-based models (such as SageModeler and Model-IT) for helping students to visualize relationships between variables (Basu et al., 2016; Sengupta et al., 2013; Wilensky & Reisman, 2006). Likewise, ST modelers (particularly in the System Dynamics community) often focus on having students understand specific types of relationships between elements in a model, while largely avoiding discussions on the broader CT principles at work in the modeling process (Assaraf & Orion, 2005; Cronin et al., 2009; Stave & Hopper, 2007). Finally, much of the traditional modeling community prioritizes diagrammatic models and investigating these models as tools to facilitate student sensemaking (Schwarz et al., 2009) and support students in explaining and predicting the behavior of a real-world phenomenon (zu Belzen & Krüger, 2010). To help synthesize key ideas and contributions from ST, CT, and Modeling literature into a cohesive vision for computational systems modeling, Shin & colleagues (2022) proposed “A Framework for Computational Systems Modeling” (Figure 2). This framework consists of three major components: ST aspects (on the left-hand side of the diagram), CT aspects (on the right-hand side of the diagram), and computational modeling practices (green boxes in the middle of the diagram). When constructing this framework, Shin & colleagues (2022) took inspiration and guidance from the ST, CT, and Modeling literature to define the five ST aspects, the five CT aspects, and the five computational modeling practices shown in this framework (Grover & Pea, 2018; NRC, 2012; Richmond, 1994; Schwarz et al., 2017). While the ST and CT aspects of this framework serve to summarize the authors’ conceptualization of systems thinking and computational thinking respectively, the five computational modeling practices are 9 concrete actions that students perform as they design, construct, test, and revise their computational systems models. Each of the five computational modeling practices are informed by the ST and CT aspects defined in this framework and provide students with the opportunity to develop and demonstrate various aspects of ST and CT (Shin et al., 2022). For example, as students “test, evaluate, and debug model behavior”, they often utilize the ST aspects of “predicting system behavior based on system structure” and “engaging in causal reasoning” (Hadad et al., 2020; Lee et al., 2020) alongside the CT aspects of “testing and debugging” and “making iterative refinements” (Aho, 2012; Barr & Stephenson, 2011; Shin et al., 2022). As such this framework acknowledges the deep synergy between ST, CT, and modeling that occurs as students construct and revise computational models. This framework also recognizes that computational systems modeling is an iterative process, where students will identify changes that need to be made to their models through the practice of “test, evaluate, and debug model behavior” and subsequently reconsider system boundaries and model structures, thus reengaging in those practices (Shin et al., 2022). Figure 2: “A Framework for Computational Systems Modeling” Testing and Debugging Testing and Debugging: an Overview One of the main advantages of incorporating computational modeling into K-12 STEM education is to enable students to engage in testing and debugging. Testing and debugging is a multi-faceted process 10 by which students actively seek to identify flaws in their algorithmic representations of a phenomenon and make subsequent corrections to their representations to more accurately reflect their evolving understanding of the underlying phenomenon (Hadad et al., 2020; Hogan & Thomas, 2001; Sengupta et al., 2013; Shin et al., 2022). After students have constructed an initial algorithmic product (e.g. a text- based computer program, an agent-based computational model, a SageModeler model, etc.), they will often need to test their algorithmic product to see if its output aligns to their expectations (Griffin, 2016; Hadad et al., 2020; Shin et al., 2022; Wilensky & Reisman, 2006). Such testing is often built directly into the computational modeling software or is accomplished by executing algorithmic codes. If the output of the algorithmic product does not match the expected outcome or is found to differ from experimental results, students are then tasked with identifying specific structural flaws within their algorithmic representation (Bravo et al., 2006; Michaeli & Romeike, 2019; Sengupta et al., 2013). This might require students to engage in debugging by systematically going through lines of computer code to find syntactical errors or interrogating their reasoning behind each individual relationship in a computational model. Having knowledgeable peers review their algorithmic products can also help students identify specific flaws in their representations of the phenomenon. Once students have identified flaws in their algorithmic representations and make appropriate revisions, they should once more test their algorithmic products to see if their changes have improved their algorithmic outputs and to identify additional errors. Through this cycle of testing, debugging, and revising, students will iteratively refine their algorithmic product to better reflect the underlying science phenomenon they are trying to represent (Grover & Pea, 2018; Hutchins et al., 2020; Shin et al., 2022; Windschitl et al., 2008). There are several benefits for encouraging students to test and debug their algorithmic products. By seeking out structural and syntactical flaws in their algorithmic representations, students build a deeper understanding of how to encode ideas in an algorithmic environment, thus enhancing their CT skills (Michaeli & Romeike, 2019; Grover et al., 2015; Shin et al., 2021). For example, identifying that a missing parenthesis in a text-based computer program renders it unable to compile teaches students the importance of proper syntax in computer science. Likewise, testing and debugging often requires that 11 students investigate how various aspects and elements of their algorithmic representations interact with each other to create complex behavioral patterns (Abar et al., 2017; Fretz et al., 2002; Sengupta et al., 2012; Weintrop et al., 2016). Through seeing how interactions between various elements in their algorithmic products influence output behavior, students can develop ST competency (Shin et al., 2022; Weintrop et al., 2016). Finally, by engaging in the iterative refinement aspect of testing and debugging, students have multiple opportunities to revisit their understanding of the underlying scientific phenomenon or the goals of their algorithmic product (Grover & Pea, 2018; Hutchins et al., 2020; Shin et al., 2022). By this reflective process, particularly when paired with collecting and analyzing data from real-world experiments, students can reconsider previously held assumptions about the phenomenon and make changes to their models based on new knowledge (Basu et al., 2016; Bravo et al., 2006; Grapin et al., 2022; Windschitl et al., 2008). As such, testing and debugging supports students in making sense of the phenomenon they seek to represent, thus benefiting their understanding of disciplinary core ideas. Testing and debugging (along with the closely related construct of iterative refinement) is found across a broad spectrum of STEM Education literature (Grover & Pea, 2018; Michaeli & Romeike, 2019; Stratford et al., 1998; Weintrop et al., 2016). In the scientific practice of modeling, students are encouraged to frequently return to their models after learning new science content and/or conducting real world experiments to make model revisions (Louca & Zacharia, 2012; Metcalf et al., 2000; NRC, 2012; Schwarz et al., 2009). Through this process of iterative refinement, students gradually improve their models to better reflect the real-world science phenomenon and deepen their understanding of underlying disciplinary core ideas (Clement, 2000; Schwarz et al., 2007, 2009; Windschitl et al., 2008). Computer science scholars view testing and debugging as the process of searching for anomalies in a software program, finding specific flaws (i.e., “bugs”) in the algorithmic code, and subsequently replacing these flawed sections of computer code so that the program can run as intended (Griffin, 2016; McCauley et al., 2008; Michaeli & Romeike, 2019). Given the importance of being able to identify and correct flaws in computer programming, computer science educators often view testing and debugging proficiency as an essential indicator of programming skill. 12 Computational modeling, another instructional environment, can facilitate students with testing and debugging. The computational modeling environment requires that users encode information in an algorithmic manner so that the software can generate a functioning model (Fisher, 2018; Pierson & Clark, 2018; Sengupta et al., 2013; Shin et al., 2022). The complexity of the encoding process ranges from using drop-down text menus to set semi-quantitative relationships between individual variables in icon-based modeling programs such as SageModeler to using relatively sophisticated text-based programming to set the behavior of individual agents in NetLogo (Damelin et al., 2019; Goldstone & Janssen, 2005; Wilensky & Reisman, 2006). Despite these varying levels of complexity, the common algorithmic nature of computational modeling software programs leads to an environment where syntactical errors can occur, thus creating opportunities for students to use debugging to locate and correct these flaws in their models. Most computational modeling software programs also have a visual output, allowing for students to more easily see how their algorithmic products behave under different initial conditions, thus facilitating model testing (Abar et al., 2017; Campbell & Oh, 2015; Fisher, 2018; Sins et al., 2005). For example, some computational modeling programs allow for students to see the behavior of the various agents they have programed on a set canvas (Basu et al., 2014; Goldstone & Janssen, 2005; Ginovart, 2014; Sengupta & Farris, 2012). Other computational modeling programs let students explore how manipulating the relative amount of input variables impacts the relative amount of intermediate and output variables in their models (Damelin et al., 2017; Metcalf et al., 2000; Nguyen & Santagata, 2020; Richmond, 1994). Such a visual output enables students to see if their model’s behavior contradicts their expectations and determine if further revisions are needed. In computational modeling programs with a quantitative or a semi- quantitative visual output, students can often compare their model output with real-world experimental data, allowing students to validate their models (Campbell & Oh, 2015; Shin et al., 2021; Weintrop et al., 2016). Using external data to validate model behavior is an important aspect of testing and debugging as it allows for students to demonstrably determine if their model behavior accurately reflects the targeted real-world phenomenon. It also helps emphasize the importance of the experimental aspect of science, 13 showing that theoretical models, such as their computational model, need to be supported by external, experimental data (Bravo et al., 2006; Sengupta et al., 2013; Stratford et al., 1998). Testing and Debugging in “A Framework for Computational Systems Modeling” Expanding upon prior studies, my colleagues and I developed a comprehensive definition of testing and debugging, as part of the broader computational modeling framework of “A Framework for Computational Systems Modeling” (Basu et al., 2016; Hadad et al., 2020; Sengupta et al., 2013; Shin et al., 2022; Figure 3). As previously mentioned, “A Framework for Computational Systems Modeling” is deeply rooted in our collective understanding of ST, CT, and computational modeling and was influenced by our experiences with the SageModeler software program (Damelin et al., 2019; Shin et al., 2021; 2022). “A Framework for Computational Systems Modeling” incorporates the term “testing and debugging” in both the context of CT and within the computational modeling practice of “testing, evaluating, and debugging model behavior”. However, for the purposes of this thesis, I am focusing on testing and debugging as a multifaceted computational modeling practice. While testing and debugging, students will often begin by interrogating the different variables and relationships within their models or by analyzing the visual output of their models (Hadad et al., 2020; Lee et al., 2020; Shin et al., 2022). During this analytical phase, students will often identify aspects of their models that do not correspond with their evolving understanding of the phenomenon or fail to align with experimental data. This, in turn, motivates students to seek specific relationships and variables that can be changed to improve their model. Through iteratively assessing model output and refining model structures, students generally succeed in aligning their models more closely with the behavior of the targeted real-world phenomenon. This inclusive perspective on testing and debugging, inspired by the insights of various scholars, acknowledges how students embody elements of CT and ST as they participate in this practice (Aho, 2012; Brennan & Reisnick, 2012; Sengupta et al., 2013; Yadav et al., 2014). Within this framework, aspects of the scientific practice of “using mathematics and computational thinking” along with aspects of the crosscutting concept of “systems and systems models” and “cause and effect” are seamlessly woven into the computational systems modeling practice of “testing, evaluating, and debugging model behavior” 14 and the broader scientific practice of “developing and using models” (NGSS, 2013; Shin et al., 2022). As students examine the visual outputs of their computational models to identify aspects that deviate from their expectations, they are actively engaging in the CT aspect of “testing and debugging” (Barr & Stephenson, 2011; Sengupta et al., 2013; Sullivan & Heffernan, 2016). Similarly, when students compare this model output against external real-world data, they are simultaneously involved in “generating, organizing, and interpreting data” (Aho, 2012; Selby & Woollard, 2013). Furthermore, as students discuss the validity of various relationships within their models, they are exemplifying the ST aspect of “causal reasoning” which intersects with the crosscutting concept of “cause and effect” (NGSS, 2013; Shin et al., 2022). When these discussions progress towards assessing how structural elements within a model (such as feedback loops) influence broader facets of model behavior, students are effectively, “interpreting and predicting system behavior based on system structure”. Finally, students engage in “iterative refinements” when they modify their models to ensure that their model’s behavior more accurately mirrors that of the real-world phenomenon (Hadad et al., 2020; Weintrop et al., 2016). Figure 3: Aspects of Systems Thinking and Computational Thinking exhibited through the computational modeling practice of “Test, Evaluate and Debug Model Behavior” 15 Challenges with Testing and Debugging Although testing and debugging has been identified as a key aspect of computational modeling and is recognized as an important learning goal in computer science education and science education, students often find testing and debugging challenging (Barr & Stephenson, 2011; Eidin et al., 2023; Grapin et al., 2022; Li et al., 2019). When building computational models, students are often reluctant to make changes to their initial models, even when presented with new evidence that contradicts their initial ideas about a phenomenon (Grapin et al., 2022; Stratford et al., 1998; Swanson et al., 2021). Students also tend to take an ad-hoc outcome-oriented approach to testing and debugging when tasked with using real- world external data to validate their model output (Li et al., 2019; Sins et al., 2005; Wilensky & Reisman, 2006). In these cases, students will make modifications to their models to try to generate a model output that matches the real-world experimental results without considering the implications of these changes on the explanatory power of their models. This outcome-oriented approach results in models that superficially reflect real-world data but that are unable to properly explain the mechanisms of the targeted phenomenon. For example, based on experimental observations, a student might generate an agent based geocentric computational model of the rotational paths of the Sun, Earth, and Moon that accurately predicts the occurrences of lunar and solar eclipses. While this model technically fits the experimental data they have been provided, the underlying mechanisms of the model reflect a non-canonical understanding of the targeted disciplinary core ideas. As the literature demonstrates the challenges students face with testing and debugging, relatively few studies discuss how teachers can support students with this practice in the context of science education and computational modeling (Barr & Stephenson, 2011; Grover & Pea, 2018; Michaeli & Romeike, 2019). Most of the existing studies on pedagogical practices for supporting students with testing and debugging center on computer science contexts, particularly with respect to debugging traditional text-based computer programs (Katz & Anderson, 1989; McCauley, 2008; Michaeli & Romeike, 2019; Vessey, 1985). Meanwhile studies on pedagogical practices for computational modeling tend to offer generic advice on supporting students with iterative refinement and largely avoid suggesting 16 specific scaffolds for testing and debugging (Fretz et al., 2002; Snyder et al., 2022; Wilkerson et al., 2018; Wilkerson-Jerde et al., 2015). Given the established need to better support students with the computational modeling practice of testing and debugging and the dearth of literature on effective pedagogical supports for this practice, I set out to investigate how to best support students with testing and debugging in a computational modeling environment. Thesis Overview Building off “A Framework for Computational Systems Modeling” and the broader testing and debugging literature, I explore ways to support students with testing and debugging in a computational modeling context. It is important to note that the research found in this thesis was part of a larger research partnership between the Concord Consortium (Concord, Massachusetts) and the CREATE for STEM Institute (Michigan State University, East Lansing, Michigan) that sought to investigate how to integrate computational systems modeling into high school science classrooms using curricula built around SageModeler software. This project, influenced by project-based learning and design research principles (Barab & Squire, 2016; Krajcik & Shin, 2022), centers on the implementation of a high school chemistry unit, where students are tasked with building and revising a computational model of evaporative cooling using SageModeler software. Papers 1 and 2 of the thesis represent data collected from the initial (year 1) implementation of this project while Paper 3 consists of data collected in year 4 of this project. All data for this project was collected at Faraday High School (FHS), a pseudonym for a STEM magnet school in the Midwestern United States, in collaboration with two high school teachers: Mr. H and Mr. M (both pseudonyms). Before investigating how teachers and the broader learning environment could support students with testing and debugging, it was necessary to develop a research instrument for assessing how students test and debug computational models. The development of this research instrument is described in the first paper of this thesis titles: “Developing the Systems Thinking and Computational Thinking Identification Tool”. Based off “A Framework for Computational Systems Modeling”, this research instrument, known as the Systems Thinking and Computational Thinking Identification Tool (or the ID 17 Tool), was initially intended to analyze student behaviors across all five computational modeling practices found in “A Framework for Computational Systems Modeling”. However, the unwieldly nature of the original instrument necessitated a narrowing of the focus towards seven behaviors associated with testing and debugging. In addition to creating a useful qualitative research instrument, this paper helped to define testing and debugging in the context of computational modeling by describing discrete testing and debugging behaviors students engage with as they test, debug, and revise computational models. Using the ID Tool instrument developed and validated in the first manuscript, I subsequently investigated how five student groups tested and debugged their models in the context of a high school chemistry unit centered on computational modeling in Paper 2: “Examining Student Testing and Debugging within a Computational Systems Modeling Context”. Through this paper, I identified several different approaches these students took towards testing and debugging, ranging from systematically using the simulation features built into SageModeler to find structural flaws in their models to gaining regular insights from peers on which aspects of their models needed further support. I also found evidence of certain testing and debugging behaviors (as measured by the ID Tool instrument developed in the first manuscript) being more common than others, suggesting that additional support from teachers and the learning environment would be helpful for engaging students in the less represented testing and debugging behaviors. While the first two papers of this thesis focus on analyzing how students test and debug computational models, Paper 3 (“Synergistic Scaffolding and Clear Rationales: How Teachers can Support Students with Testing and Debugging in a Computational Modeling Context”) instead explores how teachers can scaffold students with testing and debugging and which scaffolds seem to impact student testing and debugging behaviors. Narrowing the scope of this paper, I chose to focus on how teachers supported students with three key testing and debugging behaviors from the original ID Tool: analyzing model output, using peer feedback, and using external data to validate model output. This decision was based on the relative importance of these aspects of testing and debugging in the broader literature so that any conclusions from this paper could be more targeted and concise. Given the design- 18 based nature of the broader project, the evaporative cooling unit as enacted in paper 3 (year 4 of the project), was significantly modified based on the results of papers 1 and 2 to better support students with testing and debugging. Immediately prior to the enactment of the evaporative cooling unit, Mr. H and Mr. M participated in a professional learning community organized by two colleagues and I to discuss how to best support students with testing and debugging. The results of this paper suggest that when Mr. H and Mr. M used synergistic scaffolding, by supporting students with making use of existing curricular and technological scaffolds embedded in the learning environment, students were more likely to engage in the targeted testing and debugging behaviors. Additionally, providing students with clear rationales for using certain testing and debugging behaviors to revise their models, seems to have been an effective pedagogical strategy for scaffolding students with testing and debugging. Author Positionality As a scholar in the field of science education, I take a strong stance on promoting what I believe to be best science teaching practices and student science learning. Taking an asset-based approach, I believe that all students are capable of learning science and that all students should have the opportunity to experience the joy and wonder of the natural world through instruction centered on science practices. I also have a strong belief that science learning should be contextualized to student lived experiences and be relevant to their everyday lives. I also have deep philosophical commitments to constructivist pedagogies and student-centered learning. To me, Constructivism is a theory of learning that posits that all student learning must build upon student prior knowledge and that students learn best when they can actively engage in the process of knowledge construction. In practice, this means that I believe that good science teaching must allow for students to engage with a meaningful phenomenon, ask questions and conduct investigations to gain insight into said phenomenon, and ultimately construct a knowledge product that demonstrates their sensemaking and understanding of the key scientific principles underlying the phenomenon. I believe that having students construct and revise models (both paper-pencil and computational models) facilitates student sensemaking and that students must be given multiple opportunities to revise their models as they gain new insights into the phenomenon through hands-on 19 investigations. Finally, I believe that developing a strong appreciation for systems thinking is a critical component of contemporary science education. Given that many of the most important scientific issues of our time involve systems composed of complex webs of interconnected elements that change over time, it is imperative that every student have a firm grasp on ST concepts before leaving high school. Given these philosophical commitments, I seek to amplify these core constructivist principles and their subsequent corollaries through my work. As a researcher, I recognize that classroom research is not value neutral and that my presence as a researcher impacts student learning and teacher behavior. Given that planning and enacting the curriculum at the core of my research takes a substantial amount of time and effort on the part of the cooperating teachers (Mr. H and Mr. M), I have sought to build and maintain a strong professional working relationship with both teachers. As I had limited experience in K-12 classrooms prior to beginning this work, I have deeply valued the input and expertise of Mr. H and Mr. M in this project. Over the many PLC meetings we have had, I have endeavored to avoid taking an overtly authoritative role. I have instead sought to maintain a collaborative environment where expertise is mutually shared and respected as both the MSU team and Mr. H and Mr. M work together to improve the quality of the unit. Additionally, I have had Mr. H and Mr. M review my findings, particularly in Paper 3, to ensure that my work accurately portrays their perspectives and resonates with their experiences as research participants. Within Mr. H and Mr. M’s classrooms, I aimed to take on primarily an observer role and sought to minimize my disruption to the learning environment. Earlier in this research, at times I took a more active role in the classroom and regularly supported students in the modeling process. As this research has progressed, I came to value a less hands-on approach as it allowed for a more authentic look at the interactions between teachers and students in constructing and revising computational models. It also minimized the potentially negative impacts that active interference can have on student learning outcomes and emotional well-being. I instead sought to focus more on observing how Mr. H and Mr. M support their respective students in constructing and revising their computational models and emphasized 20 supporting Mr. H and Mr. M through professional learning opportunities that took place outside of the classroom environment. However, technical difficulties emerging from the learning management system led me to take on a more active role in these classrooms than I would have preferred. During the final implementation of this unit, I regularly helped students with troubleshooting and at times provided support with navigating the technical aspects of SageModeler (such as creating collector and flow relationships) so that Mr. H and Mr. M can spend more time focusing on helping students with testing and debugging and ST. Finally, it is important to address how my identity as a native English speaking White man impacts my interaction with the students and teachers who are participating in this research project. Given that I am doing this work in a school building that is majority White, I acknowledge that my identity largely matches that of many of the students and of the teachers involved in this study. As such, I must recognize that some of the relational aspects of building this research partnership are easier than they would be if I were a person of color, an immigrant, and/or a non-native English language speaker. I also admit that given how science has long been a male dominated field and that both teachers I am working with are also White men, my presence can reinforce assumptions about science identity, even though that is not my intention. As such, I feel the need to give space so that female students and students of color can also feel empowered in their science identity through this work. I also must acknowledge that as a White man, I can often be blind to the hidden power dynamics and implicit biases present in the classroom environment and curricular design choices. In concrete terms, this process of supporting female students and students of color took three distinct forms: supporting Mr. H and Mr. M in supporting equitable discourse moves in our professional learning sessions, providing positive feedback and affirming the emerging science identities of female students and students of color, and prioritizing participation of female students and students of color in data collection practices. Given that one of my colleagues was interested in using this project to investigate how to support more equitable discourse between students in small groups, we made it a priority to address the role that teachers can play in supporting this process during our professional 21 learning meetings with Mr. H and Mr. M. As such, I supported her in having these potentially sensitive conversations around equitable discourse with Mr. H and Mr. M. Once I was in the classroom, I took key opportunities in conversations with female students and students of color to provide these students with supportive feedback that affirmed their science identity. I prioritized (where possible given the challenges we faced with student recruitment) having appropriate representation of female and POC students in student screencasts and student interviews. When analyzing these data, I sought to take an asset-based perspective that displayed their emerging science identities and provided examples of how these students are using testing and debugging strategies and systems thinking discourse to make sense of the evaporative cooling phenomenon. 22 PAPER 1: DEVELOPING THE SYSTEMS THINKING AND COMPUTATIONAL THINKING Abstract IDENTIFICATION TOOL We developed the Systems Thinking (ST) and Computational Thinking (CT) Identification Tool (ID Tool) to identify student involvement in ST and CT as they construct and revise computational models. Our ID Tool builds off the ST and CT Through Modeling Framework, emphasizing the synergistic relationship between ST and CT and demonstrating how both can be supported through computational modeling. This paper describes the process of designing and validating the ID Tool with special emphasis on the observable indicators of testing and debugging computational models. We collected 75 hours of students’ interactions with a computational modeling tool and analyzed them using the ID Tool to characterize students’ use of ST and CT when involved in modeling. The results suggest that the ID Tool has the potential to allow researchers and practitioners to identify student involvement in various aspects of ST and CT as they construct and revise computational models. Introduction Many of our current societal and ecological challenges involve complex systems composed of interconnected elements. From global pandemics to climate change, these challenges require systems thinking (ST) to identify how various elements contribute to emergent effects in large-scale systems. ST enables individuals to investigate how a single part of a system can have broader impacts on the whole system (Meadows, 2008). Given the complexity of most systems, computational thinking (CT) is often required to approach these problems. CT is a sensemaking process where one decomposes a problem in a systematic way, translates it into an algorithm that can be interpreted by an information processing agent, and iteratively refines it based on new observations and new data inputs (Grover & Pea, 2018; Wing, 2006). Because both ST and CT are important for addressing problems involving complex systems, it is fruitful to consider their synergies for investigating phenomena (Shin et al., 2022; Weintrop, 2016). ST and CT are also increasingly being emphasized as important elements of science education on a global 23 scale, being incorporated into official policy documents in many countries including the U.S., the U.K., and Taiwan (Csizmadia et al., 2015; NGSS Lead States, 2013; So et al., 2020). These efforts to include ST and CT as key aspects of science education create a need for developing new research tools for characterizing and monitoring student use of these types of thinking (Grover & Pea, 2018). One framework that recognizes the interconnected relationship between ST and CT is the “ST and CT Through Modeling Framework,” which describes how student use of ST and CT can be supported through the construction of computational models (Bowers et al., 2022; Shin et al., 2022). This framework seeks to clarify and expand the conceptualizations of ST and CT as proposed by the NGSS as well as demonstrate the synergy between ST, CT, and modeling (NGSS Lead States, 2013; Shin et al., 2022). Given its focus on the interconnectedness of ST and CT, this framework provides a foundation for developing an instrument for observing student use of ST and CT as they construct and revise computational models. Such a tool may facilitate researchers in recognizing instances of and patterns in students’ use of specific ST and CT aspects as they construct and revise models. In this paper, we first summarize our conceptualization of the three main components of our framework (ST, CT, and modeling) and how these components combine to form five computational modeling practices. We then describe how we developed a research tool based on this framework to explore student use of ST and CT as they constructed and revised computational models using a semi-quantitative computational modeling tool. Finally, we provide examples of how this tool can be used to identify and categorize student use of ST and CT. Theoretical Approach Systems thinking is an approach to exploring a phenomenon as a network of elements that work together to create a system with emergent behavior that is more than the sum of its constituent parts (Arnold & Wade, 2015; Forrester, 1971; Meadows, 2008). We define an “element” as a key part of a system that can be independently described yet interacts with other aspects of the system to impact the overall behavior of that system. Many complex phenomena can be described as a series of interacting elements with feedback relationships and informational delays that often generate counterintuitive 24 behaviors (Booth-Sweeney & Sterman, 2007; Cronin et al., 2009). To fully engage in ST, students need to move beyond simple linear causal reasoning to a system behavior perspective so that they can identify common structural patterns found within and across systems. Our framework identifies five major aspects of ST: (1) defining a system’s structure and boundaries, (2) engaging in causal reasoning, (3) recognizing interconnections and identifying feedback, (4) framing problems or phenomena in terms of behavior over time, and (5) predicting system behavior based on system structure (Shin et al., 2022). Computational thinking has many definitions ranging from being grounded in mathematics and data analysis (NRC, 2012) to being an aspect of sensemaking centered on formulating questions through testing models and simulations (Schwarz et al., 2017; Weintrop et al., 2016) to thinking like a computer scientist (Grover & Pea, 2018; Wing, 2006). Synthesizing these approaches, we define CT as a form of sensemaking that uses an iterative and quantitative approach to decompose a phenomenon or problem to explore, explain, and predict the behavior of that phenomenon or to find a solution to a problem through the creation and iterative revision of algorithms (Shin et al., 2022). Our framework identifies five major aspects of CT: (1) decomposing problems such that they are computationally solvable, (2) creating computational artifacts using algorithmic thinking, (3) generating, organizing, and interpreting data, (4) testing and debugging, and (5) making iterative refinements. In addition to ST and CT, modeling forms the third component of our framework. Modeling is the process of creating a representation of a phenomenon such that the representation can be used to explain or predict the behavior of that phenomenon (Harrison & Treagust, 2000; Schwarz et al., 2009). From this perspective, models are viewed not just as the product of scientific inquiry but as essential tools for supporting scientific reasoning and sensemaking (Schwarz et al., 2009). Additionally, analyzing existing models can help one gain insight into different aspects of a phenomenon and predict its future behavior (zu Belzen & Krüger, 2010). Scientists and students often use models to represent their conceptualization of a phenomenon so that they can synthesize and communicate their ideas to others (Gilbert & Justi, 2016). Within our framework, students utilize both ST and CT approaches as they engage in the process of modeling. 25 While researchers (Berland & Wilensky, 2015; Wing, 2017) claim that CT and ST are intertwined and support each other, we view CT and ST as co-equal, yet distinct in the context of modeling because of their unique ways of approaching problems. CT focuses on designing solutions through computation while ST analyzes the various relationships among elements in a system (Shute et al., 2017). Our framework thus defines CT and ST as separate entities and identifies five computational modeling practices that combine aspects of ST and CT: (M1) characterize problem or phenomenon to model, (M2) define the boundaries of the system, (M3) design and construct model structure, (M4) test, evaluate, and debug model behavior, and (M5) use model to explain and predict behavior of phenomenon or design solution to a problem (Bowers et al., 2022). Students engage in these modeling practices as they construct, test, revise, and use their computational models. Students characterize the phenomenon (M1) as they discuss and unpack key elements of the phenomenon under study and as they learn about new elements of the phenomenon. Students define the boundaries of the system (M2) and design/construct model structure (M3) as they discuss which variables to add to their models and set relationships between these variables respectively. Once students have built their initial models, they can analyze the model output and should compare this output to real-world data or their emerging understanding of the phenomenon to identify and modify flaws in their model, thus testing and debugging of model behavior (M4). Finally, students use their models to construct explanations of the phenomenon or predict how the system will behave under different circumstances (M5). Each of these practices are supported by a combination of aspects of ST and CT (Table 1). 26 Table 1: The computational modeling practices and associated ST and CT aspects Computational Modeling Practice Associated ST and CT Aspects M1. Characterize Problem or Phenomenon ST: Define a System CT: Decompose Problems M2. Define System Boundaries ST: Define a System, Frame Phenomena in Terms of Behavior over Time CT: Decompose Problems, Create Algorithmic Artifacts M3. Design and Construct Model Structure ST: Engage in Causal Reasoning, Recognize Interconnections and Feedback, Frame Phenomena in Terms of Behavior over Time CT: Create Algorithmic Artifacts M4. Test, Evaluate, and Debug Model Behavior ST: Define a System, Predicting System Behavior Based on System Structure CT: Generate and Interpret Data, Test and Debug, Make Iterative Refinements M5. Use Model to Explain and Predict Behavior of Phenomenon ST: Predict System Behavior Based on System Structure, Engage in Causal Reasoning CT: Generate and Interpret Data, Test and Debug, Make Iterative Refinements Although the science education community has established ST, CT, and modeling as key learning goals, we know relatively little about how to support students in these practices. We used the ST and CT Through Modeling Framework to develop a research tool that could help researchers investigate student use of ST and CT as they build, test, and revise models. We hypothesize that such a tool could help researchers identify which aspects of ST and CT students use more frequently or find challenging. Therefore, we investigate these research questions: How can one characterize patterns of student use of specific aspects of ST and CT as they construct and revise models? Which aspects seem to be more challenging for learners? To address these questions, we developed the ST and CT Identification Tool (ID Tool) to classify instances of students using aspects of ST and CT as they build, test, and revise models. 27 Methods Study Context and Data Sources The data used to develop and evaluate our ID Tool came from a high school chemistry unit on evaporative cooling designed to meet NGSS learning goals and enacted at a Midwestern U.S. STEM school. We designed this unit around Project-Based Learning (PBL) principles (Krajcik & Shin, 2022) in which students explore the phenomenon of evaporative cooling, use a driving question and a driving question board and conduct investigations to address the driving question. This unit also centered on students building and revising models of phenomena using an open-source semi-quantitative computational modeling tool called SageModeler (Figure 4). SageModeler is a modeling tool that allows students to construct semi-quantitative models without using formal programming language (https://sagemodeler.concord.org). Students can test these models using a simulation function to generate model output and using graphs constructed from the output or imported from real-world data. To collect data on students building and revising their models, we used 15 hours of screencasts from five pairs of students for a total of 75 hours. Screencasts record the students’ actions on their laptop screens and record student audio while they are building and manipulating their computational models. Instrument Development Content validity refers to the extent all aspects of our framework align with the literature. For content validity, we conducted an extensive literature review of CT, ST, and modeling, and deconstructed each practice into smaller aspects and sub-aspects (Shin et al., 2022). We examined specific aspects of ST and CT to define how students should be able to use their knowledge through five modeling practices. During the development processes, our research team – including experts in science, learning sciences, learning technology, and science education – defined, reviewed, and revised the modeling process and specified aspects of ST and CT in the context of modeling through discussing disagreements and ambiguities, continuing (or updating) our literature review, and teachers’ and students’ data collected from implementation. Our research team expanded on this work, developing a theoretical framework describing how specific aspects of ST and CT are applied through five distinct modeling practices. These 28 processes confirmed the theory-based modeling process outlined in the framework, ensuring that the ST and CT aspects were operationalized to monitor student involvement in these aspects while modeling. Construct validity is the extent to which the indicators of our ID Tool measure our intended constructs. To accomplish construct validity, our approach focuses on defining indicators (evidence) clearly and comprehensively and describing measurable (observable) behaviors that are present when learners are utilizing the desired ST and CT aspects through modeling. We first decomposed the various aspects of ST and CT associated with each modeling practice into smaller sub-aspects. For example, the computational modeling practice of “test, evaluate, and debug model behavior” (M4) is supported by the ST aspects of “defining a system” and “predicting system behavior based on system structure” along with the CT aspects of “generating, organizing, and interpreting data,” “testing and debugging,” and “making iterative refinements” (Table 1). These ST and CT aspects can in turn be broken down into more specialized sub- aspects. Within the CT aspect of “testing and debugging” we identified three key sub-aspects associated with the modeling practice of “test, evaluate, and debug model behavior”: “detecting issues in an inappropriate solution,” “fixing issues based on the behavior of the artifact,” and “confirming the solution using multiple starting conditions.” We then identified specific learner-generated behaviors or knowledge products that can be associated with one or more of these sub-aspects of ST and CT. These behaviors were operationalized as indicators. In this study, we are specifically focusing on indicators associated with “testing and debugging” as the literature strongly suggests that students often have difficulty fully participating in its associated ST and CT aspects (Grover & Pea, 2018). The six indicators associated with the (M4) modeling practice behavior are listed below, along with their respective ST and CT aspects and sub-aspects. 4A. Analyzing and Sensemaking through Discourse. ST: defining a system (redefining model structure) and predicting system behavior. CT: testing and debugging (detecting faults and fixing faults) 4B. Analyzing Model Output: Simulations. ST: Not applicable (NA). CT: interpreting data (generating data, analyzing data), testing and debugging (detecting faults, confirming solutions), and iterative refinement (verifying solutions) 29 4C. Analyzing Model Output: Graphs. ST: NA. CT: interpreting data (generating data, analyzing data), testing and debugging (detecting faults, confirming solutions), and iterative refinement (verifying solutions) 4D. Analyzing and Using External Data. ST: NA. CT: interpreting data (generating data, analyzing data) and iterative refinement (verifying solutions) 4E. Using Feedback. ST: defining a system (redefining model structure). CT: testing and debugging (fixing issues) and iterative refinement (making modifications and verifying solutions) 4F. Reflecting upon Iterative Refinement. ST: defining a system (redefining model structure). CT: testing and debugging (fixing issues) and iterative refinement (making modifications and verifying solutions). The modeling research team also reviewed the indicators to determine if they plausibly indicated student use of individual aspects of ST and CT. Once these indicators were reviewed, we further refined and developed them into a four-part classification system (ranging in ascending order from Level 1 to Level 4) to explore the sophistication of student use of these aspects. We then conducted an interrater reliability test for these indicators, which demonstrated a 91.7% agreement (Cohen’s Kappa, .87) between two independent coders. Data analysis and findings To code students’ collaborative interactions as they built a computational model with SageModeler, we analyzed screencast data using Atlas.ti to organize the data according to the four levels of the ID Tool and to determine the relative frequency of each of these six indicators. The patterns of each group as well as among groups were analyzed to characterize how students used ST and CT aspects during modeling and which aspects seemed challenging for learners. Below we summarize how student behaviors served as evidence for ST and CT by matching indicators from the ID tool with observations from our study. The computational modeling practice of “test, evaluate, and debug model behavior” occurs as students evaluate their models and consider changes they need to make so that their models more accurately reflect their understanding of the phenomenon. Given the various approaches students can take to evaluating and 30 revising their models, we have identified six observable indicators as evidence that students are involved in different aspects of this modeling practice as listed above as 4A-4F. As students worked on refining their models with their partners, they often discussed specific model relationships and/or broader model behavior. For example, when evaluating their early model, two students had this conversation regarding relationships between variables: “Student 1: As the molecular energy increases, that makes the molecular spacing of the substance increase. That’s good. And then the spacing of the air molecules increases, this [molecular spacing] stays the same until it [spacing of the air molecules] gets small as there’s not a lot of space. Student 2: Makes sense. Student 1: We can change this [the relationship between spacing of air molecules and molecular spacing]. Student 2: Yeah, I don’t think that makes sense. Student 1: Is it the other way then? Student 2: Maybe?” This conversation is an example of indicator 4A, students participating in analyzing and sensemaking of model structure through discourse. At Level 1, students verbalize the changes they made to their model or describe the area of their model they believe needed further revisions, but do not provide reasoning for making these changes or why an area of their model needs improvement. If a student verbalizes reasoning but does not participate in a mutual dialogue with their partner, they are considered performing at Level 2. To progress to Level 3 requires that students engage in a back-and-forth dialogue by providing reasoning for making key changes to their model. Evidence for Level 4 would be a student exchange in which they consider how changes to their model would impact the behavior of the model. Because the students in this example provide a brief amount of reasoning as they are trying to justify the relationship between the spacing of air molecules and molecular spacing and both students contribute to the discourse (although Student 2 does so in a minimalist manner), we consider this to be evidence of students participating at Level 3 for Indicator 4A. In addition to listening to student discourse, we observe students using model output features to test their model’s behavior. SageModeler offers two ways of generating and analyzing model output: manipulating variables using the simulation tool and generating graphs. Both actions produce observable 31 indicators (4B and 4C, respectively). The simulation tool allows students to manipulate the relative amount of each input to test its impact on model behavior (Figure 4A). If students adjust the relative amount of one or more input variables, but do not verbalize their interpretation of this process, we consider this evidence of some attempt to interpret data, even if only at Level 1. Once students begin verbalizing their interpretation of the testing process (either by identifying specific flaws in their model or by stating that their model is functioning in accordance with their expectations), we can map their progress to Level 2. Identifying Levels 3 and 4 requires that students participate in a meaningful back and forth dialogue with either their partner, one or more peers, or a teacher. If the conversation focuses on the smaller aspects of model behavior (centering on a single causal chain), we consider this evidence of Level 3 for interpreting data using the simulation tool, while evidence of Level 4 would require a more holistic discussion of the model (e.g., focusing on how multiple causal chains impact each other). Students can also use SageModeler to generate graphs that analyze the relationships between two variables from their model (Figure 4B). If students unsuccessfully attempt to make a graph of two variables from the model output, we consider this evidence of Level 1 for indicator 4C. If students successfully make a graph of two variables, but do not discuss their interpretation of this graph, their behavior can be categorized as Level 2. Evidence for Level 3 requires students to participate in a dialogue where they discuss their interpretation of the graph and its implications for those two variables in isolation from the broader context of the model. If students consider the implications this graph has for both these two variables and broader model behavior, this is considered evidence for Level 4 for generating and analyzing data, detecting faults, and confirming and verifying a solution. In Figure 4A, these students used the simulate feature to look at their model output, but neither student verbalized this process in a meaningful way, so we inferred that the students were performing at Level 1 for indicator 4B. The students who made the graph in Figure 4B verbalized their interpretation of the graph, stating, “This graph shows that as IMF increases the potential energy increases but then plateaus. That makes sense to me.” This is evidence of Level 3 performance for indicator 4C. 32 Figure 4: Student use of model output analysis features in SageModeler Figure 4A: Student use of “Simulate” feature in SageModeler Figure 4B: Student use if graphs in SageModeler Just as students can analyze their model output to see if their model behaves according to their understanding of the phenomenon, students can also examine external data sources to verify if their models accurately describe the phenomenon as evidenced by indicator 4D. When students superficially refer to the existence of data or loosely reference dubious data sources, they are at Level 1. At Level 2, students reference external data (from real-world observation or specific information provided by instruction or readings) to inform or justify changes made to their models, but do not actively compare these data to their model output. Once students progress to comparing specific pieces of real-world data to their model output, they can be said to be engaging at Level 3. This is particularly evident if they input quantitative external data into the modeling program and directly compare it to their model output (Figure 33 5). Finally, if students compare and contrast their external data to model output and discuss the validity of the external data, this is evidence of Level 4 performance. In Figure 5, the students input real-world data from an experiment into SageModeler but did not actively compare these data to their model output, indicating a Level 2 performance. Figure 5: Students inputting external data Another important way students can receive feedback on their models is through discussions with peers or a teacher, which can inform further revisions, allowing students to engage in using feedback to inform model revisions (Indicator 4E). If the feedback students receive does not inform any changes to their model or prompt further analytical discourse, they are at Level 1. Note that if the feedback they receive is inappropriate and students do not discuss why this feedback was inappropriate their behavior would still be indicative of a Level 1 performance. Students who use this feedback to make changes to their models but have neither a discussion with their partner before making these changes nor test their models after making these changes are at Level 2. Once students use this feedback to either spark an analytical discussion or analyze their model’s behavior after making recommended changes, they are operating at Level 3. If students then address the originator of the feedback or have a conversation with another student group about why they made these changes or what new insights have emerged from their testing and debugging of these changes, their behavior can be categorized as being at Level 4. For instance, one of their peers asked another pair to remove “density” as a variable from their model, arguing that it was not necessary to explain the phenomenon. This pair of students then removed the density 34 variable but did not discuss why they were removing this variable. We classify this as performing at Level 2. Finally, students should be given opportunities to reflect on the changes they have made to their models. Students can participate in reflecting on iterative refinement (4F) through discussion or writing as seen in Figure 6. Student level of expertise is suggested by the depth and richness of the insights they exhibit into their own revision process. When students give surface-level feedback on the quality of their models at a given point in time, without considering the changes they have made or the reasons for making these changes, they are performing at Level 1. To infer Level 2 performance, students list specific changes that they have made to their models, but do not provide any detailed reasons for making them. Evidence for Level 3 performance requires that students reflect upon specific changes to their models and explain their reasoning behind these changes. Finally, students performing at Level 4 emphasize broader changes that have occurred to their models over a longer period (often across multiple revisions) and provide an explanation as to how their model has evolved. In Figure 6, a pair of students list the changes they have recently made to their models and give specific reasons for making these changes (peer feedback and changes in conceptual understanding). As this reflection focuses on more immediate changes and not broader patterns, it is evidence of Level 3 performance. Overall, these results support Research Question 1 (How can one characterize patterns of student use of specific aspects of ST and CT as they construct and revise models?) as they demonstrate how our ID Tool can be used to identify and classify specific instances of students using ST and CT as they are testing and debugging their models. Figure 6: Student’s written reflections on iterative refinement 35 Using our ID Tool, we examined the screencasts of five student groups during an evaporative cooling unit. We then compared the relative amount of time (as determined by the number of 10-minute intervals where students were involved in at least once in a respective indicator) these students spent participating in each of the six sets of behaviors we viewed as indicators of involvement with the modeling practice of testing and debugging (Table 2). Incidents were recorded for each 10-minute block and data from all five groups were aggregated to compare the relative amount of time coded for the presence of each indicator. Time points where students were not exhibiting any indicators were excluded from this data set and students could exhibit multiple indicators within one 10-minute block. The results suggest that students spent a large portion of their time using discourse-based strategies to analyze their models (as seen by their high use of Analyzing and Sensemaking of Models through Discourse [4A], 59.5%) and often utilized the simulation features present within the modeling program. However, these students seem less likely to use external data sources to drive their revision process ([Indicator 4D], 15.1%) and even more hesitant to use graphs to analyze their model output ([Indicator 4C], 3.2%). This suggests that additional scaffolds are likely needed to support student participation in these activities. It is important to note that although students were more likely to participate in Analyzing and Sensemaking of Models through Discourse (Indicator 4A), many exhibited performance only at Level 1 and Level 2 behaviors, which might indicate that performing at higher levels for these indicators were more challenging for them. For instance, we observed several instances where the student in charge of the cursor dominated the sensemaking discussion, while their partner provided minimal feedback or verbal sensemaking support. Overall, these results address Research Question 2 (Which aspects seem to be more challenging for learners?) by suggesting that aspects associated with Indicator 4C are more challenging for students or are less supported by either the curriculum or their teachers compared to aspects associated with Indicators 4A or 4B. 36 Table 2: Relative time spent participating in each indicator for all five groups Indicator Total time Spent* Relative Percentage 4A. Analyzing models: Discourse 75 (G1:13, G2: 24, G3: 16, G4: 9, G5: 59.5 13) 4B. Analyzing Model Output: 48 (G1: 4, G2: 15, G3: 12, G4: 5, G5: 12) 38.1 Simulations 4C. Analyzing Model Output: Graphs 4 (G1: 0, G2: 0, G3: 4, G4: 0, G5: 0) 3.2 4D. Analyzing External Data 19 (G1: 6, G2: 6, G3: 3, G4: 2, G5: 2) 15.1 4E. Using Feedback 36 (G1: 11, G2: 5, G3: 3, G4: 11, G5: 6) 28.6 4F. Reflecting upon Iterative 41 (G1: 3, G2: 12, G3: 11, G4: 5, G5: 10) 32.5 Refinement Total 223 (G1:37, G2:62, G3: 49; G4:32 G5:43) Note: * Total number of 10-minute coding blocks where indicator is present at least once. G: group. GX: total coding blocks where indicator is present. Due to students often participating in multiple indicators in a single 10-minute block and the “relative percentage” referring to the percentage of coding blocks where this indicator is present, the sum of the relative percentages does not add up to 100%. Conclusions and implications These results demonstrate how the ID Tool can be used to characterize patterns and challenges of student use of specific aspects of ST and CT as they construct and revise models. While the instrument described 37 in this paper focuses on aspects of ST and CT used during model revision, we have also developed other indicators for ST and CT that need to be validated. A draft of these additional indicators can be found at https://tinyurl.com/2ft6rkza. Building off the ST and CT Through Modeling Framework, our ID Tool seeks to connect abstract ideas of student cognition with concrete indicators that can be observed through screencasts, classroom videos, or direct observation of students in classrooms. As each indicator is grounded in specific aspects and sub-aspects of ST and CT, it can be used to track how students are using ST and CT in various learning activities across disciplines. Therefore, this tool can be used to develop future research instruments such as teacher and student interview protocols and classroom observation instruments as well as assist with creating ST and CT integrated learning activities. While the ID tool is primarily designed for research use, it can be modified to be used by teachers to help them identify moments where students are using ST and CT. Overall, our ID Tool represents an important step in developing a meaningful instrument for monitoring student use of ST and CT while constructing and revising models in realistic classroom settings. Further validity studies based on students’ data in various learning contexts are needed to iteratively revise this ID Tool as an evidence-based principled tool to observe student use of ST and CT as they construct and revise computational models. We are also in the process of utilizing this tool to further investigate how students utilize ST and CT aspects within a computational modeling context. Given the increased need and benefits to incorporate ST, CT, and modeling into science education, there is a growing demand for tools that can support researchers, curriculum developers, and teachers in classifying instances of student use of these practices. Our research efforts on the ID Tool seek to further research in ST, CT, and computational modeling and promote the integration of these three research fields to support student learning. 38 PAPER 2: EXAMINING STUDENT TESTING AND DEBUGGING WITHIN A COMPUTATIONAL SYSTEMS MODELING CONTEXT Abstract Interpreting and creating computational systems models are important goals of science education. One aspect of computational systems modeling that is supported by modeling, systems thinking, and computational thinking literature is “testing, evaluating, and debugging models.” Through testing and debugging, students can identify aspects of their models that either do not match external data or conflict with their conceptual understandings of a phenomenon. This disconnect encourages students to make model revisions, which in turn deepens their conceptual understanding of a phenomenon. Given that many students find testing and debugging challenging, we set out to investigate the various testing and debugging behaviors and behavioral patterns that students use when building and revising computational systems models in a supportive learning environment. We designed and implemented a six-week unit where students constructed and revised a computational systems model of evaporative cooling using SageModeler software. Our results suggest that despite being in a common classroom, the three groups of students in this study all utilized different testing and debugging behavioral patterns. Group 1 focused on using external peer feedback to identify flaws in their model, Group 2 used verbal and written discourse to critique their model’s structure and suggest structural changes, and Group 3 relied on systemic analysis of model output to drive model revisions. These results suggest that multiple aspects of the learning environment are necessary to enable students to take these different approaches to testing and debugging. Introduction Science education researchers and policymakers increasingly recognize the importance of involving learners in modeling. From the Next Generation Science Standards (NGSS) in the United States to South Korea’s new Korean Science Education Standards (KSES) and Germany’s science educational standards (KMK), policymakers have written scientific modeling into their science standards (KMK, 2005a, 2005b, 2005c; National Research Council [NRC], 2012; NGSS Lead States, 2013; Song et al., 2019). While each of these key policy documents has somewhat different viewpoints on using modeling in science 39 classrooms, they, along with many scholars, generally agree that scientific modeling is a process of creating or interpreting a representation of a phenomenon that can be used to explain or predict the behavior of that phenomenon (Harrison & Treagust, 2000; Louca & Zacharia, 2012; Mittelstraß, 2005; Schwarz et al., 2009; Schwarz & White, 2005). There are multiple ways of approaching modeling within science classrooms. Teachers can have students examine and interpret pre-existing models, investigating what these models demonstrate about natural phenomena and their inherent limitations (Krell et al., 2015). Students can also construct models of phenomena as sensemaking tools and to communicate their ideas to others (Bierema et al., 2017; Passmore et al., 2014; Schwarz et al., 2009). Just as there are multiple approaches to using models, students can construct multiple types of models, including mathematical models, diagrammatic models, and computational models (Grosslight et al., 1991; Harrison & Treagust, 2000; Zhang et al., 2014). Computational modeling uses algorithms or algorithmic thinking to create a model that represents the behavior of a system in a quantitative or semi-quantitative manner (Fisher, 2018; Pierson & Clark, 2018; Sengupta et al., 2013; Shin et al., 2021, 2022; Weintrop et al., 2016). Computational models can be valuable tools for science learning; by combining the visual aspects of diagrammatic models with the mathematical capabilities of mathematical models, computational models are responsive to new data inputs and can be tested and debugged (Campbell & Oh, 2015; Fisher, 2018; Pierson & Clark, 2018; Sengupta et al., 2013; Shin et al., 2022; Sins et al., 2005; Weintrop et al., 2016; Wilensky & Reisman, 2006). While computational modeling programs have existed for decades, their use in K-12 classrooms remains limited. This absence can partially be attributed to the siloed nature of the three main bodies of literature underpinning our conceptualization of computational modeling: modeling, systems thinking (ST), and computational thinking (CT) (Shin et al., 2022). These three cognitive processes are all recognized individually as important goals for science learning, and their intrinsic synergy is a growing interest in the field (NRC, 2012; Sengupta et al., 2013; Shin et al., 2022; Shute et al., 2017; Weintrop et al., 2016). 40 Within computational modeling, several overlapping practices allow students to utilize ST and CT as they build computational models (Shin et al., 2022). One computational modeling practice that has strong foundations in ST and CT literature is the practice of testing, evaluating, and debugging model behavior (Aho, 2012; Basu et al., 2016; Grover & Pea, 2018; Shin et al., 2022; Stratford et al., 1998; Weintrop et al., 2016; Wilensky & Reisman, 2006). Debugging is a practice unique to computational contexts as it requires that the model be defined in an algorithmic manner such that its output can be calculated by changing the relative amount of each input variable (Emara et al., 2020; Li et al., 2019; McCauley et al., 2008). By manipulating the relative amount of each input variable, students can test their models to see if they behave according to their understanding and expectations of the phenomena and make changes based on these tests (Brennan & Resnick, 2012; Hadad, 2020; Li et al., 2019; Shin et al., 2022; Stratford et al., 1998). Likewise, students can compare their model output to real-world data to further modify and improve their computational models (Campbell & Oh, 2015; Shin et al., 2021; Weintrop et al., 2016; Wilensky & Reisman, 2006). While testing and debugging is an important aspect of computational modeling, students often find it challenging (Li et al., 2019; Sins et al., 2005; Stratford et al., 1998; Swanson et al., 2021; Wilensky & Reisman, 2006). Grapin et al. (2022) and Stratford et al. (1998) suggest that students are reluctant to examine and interpret model output to inform later model revisions. Given that testing and debugging is both an affordance and a challenge within computational modeling, it is important to investigate how students test and debug as they revise computational models. In this paper, we categorize how students test and debug computational models within a constructivist classroom environment. We are interested in the different testing and debugging behavioral patterns students utilize during the model revision process. By categorizing how students test and debug their models within a constructivist learning environment (centered on Project-Based Learning [PBL] principles), we can hypothesize which aspects of the learning environment best support students in this endeavor. Before summarizing our investigative methods, we review the literature underpinning our 41 conceptualization of constructivism, computational modeling, and the modeling practice of testing and debugging. Literature Review Constructivism and Project-Based Learning For the past several decades, efforts at improving science education have centered on enacting constructivist philosophies and pedagogies in science classrooms (Fosnot, 1996; NRC, 2007, 2012). Constructivism argues that people do not absorb new knowledge in a pure form, but instead interpret new information through the lens of prior knowledge, experiences, and social relationships, thereby constructing their own knowledge based on their interactions with the world around them (Fosnot, 1996; Krahenbuhl, 2016; Pass, 2004). Advocates for constructivist approaches in science education push back against transmission-based approaches to teaching and learning, such as the Initiate, Respond, and Evaluate (IRE) model of classroom discourse (Berland & Reiser, 2009; Lemke, 1990; Mehan, 1979). Instead, they endorse classroom environments that allow students to engage in meaningful investigations of real-world phenomena so that they can build a deeper understanding of both science content and scientific practices (Berland et al., 2016; Krajcik & Shin, 2022; NRC, 2012; Windschitl et al., 2020). In the United States, this push for a constructivist approach to science education led to the development and adoption of the Next Generation Science Standards (NGSS), which prioritizes having students engage in authentic science practices, including modeling and computational thinking (NRC, 2012; NGSS, 2013). Within the broader umbrella of constructivist approaches to science education, there are several frameworks for designing and implementing constructivist lessons in K-12 classrooms, including Ambitious Science Teaching (Windschitl et al., 2020), the 5E instructional model (Duran & Duran, 2004), and Project Based Learning/PBL (Krajcik & Shin, 2022;). PBL is a student-centered, constructivist approach to teaching and learning science (Krajcik & Blumenfeld, 2006) that emphasizes collaboration, inquiry, authentic problem solving, student autonomy, and teacher facilitation. The PBL approach to curriculum design is built around five key principles: centering lesson planning on learning goals that allow students to show mastery of both science ideas and science practices, building student engagement 42 using intriguing phenomena and driving questions, allowing students to explore the driving question and phenomena using authentic scientific practices, tasking students with creating knowledge products (models, explanations, or arguments) that demonstrate student learning, and scaffolding student learning through the use of appropriate learning technologies (Krajcik & Shin, 2022; Shin et al., 2021). This approach has been shown to enhance students’ understanding of scientific concepts (Geier et al., 2008; Hmelo-Silver et al., 2007; Karacalli & Korur, 2014; Schneider et al., 2022) and positively impact some affective aspects like self-efficacy and motivation for learning (Fernandes et al., 2014; Schneider et al., 2016; Wurdinger et al., 2007). Computational Modeling Computational models are algorithmic representations that allow users to simulate the behavior of a phenomenon under multiple starting conditions (Brennan & Resnick, 2012; Fisher, 2018; Pierson & Clark, 2018; Shin et al., 2021, 2022; Sengupta et al., 2013). Students engage in computational modeling as they construct, test, revise, and evaluate computational models. Computational modeling is rooted in constructionist pedagogies, some of which strongly advocate for computational modeling as a mechanism for science learning (Papert, 1980; Papert & Harel, 1991; Pierson & Clark, 2018; Sengupta et al., 2013). Constructionist pedagogies argue that students learn best when given opportunities to construct and revise knowledge products in ways that promote authentic sensemaking (Kafai, 2005; Papert & Harel, 1991; Pierson & Clark, 2018). As computational models provide an environment where students can build and test different ways of representing a phenomenon, computational modeling facilitates sensemaking and, therefore, connects well with constructionism (Farris et al., 2019; Fisher, 2018; Papert, 1980; Pierson & Clark, 2018; Sengupta et al., 2013). Over the past few decades, the integration of computational modeling in science classrooms has been piloted by many researchers from both systems thinking (ST) and computational thinking (CT) perspectives (Arnold & Wade, 2017; Booth-Sweeney & Sterman, 2007; Brennan & Resnick, 2012; Forrester, 1971; Stratford et al., 1998; Weintrop et al., 2016; Wilensky & Reisman, 2006). Systems thinking approaches the exploration of a phenomenon as a series of interconnected elements that work 43 together to create a system with emergent behavior that is more than the sum of its constituent parts (Arnold & Wade, 2015; Cabrera et al., 2008; Forrester, 1971; Hmelo-Silver & Azevedo, 2006; Meadows, 2008; Riess & Mischo, 2010). ST literature encompasses both agent-based modeling and system dynamics modeling. In the context of system dynamics, this literature tends to focus on how students include key structural elements in their computational models and how they represent a system’s behavior over time (Booth-Sweeney & Sterman, 2007; Cronin et al., 2009; Sterman & Sweeney, 2002). Other researchers often focus on how students use CT as they build and revise computational models (Brennan & Resnick, 2012; Swanson et al., 2021; Weintrop et al., 2016; Wilensky & Reisman, 2006). CT is a form of sensemaking that uses an iterative and quantitative approach to decompose a phenomenon or problem to explore, explain, and predict the behavior of that phenomenon or to find a solution to a problem through the creation and revision of algorithms (Grover & Pea, 2018; Psycharis & Kallia, 2017; Schwarz et al., 2017; Shin et al., 2022; Weintrop et al., 2016; Wing, 2006). Because the CT community has its origins in computer science education, CT literature emphasizes the algorithmic nature of computational models, in how students construct and revise their models (Brennan & Resnick, 2012; Weintrop et al., 2016). Additionally, the relationship between computational modeling and computational thinking has been well-established in the fields of mathematics and engineering education (Bakos & Thibault, 2018; Benton et al., 2017; Magana & Coutinho, 2017; Zhang et al., 2020). Zhang et al. (2020) found that engineering students who incorporated the practice of computational thinking within their model construction practices experienced a significant increase in learning outcomes. Similarly, Magana and Coutinho (2017) demonstrated the consensus among engineering experts in academia and industry on the crucial role of preparing future engineers to use computational models in problem solving. Furthermore, in mathematics education, studies have shown improved learning outcomes as students engage in computational thinking through basic programming (Bakos & Thibault, 2018; Benton et al., 2017; Gleasman & Kim, 2020). Researchers in both disciplines have at various times addressed similar research questions and agree on many of the core components of computational modeling, including the crucial nature of testing 44 and debugging (Barlas, 1996; Brennan & Resnick, 2012; Shin et al., 2022; Sins et al., 2005; Stratford et al., 1998; Swanson et al., 2021; Wilensky & Reisman, 2006). Given this overlap between the ST and CT literature within computational modeling, “A Framework for Computational Systems Modeling” describes how ST and CT are expressed within computational systems modeling and support students in building, testing, and revising computational models (Shin et al., 2022). Within this framework, five computational systems modeling practices build on key aspects of both ST and CT (Shin et al., 2022). While each of these modeling practices represent possible avenues for students to develop ST and CT competencies, it is impractical to develop a singular research instrument to assess all aspects of this framework. Therefore, to conduct a more cohesive study, we chose to specifically focus on the modeling practice of “test, evaluate, and debug model behavior” as it is a particularly challenging aspect of computational systems modeling for many students (Figure 7) (Grapin et al., 2022; Li et al., 2019). Figure 7: Visual Representation of our Framework for “Test, Evaluate, and Debug Model Behavior” This diagram is a visual representation of the various ST and CT aspects that are included in our understanding of the computational systems modeling practice of “Test, Evaluate, and Debug Model Behavior” based on the work of Shin and colleagues (2022). On the left-hand side are the various ST subaspects that flow into the ST aspects that support this practice while the right-hand side shows the CT aspects and subaspects involved in this practice. 45 Test, Evaluate, and Debug Model Behavior Testing, evaluating, and debugging model behavior describes a broad range of strategies found across modeling, system dynamics, and computational thinking literature (Barlas, 1996; Brennan & Resnick, 2012; Campbell & Oh, 2015; Csizmadia et al., 2015; Gilbert, 2004; Li et al., 2019; Sins et al., 2005). Testing and evaluating hypotheses is a core aspect of scientific inquiry (Gilbert, 2004; Lederman, 2013; NRC, 2012). Through this iterative process, scientists revise their understanding of natural phenomena. Testing and evaluating are also crucial for students constructing scientific models in K-12 settings (Campbell & Oh, 2015; Gilbert, 2004; Louca & Zacharia, 2012; Schwarz et al., 2009). Ideally, students have multiple opportunities to test their models through experiments and revise their models based on their results. As iterative refinement helps students make sense of a phenomenon in a constructionist manner, it is considered a key element of metamodeling knowledge (Schwarz et al., 2009; Krell & Kruger, 2016). In computational modeling, both the systems dynamics and CT communities agree on the importance of testing and debugging (Barlas, 1996; Brennan & Resnick, 2012; Csizmadia et al., 2015; Sins et al., 2005). Several system dynamics studies recognize model evaluation or interpretation (i.e., students’ ability to meaningfully analyze model output data and determine how their model functions based on its structures) and model revision (i.e., changes students make to their models based on their model evaluations) as core components of computational modeling (Barlas, 1996; Hogan & Thomas, 2001; Stave, 2002). Likewise, CT literature also emphasizes the importance of troubleshooting or debugging and iterative refinement (Brennan & Resnick, 2012; Csizmadia et al., 2015; Katz & Anderson, 1987; Li et al., 2019; Swanson et al., 2021; Wilensky & Reisman, 2006). Troubleshooting occurs when a problem is identified in an algorithmic system (Jonassen & Hung, 2006; Li et al., 2019). Once identified, a systematic search for the source of the problem is often conducted through debugging techniques (Aho, 2012; Li et al., 2019; Sullivan & Heffernan, 2016). Iterative refinement involves making gradual changes to an algorithmic system (in this case, a computational model), and often happens in response to new information (Brennan & Resnick, 2012; Ogegbo & Ramnarain, 2021; Shute et al., 2017). 46 Building on this literature, our view of the modeling practice of testing, evaluating, and debugging involves students first evaluating model structure (Hmelo-Silver et al., 2017) and model output (Hadad et al., 2020), then comparing their model to their conceptual understandings and/or external data (Weintrop et al., 2016), and finally, making informed changes to their model based on these analyses (Aho, 2012; Sengupta, 2013). Within our framework (Figure 1), the synergy between ST and CT in supporting students in this practice is thoroughly fleshed out (Shin et al., 2022). The ST aspects of causal reasoning and predicting system behavior based on system structure often help students evaluate their model structure and make informed decisions about model revisions (Lee & Malyn-Smith, 2020; Shute et al., 2017). The CT aspects of iterative refinement, data analysis, and systematic troubleshooting help students identify flaws in their models so that they can make necessary changes (Aho, 2012; Sengupta et al., 2013; Türker & Pala, 2020; Yadav et al., 2014). Despite being identified as a core aspect of computational modeling across many studies, testing and debugging is challenging (Grapin et al., 2022; Li et al., 2019; Sins et al., 2005; Stratford et al., 1998). Students often hesitate to revise their models based on new evidence, and those who make changes tend to be conservative with their model revisions (Grapin et al., 2022; Swanson et al., 2021; Wilensky & Reisman, 2006). Another study suggests that students often take an ad hoc outcome-oriented stance toward testing and debugging (Sins et al., 2005). In these cases, students seek to modify their models so that they match an external set of data using the minimal number of changes possible, rather than focusing on having their models match their conceptual understanding of the phenomenon (Li et al., 2019; Sins et al., 2005; Wilensky & Reisman, 2006). This often results in models that functionally produce the correct outcome, but often lack internal consistency and explanation power (Sins et al., 2005; Wilensky & Reisman, 2006). Additionally, this outcome-oriented approach greatly reduces the potential of testing and debugging to support student learning by shifting the modeling process away from being a sensemaking tool towards being an ad hoc engineering problem (Hogan & Thomas, 2001; Sins et al., 2005). Given these challenges, finding evidence of students using testing and debugging in sophisticated ways and identifying aspects of a learning environment that can support students in this work becomes critical. 47 Research Questions Although the ST and CT literature argues for the importance of students using the modeling practice of test, evaluate, and debug model behavior, research shows that students often have challenges with this practice (Li et al., 2019; Sins et al., 2005; Stratford et al., 1998; Swanson et al., 2021; Wilensky & Reisman, 2006). Our goal was to identify instances of students testing and debugging their computational models and to examine different behavioral patterns that student groups can use to engage in this practice. In this paper, we define “behavior” as a distinct student action or series of actions occurring within a discrete timeframe and “cognitive behavioral pattern” as a long pattern of behaviors found across multiple episodes that suggest a generalized approach to testing and debugging. Additionally, we recognize that the learning environment can either support or hinder students (Assaraf & Orion, 2005) in building proficiency with testing and debugging. We thus set out to answer the following research questions within a design-based research environment centered on a high school chemistry unit on evaporative cooling developed according to PBL principles. RQ1. What different cognitive behavioral patterns do students use to approach testing and debugging within a computational modeling unit on evaporative cooling? RQ2. What testing and debugging behaviors do students seem to use more frequently within the context of a computational modeling unit on evaporative cooling? Methods Study Context and Learning Environment Learning Environment and Participants This study is based on data collected in January-February 2020 from a six-week high school chemistry unit on evaporative cooling. This unit was implemented at a STEM magnet school (which we call Faraday High School or FHS as a pseudonym) in a small Midwestern city. While it is a publicly funded institution, students need to apply to this school from across a broad catchment area consisting of three counties with admission based primarily on student academic test scores. Approximately 21% of the student body is part of a racial or ethnic minority and approximately 54% percent of students are on free 48 or reduced lunches. Two of the authors (Observers A and B) partnered directly with two high school chemistry teachers (Mr. H and Mr. M). Mr. H is a middle-aged White male with approximately 15 years of teaching experience and Mr. M is a young White male with 4 years of teaching experience. For this unit, Mr. H and Mr. M each taught 2-3 sections of 10-25 students for a total of 103 student participants. As a sophomore chemistry class, this was the first high school level chemistry class for these students, with their first year spent learning key physics concepts. Because FHS runs on a block schedule, each section meets for 80 minutes every other day. Curriculum The evaporative cooling unit was developed using PBL principles, which include starting the unit with a driving question grounded in a real-world phenomenon, exploring the driving question and the phenomenon through engaging in science practices, and scaffolding the unit with learning technologies (Krajcik & Shin, 2022). The evaporative cooling process results in liquids getting colder during evaporation as faster moving liquid particles with a high average kinetic energy (KE) tend to be the first particles to overcome the intermolecular forces (IMFs) of attraction. Overcoming these forces is what causes molecules in the liquid phase to enter the gas phase. As these high KE particles leave the liquid phase, the average KE of the remaining liquid molecules decreases, making the substance colder. The KE of the faster moving liquid particles is transferred to the potential energy (PE) of the gas particles. At the beginning of the unit, students were initially tasked with drawing a two-dimensional model of the evaporative cooling phenomenon on whiteboards. Students were then introduced to the SageModeler computational modeling program (Damelin et al., 2017) along with some of the key aspects of computational modeling, such as the need to recontextualize the phenomenon as a set of interacting variables to create a workable computational model in SageModeler. Students then worked in small groups (two to three students) to construct and revise a computational model of this phenomenon that addressed the cooling effect of evaporation and included IMF to explain why some liquids evaporate faster than others. Students had multiple opportunities to test, debug, and revise their computational models over the six-week unit. 49 These opportunities for students to test and debug their computational models included teacher and peer feedback, written reflections, and specific features embedded in the computational modeling program. The classroom teacher regularly visited each student group to ask them questions about their computational models. These questions provided opportunities for students to identify areas in their models that needed improvement and make changes accordingly. Student groups provided structured feedback to each other. By examining the computational models of other student groups and receiving feedback on their own models, the peer feedback cycle helped students identify aspects of their computational models that needed improvement. The students were also instructed to write down their reflections on the revision process after each revision cycle. These written reflections helped students assess any recent changes they had made to their models and consider what additional revisions might be needed in later modeling sessions. Finally, students were encouraged to use the testing and debugging features embedded in the computational modeling program (defined below) as they worked within their small groups to make changes to their computational models. SageModeler Within the evaporative cooling unit, students build, test, and revise computational models using SageModeler, a free, browser-based, open-source software program. SageModeler allows students to set certain variables as “collectors” (variables that can accumulate an amount over time) and transfer valves or flows between these collector variables. Additionally, SageModeler offers two main testing and debugging features that students can use to evaluate model output and compare their models to real-world data: simulation and graphing features. Using the simulation feature students can generate model output for all variables in their model, enabling them to test how the model output changes under different initial conditions (Figure 8A). Students can assess both the overall behavior of their model and examine how specific structural changes might impact this behavior. The graphing feature of SageModeler facilitates students in testing the relationship between any two variables in their model as one input variable is being manipulated (Figure 8B). Graphs serve two principal functions; they allow students to 1) look at the correlation between two distal variables and 2) compare their model’s output to real-world data. Students 50 can generate a graph between two variables in their model and then compare this model-generated graph to a graph of real-world data (Figure 8C). Figure 8: Testing and Debugging Features of SageModeler Figure 8A: Simulation Feature This figure demonstrates the simulation feature of SageModeler. The simulate function is turned on to allow for the student to generate model output (1). The student then manipulated the input variable “IMF” (by moving its associated slider bar up and down) (2) to determine its impact on downstream variables. 51 Figure 8 (cont’d) Figure 8B: Graphing Feature of SageModeler This figure demonstrates the graphing feature of SageModeler. The students begin by using the Record Continuously icon (1), which allows them to record how the different variables change as the input variable (2) is manipulated. Using these recorded data, the students can then create a graph in SageModeler showing the relationship between any two variables (3). Figure 8C: Data Comparison Using SageModeler This figure shows how students can input external data into SageModeler and compare it to their model output. Notice that the external data (graph on the right) shows an exponential relationship between potential energy and kinetic energy, which suggests that these students need to revise their model. 52 Data Collection The primary source of data for this research are screencasts, which are both video recordings that capture the various activities occurring on a laptop screen and audio recordings of the computer’s microphone. Screencasts allow researchers to observe how students construct and revise their computational models and the dialogue between group partners. From these screencasts, we can observe changes students make to their models, ascertain their reasoning for making these changes through their dialogue, and glean insights into their approach to testing and debugging. For this study, we focus on screencast data from five groups of students, three from Mr. H’s class and two from Mr. M’s class (Table 3). These screencast groups were recommended to us by Mr. H and Mr. M as they were among their more talkative students and gave consent for the screencast process. While other students were not chosen to be screencasted, they were present in the classroom and gave permission for their classroom discourse (including conversations with screencast groups) to be recorded for this project. Note that all names described in this manuscript are pseudonyms meant to protect student identities. Non-screencast students are given letter- based pseudonyms (e.g., Student A, Student B, etc.) when engaging in conversations with screencasted students. Table 3: Student Demographics Student Group Mr. H Group 1: Andy and Ben Group 2: Leslie and Aubrey Group 3: Robert, Mark, and Jerry Group 4: Ron and Tom Mr. M Mr. M Group 5: Rashida and Donna Mr. H Teacher Grade Level 10th Male/Male Mr. H Gender Identity 10th 10th 10th 10th Female/Female Male/Male/Male Male/Male Female/Female Instrument Development To categorize how students test and debug their models, we use the ST and CT Identification Tool (ID Tool). The ID Tool is based on “A Framework for Computational Systems Modeling” (Bowers et al., 53 2022b; Shin et al., 2022) and was validated by a team of experts, who reached a 92% agreement (Cohen’s Kappa .87) among raters. Given that the six indicators of this tool are all contextualized within the computational modeling practice of test, evaluate, and debug model behavior, we used these indicators to investigate student testing and debugging behaviors in the evaporative cooling unit. The six testing and debugging indicators are listed in Table 4. Each indicator contains a four-part classification system (from Level 1 to Level 4 in ascending order) that explores the sophistication of observed student behavior. Table 4: Description of Key Indicators from the ST and CT Identification Tool Indicator Description Brief Level Descriptions A: Sensemaking through Discourse B: Analyzing Model Output: Simulations Students either verbalize their reasoning for making changes to their models or engage in conversations about why specific aspects of their models need to be improved. Students use embedded model output tools to analyze how their model behaves under different input conditions. In this case, students use the simulation tool in SageModeler to test their models. Level 1: Verbalize changes to model or identify areas needing revisions, but no reasoning Level 2: Verbalize reasoning but no mutual dialogue Level 3: Back and forth dialogue with verbal reasoning Level 4: Back and forth dialogue with verbal reasoning and impact on other parts of model Level 1: Adjusting one or more input variables, but no verbal reasoning Level 2: Adjusting input variables with verbal reasoning but no dialogue Level 3: Adjusting input variables with verbal reasoning and dialogue, focus on local behavior Level 4: Adjusting input variables with verbal reasoning and dialogue, holistic model discussion 54 Table 4 (cont’d) Indicator C: Analyzing Model Output: Graphs D: Analyzing and Using External Data E: Using Feedback F: Reflecting upon Iterative Refinement Description Students use embedded model output tools to analyze how their model behaves under different input conditions. In this case, students generate and analyze graphs in SageModeler. Students use external data sources to verify model behavior. At more sophisticated levels, students compare specific external data sources directly to their models and discuss the validity of the external data. Students receive meaningful feedback from others (teachers or peers), discuss the validity of the feedback, and use feedback to inform model revisions. At more sophisticated levels, students test their models after making recommended changes and have a follow-up discussion with others to share their new insights. Students reflect through writing or discourse on the changes they have made to their models. At more sophisticated levels, students give a defined rationale for the changes they have made. Brief Level Descriptions Level 1: Unsuccessful attempt to make a graph in SageModeler Level 2: Successful graph creation, but no interpretation Level 3: Successful graph creation with discussion of implications for the graphed variables Level 4: Successful graph creation with discussion of the broader implications for model behavior Level 1: Superficial reference to data or referencing inaccurate data Level 2: Reference external data to inform revisions but no direct comparisons to model output Level 3: Compare specific external data to model output without discussion of data validity Level 4: Compare specific external data to model output with discussion of data validity Level 1: Students receive feedback but do not discuss it or use it to inform revisions Level 2: Students make changes to their models based on feedback but do not discuss the validity of the feedback Level 3: Students receive feedback, discuss its validity, and make or do not make changes to their models based on feedback Level 4: Students receive feedback, discuss its validity, make, or do not make changes to their models based on feedback, and share reflections with another group Level 1: Ambiguous surface level reflection without reasoning Level 2: List specific model changes but do not provide detailed reasoning Level 3: List changes and reflect upon reasoning Level 4: List changes, reflect upon reasoning (with a defined rationale), and discuss broader changes to models 55 Data Analysis Using the ID Tool and Primary Analysis We used the ID Tool to conduct a primary analysis of the screencast data. Using Atlas.ti software, we annotated the screencast videos to mark instances where students were exhibiting testing and debugging behaviors based on the rubric described by our ID Tool. This annotation method was previously validated by Bowers and colleagues (2022 B). To maintain interrater reliability throughout this study, we engaged in periodic member checking where all scorers independently scored a 30-minute segment of student video to see if our coding results drifted from each other. As we annotated these specific instances using the ID Tool, we also developed descriptive memos to record notes of what was occurring in each specific episode. These descriptive memos summarized student actions in a narrative manner to make it easier for us to begin looking at broader behavioral patterns governing student testing and debugging. Because the ID Tool is primarily useful for identifying the extent of student testing and debugging behaviors, the descriptive memos were necessary for determining broader testing and debugging behavioral patterns. The primary analysis using the ID Tool along with the supplementary memos allowed us to identify instances where students were testing and debugging their models. In subsequent analysis, the ID Tool coding of the screencast results was used to create a timeline of the testing and debugging behaviors, which along with the supplementary memos, informed our narrative analysis of testing and debugging behavioral patterns. Timeline Construction and Analysis After analyzing screencast videos, we constructed a spreadsheet-based timeline for each of the five screencast groups that show which indicators students exhibited within a specific five-minute interval (Table 5). If students had separate or overlapping episodes between two adjacent time points where they exhibited indicators of A, B, and E, all three indicators were included within the timeline for that interval. Along with marking which indicators were present in each time interval, the highest level of student performance associated with said indicator within that time frame was also noted. The constructed timeline served as a tool for recording and organizing patterns of student testing and debugging behaviors 56 and subsequently informed later narrative analysis of student testing and debugging cognitive behavioral patterns and a summative quantitative analysis of student testing and debugging behaviors. Table 5 : Testing and Debugging Timeline for Group 1 Episode Time Codes 13-Jan 10:00 A(3) 15-Jan 55:00 A(1) 15-Jan 60:00 A(1), E(2), F(2) 27-Jan 50:00 E(1) A(1), E(2) 27-Jan 55:00 27-Jan 60:00 E(2) Episode Time Codes A(3) D(2) 29-Jan 5:00 E(3) 29-Jan 10:00 A(3) D(2) E(3) 29-Jan 15:00 A(3) D(2) E(2) 29-Jan 25:00 B(3) E(1) 29-Jan 30:00 B(3) D(2) E(4) F(2) Episode Time Codes A(2) B(2) 10-Feb 65:00 10-Feb 70:00 B(2) 12-Feb 5:00 A(3) E(2) 12-Feb 10:00 A(2) B(3) 12-Feb 15:00 A(3) B(2) D(2) E(1) F(2) Episode Time Codes 14-Feb 15:00 D(2) 14-Feb 20:00 D(2) E(1) 14-Feb 25:00 D(1) F(3) 14-Feb 35:00 D(2) 14-Feb 40:00 D(2) 31-Jan 5:00 A(1) F(2) 14-Feb 5:00 D(2) 14-Feb 55:00 A(2) B(2) E(2) 27-Jan 65:00 A(3), D(2), E(2) 10-Feb 60:00 A(3) B(4) E(2) 14-Feb 10:00 D(2) 14-Feb 60:00 F(3) Narrative Analysis Once the initial timeline was constructed, we conducted a narrative analysis for three student groups. While the timeline demonstrated general patterns of student testing and debugging behaviors, a more comprehensive analysis, focusing on key episodes from the screencasts, was needed to describe student testing and debugging cognitive behavioral patterns. Returning to the descriptive memos of each group, we started by looking for specific episodes that clearly demonstrated students exhibiting specific indicators. We also looked for patterns and outliers between episodes within the same student group and between student groups, so that we could articulate the major differences in the testing and debugging behaviors of these five groups to write a cohesive narrative for each group. We then compared these narratives to the timeline analysis to check for internal consistency. This allowed us to address RQ1. 57 Although we conducted a quantitative analysis using data from all five groups, we selected three groups for the narrative analysis that represent the breadth of student testing and debugging cognitive behavioral patterns. We did not select Groups 4 and 5 for the narrative analysis because their behavioral patterns overlapped greatly with those of Groups 1 and 2, respectively. We also endeavored to show the diversity of behavioral patterns that can occur within a single class of students, so our narrative analysis deliberately only includes students from Mr. H’s class. Semi-quantitative Analysis After conducting the narrative analysis, we returned to the timeline (which also served to help structure our narrative analysis) to examine student testing and debugging behaviors from a more quantitative perspective. We constructed a frequency table based on this timeline to show how frequently each indicator was observed across all five student groups and how each group differed in exhibiting the six testing and debugging indicators. By aggregating the timeline data into a single frequency table, we were able to determine which testing and debugging behaviors were most common across these five student groups. This semi-quantitative comparison of student testing and debugging behaviors primarily served to supplement the qualitative analysis of student testing and debugging behavioral patterns and functions as an additional method for visualizing our findings from our narrative analysis to address RQ2. 58 Results Research Question 1: What different cognitive behavioral patterns do students use to approach testing and debugging within a computational modeling unit on evaporative cooling? Table 6: Student Testing and Debugging Behavior Patterns Group Behavior Pattern Summary Group 1 ● Initially focused on receiving external evaluation and feedback from peers ● Shifted towards internal analysis of model output using simulation feature Group 2 ● Sensemaking discourse drove model evaluation and revision ● Reflected on reasoning behind modeling decisions to identify areas of uncertainty in their models Group 3 ● Utilized simulation and graphing features of SageModeler to systematically assess model output and drive revision This table presents a summary outlining the general testing and debugging behavior patterns each student group used and how these behaviors shifted over this unit. Group 1: Andy and Ben Compared to the other groups, Group 1 relied more on collaboration with the broader classroom community (Indicator E: Using Feedback) as a form of checking the validity of their model and figuring out ways to refine their conceptual understanding (Table 6). For example, on Day 2 when trying to set the boundaries of their system, Ben wrote to Students C and D (both from a non-screencast group) in the online platform, “What is the scale range you will be using to model the system? Will you focus only on what you have been able to observe?” Student C responded, “We will focus on what we have observed in combination with what is going on at a particle level.” The nature of this question is further clarified by Observer A who explained to these students that the idea of a “scale range” is the level at which they are modeling the phenomenon. Group 1 then decided that their model should focus on the particle level of evaporative cooling. 59 While this behavioral pattern of borrowing ideas from other groups was generally beneficial to these students, it also occasionally led them towards considering adding non-canonical variables to their model. The following excerpt is an example of a conversation between Group 1 and Students C and D that took place during a peer revision discussion. Student C: Spacing of the molecules? Isn’t that density? Ben: I mean, it is talking about how far apart they are. Student C: That is density. Student C (to Student E from a second non-screencast group): Didn’t you use density in your model? Student E: Do not use density in your model. He (Mr. H) will get upset. But the spacing of the particles is important. In this conversation, Student C tried to convince Andy and Ben to add density as a variable to their model; they were stopped from doing so by Student E’s appeal to authority (Mr. H). Although Group 1 is heavily influenced by this appeal to authority, because Andy and Ben do not simply accept peer feedback at face value but discuss it with multiple individuals and consider the validity of this feedback, they were coded at Level 3 for Indicator E. Later in the unit, their behavioral patterns shifted away from focusing on peer feedback and towards incorporating the use of simulations of their model output (B: Analyzing Model Output: Simulations) as the complexity of their model increased. While they previously opted not to use the simulation features embedded in SageModeler, they began a more deliberate testing and debugging approach. For example, after including a positive relationship between the variable “Spacing of Molecules” and the transfer valve between “Kinetic Energy” and “Potential Energy”, Group 1 decided to simulate their model output (Figure 9). Through this simulation, they recognized that although they conceptually agreed with this specific relationship, they questioned the overall behavior of their model. They were concerned about the decrease in “Potential Energy” that occurs after all of the “Kinetic Energy” has been converted into “Potential Energy.” Yet despite their use of the simulation function to 60 identify this behavioral anomaly within their model, they did not determine which specific relationship was responsible for this behavior and, therefore, were unable to make the necessary changes so that their model matched their conceptual understanding. While this was an example of Ben and Andy systematically testing their model, they had difficulty interpreting their model’s structure in a way where they can identify the source of the behavioral anomaly, suggesting a gap in their computational thinking skills. Overall, Group 1 seemed to rely initially on external feedback to help them identify flaws in their models before shifting toward using the simulation features to interpret their model’s output. Figure 9: Screenshot of Group 1 testing and debugging their dynamic model In this figure, the students from Group 1 used the simulation features to determine how kinetic energy is impacting other variables in this model. Group 2: Leslie and Aubrey While Group 1 tended to utilize discussions with other groups, Group 2 often depended on discussions with each other to make sense of the phenomenon as a system of interconnected elements and to identify where revisions were needed (Indicator A: Sensemaking through Discourse). Early in the unit, as the students were trying to decide which variables to include in their model, they had the following discussion: 61 Aubrey: I don’t know if it is right, but it makes sense. Leslie: Now we need to add another box. Aubrey: The only other variable we have is temperature. But isn’t temperature a constant? Leslie: Yes, it is. So, our model is just two things long. That’s boring. So, molecular energy goes into molecular spacing of substance. Is this all about evaporation? Aubrey: Yeah. From this conversation, we see that these students were considering the boundaries of the system while they were also using causal reasoning by reviewing the relationship between “molecular energy” (which appears to be a student-generated term that is roughly equivalent to kinetic energy) and molecular spacing. Through this discussion, they were also identifying an area of their model that needed revision, proposing a change, and considering the ramifications of this change on their model’s behavior. Thus, this is an example of students verbally testing and debugging their model. This testing and debugging is also evident in later discussions where they verbalized their interpretation of their model’s structure as they considered which changes were necessary for creating a more robust model of evaporative cooling (Indicator A: Sensemaking through Discourse). In this example, Leslie and Aubrey were trying to figure out how to revise their models in response to a recent investigation on the role of potential and kinetic energy in evaporation (Day 7). In particular, they were trying to determine how the “spacing of molecules” variable (formerly called “molecular spacing”) fits in their new conceptual understanding of evaporative cooling. Leslie: So, maybe the temperature of water also affects the spacing of molecules and then kinetic energy affects potential energy, which also affects the spacing of molecules. Aubrey: Maybe. Leslie: We’ll try it. But maybe it doesn't. Aubrey: Well, for sure this one [pointing to the “Temperature of Water” variable]. Leslie: Okay, spacing of molecules. As the temperature of water increases, the spacing of molecules increases more and more. So, remember that one model that we did. 62 Aubrey: Yeah, like the hexagon thing where they kept on getting more and more spread apart (referencing a simulation that showed how as the liquid heated up, the kinetic energy increased until it hit the boiling point, after which the potential energy started to increase as the molecules moved farther apart). Not only does this excerpt show how dialogue is manifested in the practice of testing and debugging, but it is also a clear example of how these students used external data to validate their sensemaking (Indicator D: Analyzing and Using External Data). This is subsequently followed up by the use of written reflections as an additional form of sensemaking. At first the students wrote, “as the temperature of the water (average kinetic energy) increases the molecules start gradually moving faster and hitting each other and breaking their force of attraction keeping them together, and become gas.” While this initial written explanation is an accurate justification for this relationship, they disagreed with the second part of this explanation and replaced everything after “hitting each other” with “. . . and move farther apart. As the temperature of water increases they move more quickly than before and move farther apart than they were.” This writing seemed to help these students reflect upon their causal reasoning for this relationship. Group 2 expanded upon this use of written reflection by placing explanations of each relationship directly on the SageModeler canvas (Figure 10). Writing these notes supported their causal reasoning about relationships and served as a means of considering the validity of each relationship they had encoded into SageModeler, thereby acting as an alternative approach to the type of formal testing and debugging that is often conducted at this stage in model development. Their embedded notes also had the potential to support later revision efforts as they could have identified their original rationale behind a particular relationship and considered if new evidence supported or undermined that explanation for that causal relationship (Indicator F: Reflecting upon Iterative Refinement). Collectively, their verbal dialogue and written reflections demonstrate that Group 2 engaged with testing and debugging primarily through discourse (Table 6). 63 Figure 10: Screenshot of Group 2’s Annotated Model The students in Group 2 wrote their rationales for each relationship on their model as a form of sensemaking during the testing and debugging process. Group 3: Robert, Mark, and Jerry Group 3 utilized the testing and debugging features embedded within SageModeler (Indicator B: Analyzing Model Output: Simulations and Indicator C: Analyzing Model Output: Graphs) as they analyzed their models to determine which changes to make. One interesting example of systematic testing and debugging occurred when the students inserted a “dummy variable” into their model to see the effects of adding a fourth variable on the behavior of their model (Figure 11). After inserting this dummy variable, they used the simulation feature to observe its impact on the behavior of the model as a system (Indicator B: Analyzing Model Output: Simulations). However, they quickly removed the dummy variable, suggesting their dissatisfaction with its effect on model behavior. This use of a dummy variable along with their subsequent discourse is strong evidence that these students were using testing and debugging as they made a deliberate change to their model to see how it would impact model behavior and then removed this after testing this change and determining that it was unsatisfactory. 64 Figure 11: Example of Group 3 Utilizing a “Dummy” Variable to Facilitate Testing and Debugging When Group 3 was trying to decide if any additional variables might be needed in their model, they inserted a “dummy” variable (which they named “random thing”) to see how it would impact model behavior. Another example of Group 3 using the model simulation features to support their testing and debugging occurred as they were trying to decide which relationships to set between the variables of “Kinetic Energy,” “Potential Energy,” and “Density.” It is important to note that other screencast excerpts demonstrate that these students held non-canonical ideas about “density” at this stage in the unit. Most notably, they viewed “density” as an extrinsic characteristic of a substance that decreased as a substance changed from a liquid to a gas. As such, their understanding of “density” is closer to the canonical understanding of “molecular spacing of molecules.” Jerry: Kinetic energy does what? Robert: Okay, so as intermolecular force (IMF) increases, does density increase in the end? Jerry: It would be the other way around. As IMF increases, density decreases. But . . . Robert: ...which means one of these [relationships] has to be increases and the other has to be decreases, or they both have to be decrease. Jerry: The kinetic energy (KE) is opposite of potential energy (PE). Robert: So, this would probably be the one that is decreasing? 65 Jerry: But that doesn’t make sense because of the graph he [Mr. H] showed us. Just put decreasing. Wait. Actually, it would be increasing [KE to PE] and the last one [PE to Density] would be decreasing? The students changed the relationship between “PE” to “Density” to decreasing and then simulated the model and saw that an increase in IMF causes the density to decrease (which in their understanding would mean an increase in the molecular spacing of the substance). In this example, the students first considered the overall behavior of their model of evaporative cooling. The students then analyzed the individual relationships within this system to determine how these relationships would influence system behavior. Upon identifying how these relationships would impact the model’s output, they considered how these individual relationships reflected their understanding of the real-world phenomenon and ultimately selected a specific relationship to modify. After modifying this relationship, they once again used the simulation features to see the impact of this change upon the model output. This is an example of Indicator A: Sensemaking through Discourse and Indicator B: Analyzing Model Output: Simulations. Group 3 also used the model output generated from SageModeler to make a graph of the relationship between IMF and PE. After making several changes to their model, the students tested to see how these changes impacted the overall behavior of the system. They used the simulate feature to look at how manipulating the input variable (Intermolecular Force) of their model would affect intermediate and distal output variables. Given that they were specifically interested in how the IMF impacted PE, they used the simulation output to generate a graph in SageModeler (Figure 12A). Upon making this graph, they recognized that apart from a few outlier points at the end (likely artifacts from previous simulations), there was a linear relationship between IMF and PE, which was not in line with their understanding of the relationship between these two variables. They later changed the individual relationship between IMF and PE to an exponential one, which made its associated graph into an exponential relationship (Figure 12B). This is an example of students using Indicator C: Analyzing Model Output: Graphs. While other student groups periodically used the simulation features to explore the output of their models, only this group 66 used the model output to successfully make graphs of the relationships between two variables in their model. Overall, Group 3 tended to focus on testing and debugging behavioral patterns that prioritized systematically analyzing their model output to identify areas of their model that needed improvement (Table 6). Figure 12: Example of Group 3 Exhibiting Evidence of Indicator C: Analyzing Model Output: Graphs Figure 12A: Group 3 Pre-Revision Model (Day 8) with Graphical Representation of Relationship Between IMF and PE Figure 12B: Group 3 Revised Model (Day 8) with representation of relationship between IMF and PE 67 Research Question 2: What testing and debugging behaviors do students seem to use more frequently within the context of a computational modeling unit on evaporative cooling? Table 7: Relative Occurrence of Each Testing and Debugging Behavior Indicator Group 1 Group 2 Group 3 Group 4 Group 5 Total N % N % N % N % N % N % 15 53.6 39 83.0 24 66.7 12 50.0 22 68.8 112 67.1 8 28.6 23 48.9 26 72.2 7 29.2 19 59.4 83 49.7 0 0.0 1 2.1 6 16.7 0 0.0 0 0.0 7 4.2 13 46.4 11 23.4 6 16.7 3 12.5 6 18.8 39 23.4 Sensemaking via Discourse (A) Model Output: Simulations (B) Model Output: Graphs (C) Use External Data (D) Feedback (E) 15 53.6 16 34.0 11 30.6 13 54.2 12 37.5 67 40.1 6 21.4 13 27.7 12 33.3 8 33.3 12 37.5 51 30.5 28 47 36 24 32 167 Reflecting on Refinement (F) Total Number of Intervals N is the number of five-minute intervals where we observed a particular exhibited behavioral indicator. % is the percentage of testing and debugging intervals during which this indicator was observed. Note that because student groups often exhibited multiple behaviors within any given five-minute interval, their percentages do not add up to 100. Total represents data from all groups. Based on our semi-quantitative analysis of student screencasts from all five focus groups, there is evidence that all six testing and debugging indicators were used at least once by a student group within this dataset (Table 7). Although all indicators were present, some indicators were more commonly used 68 than others. Indicators associated with sensemaking discourse (as exemplified by A) were more frequently used while indicators that are linked to comparing models to external data (D) were less frequently used. This implies that while the learning environment supported students in using sensemaking discourse, it did not sufficiently support students in comparing their models to external data (D). Additionally, there is a sharp contrast between the frequency at which students used SageModeler’s simulation (B) and graphing (C) features. Even though both of these SageModeler features allow students to examine their model output, the simulation feature focuses on how the relative amount of each variable changes as the input variables are manipulated. In contrast, the graphing feature allows students to compare the relationship between two individual variables in isolation from other aspects of the model (Figure 2b). Student preference for the simulation feature (B) over the graphing feature (C) suggests that these students found the simulation feature easier to navigate and/or more useful for the learning tasks in this unit. Given that the graphing feature tends to better support direct comparison with external data and the noted low use of external data by students in this unit, the latter explanation has merit. The results from the indicator analysis are largely consistent with the results from the narrative analysis as these data show a preferred set of behaviors for each group (Table 7). Group 1 was more apt to reference external data (D) and use external feedback (E) to drive their revision process while minimizing their use of model output simulations (B) and not using model output graphs (C). This contrasts with Group 3, which strongly prioritized analyzing model output (B) over using external data (D) and external feedback (E). Indeed, Group 3 is one of only two groups to use the graphing feature (C) and the only one to do so successfully. While Group 2 attempted to use the graphing feature (C), they strongly preferred using discourse as their primary means of model analysis (A). Group 5 also prioritized discourse (A) for testing and debugging, but also frequently utilized model simulation features (B). Finally, Group 4 had a strong preference for using feedback (Indicator E) but tended to have limited discussions on the meaningfulness of the feedback, so their behavior was assessed at a lower level using the ID Tool. Overall, these comparative results from both our narrative and semi-quantitative analyses show that despite having a common learning environment, student groups found opportunities to approach testing 69 and debugging in unique ways. This suggests that multiple scaffolds and supports are needed to help all students test, evaluate, and debug their models. Discussion Our results demonstrate that within the evaporative cooling learning environment, there is evidence of student behavior that corresponds to all six testing and debugging indicators in the ID Tool. As anticipated, some behaviors corresponding to certain testing and debugging indicators occurred more frequently than others, with students particularly spending more time analyzing their models through discourse (A) compared to other indicators (Table 7). However, it is also important to mention that differences in student behaviors, as noted by both narrative analysis and quantitative analysis, demonstrate that even within a common learning environment, student groups may adopt different approaches to testing and debugging (Tables 6 and 7). Importance of Learning Environment Because each student group used a different set of cognitive behavioral patterns for testing and debugging within a common PBL-aligned learning environment, multiple supports are likely needed to accommodate these different approaches to testing and debugging. Having multiple pathways for students to engage in the learning process allows students to leverage their unique strengths and prior knowledge to further their sensemaking endeavors (Basham & Marino, 2013; Hansen et al., 2016; Scanlon et al., 2018). Because this study suggests that students in the same class can utilize different testing and debugging behaviors and behavioral patterns, it reinforces the need to design multifaceted learning environments, so that all learners can fully participate in computational modeling. Although the learning environment in our evaporative cooling unit provided multiple pathways for students to participate in testing and debugging, two features that seemed to be the most meaningful for supporting students in testing and debugging were the simulation feature embedded in SageModeler and the use of student small groups, which facilitated discourse. Because the simulation feature allowed students to generate model output data in real time, students could test how changes in their model structures impacted model behavior and detect flaws in their models. This allowed students to analyze model output in a way that 70 would not have been possible through traditional paper-pencil modeling. By making it easier to detect areas of their models that needed improvement, the simulation feature further assisted students in revising their models, thereby encouraging testing and debugging (Fan et al., 2018; Lee et al., 2011; Shen et al., 2014). Another feature of the learning environment that supported students in testing and debugging is the use of student groups. In this unit, students worked in small groups of two to three students and were encouraged to collaborate with each other and verbalize their thought processes. By the nature of using collaborative student groups as opposed to having each student build their own models independently, students were implicitly encouraged to share their design choices and modeling behavioral patterns with their partners. These partners could in turn ask each other to provide evidence or reasoning to defend their design choices or provide a counterclaim of their own. For example, one student might state that density impacts the rate of evaporation because higher density particles evaporate slower. Another student could then argue that density does not impact the rate of evaporation because oil is less dense than water but evaporates far slower (if at all). Through such productive argumentative discourse, students can identify flaws in their reasoning and in their model construction, prompting them to revise their models (Campbell & Oh, 2015; Kyza et al., 2011; Lee et al., 2015). As such, placing students in pairs or small groups encourages them to have these sensemaking conversations (Indicator A), which in turn facilitate model evaluation and model revision, both of which are key aspects of testing, evaluating, and debugging model behavior. In a similar manner, peer reviews provided further support for the model revision process (King, 1998). Having another group of students analyze their models and provide meaningful feedback often gave students a fresh perspective on their models. Their peers could detect flaws in their models that the student pair might have otherwise ignored. Students then used that feedback to prompt additional sensemaking discourse and inform future model revisions. Thus, by receiving and using feedback (Indicator E), students were able to have an external party evaluate their models and provide key insights on aspects of their models that needed further review, thereby supporting students in the model revision process. 71 Limitations Although this study shows some promising aspects of the design of our learning environment to support students in testing and debugging, there are both limitations with our methodology and aspects of the learning environment. It is important to note that this research took place at a STEM magnet school in a classroom environment that encouraged student discourse and collaboration. Because traditional classroom environments often lack a strong culture of student discourse, our results might not be fully applicable to all classrooms (Grifenhagen & Barnes, 2022; Jimenez-Aleixandre et al., 2000; Kelly, 2013). We also recognize that the limited sample size makes it difficult to draw broader conclusions from our semi-quantitative analysis. While these results do suggest a diversity in student approaches to testing and debugging and that certain testing and debugging behaviors are more common than others within this class, we cannot argue from this analysis alone that these are universal patterns. Given the design-based nature of this study, it was not feasible to isolate specific aspects of the learning environment to determine definitively if either the SageModeler simulation feature or the use of student groups are the most important scaffolds for testing and debugging for these students. While our qualitative analysis does suggest that these factors helped support students in testing and debugging, additional factors, such as teacher instructions and prior student experiences, also might have contributed to our results. Additionally, it is difficult to determine why any specific student groups chose to use a particular set of testing and debugging behaviors. It is possible that more introverted students preferred testing and debugging behavioral patterns that were more focused on analyzing model outputs (Indicator B) compared to extroverted students who might have gravitated towards more social approaches to model validation, such as peer feedback (Indicator E). Another explanation could be that more mathematically inclined students preferred using simulations and graphs (Indicators B and C) to interpret their models compared to using more verbally intensive behavioral patterns. However, both ideas are difficult to assess without targeted interviews and/or additional written assessments, neither of which occurred for this study. 72 Conclusion Testing, evaluating, and debugging models is an important competency. Being able to identify the flaws in a model helps students engage in revisions, improving both their representational and conceptual models of a phenomenon (Barlas, 1996; Grapin et al., 2022; Stratford et al., 1998; Sterman, 1994). Frequent testing, debugging, and revision cycles also reinforce the scientific principle of iterative refinement through experimentation, which is essential to the scientific thinking process (Gilbert, 2004; NRC, 2012; Schwarz et al., 2009). Within our framework, we view testing, evaluating, and debugging model behavior as an integral aspect of computational modeling, drawing upon modeling, ST, and CT traditions (Shin et al., 2022). As it facilitates model revision and iterative refinement, this practice benefits student modeling (Schwarz et al., 2009). Likewise, model evaluation, model interpretation, and model revision are all important concepts in system dynamics that overlap with our understanding of testing, evaluating, and debugging model behavior (Barlas, 1996; Martinez-Moyano & Richardson, 2013; Richardson, 1996). Additionally, students often need to consider parts of their model structure from a systems thinking perspective to accurately identify areas of their model that need improvement and to guide the subsequent revision process (Lee & Malyn-Smith, 2020; Shute et al., 2017). Finally, our understanding of testing, evaluating, and debugging model behavior incorporates the CT concepts of debugging, wherein students systematically review their computational models to identify flaws and structural errors, and iterative refinement, the process by which students make changes to their models in response to new information (Aho, 2012; Sengupta et al., 2013; Türker & Pala, 2020; Yadav et al., 2014). This investigation into how students used testing and debugging within the context of an evaporative cooling unit demonstrates both the possibilities and challenges of integrating this practice into secondary science education. Although we have evidence of students testing and debugging their models, the relative absence of using external data to directly validate their models is an area of concern (Table 5). It suggests that more direct curricular support is needed to encourage students to compare their models to external data. This, along with a desire to better support other aspects of testing and debugging (such as the peer review process), has led us towards making several curriculum changes. We have developed a set 73 of model design guidelines to help students identify areas of their models that can be improved during the model revision process. These model design guidelines ask students to consider if their models have appropriately named/relevant variables, define appropriate relationships between variables, have clearly defined boundaries, and work appropriately when simulated. The last section of these guidelines asks students to consider how their models compare to real-world data, further emphasizing the importance of using external data to validate their models. In addition to scaffolding the model revision process, students are encouraged to use these guidelines when giving feedback to their peers. We also plan on being more explicit with students about which specific pieces of experimental data they should use to validate their models during model revisions. For example, we have added a built-in table to the SageModeler canvas where students can input experimental data showing how the temperature of liquids changes over the course of evaporation. Finally, we are streamlining the unit to allow for more in-depth classroom discourse and more scaffolded model revisions. In this way, we hope to reduce student and teacher fatigue over this unit. In addition to curricular scaffolds, future iterations of this design-based research could investigate the role of teacher scaffolds in supporting students in using external data to validate their models, as instructional supports offer another avenue to bring this aspect of testing and debugging into the classroom. We also might further investigate the role of student groups in supporting collaborative discourse around testing and debugging and find additional ways to leverage peer revisions to best support student model revisions. Finally, future work needs to investigate how different testing and debugging behavioral patterns are linked to model outcomes. Given our small sample size, it was difficult to determine any meaningful correlations between student testing and debugging behavioral patterns and either their post-unit understanding of disciplinary core ideas or the conceptual accuracy of their final models. While we hypothesize that student groups that systematically analyze the model output and frequently compare their models to real-world experimental data will end up with models that more accurately represent a canonical understanding of the system they are modeling, it is also possible that such behaviors lead to model fitting and limit opportunities for students to reflect on their evolving 74 conceptual understanding of the phenomenon (Sins et al., 2005; Wilensky & Reisman, 2006). Therefore, while it is likely that student groups with the most robust models and deepest understanding of underlying disciplinary core ideas will use testing and debugging behavioral patterns that combine dialogic analysis (a la Group 2) with systematic analysis and comparison of model output to experimental data (a la Group 3), future work will be needed to address this hypothesis. This research can guide future research and inform teacher educators and teachers in their efforts to effectively engage students in computational tasks involving testing and debugging. By increasing awareness of the different behavioral patterns exhibited during testing and debugging, such as seeking advice from peers or making inferences based on simulations, teachers can provide a more nuanced facilitation that supports students’ strategies. For instance, teachers can prompt students to utilize tools such as graphs. Teachers can also encourage the use of effective simulation strategies, such as holding all variables constant except for one and comparing results across various scenarios. To foster productive discussion and critical thinking during testing and debugging, teachers can guide students in asking questions such as “How do you know that?” and “What does your model show?” and encourage simulation as a means of exploration and validation. Ultimately, by showing some of the different ways that students can test and debug their models, we hope that this research will encourage teachers to adopt holistic approaches to supporting students with this practice. 75 PAPER 3: SYNERGISTIC SCAFFOLDING AND CLEAR RATIONALES: HOW TEACHERS CAN SUPPORT STUDENTS WITH TESTING AND DEBUGGING IN A COMPUTATIONAL Abstract MODELING CONTEXT In our technology driven and algorithm dominated world, students need familiarity with computational thinking (CT), systems thinking (ST), and computational modeling to make sense of their lived experiences. One aspect of computational modeling that allows students to meaningfully integrate various aspects of CT and ST is testing and debugging. Through testing and debugging, students can analyze model output to identify areas of their models that need revisions, give and receive peer feedback on their models, and use external data to validate model output and revise model structure. Although testing and debugging has been identified as an important aspect of CT and computational modeling, pedagogical strategies for supporting students with testing and debugging in a computational modeling context remain understudied. In this study, I investigated how two teachers supported their students with testing and debugging in the context of secondary level chemistry unit involving computational modeling. Both teachers in this study demonstrated key pedagogical strategies for supporting students with testing and debugging. Their use of synergistic scaffolding and their efforts to present students with a clear rationale for engaging with different aspects of testing and debugging empowered their students to use testing and debugging to make meaningful revisions to their computational models. While these teachers’ synergistic scaffolding strategies seemed to support these students with testing and debugging, certain advanced testing and debugging behaviors (such as comparing and contrasting model output with experimental results to find errors in model structure) were seldom found in this study. This suggests that additional support, beyond what was observed in this study, is needed for students to perform at higher levels of testing and debugging. Introduction Given the increasing prevalence of algorithmic programs, computer software, and complex iterative tasks across all aspects of society, it is increasingly necessary for students to become familiar 76 with computational thinking (CT), regardless of their future career paths (Barr et al., 2011; Bourgault & E., 2023; Wing, 2006). Computational thinking (CT) is a form of sensemaking that uses an iterative and quantitative approach to decompose a phenomenon or problem to explore, explain, and predict the behavior of that phenomenon or to find a solution to a problem through the creation and revision of algorithms (Grover & Pea, 2018; Shin et al., 2022; Weintrop et al., 2016; Wing, 2006). Although CT has been increasingly recognized as a key goal for K-12 education (NRC, 2012), with many nations taking steps to incorporate CT into their educational standards, there remain many conflicting definitions and ideas about how CT should be introduced and supported in classrooms (Brackmann et al., 2016; Hsu et al., 2019; Heintz et al., 2014; Webb et al., 2017). The Next Generation Science Standards (NGSS) emphasizes the synergy between CT and mathematical thinking and focuses on having students use computational tools to analyze and interpret large data sets (NRC, 2012; NGSS, 2013; Shin et al., 2022). Some scholars focus on having students learn standard programming languages, such as C++, Java, or Python, due to their practical uses outside of the classroom (Abid et al., 2015; Grandell et al., 2006; Price & Price-Mohr, 2018; Tabet et al., 2016). Others argue for a broader definition of CT as “thinking like a computer scientist” and examine specific CT concepts and practices needed to solve algorithmic problems (Grover & Pea, 2018; Nardelli, 2019; Shute et al., 2017). By emphasizing broader CT concepts and practices, it is possible for students to use CT to approach and solve algorithmic problems through constructing, testing, and revising computational models (Hutchins et al., 2020; Sengupta et al., 2013; Shin et al., 2021; Weintrop et al., 2016). Computational modeling describes efforts to create models using algorithms or algorithmic thinking to represent a phenomenon in a quantitative or semiquantitative manner, typically using computational modeling software (Fisher, 2018; Pierson & Clark, 2018; Sengupta et al., 2013; Shin et al., 2022). When students construct computational models of scientific phenomena, they often need to employ various aspects of computational thinking, such as problem decomposition, testing and debugging, and making iterative refinements (Anderson, 2016; Brennan & Resnick. 2012; Irgens et al., 2020; Wang, 2021b). In addition to serving as a platform to assist students in practicing CT, 77 computational models can allow students to visualize phenomena as systems of interacting elements and explore complex relationship patterns such as stock and flow/collector and flow systems and feedback loops (Bowers et al., 2023; Basu et al., 2016; Cronin et al., 2009; Nguyen & Santagata, 2020). As such, computational modeling exists at a critical intersection between CT and systems thinking (ST) (Hamidi et al., 2023; Shin et al., 2022; Weintrop et al., 2016). Systems thinking (ST) describes the cognitive processes necessary to explore how the various aspects of a phenomenon interact with each other to form a more complex system (Arnold & Wade, 2015; Meadows, 2008; Stave & Hopper, 2007). The synergy between computational modeling, CT, and ST was explored by Shin and colleagues (2022) in “A Framework for Computational Modeling” (Figure 13). When constructing this framework, Shin et al. (2022) took inspiration and guidance from the ST, CT, and Modeling literature to define five ST aspects, five CT aspects, and five computational modeling practices shown in this framework. While the ST and CT aspects of this framework serve to summarize the authors’ conceptualization of systems thinking and computational thinking, respectively, the five computational modeling practices are concrete actions that students perform as they design, construct, test, and revise their computational systems models. Each of the five computational modeling practices is informed by the ST and CT aspects defined in this framework and provides students with the opportunity to develop and demonstrate various aspects of ST and CT (Shin et al., 2022). 78 Figure 13: A Framework for Computational Systems Modeling (Shin et al., 2022) One computational modeling practice from this framework that illustrates how computational modeling supports students with CT and ST is “test, evaluate, and debug model behavior”, often shortened to “testing and debugging” (Bowers et al., 2023; Shin et al., 2022). When testing and debugging models, students are actively analyzing and interpreting model structures and model behavior, often through examining model output, to find aspects of their models that need to be revised (Barlas, 1996; Hogan & Thomas, 2001; Sengupta et al., 2013; Stave, 2002). As such, students need to understand both how the structural aspects of their models influence its behavior (ST) and how to make changes to their algorithmic system (CT) to fully test and debug their models (Bowers et al., 2023). Testing and debugging is often an iterative process that involves frequent analysis of model output to identify structural components of their models that are not functioning as intended (i.e., “bugs”) as well as model aspects that no longer fit their conceptual understanding of the phenomenon (Jonassen & Hung, 2006; Li et al., 2019; Ogegbo & Ramnaraian, 2021). Students can also utilize peer feedback and compare model output to external data to help facilitate testing and debugging (Bowers et al., 2023; Emara et al, 2020; Weintrop et al., 2016; Yoon et al., 2016). 79 Although testing and debugging are core aspects of computational modeling and CT, many students find it challenging (Eidin et al., 2023; Grapin et al., 2022; Li et al., 2019; Sins et al., 2005). Previous studies suggest that students are often reluctant to interpret model output to inform model revision, preferring an “ad hoc” approach to model revision (Grapin et al., 2022; Stratford et al., 1998; Swanson et al., 2021). Research also suggests that students are likely to ignore key opportunities to compare the model output of their computational models to external data (Bowers et al., 2023; Grapin et al., 2022; Swanson et al., 2021). This reluctance to interpret model output or to compare model output to external data could correlate to students not fully understanding the technical capabilities of the computational modeling environment and thus not knowing all the tools that are available to use. Additionally, students might hesitate to use external data to validate their models if their teachers do not consistently reinforce the principle that computational models should reflect real-world experimental results. Other studies show that when students compare their models to external data, they often adopt an “outcome oriented” stance to model revisions, forcing their models to fit this external data without considering how the structural changes they are making to their model reflect their conceptual understanding of the phenomenon (Li et al., 2019; Sins et al., 2005; Wilensky & Reisman, 2006). This can lead students towards creating final models that can structurally replicate the net behavior of the phenomenon but lack internal consistency and are less useful for explaining how the phenomenon behaves in the real world. For example, a student might create a model that shows that an increase in human industrial activity leads to a net increase in global temperatures. However, this model internally suggests that increasing carbon dioxide levels increases the acidity of the oceans, thereby making sea-ice less stable, increasing the loss of sea ice, leading to higher global temperatures. Although the net conclusion of this model (that increased industrial activity leads to increased global temperatures) is scientifically accurate, there are some key internal relationships that defy the scientific consensus. Increasing carbon dioxide levels does increase the acidity of the oceans, but the increasing acidity of the oceans has a negligible impact on the stability of sea-ice and thus invalidates this line of causal reasoning. 80 So, while this explanation arrives at the scientifically accurate conclusion that increasing carbon dioxide emissions increase global temperatures, its lack of internal logic and inconsistency with key scientific principles, suggest a lack of understanding of disciplinary core ideas around the mechanisms of climate change. Given these differential modeling outcomes, it is important to consider the role that teachers can play in supporting students with testing and debugging. Studies on synergistic scaffolding have demonstrated that learning in computerized learning environments is often enhanced by whole class discussions and targeted support from teachers (Baker et al., 2004; Li & Lim, 2008; Wu & Pedersen, 2011). Previous literature has also investigated how teachers can support students in computational modeling and in coding-based debugging tasks (Fretz et al., 2002; McCauley, 2008; Michaeli & Romeike, 2019; Snyder et al., 2022). However, none of these research studies narrowed in specifically on aspects of testing and debugging that are particularly critical to computational modeling, including analyzing a model’s output, analyzing and using external data to validate model output, and using peer feedback to support model revisions. As such, I investigated how teachers can support students in these three major aspects of testing and debugging by addressing the following research questions. Research Questions 1. How does a teacher support students with testing and debugging in a secondary science unit involving computational systems modeling? 2. How do these pedagogical strategies compare to those used by another teacher teaching the same secondary science unit? 3. What pedagogical strategies correlate with student testing and debugging behaviors in this secondary science unit? Literature Review Testing and Debugging across Disciplines Testing and debugging has been identified as a core practice of computational thinking and computational modeling across several STEM related disciplines. The computer science literature often 81 emphasizes debugging as a key skill essential for programming and argues that proficiency with debugging is a major indicator that separates a novice from expert programmers (Griffin, 2016; Murphy et al., 2008; Soloway & Spohrer, 2013). Griffin (2016) views testing and debugging as the process of searching for anomalies in a software program, finding specific parts of the code that are not working appropriately (i.e., “bugs”), and then fixing these aspects of the computer code. Given the inherent complexities of computer programming languages, the scope of testing and debugging can range from finding simple syntax errors (e.g., missing parentheses) to larger structural errors, such as unresolved recursion loops (Ahmadzadeh et al., 2005; Ford & Teorey; McCauley et al., 2008; Michaeli & Romeike, 2019). Due to these complexities, other scholars emphasize first analyzing a computer program as a broader system of interconnected segments of code, then testing to see if the program behaves as planned before going through individual lines of code to identify errors that need to be corrected (Fix et al., 1993; McCauley et al., 2008; Vessey, 1985). Testing and debugging is also prominent in studies focusing on block programming (Kim et al., 2018; Emara et al., 2020; Tsan et al., 2022). Block programming describes computer programs that contain preset “blocks” of code that students can use to help construct more complex programs (Lye & Koh, 2014; Resnick et al., 2009; Tsan et al., 2022). While these block programs often allow students to engage in programming tasks without needing to learn more complex programming languages and syntax (Akcaoglu, 2014; Bers et al., 2014; Lye & Koh, 2014), students have a tendency towards surface-level engagement and often arrive at functioning code through ad-hoc tinkering rather than more nuanced coding techniques (Brennan & Resnick, 2012; Grover et al., 2015; Kim et al., 2018). As such, some computer science educators push for teachers to demonstrate more sophisticated and deliberate efforts at testing and debugging when teaching with block programming to help students better understand broader organizational patterns that are useful across programming contexts (Grover et al., 2015; Kim et al., 2018). Block programming is often used in conjunction with physical computing systems, including robots (Bers, 2010; Kazakoff & Bers, 2012; Wang et al., 2021a). In these cases, student testing and debugging must also consider errors that can occur in assembling these physical computing systems in 82 addition to traditional software bugs (Bers et al., 2014; Elliott et al., 2023). Despite these additional points of failure, physical computing systems can help students better identify the behavioral outputs of their code, further facilitating testing and debugging (Bers et al., 2014.; Elliott et al., 2023) Another area of STEM education where students often engage in testing and debugging is through computational modeling (Emara et al., 2020; Hutchins et al., 2020; Weintrop et al., 2016; Yoon et al., 2016). Computational modeling, particularly when it involves text-coded or block-coded agent- based modeling programs, has a fair degree of overlap with code-based programming. However, testing and debugging in computational modeling also has been influenced by the model revision process (Emara et al., 2020; Lin et al., 2021; Papaevripidou et al., 2007; Shin et al., 2022). Scientific models represent natural phenomena that explain or predict the behavior of a system (Harrison & Treagust, 2000; Louca & Zacharia, 2012; Mittelstraß, 2005; Schwarz et. al, 2009). As scientists conduct experiments and collect new data, they will often need to make modifications to their models so that they better explain the behavior of observed phenomena (Louca & Zacharia, 2012; Oh & Oh, 2011). By having students go through the process of constructing a model, testing it through experimentation, and making changes based on experimental data, students are given a first-hand experience with the iterative nature of scientific investigations (Louca & Zacharia, 2012; Metcalf et al., 2000; NRC, 2012; Shin et al. 2021). This process of continuous model revision based on new experimental data (or new information made available to students) is often referred to as iterative refinement and is a major aspect of testing and debugging within computational modeling (Bowers et al., 2023; Grover & Pea, 2018; Hutchins et al., 2020; Shin et al., 2022). As students engage in iterative refinement, they often need to reassess their understanding of the phenomenon and the underlying science ideas they are modeling. This reassessment can help students identify gaps in their understanding of the phenomenon and thereby improve their learning of science content (Clement, 2000; Schwarz et al., 2007, 2009; Windschitl et al., 2008). In addition to facilitating iterative refinement, the ability to visualize model output is another key facet of computational modeling software that supports students in testing and debugging (Bowers et al., 2023; Fretz et al., 2002; Sengupta et al., 2012; Weintrop et al., 2016). Computational modeling programs 83 typically produce a visual model output that allows students to see model behavior to facilitate model testing (Sengupta et al., 2012; Shin et al., 2022; Campbell & Oh, 2015; Fisher, 2018). In agent-based modeling programs (e.g. NetLogo), students can program various aspects or agents in their models that have distinct behavioral traits which are visualized when the program runs (Dickes & Sengupta, 2012; Goldstone & Janssen, 2005; Sengupta & Farris, 2012; Wilensky & Reisman, 2006). If the visual output of the program behaves contrary to the student’s expectations, they reexamine their programming choices to change the outcome behavior of two or more agents. In icon-based modeling programs (e.g., Stella, Model-it, and SageModeler), the model output is often visualized through graphs of the relative amounts of different variables present in the model (Damelin et al., 2017; Metcalf et al., 2000; Nguyen & Santagata, 2020; Richmond, 1994). These model output graphs can often be impacted by changing the relative amount of each input variable or making changes to the overall model structure (e.g., changing relationships between variables in the model). Agent-based modeling software can also offer graphical outputs that summarize the behavior of various model agents in more quantifiable terms (e.g., how the population of wolves interacts with the population of sheep in an ecosystem model) (Dabholkar et al., 2018; Gkiolmas et al., 2013; Wilensky & Reisman, 2006). Because these graphical outputs are often quantitative or semi-quantitative, they allow students to compare their models to data collected from real- world experiments (Campbell & Oh, 2015; Shin et al., 2021; Wilensky & Reisman, 2006). These comparisons between model output and real-world data are a critical aspect of testing and debugging as it provides students with the opportunity to have their models validated (Basu et al., 2016; Sengupta et al., 2013; Shin et al., 2021; Stratford et al., 1998). This validation process helps students connect their computational models back to the real-world phenomenon and develop a greater appreciation for the experimental aspect of science. Testing and Debugging in “A Framework for Computational Systems Modeling” Building off these previous efforts to describe how students can engage in testing and debugging, my colleagues and I compiled a description of testing and debugging rooted in our understandings of CT, Computational Modeling, and ST in “A Framework for Computational Systems Modeling” (Shin et al., 84 2022). While the term “testing and debugging” appears in both the CT aspect of “testing and debugging” and in the computational modeling practice of “test, evaluate, and debug model behavior” in this framework, for the purpose of this study, I am focusing on testing and debugging as a computational modeling practice. When testing and debugging, students will often begin by analyzing the visual output of their model or discussing the various relationships and variables that are present in their model (Bowers et al., 2023; Hadad et al., 2020; Lee et al., 2020; Shin et al., 2022). Through this process, students will often identify aspects of their model that do not match their evolving understanding of the phenomenon or do not align with experimental data. This will, in turn, prompt students to search for specific relationships and variables that can be changed to improve their model’s behavior. Through an iterative process of critiquing model output and refining model structures, student models will generally come closer to matching the behavior of the real-world phenomenon. This inclusive view of testing and debugging, inspired by several scholars (Aho, 2012; Basu et al., 2016; Lee et al., 2020; Yadav et al., 2014), acknowledges how students manifest aspects of CT and ST as they engage in this practice (Bowers et al., 2022, 2023; Shin et al., 2022; Figure 14). In this framework, aspects of the scientific practice of “using mathematics and computational thinking” and the crosscutting concepts of systems and systems models and cause and effect are embedded into the computational systems modeling practice of “testing, evaluating, and debugging model behavior” and the broader scientific practice of “developing and using models” (Shin et al., 2021, 2022). By running the computational model to examine graphical model output to find aspects of their models that are not behaving as expected, students utilize the CT aspect of “testing and debugging.” Likewise, when students compare this model output to external real-world data, they are also “generating, organizing, and interpreting data”. As students discuss the validity of various relationships in their model, they are exhibiting the ST aspect of “causal reasoning”, which also overlaps with the crosscutting concept of “cause and effect”. When these conversations shift towards dissecting how these structural elements impact broader aspects of model behavior, students are “interpreting and predicting system behavior 85 based on system structure”. Finally, students “make iterative refinements” when they make changes to their models so that their model’s behavior better matches the real-world phenomenon. Figure 14: Aspects of Systems Thinking and Computational Thinking exhibited through the computational modeling practice of “Test, Evaluate and Debug Model Behavior” As students often simultaneously utilize multiple aspects of ST and CT as they are testing and debugging their models, it is not practical to subdivide student testing and debugging behaviors based on these categories. Instead, my colleagues and I sought to describe the different testing and debugging behaviors of students as they created, tested, and revised computational models (Bowers et al., 2022). By developing the ST and CT Identification Tool (ID Tool), we identified six categories of testing and debugging behaviors that are evidence of students utilizing CT and ST: Sensemaking through Discourse, Analyzing Model Output: Simulations, Analyzing Model Output: Graphs, Analyzing and Using External Data, Using Feedback, and Reflecting upon Iterative Refinement (Table 8). These six categories were chosen to reflect the diversity of approaches students took to testing and debugging within the context of a high school chemistry unit centered on using icon-based computational modeling (Bowers et al., 2022). For each of these categories, we created a four-level system for describing the complexity of student 86 behaviors. This coding scheme was subsequently validated by external reviewers and an extensive literature review; internal interrater reliability tests further established its usefulness as a tool for demonstrating student testing and debugging behaviors. Table 8: Description of key indicators from the ST and CT Identification Tool Indicator Description Brief Level Descriptions A: Sensemaking through Discourse B: Analyzing Model Output: Simulations C: Analyzing Model Output: Graphs Students either verbalize their reasoning for making changes to their models or engage in conversations about why specific aspects of their models need to be improved. Level 1: Verbalize changes to model or identify areas needing revisions, but no reasoning Level 2: Verbalize reasoning but no mutual dialogue Level 3: Back and forth dialogue with verbal reasoning Level 4: Back and forth dialogue with verbal reasoning and impact on other parts of model Students use embedded model output tools to analyze how their model behaves under different input conditions. In this case, students use the simulation tool in SageModeler to test their models. Level 1: Adjusting one or more input variables, but no verbal reasoning Level 2: Adjusting input variables with verbal reasoning but no dialogue Level 3: Adjusting input variables with verbal reasoning and dialogue, focus on local behavior Level 4: Adjusting input variables with verbal reasoning and dialogue, holistic model discussion Students use embedded model output tools to analyze how their model behaves under different input conditions. In this case, students generate and analyze graphs in SageModeler. Level 1: Unsuccessful attempt to make a graph in SageModeler Level 2: Successful graph creation, but no interpretation Level 3: Successful graph creation with discussion of implications for the graphed variables Level 4: Successful graph creation with discussion of the broader implications for model behavior 87 Table 8 (cont’d) Indicator D: Analyzing and Using External Data E: Using Feedback F: Reflecting upon Iterative Refinement Description Students use external data sources to verify model behavior. At more sophisticated levels, students compare specific external data sources directly to their models and discuss the validity of the external data. Students receive meaningful feedback from others (teachers or peers), discuss the validity of the feedback, and use feedback to inform model revisions. At more sophisticated levels, students test their models after making recommended changes and have a follow-up discussion with others to share their new insights. Students reflect through writing or discourse on the changes they have made to their models. At more sophisticated levels, students give a defined rationale for the changes they have made. Brief Level Descriptions Level 1: Superficial reference to data or referencing inaccurate data Level 2: Reference external data to inform revisions but no direct comparisons to model output Level 3: Compare specific external data to model output without discussion of data validity Level 4: Compare specific external data to model output with discussion of data validity Level 1: Students receive feedback but do not discuss it or use it to inform revisions Level 2: Students make changes to their models based on feedback but do not discuss the validity of the feedback Level 3: Students receive feedback, discuss its validity, and make or do not make changes to their models based on feedback Level 4: Students receive feedback, discuss its validity, make, or do not make changes to their models based on feedback, and share reflections with another group Level 1: Ambiguous surface level reflection without reasoning Level 2: List specific model changes but do not provide detailed reasoning Level 3: List changes and reflect upon reasoning Level 4: List changes, reflect upon reasoning (with a defined rationale), and discuss broader changes to models While each of these testing and debugging behaviors is rooted in CT, ST, and Computational Modeling literature and were present in previous studies, three behavioral categories (analyzing model output, using feedback, and analyzing and using external data) stand out as being particularly relevant to key aspects of testing and debugging (Bowers, 2022, 2023). As previously stated, being able to visualize a model output is a key feature of computational modeling tools that can strongly support students in testing and debugging (Bowers et al., 2023; Fretz et al., 2002; Sengupta et al., 2012; Weintrop et al., 2016). One computational modeling tool that allows students to generate visual model output is SageModeler. SageModeler is an icon-based opensource modeling program that was used to design the 88 ID tool. In SageModeler, there are two main ways that students can generate model output from their computational models: the simulation feature, which shows how the relative amount of each variable changes when the input variables are manipulated, and the graphing feature, which can demonstrate the relationship between any two variables in the model (Damelin et al., 2017; Bowers et al., 2023; Figure 15). A previous study (Bowers et al., 2023) suggests students primarily use the graphing features of SageModeler only when they are comparing model output data to external data and use the simulation features throughout the rest of the modeling process to drive testing and debugging. In this study, I will focus on students using the simulation features to analyze model output. The ability to learn from the feedback of others, especially peers, is an important goal of social constructivist approaches to science education (Ben-Ari, 2001; Louca & Zacharia, 2012; Schreiber & Valle, 2013: Tsivitanidou et al., 2018). My earlier work shows that students often use peer feedback to identify aspects of their models that need revisions and gain new insights into model design from analyzing peer models (Bowers et al., 2023). However, teacher and curricular support are necessary for students to get the most benefit out of the peer review process (Louca & Zacharia, 2012; Luxton-Reilly, 2009; Reynolds & Moskovitz, 2008; Wen & Tsai, 2008). Given these benefits and challenges, I am continuing to focus on students using feedback (Indicator E) to drive testing and debugging in this unit. 89 Figure 15: Simulation and Graphing Features of SageModeler Figure 15A: Using the Simulation Features of SageModeler Figure 15B: Comparing Model Output Data to Experimental Data using Graphing Features 90 Lastly, using external data to verify computational models has long been identified as a key learning goal for computational modeling (Basu et al., 2016; Bravo et al., 2006; Sengupta et al., 2013; Stratford et al., 1998). Previous studies have shown that this is a task that students find challenging (Bowers et al., 2023; Grapin et al., 2022; Sins et al., 2005; Stratford et al., 1998). In some studies, students largely ignore external data as a means of validating model output (Bowers et al., 2023; Grapin et al., 2022; Stratford et al., 1998). In other studies, students who do use external data to drive testing and debugging often focus on forcing their model to fit the output patterns suggested by the external data rather than using the incongruence between their computational model and external data to drive discussions on the conceptual ideas embodied in the model structures they are revising (Li et al., 2019; Sins et al., 2005; Wilensky & Reisman, 2006). In both cases, the students are not fully engaging with a key affordance of computational modeling and therefore miss critical learning opportunities from comparing model output to external data. As such, I argue that it is important to investigate additional mechanisms for supporting students in analyzing and using external data (indicator D) as they are testing and debugging their computational models. Scaffolding and Synergistic Scaffolds Scaffolding is a common framework for supporting students in learning a new task or practice. Building off Vygotsky's ideas about the zone of proximal development, Wood and colleagues (1976) postulated that students need supports or “scaffolds” to enable them to achieve tasks beyond their present abilities (Lin et al., 2012; Rogoff, 1990; Tabak & Kyza, 2018; Wertsch, 1979). As such, scaffolding describes the combination of verbal instructions, written directions, and technological tools that enable students to perform learning tasks that they would not necessarily be able to complete otherwise. Scaffolding enables students to experience a complex task, even if they are unable to complete certain portions of the task independently (Lin et al., 2012; Tabak, 2004; Tabak & Kyza, 2018). For example, many children have initial difficulty maintaining the velocity necessary to keep a bike in an upright position. As such, training wheels are often a necessary scaffold so that the child can focus on mastering pedaling and steering without needing to focus on maintaining balance during this phase of learning. 91 Additionally, having a parent available to talk to the child through the importance of peddling and to catch the child when they are losing balance is another form of scaffolding children in learning how to ride a bicycle. Over time, students gain the ability to complete a task without the support of the scaffolds (Collins et al., 1989; Lin et al., 2012; Tabak & Kyza, 2018). In anticipation of such growth, teachers and curriculum developers can adopt a “fading scaffolds” strategy, where scaffolds are gradually removed as students become more proficient in a task (Collins et al., 1989; Lin et al., 2012; Wu & Pedersen, 2011). In the bicycle example, the training wheels are often removed once the child has proficiency with pedaling, steering, and stopping the bicycle; the child can then focus on learning how to balance the bicycle without the training wheels as scaffolds. In science classrooms, these scaffolds can take many forms including: whole class demonstrations on how to perform a task, computer text boxes encouraging students to write down their reasoning during the modeling process, and sentence starter guidelines to support students with making claim, evidence, reasoning statements during classroom argumentation (Basu et al., 2017; Lin et al., 2012; McNeill & Krajcik, 2009; Tabak & Kyza, 2018). Because many tasks and practices, such as testing and debugging, are too complex to be covered in a single lesson or through a single scaffold, scholars advocate for distributed scaffolding (Hsu et al., 2015; Puntambekar & Kolodner, 2003; Tabak, 2004; Tabak & Kyza, 2018). Distributed scaffolding describes efforts to create a set of scaffolds spread across several types of media and/or multiple timepoints and is usually subdivided into three major categories: differentiated scaffolding, redundant scaffolding, and synergistic scaffolding (Puntambekar & Kolodner, 2003; Tabak, 2004). Differentiated scaffolding emphasizes using different tools to support different learning needs around a common practice or task (Krajcik et al., 2000; Tabak, 2004; Tabak & Reiser, 1999). For example, a teacher might provide verbal instruction on the importance of frequently testing the output of computational models and then distribute handouts for how to provide feedback during peer review sessions. While both of these scaffolds help support students with the broader practice of testing and debugging, they focus on different learning goals and do not naturally build on each other. Redundant scaffolding describes efforts to use 92 different tools and techniques or the same tools and techniques across multiple time points to support a common learning goal (Puntambekar & Kolodner, 2003; Tabak, 2004). Redundant scaffolding often either involves repetition of past scaffolds or using different sets of supports for the same practice in a disconnected or disjointed manner. Teachers and curriculum developers engage in synergistic scaffolding when they design different tools and techniques to work in tandem to support a set of learning needs associated with a more complex task or practice (Tabak, 2004; McNeill & Krajcik, 2009). Because the different supports are designed to build off each other to reinforce student learning over the course of the unit, such scaffolding creates synergy that transcends that of redundant scaffolding. In implementing synergistic scaffolds, a teacher might provide a brief whole class demonstration on how to use the simulation features embedded in a computational modeling program. They will later have a brief informational talk on the importance of using this simulation feature to find flaws in model structures and guide model revisions. When the teacher subsequently sets up a whole class discussion on how to evaluate peer models, they will again reference their earlier informational talk and whole class demonstration to reemphasize the importance of analyzing model output through simulations and to build towards the next learning goal of giving quality peer feedback. As the teacher provides feedback to individual student groups, they will then reference their earlier informational talks and whole class demonstrations to help remind students of the importance of using the simulation features and to review the mechanics of the simulation features. Synergistic scaffolding allows for teachers to use multiple mediums to reach a larger cross-section of students to support them with using more complex practices, thus bolstering student learning (Tabak, 2004; McNeill & Krajcik, 2009). Given the multi-media nature of learning in computerized settings, there has been an emphasis on creating synergistic scaffolds to support students in these learning environments. Hutchins and colleagues (2020) investigated how a block coded agent-based modeling program served as a synergistic learning environment for supporting students in learning both physics content and CT practices. Other studies focus on how scaffolding supports, such as adaptive mentor agents, can be embedded within a 93 computerized learning environment to support students in performing key practices, including computational modeling (Baker et al., 2004; Fretz et al., 2002; Grawemeyer et al., 2017; Putnambekar & Hubscher, 2005). Wu and Pedersen (2011) recognized that many studies on scaffolding in computerized settings were ignoring the role of teachers in supporting student learning. They argued that solely relying on computer-based scaffolds in these learning environments overlooks how students often ignore text- based supports and how students tend to focus on task completion and thereby dismiss key reflection opportunities embedded into these learning environments (Baker et al., 2004; Li & Lim, 2008; Wu & Pedersen, 2011). Wu and Pedersen (2011) subsequently demonstrated that synergistic scaffolding provided by both the computerized environment and in-person teachers supported student learning better than either set of scaffolds in isolation. Snow and colleagues (2022) also reflected on the role of teachers in supporting student learning in computerized learning environments as they investigated how teachers used discourse to create synergy between computerized simulations and classroom discussions in a chemistry classroom. Given the importance of teachers in supporting student learning in computerized learning environments, I aim to investigate how the pedagogical strategies used by teachers in a computational modeling unit helps scaffold students in testing and debugging computational models. Pedagogical Strategies for Supporting Students with Testing and Debugging Computer science educators recognize that testing and debugging is a specialized practice that does not come naturally to novice programmers but must instead be supported by explicit instruction (Kessler & Anderson, 1986; McCauley, 2008; Michaeli & Romeike, 2019; Murphy et al., 2008). As such, scholars have identified several examples of successful pedagogical strategies for supporting students with testing and debugging in traditional text based programming environments (Michaeli & Romeike, 2019; Wilson, 1987; Chmiel & Loui, 2004). Wilson (1987) observed computer science professors using Socratic questioning to support students in adopting a more systems level view of their code. With questions about the purpose of their code and how various elements of their code interacted with each other, students were able to identify more broadly where the program might be malfunctioning and were subsequently able to debug that part of their code. In addition to having teachers ask students 94 reflective questions, the literature also emphasizes the need for students to be given guiding questions and frameworks to guide their own debugging practices. Carver & Risinger (1987) showed that when students were given 30 minutes of explicit debugging instruction, centered on teaching students how to use a flow chart of questions designed to help students locate and repair bugs in their code, students were able to identify and fix bugs in independent programming tasks more effectively. In a similar manner, Michaeli and Romeike (2019) suggest that students can build greater self-efficacy and competency with testing and debugging by using the “Compile, Run, Compare” debugging approach. In “Compile, Run, Compare,” students first ask if the program is compiling and/or running within a timely manner to see if there are compile time and/or runtime errors. Then students are tasked with comparing their program output to their expected output to see if logical errors exist in their code. Through giving students reflective questions and straightforward paradigms, these teachers helped them develop a core framework for testing and debugging their computer programs that they could build upon in future assignments. Beyond flow charts and Socratic questioning to help students master generic debugging strategies, Chmiel & Loui (2004) demonstrated the importance of practice problems and reflective practices for supporting students in building competency with testing and debugging. In this study, students were regularly given practice problems where they had to identify and repair bugs that existed in a teacher created program. As students worked on their own programs, they were also encouraged to keep a journal to reflect on changes they made to their code, where bugs were occurring in their code, and how they would code differently in the future. Lastly these students also regularly gave and received peer feedback on their code, providing more opportunities to identify problems and make improvements to their computer programs. This approach of providing students with multiple opportunities to practice testing and debugging (through practice problems and peer review) and reflect on the debugging process appears to help students develop a deeper understanding of testing and debugging that transcends the “rules based” paradigms laid out in flowcharts or other simplified frameworks. Both modeling and computational modeling literature often focus on broader efforts to support students with revising their models. In their endeavors to develop a learning progression for scientific 95 modeling, Schwarz and colleagues (2009), emphasized the importance of students learning that models should be revised to better explain and communicate key science ideas. Li & Schwarz (2020) identified that a key way that teachers can support students with model revisions is through using generative questions redirecting students to consider the nature and purpose of modeling. For example, when a student’s model was not adequately addressing a key aspect of the phenomenon, the teacher asked the student “How and why do you think those changes happened? . . . How and why do you think the liquid seemed to disappear?” (pg. 187). Such questions prompted the student to revise their model to include a deeper explanation of the phenomenon. In a similar manner, Ambitious Science Teaching encourages teachers to use “back pocket” questions such as “how does it do that?” and “does that reflect our experimental results” to encourage students to revise their models to better explain the underlying mechanisms of a phenomenon (Windschitl et al., 2020). In addition to questions that encourage students to revise their models to unpack mechanistic reasoning, Justi recommends that teachers support students with analyzing and interpreting experimental data (2009). This assists students with recognizing if their models reflect the real-world phenomenon or if further revisions are needed. In all these cases, teachers are using generative questions to help students identify aspects of their models that can be improved as well as giving students a clear rationale for model revisions. Compared to computer science and modeling, there are few studies of pedagogical strategies for testing and debugging in computational modeling literature. Pierson and colleagues (2017) examined computational modeling as an extension of scientific modeling and thus centered their vision of testing and debugging firmly within the ideas of model revision developed by Schwarz et al., 2009. As such, they identified that students were more driven to revise their models to increase their explanatory power rather than to match an external data source. Pierson and colleagues (2017) attribute this emphasis on explanatory power to the culture of collaboration and peer review created by the teacher, as students were revising their models to better communicate their ideas with other students in their classroom. In a similar study, Pierson and Clark (2018) found that having students present their computational models to an external audience of younger students was a motivating factor for encouraging model revisions. As such, 96 both studies show that creating an authentic need to use computational models as a communication tool is an effective pedagogical strategy for encouraging students to engage in the revision aspects of testing and debugging. In contrast to Pierson and colleagues (2017), Basu and colleagues (2016), firmly center their vision of computational modeling within a computational thinking framework and are thus more focused on how teachers support students with identifying bugs in their block-based coding program. As such, they encourage teachers to ask their students to break down their code into different subsystems to help narrow down the source of any potential bugs. Such types of questions largely mirror the Socratic questions discussed by Wilson (1987) and the flow charts of Carver & Risinger (1987). Beyond coding- based approaches to testing and debugging, Basu and colleagues (2016) identified that the simulation features embedded in their computational modeling program should be used to help students test their model output and determine if it matches external data or the output of an “expert model”. While the computer science, modeling, and computational modeling studies all provide insight into potential strategies for teachers to use to support students with testing and debugging, none of them fully address the vision of testing and debugging laid out in “A Framework for Computational Modeling”. While text-based programming studies support teachers in encouraging students to compare computer output to their expected outcomes, the sources of text-based programming errors differ from those common in computational modeling. Pedagogical strategies described in scientific modeling literature that provide students with a clear rationale for engaging in model revisions are helpful for computational modeling contexts. However, without properly supporting students with analyzing model output or comparing model output to external data, students could easily adopt ad-hoc revision strategies that ignore the affordances of computational modeling. Likewise, few computational modeling studies have gone sufficiently in-depth on testing and debugging to adequately address how teachers should be supporting students with this practice in a computational modeling context. As such, I set out to investigate how two teachers supported their students with the computational modeling practice of testing and debugging in a high school chemistry unit on evaporative cooling. 97 Methods Study Context and Learning Environment Learning Environment and Participants Two high school chemistry teachers, Mr. H and Mr. M (both pseudonyms) collaborated with me and each other to implement an evaporative cooling unit in their classrooms during November-December 2022. Both Mr. H, (a 44-year-old White male with approximately 20 years of teaching experience) and Mr. M (a 32-year-old White male with approximately 10 years of teaching experience) teach 10th grade chemistry at Faraday High School (pseudonym; FHS). FHS is a Midwestern STEM magnet school; while publicly funded, students must apply to this school from a tri-county catchment area with admissions based on academic test scores, teacher recommendations, and considerations for equity (as the school tries to take a representative population from the many districts in its catchment area). Around 79% of FHS students identify as White and around 54% of students receive free or reduced lunches. FHS runs on a block schedule, meaning each chemistry class meets for 80 minutes every other day. Prior to implementing this unit, Mr. H and Mr. M participated in a professional learning program (PLP) focused on supporting students with creating, testing, debugging, and modifying computational systems models. Both Mr. H and Mr. M fully participated in this PLP and implemented the evaporative cooling unit in their classrooms and a full narrative of their implementations serve to address research questions 1 and 2, respectively. However, given that few students from Mr. M’s class agreed to participate in the screencast data collection process, I am only able to address the impact of Mr. H’s pedagogy on his students’ testing and debugging through research question 3. As such this is primarily a case study on how Mr. H supported his class in testing and debugging their computational models in a unit on evaporative cooling, with one of Mr. M’s sections used to compare with Mr. H’s pedagogical strategies (Table 9). 98 Table 9: Demographic Data of Mr. H and Mr. M’s classes Teacher Mr. H Mr. M # of Students 29 14 # of Female Students 13 (45%) 8 (57%) # of Hispanic Students 0 (0%) 0 (0%) # of Black Students 2 (7%) 1 (7%) # of Asian Students 2 (7%) 3 (21%) # of White Students 25 (86%) 10 (72%) Professional Learning To prepare for implementing this unit, Mr. H and Mr. M participated in a professional learning program (PLP) geared towards supporting them in using the evaporative cooling unit to engage students in testing and debugging practices. In the two months leading up to implementing this unit, Mr. H and Mr. M participated in weekly 45-minute PLP meetings over Zoom. Linsey Brennan (a fellow graduate student), Emil Eidin (a post-doctoral researcher), and I collaborated and co-organized these PLP meetings. Early in these meetings, we deliberately unpacked the curriculum and mapped out a timeline to encourage fidelity of implementation. Next, we reviewed an approach for introducing the students to using SageModeler, through an online self-directed learning module publicly available on the SageModeler website (https://sagemodeler.concord.org/app/#file=examples:Getting%20Started). The teachers ended up having students work through this module for a single class period a few days before this unit started. We also discussed how to support students with building their initial models, including the use of an embodied modeling experience where students took on the role of evaporating liquid molecules and co-constructing the initial model backbone together as a class. To help foster student driven discussions and better support students in small group work, we reviewed a series of practitioner focused guidelines for classroom discourse and student collaboration strategies. We encouraged Mr. H and Mr. M to have students adopt a “copilot” strategy where one student would be designated to control the cursor during the process of modeling, while the other student(s) provided ideas and insights to help the first student build or revise the model. The students would periodically switch roles to help ensure that all students got a chance to control the cursor during the process of modeling. Another aspect of supporting students with collaboration was the peer review 99 guidelines. These peer review guidelines aimed to help students identify aspects of their peers’ models that needed improvement and share this feedback with their peers. We also discussed how students could gain deeper insights from the peer review process, which could help them improve their models. We also highly encouraged Mr. H and Mr. M to have frequent whole class model review sessions, where they would place an anonymized student model in front of the whole class and help students provide feedback and critique for this model. Mr. H and Mr. M also had the opportunity to reflect on the importance of having students revise their models using real world data and discussed strategies to support students in using real-world data to validate their models, such as having a whole class demonstration on how to input external data into SageModeler and how to overlay the graphs from the external data with model output data to validate model output. Throughout these meetings, we frequently pointed out how the curriculum and the SageModeler software were designed to support students in the three main aspects of testing and debugging: analyzing model output through simulation features, analyzing external data to validate model output, and using peer feedback to further model revision. By highlighting the scaffolds for testing and debugging already embedded in the curriculum and the SageModeler software, we helped Mr. H and Mr. M identify areas where they could have informational talks, whole class discussions, or conversations with small groups that further supported and scaffolded these learning goals. For example, at one professional learning meeting, we discussed how after building their initial model, students would have the opportunity to begin analyzing model output using the simulation feature embedded in SageModeler and suggested that the teachers provide additional support and scaffolding at that time for this aspect of testing and debugging. In this manner, our professional learning program encouraged the teachers to develop pedagogical strategies and scaffolds for testing and debugging that synergized with the supports already built into the curriculum and the SageModeler software, beyond the specific synergistic strategies we discussed regarding the use of the peer review guidelines, the importance of frequent whole class model review sessions, and the classroom discussions and demonstrations necessary to support students in analyzing external data to validate model output. 100 Curriculum The five-week evaporative cooling unit implemented in this study was designed according to project-based learning (PBL) principles. Building off the work of Krajcik & Shin (2022), my vision of PBL includes: exploring a meaningful driving question based on a real-world phenomenon, investigating the driving question and phenomenon through scientific practices, creating knowledge products such as computational systems models, encouraging productive collaboration, and utilizing learning technologies. Evaporative cooling describes the phenomenon of liquids becoming colder as fast moving, high kinetic energy (KE) particles evaporate first. These faster moving particles overcome the intermolecular forces (IMFs) and transition from a liquid to a gas. As they evaporate, the KE of these liquid particles is transferred to the potential energy (PE) of the gas particles. This loss of high KE particles reduces the average KE of the remaining liquid, causing the liquid to become colder and reducing the rate of evaporation. To represent this phenomenon using SageModeler, students need to demonstrate how both the mass and kinetic energy of a liquid transition into the mass and potential energy of a gas via evaporation (Figure 16A). Figure 16: Examples of Evaporative Cooling Models Figure 16A: Example of a Final Form Evaporative Cooling Model 101 Figure 16 (cont’d) Figure 16B: Backbone of Evaporative Cooling Model Students first interacted with this phenomenon by observing how rubbing alcohol, acetone, and water feel as they evaporate from their skin. They were then tasked with creating a diagrammatic model of evaporative cooling that addressed the unit’s driving question: Why do I feel colder when I am wet than when I am dry? Students were next introduced to SageModeler by Mr. H and Mr. M, who helped them construct the initial “backbone” relationship of their models, showing the mass of a liquid transforming into the mass of a gas via evaporation (Figure 16B). Students then worked in small groups (two to three students) to expand on this initial model. As the unit progressed, students were exposed to additional concepts (e.g., IMF, kinetic energy, and potential energy) through hands-on experiments, computerized learning modules (Figure 17), and classroom discussions, encouraging them to test, debug, and revise their models. Students were also given opportunities to receive structured feedback from other groups. Mr. H and Mr. M also anonymously presented student models during whole class discussions to allow for the class to collectively provide feedback to this anonymous group and to discuss aspects of these selected models that could benefit all students in their model revision process. Figure 17: Example of a Computerized Learning Module from the Evaporative Cooling Unit 102 SageModeler Throughout the evaporative cooling unit, students constructed, tested, debugged, and revised computational systems models using SageModeler – a free, browser-based, open-source software program. SageModeler is an icon-based modeling program that enables students to create variables and set relationships between these variables using a dropdown menu (Figure 18A). Students also design appropriate variables as “collectors” (variables that can accumulate an amount over time) and make transfer relationships/flows between these collector variables (Figure 18B). This can allow them to model how the mass of a liquid transitions into the mass of a gas during evaporation. SageModeler also has a simulation feature that allows students to manipulate the relative amount of each input variable to see how their model behaves under different initial conditions and how their model behavior changes over time (Figure 18A). Students can also input real world data into SageModeler and make graphs to compare their simulated model generated graphs with real world data (Figure 18C). Figure 18: Introduction to SageModeler Features Figure 18A: Setting Relationships and using the Simulate Feature Figure 18B: Collector and Flow Relationships 103 Figure 18 (cont’d) Figure 18C: Comparing Model Output Data to Experimental Data Data Collection I, along with Linsey Brennan and Tingting Li, collected data for this study in collaboration with Mr. H and Mr. M in November and December of 2022. My primary targets for data collection were teacher videos and student screencasts. Teacher videos were captured using a specialized microphone alongside whole class video using an iPad camera attached to a tripod stand. This allowed me to record how Mr. H and Mr. M supported student testing and debugging through whole class discussions, conversations with small groups/individual students, and behind the scenes efforts to troubleshoot learning technology. As such, the teacher audio is my primary data source for assessing Mr. H and Mr. M’s teacher moves. The student screencasts capture student audio and student screen actions as they constructed and revised their computational systems models using SageModeler software. These student screencasts were used to determine how students were testing and debugging their models during this unit. Student screencasts were collected from five student groups in Mr. H’s class (11 students total); their pseudonyms and demographics are listed in Table 10. Given the smaller class size and fewer student 104 volunteers, screencasts were not collected in Mr. M’s class. Students who are not screencast students (e.g., the students in Mr. M’s class) are given letter-based pseudonyms (i.e. Student A, Student B, etc.). Table 10: Screencast Student Pseudonyms and Demographics Student Group Student Pseudonyms Demographics Group 1 Reese and Eric South Asian Male, White Male Group 2 Esme and Lilly White Female, White Female Group 3 Carter, Sam, and Fred White Male, White Male, White Male Group 4 Tiffany and Anna South Asian Female, White Female Group 5 Morty and Isabelle White Male, White Female It is also important to note that either myself or another colleague was present each day of this unit. While our primary purpose was to set up our data collection equipment, we also assisted Mr. H and Mr. M with troubleshooting the various technology related problems associated with SageModeler their students encountered during this unit. As such, we can be said to have been active participants in this classroom environment. However, we did our best to limit our support to aiding with technology related challenges and to avoid providing any prompting or other scaffolds that would have helped students with learning science content, CS Modeling, or Testing and Debugging. While we occasionally gave brief, in class feedback to Mr. H and Mr. M, we largely avoided efforts to influence student learning or teacher pedagogy during classroom enactment, providing most of our suggestions after the lesson was finished on any given day. This was to show respect to Mr. H and Mr. M as professionals and to minimize the impact of our interference on student learning outcomes. Data Analysis I uploaded all teacher videos and all screencast videos into Atlas.ti for data analysis. Atlas.ti is a qualitative software analysis program that allows users to highlight specific segments of video, assign qualitative codes to these video segments, and make additional notes to summarize these video segments in addition to the broader qualitative codes. For the purposes of this study, I used separate sets of 105 qualitative codes for the teacher videos and for the screencast videos so I could narrow in on teacher pedagogical moves and student testing and debugging behaviors, respectively. Teacher Pedagogical Moves To categorize how teachers scaffolded students in testing and debugging, along with broader teacher pedagogical moves, I developed a three-tier coding system loosely based on Fretz and colleagues’ efforts to classify how teachers were supporting students with computational modeling (2002). Fretz and colleagues created three major categories to describe teacher pedagogical actions: pedagogical activities (what activities teachers have assigned to students at a specific moment in time), scaffold focus (the specific types of information/support teachers provide to help students), and targeted indicator (aspects of computational modeling the instruction is aiming to support). In my qualitative coding, I adapted these three categories into pedagogical method, pedagogical focus, and computational systems modeling content (CS Modeling content). This coding scheme was reviewed by two external experts in computational modeling, whose feedback was incorporated into the final version of this coding scheme. It was also validated by four researchers collaborating on this project, who each coded one set of three 30- minute segments of video to reach a coding consensus and an interrater reliability of 87%. Note that each quote used in this paper will include all corresponding categories in its description to provide further examples of how these categories were used to analyze the data. The pedagogical method category differs from the pedagogical activities category developed by Fretz and colleagues (2002). Rather than focusing on the actions teachers have assigned to students, my pedagogical method category describes the broader actions and methods Mr. H and Mr. M used to support student learning during any moment of teaching. Within the pedagogical method category are the following seven subcategories: information sharing/informational talk, teacher-centered whole class discourse, teacher discussions with small groups/individual students, computer and laboratory demonstrations, and behind-the-scenes actions/conversations (Table 11). These pedagogical methods subcategories were based on my own initial observations about the different ways Mr. H was communicating with students during his teaching, with the initial subcategories being “informational 106 talks”, “whole class discourse”, “small group discussions”, and “demonstrations”. Upon watching the first few classroom videos, I decided to subdivide the demonstration subcategory into three separate categories, to better capture the diversity of delivery methods used in these classrooms. I also added the “behind the scenes” subcategory after noticing that Mr. H and Mr. M frequently had important generative conversations about their teaching as students worked in their independent groups. Table 11: Pedagogical Method Subcategories Subcategory Name Informational Talk Whole class discourse Discussions with small groups Subcategory Description Subcategory Example The teacher speaks to the whole class and students are not verbally participating in discourse. Can last from 1 minute to 12 minutes based on data collected in this study. Mr. M tells students the agenda for that class period while students listen The teacher addresses the whole classroom and either asks students to share their ideas or has students ask questions, which might be answered by the teacher or by other students. Includes conversations where the teacher shares information with students but asks frequent questions to foster student participation. As Mr. M discusses evaporation, he asks students to share their experiences with evaporation to the whole class Teacher visits small groups and talks with them. Mr. H asks Reese and Eric to explain their models and provide feedback to them. Computer Demonstrations The teacher demonstrates how to use an aspect of SageModeler or another piece of software in front of the whole class. Mr. H shows students how to use the simulate features of SageModeler Laboratory Demonstrations Teacher presents a key scientific principle or laboratory technique through a visual experiment or demonstration Video Demonstrations The teacher demonstrates a key concept through a video. Behind the Scenes Teacher talks with the researcher or another teacher OR the teacher troubleshoots a technology problem on his own. Usually out of view of students. Mr. H shows that despite being less dense than water, canola oil does not seem to evaporate after 30 minutes Mr. M has students watch a video about diagrammatic models. Mr. H shares his experiences with leading a discussion of evaporation with Mr. M while students are revising their models. 107 The “pedagogical focus” category classifies the various types of information that Mr. H and Mr. M are seeking to communicate with students through their teaching (Table 12). Several of the subcategories of “pedagogical focus” were directly adapted from the “teacher scaffolds” category of Fretz and colleagues (2002). One of the broader categories of pedagogical focus is classroom housekeeping, which includes efforts to organize student groups, describe present and future tasks, and redirecting students back to their assigned work. This was inspired by the “task” scaffold subcategory of Fretz and colleagues, which classified efforts taken by teachers to help redirect students back towards computational modeling tasks. In addition to classroom housekeeping, I added “relationship building” to describe interactions where the teachers were primarily focused on building rapport and community with their students. The “science content” subcategory was adapted from the “conceptual” scaffold subcategory as a means of describing episodes where teachers were explicitly supporting students with learning key principles of science content knowledge. Technology and Sagemodeler utility come from the “utility” scaffold, which also emphasizes how to use learning software and technology. Given the many challenges students faced when various technology software and hardware malfunctioned, I also added the “technology troubleshooting” category to cover efforts by the teachers to help students with technology when it was not working as intended. Lastly, CS Modeling is the broader pedagogical focus category for the entire concept of Computational Systems Modeling, which is explained more in-depth in the next category of codes. 108 Table 12: Pedagogical Focus Categories Category Name Category Description Category Example Classroom Housekeeping Science Content Teacher discusses the class agenda, student tasks, and classroom organization. Focus is on managing the classroom as a learning community. Includes task redirection. Mr. H helps a student group that has been absent for a few days figure out what tasks they need to focus on to get caught up to their peers. Teacher talks about key scientific concepts or principles. In this unit, the focus is on evaporative cooling, energy, and IMF. Mr. M discusses different types of Potential Energy and Kinetic Energy that exist in nature. Relationship Building The teacher talks with students about topics that pertain to student personal lives and non-academic interests. Mr. H has an opening discussion asking students about their favorite Pixar movies. Technology Utility Teacher demonstrates or talks about how to use technology and software. Includes the learning management system and data collection tools for the experiments. Does not include any use of SageModeler. Mr. H demonstrates how to log into the learning management system; Mr. M demonstrates how to use the data collection tools for the temperature vs. time experiment. Sagemodeler Utility The teacher demonstrates or discusses how to use key aspects of SageModeler. Mr. H demonstrates how to use the simulate feature. Technology Troubleshooting CS Modeling Teacher focuses on fixing problems that have arisen from issues with technology. Includes SageModeler, the learning management system, and physical technology. The focus is on when technology is not behaving as intended (as contrasted to technology utility). Mr. H works with individual students to try and recover their models which have not been saved by the learning management system; Mr. M tells students how to work around a key glitch in SageModeler software. The teacher highlights key concepts and key technology tools associated with supporting students in the process of constructing, testing, debugging, and revising computational systems models. Mr. H provides students with the peer review guidelines to help scaffold the peer review process. Mr. M asks individual groups to explain their models. CS modeling content describes the specific aspects of CS modeling that Mr. H and Mr. M are focusing on during their teaching (Table 13). As such, CS modeling is technically a subset of pedagogical content. The CS modeling categories are based directly on key indicators from the Systems Thinking and 109 Computational Thinking Identification Tool, along with key ideas from “A Framework for Computational Systems Modeling.” Given that this study predominantly focuses on how Mr. H and Mr. M are supporting their students with testing and debugging, the categories of CS modeling reflect this emphasis on the three aspects of testing and debugging (analyzing model output through simulations, using external data to support model revisions, and using external feedback through peer review) I am targeting in this study. In addition to those three categories, I also chose to categorize instances where teachers pointed out specific model components or where they discussed broader concepts related to systems thinking. Using these categories to classify instances where these two teachers were instructing students in the three main aspects of testing and debugging served as the foundation for more in-depth narrative analyses. Table 13: CS Modeling Content Categories Category Name Category Description Category Example Analyzing Model Output Teacher supports students with using the simulate features to analyze model output to revise their models. Mr. H gives a 5-minute informational talk about the importance of using the simulate features to speed up the revision process. External Data Peer Review Model Components Teacher supports students with using external data (both quantitative and qualitative) to support model revisions. Mr. M shows students how to input external data into SageModeler and how to compare these data with model output. Teacher assists students in both critiquing peer models and in utilizing peer feedback to drive future model revisions. Mr. H leads a whole class model critique, showing students how to critique peer models. Mr. M organizes peer reviews. The teacher points out specific model components (variables and relationships) that students should revise in their models. In a small group discussion, Mr. H asks students about the “density” variable and why they have density in their model. Systems Thinking The teacher assists students in understanding and utilizing systems thinking principles. Mr. M discusses feedback loops during a whole class model critique. 110 Student Testing and Debugging Behaviors To classify student testing and debugging behaviors, I used three key indicators from the ST and CT Identification Tool (Table 14). This coding scheme, originally developed based on A Framework for Computational Systems Modeling, identifies six major testing and debugging behaviors that students often use to test and debug computational systems models (Shin et al., 2022; Bowers et al., 2022). The ST and CT Identification Tool was originally designed for analyzing screencasts of students building and revising SageModeler models and was reviewed by a panel of five external expert reviewers and by four internal reviewers (with a 91.7% agreement between all four raters). Given that this instrument has previously been validated, both externally and internally using screencast data that closely mirror the student screencast data found in this study (Bowers et al., 2022, 2023), I find this instrument to be appropriate for assessing student testing and debugging behaviors in this study. Although there are a total of six indicators in the original ST and CT Identification tool, I only used the three indicators associated with the aspects of testing and debugging that were most relevant for this study (i.e., analyzing the model output, analyzing, and using external data, and using feedback). These three indicators were chosen based on earlier literature suggesting that students either found these aspects of testing and debugging particularly challenging or that additional teacher support is needed for students to fully demonstrate these aspects of testing and debugging (Bowers et al., 2023; Grapin et al., 2022; Li et al., 2019; Louca & Zacharia, 2012; Sins et al., 2005). I also used the level descriptions to assign appropriate levels for student testing and debugging behaviors based on this rubric as I analyzed student screencasts. These individual categories were strongly supported by previous efforts to validate this instrument (Bowers et al., 2022, 2023; Table 4), achieving a higher degree of agreement across the four reviewers than the other three categories. Removing the other three indicators (Sensemaking through Discourse, Analyzing Model Output: Graphs, and Reflecting upon Iterative Refinement) narrows the scope of my results, However, my previous studies (Bowers et al., 2022, 2023) suggest that these additional aspects of the ST and CT Identification Tool, were both more challenging to track and less meaningful indicators of student proficiency with testing and debugging. Additionally, because none of 111 these categories were emphasized in the professional learning program nor were specifically targeted by curricular or software scaffolds, their absence serves to streamline this study to focus on the most relevant and well supported aspects of testing and debugging present in this unit. Table 14: Description of key indicators from the ST and CT Identification Tool (Bowers et al., 2022) Indicator Description Brief Level Descriptions B: Analyzing Model Output: Simulations D: Analyzing and Using External Data E: Using Feedback Students use embedded model output tools to analyze how their model behaves under different input conditions. In this case, students use the simulation tool in SageModeler to test their models. Students use external data sources to verify model behavior. At more sophisticated levels, students compare specific external data sources directly to their models and discuss the validity of the external data. Level 1: Adjusting one or more input variables, but no verbal reasoning Level 2: Adjusting input variables with verbal reasoning but no dialogue Level 3: Adjusting input variables with verbal reasoning and dialogue, focus on local behavior Level 4: Adjusting input variables with verbal reasoning and dialogue, holistic model discussion Level 1: Superficial reference to data or referencing inaccurate data Level 2: Reference external data to inform revisions but no direct comparisons to model output Level 3: Compare specific external data to model output without discussion of data validity Level 4: Compare specific external data to model output with discussion of data validity Students receive meaningful feedback from others (teachers or peers), discuss the validity of the feedback, and use feedback to inform model revisions. At more sophisticated levels, students test their models after making recommended changes and have a follow-up discussion with others to share their new insights. Level 1: Students receive feedback but do not discuss it or use it to inform revisions Level 2: Students make changes to their models based on feedback but do not discuss the validity of the feedback Level 3: Students receive feedback, discuss its validity, and make or do not make changes to their models based on feedback Level 4: Students receive feedback, discuss its validity, make, or do not make changes to their models based on feedback, and share reflections with another group 112 Preliminary Analysis and Summary Table Construction Once I developed these qualitative coding rubrics, I began coding both the teacher videos and the student screencasts using these qualitative codes and making additional notes to summarize key teaching moments and key examples of student testing and debugging. Once I had finished my initial coding, I constructed a summary table for the teacher videos and the student screencasts. For the teacher videos, I added up the minutes for each category in the coding scheme for each day. This allows exploration of how frequently Mr. H and Mr. M utilized different pedagogical methods, emphasizing different pedagogical content, and focusing on specific aspects of CS modeling to see how their pedagogical focuses compared with each other and shifted over the course of the unit. For the student screencasts, I also added up the minutes for each coding category for each student group for each day and then combined data from all five groups into a single unified data set. This enables us to compare student testing and debugging behaviors with Mr. H’s pedagogical methods and pedagogical content. While the summary tables alone were unable to provide an adequate answer to the research questions, they helped to inform the narrative analyses. Narrative Analyses Building off the summary tables and my preliminary data analysis, I revisited the teacher videos with a focus on teaching moments where Mr. H and Mr. M supported students with CS modeling content. I specifically rewatched every part of the videos where I coded for “Analyzing Model Content,” “External Data,” and “Peer Review,” taking notes on their pedagogical moves and the specific aspects of these testing and debugging behaviors they chose to focus on in those moments. Based on these additional notes and my initial findings from the summary tables, I wrote a detailed narrative analysis of Mr. H’s pedagogical strategies for supporting students in these three areas of testing and debugging, thus addressing Research Question 1: How does a teacher support students with testing and debugging in a secondary science unit involving computational systems modeling? Additionally, I conducted a parallel narrative analysis of Mr. M’s pedagogical strategies to address Research Question 2: How do these 113 pedagogical strategies compare to those used by another teacher teaching the same secondary science unit? Once I had finished the teacher focused narrative analyses, I returned to the summary tables to see how student testing and debugging behaviors compared to Mr. H’s efforts to support them with CS modeling content. I also systematically rewatched student screencasts to see what aspects of the learning environment prompted them to test and debug their models and if there were specific aspects of Mr. H’s teaching that were particularly helpful for encouraging students to use more advanced testing and debugging techniques. These determinations were made based on the proximity of student testing and debugging behaviors to specific moments of Mr. H’s teaching as well as student appropriation of key phrases or testing and debugging behaviors previously shared by Mr. H. This further investigation culminated in a narrative analysis that summarizes my findings and addresses Research Question 3: What pedagogical strategies correlate with student testing and debugging behaviors in this secondary science unit? After completing these narrative analyses, I had Mr. H and Mr. M review my findings as a form of member checking to further validate my interpretation of their pedagogical strategies. Both Mr. H and Mr. M largely agreed with my interpretation of these data and offered some additional context, particularly on their rationale for certain pedagogical strategies, that informed the final version of this manuscript. Results Research Question 1: How does a teacher support students with testing and debugging in a secondary science unit involving computational systems modeling? The summary tables illustrate that Mr. H used a diverse array of pedagogical methods to support his students throughout this unit (Table 15). Mr. H commonly supported students through discussions with small groups and individual students (256 minutes, 26.6% of class time). These discussions often corresponded to moments where the small groups were working on their evaporative cooling models, and Mr. H offered individualized support with the modeling process. Mr. H also spent a substantial amount of time providing information to the whole class (153 minutes, 15.9% of class time). This included both brief comments, such as when he shares an issue he has found while working with a small group with the 114 whole class and longer informational talks, where Mr. H explained a specific scientific concept or modeling principle. In addition to these informational talks, Mr. H provided opportunities for interactive whole class discussions (167 minutes, 17.4% of class time). These whole class discussions included discussing opening questions meant to ease students into learning and build classroom community, sharing and answering phenomenon-driven questions from the driving question board, and critiquing anonymous models shared by Mr. H to build student familiarity with reviewing peer models and encourage self-reflection and revisions of their own models. In addition to incorporating these three main pedagogical methods, Mr. H also had to balance different pedagogical foci across this unit. The three most common categories Mr. H focused on were: CS Modeling (278 minutes, 29% of class time), Science Content (237 minutes, 24.7% of class time) and Classroom Housekeeping (230 minutes, 23.9% of class time). While the focus on CS Modeling and Science Content are self-explanatory (given the design goals of this unit), the substantial amount of time spent on Classroom Housekeeping seems to originate from the organizational complexity of this unit and the need to support students through key transitions between classroom activities. Students frequently shifted between hands-on investigations, whole class informational talks and discussions, and small group work on their computational systems models and associated learning modules. Through these transitions, Mr. H provided instructional support to keep students moving forward. Additionally, Mr. H needed to organize student peer reviews and other key logistical aspects of this unit. It is also possible that conducting this unit towards the end of the fall semester (right before winter break) could have contributed to more Classroom Housekeeping being necessary. Beyond these three main categories, Mr. H spent a significant amount of class time teaching students how to use SageModeler (SageModeler Utility, 95 minutes, 9.9% of class time) and other classroom technologies (Technology Utility, 59 minutes, 6.1% of class time) such as the learning management system associated with this unit and laboratory technology for recording temperature data. Mr. H also spent a significant amount of class time troubleshooting SageModeler and the learning management system (100 minutes, 10.4% of class time). 115 Such troubleshooting efforts limited Mr. H’s ability to provide more support with student models and with key aspects of testing and debugging. Within his focus on CS modeling, Mr. H provided targeted support for the testing and debugging aspects of analyzing model output (78 minutes, 8.1% of class time), analyzing and using external data, (63 minutes, 6.6% of class time), and using feedback/peer review (99 minutes, 10.3% of class time). Mr. H’s emphasis on these three aspects of testing and debugging varied throughout this unit, largely mirroring the changing emphasis placed on each by the curriculum. For the first three days of this unit, Mr. H provided little direct support for testing and debugging as students were building their initial models. On Nov 17th (Day 4) and Nov 21st (Day 5), Mr. H provided targeted instruction for analyzing model output, emphasizing the need to use the simulation features of SageModeler to drive model revisions. Likewise, Mr. H helped scaffold student peer reviews on Nov 21st (Day 5) and Dec 1st (Day 7), providing both organizational and instructional supports. Mr. H had extended (10-minute) informational talks on using external data to validate their models on Dec 12th (Day 10) and Dec 15th (Day 11), which coincided with students collecting quantitative data on how the temperature of water, rubbing alcohol, and acetone change during the process of evaporation. Additionally, when showing students how to compare external data to model output on Dec 15th (Day 11), Mr. H also reinforced the need to analyze model output using the simulation features, showing a synergistic approach for supporting both practices. On both Dec 12th (Day 10) and Dec 19th (Day 12), Mr. H facilitated a whole class review of anonymized student models, once more showcasing key aspects needed for peer reviews. A more detailed narrative analysis of how Mr. H supported each of these practices is found below. In addition to focusing on testing and debugging, Mr. H also dedicated a substantial amount of time to systems thinking (160 minutes, 16.6% of class time) and targeting specific model components (118 minutes, 12.3% of class time), e.g., pointing out specific variables and relationships students should revise in their models. 116 Table 15: Summary Table of Mr. H’s pedagogical methods Note that time is in minutes (rounded to the nearest .25 minutes). The percentage is based on 960 minutes of class time across all 12 days (80 minutes per day). Category Nov 7 Day 1 Nov10 Day 2 Nov 14 Day 3 Nov 17 Day 4 Nov 21 Day 5 Nov 28 Day 6 Info Sharing 20.25 14.25 9 Whole Class 4 7.5 27 Small Group 22.5 20.25 26.75 9 23.5 8 9 0 0 5 0 0 0 0 23.5 14.25 5 0 0 6.5 4.5 4.25 27.25 24.75 26.5 11.25 Comp Demos Lab Demos Videos BTS Classroom Housekeeping Science Content Relationship Building Tech Utility 2.75 6.25 3.25 3.75 Sage Utility Tech Tshoot 0 16 0 3 19.5 10 6 2 CS Modeling Model Output External Data Peer Review Model Components Systems Thinking 1 0 0 0 0 0 0 0 0 4 4.5 0 1.5 10.25 16.5 20 10.5 0 9 6 9 117 Dec 1 Day 7 19.5 13.75 10 4 39 0 0 0 0.5 16.5 18.5 13.75 6 0 7 0 2.5 19.5 0 4.25 2.25 3 10.25 16.75 3 4 12 5.25 0.75 0 3 17 0 13 20 3.25 0 4.75 4 7.75 15.5 7.75 38.5 4.5 0 29.75 10.25 14.75 4.75 18.25 10.25 39.25 25.25 19.25 15.75 22.5 18.25 3.5 0.5 2.75 6 3.75 4.25 23.75 26 11.25 41.25 11.25 Table 15 (cont’d) Category Dec 5 Day 8 Dec 8 Day 9 Dec 12 Day 10 Dec 15 Day 11 Dec 19 Day 12 Total % Info Sharing 11 13.75 12.75 8.5 Whole Class 19.75 4.75 19.75 11.5 16 3 0.75 0 28 12.5 0 0 11.25 13.25 22 0 0 0 153 167 256 48 38 7 57 15.9 17.4 26.6 5 3.9 0.8 6 6.75 6.25 5.25 8.25 19.25 19.25 11.75 30.25 230 23.9 Small Group 13.75 34.25 0 0 0 Comp Demos 0 Lab Demos 11.5 Videos BTS Classroom Housekeeping 0 4.5 16 Science Content 39.5 10.5 Relationship Building Tech Utility Sage Utility Tech Tshoot 3 6.5 4.5 0 0.5 0 2.5 27 CS Modeling 8.25 17.25 Model Output External Data Peer Review 0 2.5 0 Model Components 0 3.75 4.5 3.5 7 18.5 5.25 8.25 17.25 9.5 5.5 237 24.7 61 6.4 23.75 0 2.25 11.25 28.25 0 0 37 9 17.5 38.5 8.25 24.25 20.25 5.5 17.75 17.75 30 0 3.75 21.25 15.25 28.75 15.25 59 95 100 278 78 63 99 118 160 6.1 9.9 10.4 29 8.1 6.6 10.3 12.3 16.6 Systems Thinking 8.25 8.75 16.75 27.5 15.25 Analyzing Model Output Mr. H first introduced the students to the simulation features within the context of a whole class critique of an anonymized student model. He began by pulling up a student model that had an undefined 118 relationship between temperature and the number of liquid particles (Figure 19). Mr. H then asked the students what is impacting the model beyond the initial model backbone (the transfer relationship between the number of liquid particles and the number of gas particles), to which they responded, “the temperature.” Mr. H then said, So, we have temperature affecting this. Or does it? I am going to zoom out a little bit because I need to be able to click this one (the simulate button). (Mr. H clicks the simulate button). Since we should start with a lot of particles, I am going to keep that up there. (Mr. H keeps the slider on the number of liquid particles high). But you said temperature is a factor. So, I am going to be changing this line right here (slider bar for temperature) which means I am going to be changing the amount of heat or temperature and if we had a temperature factor that would mean that that would change other aspects of this model but look what happens. (Mr. H moves this slider bar up and down and nothing happens). (Relevant Categories: Whole Class Discussion, Computer Demonstration, SageModeler Utility, CS Modeling, Analyzing Model Output, Peer Review, Systems Thinking) Through this initial introduction, Mr. H showed the students how to use the simulate feature of SageModeler to identify an area of their model that needed to be changed (in this case, the students needed to define the relationship between temperature and number of liquid particles so that temperature can have an impact on model behavior). Given that the simulation feature is a built-in software scaffold for supporting students in analyzing model output, this is an example of Mr. H supporting students by highlighting technological scaffolds built into SageModeler. 119 Figure 19: Student Model with undefined temperature relationship After showing the students a few other examples of using the simulate feature to analyze model output, Mr. H made a strong case for why analyzing model output through using the simulation feature can help students improve their models faster. I am saying “make sure you run your simulation,” because sometimes when you run a simulation, you go “uh oh.” Because sometimes this doesn't work the way my mind said it should, the way my evidence in the back (referencing hands on experiments done in the back of the room) said it should, the way that all of the stuff I worked on, and my understanding said it should. So, take the time to change it. Play around with it and change. There is something that I want you guys to do in this unit and it will help you an infinite amount. I want you guys to fail faster. And I know you are looking like me like "did Mr. H say he wants us to fail?" No. I want you to fail faster. And what that means is I want you to throw those ideas down, make those connections, run that simulation, and say "Oh crud that's not working. Alright let's change this around." Because the more iterations and the faster you fail and the quicker you go through this process, the quicker that you will get your model into something you like instead of debating for 5 or 10 minutes where you are going to connect heat. Instead of debating, connect heat, run the simulation, and change it. That is what I mean when I say, "fail faster". (Relevant Categories: Information Sharing, CS Modeling, Analyzing Model Output). 120 The idea of “failing faster” by making model modifications and testing them immediately through the simulation feature demonstrated Mr. H’s interpretation of the importance of analyzing model output as a primary mechanism for facilitating student testing and debugging. By making a strong case for having students frequently analyze their model output through simulations early in the unit, Mr. H aimed to encourage this testing and debugging behavior to help students improve their models throughout the unit. It is also important to note how this “fail faster” talk builds directly off his earlier demonstrations of how to use the simulation features embedded in SageModeler and adds additional context to underscore the importance of these earlier supports. As such this is an example of synergistic scaffolding. In addition to demonstrating how to use the testing and debugging features in SageModeler and arguing for the benefits of frequently using the simulation features to assess model behavior, Mr. H reinforced this practice throughout the unit in his discussions with small groups. In this example, the students are struggling to figure out where to go next with their model and call on Mr. H for assistance. Anna: So, we don't really know what to change this arrow to or how to change our model. Mr. H: Simulate. Tiffany: Should we mention the IMF (Intermolecular force)? Mr. H: Absolutely, if you think that it belongs there. See in the upper right-hand corner. The way that you make it a better model is that you click that simulate button. We want to fail faster. So, you have an idea of what each of those boxes should look like, right? How the graphs should look over time. Anna: Those are graphs? Mr. H: Yeah, those are graphs that depict what is happening to that variable overtime. So, is that overtime happening the way you think it should? Tiffany: No. Mr. H: So, play with the slider bars and see what happens. (Relevant Codes: Discussions with Small Groups, SageModeler Utility, CS Modeling, Analyzing Model Output, Model Components.) 121 Through this conversation, Mr. H not only showed these students how to use the simulation features to test model output but also reinforced key testing and debugging features by directly referencing his earlier “fail faster” informational talk, once again demonstrating synergy across multiple supports. Once these students have recognized that their model does not behave the way they think it should, Mr. H suggested that they “stop the simulation, change your relationships and see if you can make it work the way you think it should.” In this example, Mr. H provided individualized support on using the simulation features of SageModeler to interpret the model output. His support focused on the technical aspects of SageModeler (turning on the simulation features, recognizing how model output is displayed in the program) and the rationale behind analyzing and interpreting model output as a form of testing and debugging while still allowing students to freely draw their own conclusions from the model output and refraining from telling them to make specific changes to their models. This example further spotlights how Mr. H is building on earlier discussions and demonstrations, specifically his fail faster informational talk, to reinforce student learning. Analyzing External Data for Model Verification Early on in this unit, Mr. H tried to have students use experimental data to validate their models. After the students had finished an activity where they counted the number of drops of each liquid (water, acetone, and rubbing alcohol) that would fit on a penny, Mr. H asked them to input the data into a preset data table in the online learning program and use it to inform their modeling process. “First of all, when you take your data on this lab activity, before you segue into building your model, it is important that you get all three trials in. Because if you don't get all three trials in, you won't be able to get your average and do the thing I am about to show you. So once you get your isopropyl, acetone, and water data put that name of the liquid right here and what we are doing right now is figuring out how to put raw data from a lab into here so that with the other thing that we are going to be doing today, you guys are going to be able to start running some simulations and then you are going to see if you are going to get your numbers to match your 122 simulations.” (Relevant Categories: Computer Demonstration, SageModeler Utility, CS Modeling, Analyzing Model Output, External Data). Although Mr. H had aimed to have students use the raw data from this experiment as a means of model evaluation, most student groups were actively constructing their initial models at that time and were, therefore, not yet ready to consider using external data to validate their models. Later in the unit, Mr. H returned to focusing on having students make use of external data in the testing and debugging process. On December 8th (Day 9), the class period before students were expected to collect experimental data on how the temperature of acetone, rubbing alcohol, and water change during the process of evaporation, Mr. H reiterated the importance of using external data to validate student models. Today is about “revision, revision, revision”. The goal by the end of this class is “Can we incorporate potential and kinetic energy into the models so that we can make it behave the way that we know it should.” The way we experienced it with the embodied model. The way that we saw with a lot of other simulations that we saw within the unit. Can we get this model to represent, as best as possible, what is going on so that on Monday, when we jump right in and we take lab data, when we put that lab data in on Thursday next week, we can see how awesome our models are. But those models have to be ready for when we hit Thursday. (Relevant Categories: Information Sharing, CS Modeling, External Data, Model Components). Through this short informational talk, Mr. H reiterated the importance of model revisions and sets the stage for students using external data from their laboratory experiments to drive testing and debugging. He also reminded students of prior classroom experiences that showcased different key aspects of the evaporative cooling phenomenon and should be influencing their models. Because Mr. H uses this talk to prepare students for the demonstrations of how to use SageModeler to compare model output with external data, it is also an example of synergy between scaffolds. After students had collected their experimental data on how the temperature of the three liquids changed during evaporation, Mr. H showed the students how to input the data into SageModeler. 123 We did 15 second intervals, I am not going to ask you for all of the data. I am just going to ask you for a little bit. You should have a zero point, so every 15 seconds . . . You guys said it stopped (the acetone stopped decreasing in temperature) after 90 seconds? We will throw 105 in there. I will do the same for the isopropyl and the water. . . . All you are going to do to get to this data table is on page 5.2, go up here to tables and hit temperature vs. time. When you do that, this table will open up for you. (Relevant Categories: Computer Demonstrations, SageModeler Utility, CS Modeling, External Data). Once Mr. H had finished showing the students how to input the external data into SageModeler, he then showed them how to generate a graph from this external data, how to create a similar graphical output from their model output, and how to overlay these two graphs to compare their model output to real-world data (Figure 6C). As such, Mr. H supported students in using existing technological scaffolds that are important for students to be able to analyze external data and use said data to verify model output. His demonstration of these technological scaffolds for using external data to validate model output largely mirrors conversations during the PLC on how to support students with this practice. In addition to showing the students the mechanics of how to use SageModeler to compare model output to external data, Mr. H gave his students the task of using this data analysis to revise their models. Your goal today is to spend about half to two thirds of the remaining time to enter your data into 5.2 and then play. . .. Your job is going to be to overlay your graph (model output graph) with this graph (external data graph) and then play. Try to figure out how to make things match. (Relevant Categories: Information Sharing, Classroom Housekeeping, CS Modeling, External Data). This direction asked students to spend much of the remainder of class time using the external data to drive model revisions and to improve their models so that it matches the experimental results. Across this set of computer demonstrations and informational talks, Mr. H used multiple modalities in a synergistic manner to better support students in analyzing external data to validate model output. 124 Just as Mr. H showed the class how to use SageModeler to compare model output data to external experimental data, he also helped individual student groups with this task. In one instance, the online learning management system accidentally did not save a group’s experimental data, leading Mr. H to troubleshoot. Tiffany: We can’t find our graph. I can’t find the graph we made in last class. Do I have to do all of this again? Mr. H: Well as long as. . .. Oh my. All of your data is not there either. Anna: Well, we have nothing apparently. Mr. H: Alright, so here is what I would like you to do. Did you try to copy over from the previous pages. . .. . If you can’t get it to work, just run your simulations. You know what that graph looked like every single time. So, you know what the graph should look like. So do some model revisions based on what you know it should look like. (Relevant Categories: Small Groups, Tech Troubleshooting, CS Modeling, Analyzing Model Output, External Data). In this instance, Mr. H first tried to help the students recover their data from an earlier page in the program. Once that option led to a dead end, rather than having the students go through the tedious task of reentering their data into SageModeler, Mr. H suggested a more efficient solution. By having students compare their model output to the exponential curve of temperature vs. time they remembered from the previous class period, Mr. H helped the students overcome the hurdle of repeating their past work, allowing them to more easily revise their models based on their experimental data. Although issues with the software forced the student to deviate from the designed technological scaffolds associated with verifying model output using external data, Mr. H was able to provide additional support so that the student could still complete the task while still building off the principles of this practice that he shared earlier in the unit. 125 Peer Review On November 21 (Day 5), Mr. H provided direct instructions on how to get the most out of the peer review process when reviewing computational systems models. At the beginning of the lesson, Mr. H introduced his students to a set of general model design guidelines (which were intended to help scaffold the model revision process by being a general checklist that students could use to help them identify aspects of their models that needed to be revised) and peer review guidelines. These peer review guidelines, which were reviewed by Mr. H and Mr. M during the PLC, included three general goals for peer reflections, and several sentence starters meant to help students respectfully provide constructive feedback to their peers and were projected in front of the whole class (Figure 20). Figure 20: Peer Reflection guidelines As Mr. H had these Peer Review Guidelines displayed for the whole class, he began sharing his advice for peer review of student models. 126 “Today we are doing our first peer reflection. And when you do a peer reflection, what you are trying to do is . . . First, leave your ego at the door. No model is perfect. Even with models that I create, Mr. X (the author) and Mr. M always look at them and say, “Why did you do this right here? How could we do this better?” Even my models aren’t perfect. Because there is no perfect model. There are only steps along the way. So, realizing that your model isn’t perfect, be open to feedback from someone else. Be open to “ooh why did you do this?” from someone else. Because these are the three things that are important inside of that peer reflection. You are trying to help your classmates refine your model, trying to make it as best as you can. You are trying to prepare yourself for the whole class discussion. Also, there are so many different ways that we can model the phenomenon that there is no correct way to create your model. Sometimes seeing it from a different perspective and how the phenomenon is being modeled helps us understand it better. When you have your discussion, use these (peer reflection questions) as sentence starter guidelines. Take notes inside of your lab book as you are looking at someone else’s model on things that look cool that you want to incorporate into your model or questions that you have for them, based on these five sentence starters.” (Relevant categories: Information Sharing, CS Modeling, Peer Review). Mr. H’s informational talk mirrors the peer reflection guidelines as they were discussed in the PLC (and were visually presented to the students), but his focus on the importance of humility in this process represents his personal interpretation of these guidelines. This brief informational talk on peer reviews provided Mr. H with an opportunity to emphasize the key goals of the peer review process (as defined by the curriculum), showcase the sentence starters (a built in curricular scaffold) designed to help scaffold student conversations around peer reflections, and communicate the importance of humility to getting the most out of peer reflections all in a manner that synergized with the curricular scaffolds (e.g. the peer review guidelines) present in the unit. By using both visual (the projection of the peer reflection guidelines) and auditory (the informational talk) media, Mr. H made use of multiple modalities to scaffold students with using peer feedback in addition to sharing existing curricular scaffolds with students. 127 Mr. H also spent a lot of time helping organize and troubleshoot the peer review process. After a behind the scenes effort to troubleshoot the peer-sharing features embedded into the online learning platform, Mr. H told the students how to navigate this feature. “So, you are going to go to page 3.4. You are going to grab your boxes (model variables) and you are just going to shift your boxes around a little bit (move them slightly on the screen). You are not going to actually change anything about your model. But you are going to force it to save. Then you are going to click the little up arrow to share.” (Relevant categories: Information Sharing, Tech Utility, CS Modeling, Peer Review). This is another example of Mr. H supporting students in using existing technological scaffolding. After providing instruction on how to use the peer sharing features, Mr. H identified student groups who were ready to share their models and organized these peer review sessions. “Esme, I want you to share out with Tiffany and Anna. So, Esme to Tiffany, Tiffany to Esme. . . Eric, I want you to share with Isabelle.” (Relevant categories: Small Group, Classroom Housekeeping, CS Modeling, Peer Review). These organizational and troubleshooting efforts were meant to help ensure that all students could get the most out of the peer feedback process. Mr. H also found opportunities to reinforce the need to reflect on peer feedback as he met with small groups as they were actively revising their models. Mr. H: So, what changes would you want to make to your model based on the comments you got. Morty: They were not very constructive; I hope we would have got some constructive criticism on our model. But yeah, I think it is pretty good. Mr. H: What do you want to change? How do you think you can make it better? Isabelle: I think there is another variable we could add to make this flow easier. Morty: I don’t really know exactly how Kinetic Energy affects this whole thing so if I understood more, it would probably be easier. (Relevant Categories: Small Groups, CS Modeling, Peer Review, Model Components, Systems Thinking). With this conversation, Mr. H aimed to get the students to consider the feedback they received from their peers as a starting point for model revisions. As it appears that this group did not receive the most 128 sophisticated feedback from their peers, Mr. H further asked them to consider other aspects of their model that they can improve upon. Thus Mr. H managed to encourage students to reflect on peer feedback while also pushing them to consider other aspects of their models to revise. In another example, Esme and Lilly were unsure of what to do as they waited for their turn to use the laboratory equipment to test how the temperature changed over time. Mr. H suggested, “While you are biding time, if you want to go to your peer review model and make any modification, because I know you guys were in the middle of trying to turn your chain into a feedback loop.” (Relevant Categories: Small Groups, Classroom Housekeeping, CS Modeling, Peer Review, Systems Thinking). This redirection encouraged the students to return to making changes to their model based on peer feedback. By emphasizing the specific aspect of peer feedback that was most relevant to their model structure (creating a feedback loop), Mr. H helped simplify the overall task, which made it more feasible for these students to accomplish before they began collecting data to test their model behavior. Through his conversations with students, Mr. H reinforced key messages about peer feedback from his earlier informational talks and helped students use other scaffolds embedded into SageModeler software and the curriculum. Lastly, Mr. H supported students in the peer review process by having students practice evaluating peer models through whole-class model evaluations. In these evaluations, Mr. H displayed a student model anonymously and walked students through the process of critiquing said model. Mr. H: So here is the first one I want you to look at. Remember we don’t claim models, we look at them, analyze them, and offer feedback blindly to whomever’s model this is so they can make improvements. First of all, what is the very first thing you notice about this model? Does it have the ability to answer the modeling question? What does the final model output go to? Sam: Temperature Mr. H: So, the final model output goes to temperature. So, does this have the ability to answer the driving modeling question? Isabelle: Yes. 129 (Relevant Categories: Whole Class Discussion, CS Modeling, Analyzing Model Output, Peer Review, Systems Thinking). In this initial part of the model evaluation process, Mr. H reaffirmed the goals of model evaluation. He then asked the students to consider the model output and how this relates to its ability to address the driving question of the unit (why do we feel colder when we are wet than when we are dry?). As this whole class evaluation continued, Mr. H asked his students to consider the input variables that are present in this model (by considering the number of manipulation bars/slider bars that are present in the model). Mr. H: How many manipulation bars are we going to have on this? Lilly 6 Mr. H: 6? Where do you see six? Remember what always gets a bar. What always gets a bar? Reese: The collector? Mr. H: Yep collectors, what else? Carter: Anything that doesn’t have anything else feeding into it? Mr. H: Yeah, any that doesn’t have anything else feeding into it. So, l anything at the beginning of a chain will have bars as well. So how many bars are here? Fred: 3. (Relevant Categories: Whole Class Discussion, SageModeler Utility, CS Modeling, Peer Review, Systems Thinking). Through these lines of questioning, Mr. H demonstrated the sorts of questions that students should ask as they analyze the models of their peers as well as when reflecting on their own models. He also used these questions to support students with other key aspects of CS modeling, including analyzing model output and systems thinking, showing an ability to use a singular scaffold to support multiple learning goals. Summary Overall, Mr. H used synergistic pedagogical strategies to support students with the testing and debugging behaviors of analyzing the model output, analyzing external data for verifying model behavior and using peer feedback (Table 16). Across these three aspects, Mr. H provided targeted demonstrations 130 of the associated software tools and scaffolds and modeled the reasoning pathways necessary for students to engage in these aspects of testing and debugging. In particular, he showed the whole class how to use the simulation features of SageModeler to analyze model output and went in-depth on how to input external data into SageModeler and how to compare the model output with the external data to verify model behavior. He also used whole class model reviews to showcase the types of questions students should ask during peer review as well as other aspects of testing and debugging and CS Modeling, supporting multiple learning goals with a unified set of scaffolds. Mr. H also gave relevant informational talks on the rationale and importance for each aspect of testing and debugging. For analyzing model output, Mr. H encouraged students to adopt a strategy of frequently analyzing model output via the simulation features after making changes to their models so that they could more quickly identify flaws in their models or “fail faster” and more rapidly improve their models compared to discussing model structures without analyzing model behavior. Meanwhile Mr. H explained the peer reflection guidelines in a manner that emphasized that the goal of peer feedback was to help fellow students improve their models and to be exposed to different ways of constructing evaporative cooling models to gain insights into refining one’s own model. These informational talks often built on ideas previously introduced during his demonstrations of the mechanics of each feature of SageModeler (or in the case of “using external data to verify model output,” presage the later demonstration), building a cohesive and synergistic narrative for these testing and debugging practices. Many of these talks took place alongside demonstrations of visual scaffolds, with Mr. H making use of multimodality in his teaching. Finally, Mr. H reinforced the mechanics and rationale behind each of the three targeted testing and debugging behaviors in his direct interactions with small groups, building on prior discussions and demonstrations, showcasing synergistic scaffolding over time. 131 Table 16: Mr. H’s Pedagogical Strategies for Supporting Students with Testing and Debugging Testing and Debugging Behavior Analyzing Model Output External Data Mr. H’s Pedagogical Strategies • Whole Class Demonstration of Behavior (Showing the simulation features) Demonstration of Technological Scaffolds Informational Talk on Rationale for Behavior • • • Direct Interactions with Small Groups to reinforce Mechanics and Rationale for Behavior. • Whole Class Demonstration of Behavior (Inputting Data into SageModeler; Comparing Model Output with External Data) • Demonstration of Technological Scaffolds • • Direct Interactions with Small Groups to reinforce Mechanics Informational Talk on Rationale for Behavior and Rationale for Behavior Peer Review • Whole Class Demonstration of Behavior (Whole Class Model Reviews) • Demonstration of Technological and Curricular Scaffolds • • Direct Interactions with Small Groups to reinforce Mechanics Informational Talk on Rationale for Behavior and Rationale for Behavior Research Question 2: How do these pedagogical strategies compare to those used by another teacher teaching the same secondary science unit? Based on my analysis of Mr. M’s Pedagogical Methods and Pedagogical Foci, there are differences and similarities in how Mr. M and Mr. H approached teaching the evaporative cooling unit (Table 17). Because Mr. M was absent on Nov 14 (Day 3) and Nov 28 (Day 6), I have removed those two dates from Mr. H’s data set to have a fair comparison of their Pedagogical Methods and Pedagogical Foci. With regards to pedagogical methods, Mr. H spent more class time sharing information through informational talks (16.8%) compared to Mr. M (7.7%). Additionally, Mr. M spent more class time having students partake in whole class discussions (25.2%) compared to Mr. H (15.2 %). As far as pedagogical foci, Mr. M spent more time discussing science content with students, particularly as they revised their computational models compared to Mr. H (Mr. M, 35.6%; Mr. H, 23.6%). For all other pedagogical categories used to compare Mr. H and Mr. M, there were no noteworthy differences between the two teachers. 132 Table 17: Comparison between Mr. H’s and Mr. M’s pedagogical methods and foci Note that both teachers’ times and percentages are based on 800 minutes of class time as I eliminated all data points for Mr. H from Days 3 and 6 (when Mr. M was absent) for a fair comparison. Category Mr. H Total Mr. H Percent Mr. M Total Mr. M Percent Info Sharing 134 Whole Class 121.75 Small Group 223 Comp Demos 24.5 Lab Demos 30.75 Videos BTS Classroom Housekeeping 7.25 50.5 193 Science Content 189 54.75 55 75.5 Relationship Building Technology Utility Sagemodeler Utility Tech Troubleshooting 16.8 15.2 27.9 3.1 3.8 0.9 6.3 24.1 23.6 6.4 6.9 9.4 61.5 201.25 278.25 10 18.75 4.5 50.5 166.5 284.5 57.75 38 48.5 7.7 25.2 34.8 1.3 2.3 0.6 6.3 20.8 35.6 7.2 4.75 6.1 6.5 87.25 10.9 52 CS Modeling 241 30.1 222.5 27.8 Analyzing Model Output 70.5 External Data 58.5 8.8 7.3 Peer Review 94.25 11.8 7.8 6.7 9.8 62.5 53.25 78.75 133 Although Mr. H and Mr. M spent a similar amount of time addressing CS Modeling, there are some key differences in the amount of time they spent supporting students with the three targeted testing and debugging behaviors in this unit (Table 18). Mr. H spent about the same amount of class time assisting students with analyzing model output (8.1%) as did Mr. M (7.8%) throughout the whole unit. However, Mr. H seems to have spent much more time supporting students in analyzing model output earlier on in the unit compared to Mr. M when he was focusing on pushing students to use the simulation features of SageModeler to help them identify flaws in their models through his “fail faster” approach. Additionally, Mr. H spent more time making connections between model output analysis and analyzing external data than Mr. M on Dec 15 (Day 11) when introducing students to inputting external data into SageModeler, whereas Mr. M spent more time having students use the simulation features to make sense of more complex model structures on Dec 19 (Day 12) as part of the final whole class model review. Mr. H and Mr. M spent roughly equal class time supporting students with analyzing external data (6.6% and 6.7%, respectively). While Mr. H did present the rationale for this behavior a bit earlier than Mr. M, both primarily taught students how to input external data into SageModeler to verify their models on Dec 15 (Day 11), the day after students collected data from the temperature vs. time experiment. Although Mr. H and Mr. M, in general, followed a common schedule for the unit and spent roughly equal time supporting students by using peer feedback (10.3% and 9.8 %, respectively), there are noticeable differences in when they conducted whole class model reviews. Because Mr. M was absent on Nov 14 (Day 3), he conducted a whole class model review on Nov 17 (Day 4) to help ensure that all students had a common backbone for their models. Likewise, Mr. H conducted a whole class model review on Dec 1 (Day 7), while Mr. M allowed extra time for students to work on revising their models and give feedback to their peers. Finally, Mr. M had a short whole class model review on Dec 8 (Day 9), prior to students conducting the temperature vs. time experiment on Dec 12 (Day 10) whereas Mr. H had his whole class review after students finished collecting data on Dec 12 (Day 10). 134 Table 18: Mr. H vs. Mr. M Pedagogy for Testing and Debugging Behaviors Please note that for this table, Mr. H’s time is calculated out of 960 total minutes whereas Mr. M’s time is calculated out of 800 minutes to account for Mr. M’s two absences. Mr. H Analyzing Model Output 0 Mr. M Analyzing Model Output 0 0 4 10.5 17 3.25 4.5 0 3.75 9 20.25 5.5 78 8.1 0.5 Abs 6.25 12.5 Abs 6 0 6.25 0 13 18 62.5 7.8 Nov 7 Day 1 Nov 10 Day 2 Nov 14 Day 3 Nov 17 Day 4 Nov 21 Day 5 Nov 28 Day 6 Dec 1 Day 7 Dec 5 Day 8 Dec 8 Day 9 Dec 12 Day 10 Dec 15 Day 11 Dec 19 Day 12 Total Percent Mr. H External Data Mr. M External Data Mr. H Peer Review Mr. M Peer Review 0 0 4.5 0 0 0 0 2.5 4.5 17.75 30 3.75 63 6.6 0 0 Abs 2.75 0 Abs 0 0 1.5 8.75 40 0.25 0 0 0 9 0 0 Abs 16.75 13 19.5 4.75 Abs 29.75 7.5 0 3.5 0 10.75 17.75 0 0 1.25 21.25 23 53.25 6.7 99 10.3 78.75 9.8 Analyzing Model Output Mr. M’s overall approach to supporting students with the testing and debugging behavior of analyzing mode output was different from Mr. H’s pedagogical strategies. It is important to note that both Mr. H and Mr. M had their students complete a one lesson introduction to SageModeler one week before the start of this unit on Nov 3 (Day 0), which did include a brief introduction of the simulate features present in SageModeler. Mr. M did not provide students with an informational talk on the rationale for analyzing 135 model output as a means of expediting the model revision process in the same manner as Mr. H’s “fail faster” informational talk during the early part of this unit. The closest Mr. M came to addressing a rationale for analyzing model output comes when he contrasts diagrammatic models with computational systems modeling. “And they (diagrammatic models) are really useful for us being able to see our understanding of the situation, but they are limited as there is no feedback within the model. There is no simulation within the model to show you whether or not your model is accurately representing something. It also doesn’t help you build understanding of how relationships fit together (Relevant Categories: Information Sharing, CS Modeling, Analyzing Model Output, Systems Thinking).” Here Mr. M listed the limitations of diagrammatic modeling from a system thinking perspective (lack of feedback structures, obscuring relationships between different variables in the system) and a testing and debugging lens. In particular, he remarked on how the absence of simulation features is a hinderance of diagrammatic modeling. While this informational talk did not directly endorse using simulation features to analyze model output, it suggested that the students should be using the simulation features present in SageModeler to see if their model is accurately representing the phenomenon. As with his presentation of an indirect rationale for analyzing model output, Mr. M provided whole class instruction on the mechanics of the simulation features of Sagemodeler that allow students to analyze model output that was imbedded in a discussion on aspects of systems thinking. During his first whole class model review on Nov. 17th (Day 4), Mr. M asked students to consider the structural meaning of “sliders” (the sliders on SageModeler variables that allow student to manipulate the relative amount of each initial variable in SageModeler). Mr. M: This is a dynamic model. In a dynamic model, do we set initial conditions by changing whole variables that way or do we set initial conditions by changing the sliders? Student A: Changing the Sliders: Mr. M: Yes, because all these little boxes (the model output boxes present in collector variables in SageModeler) are graphs overtime (show how the collector variables are changing overtime) 136 so in this situation it stays the same because there are no liquid particles to start off with. And so, we can’t have a variable that is stating an initial condition because the variables are changing throughout. (Relevant Categories: Whole class discourse, SageModeler Utility, Analyzing Model Output, Peer Review, Systems Thinking). In this discussion, Mr. M pointed out that the sliders represent the relative amount of each initial variable and therefore negate the need to have an additional variable as an initial starting condition. As such Mr. M indirectly communicated that the sliders allow students to manipulate the relative amount of each variable when they are analyzing model output. Mr. M also stated that “these little boxes are graphs over time, showing students how to look at model output in their models without directly addressing the rationale for analyzing model output. As the conversation continues, Mr. M discussed that students need to push the simulate button to access the slider bars by saying “I can give it a slider by pushing simulate” but never directly showed the whole class the mechanics of analyzing model output with the simulate features. This indirect approach to presenting both the mechanics behind analyzing model output strongly contrasts with Mr. H’s approach to walking students step by step through this process. While this indirect approach does not clearly point out the existing technological scaffolds to the same extent as Mr. H’s demonstration, Mr. M is supporting students with systems thinking through these discussions and demonstrations, thus creating some synergy between his scaffolding for analyzing model output and systems thinking. Additionally, the combination of visual and verbal components in these demonstrations shows a multimodal approach to supporting students in this practice, which parallels Mr. H’s pedagogical methods. Although Mr. M adopted an indirect approach towards communicating the rationale and mechanics for analyzing model output via the simulation features of SageModeler, he did support students in this testing and debugging practice through direct conversations with small groups. Early on, Mr. M directly told a small group that they need to “push the simulate button on the top” to analyze model output. In this way, Mr. M reinforced the procedural skills and technological scaffolds needed to generate model output in SageModeler in more direct manner than in his whole class discussions. In a 137 parallel to his whole class discussion, Mr. M had a discussion with students about the mechanics of analyzing model output based on the absence of the slider bar from a variable the students want to manipulate. Mr. M: Oh you want to know why it doesn’t have a slider. Well what is true about all of the ones that do have sliders and how they are connected? Student B: Nothing is going into them, only outgoing stuff? Mr. M: Because if there are inputs going in and outputs going out you might not get to set the amounts. (Relevant Categories: Small Groups, SageModeler Utility, CS Modeling, Analyzing Model Output, Systems Thinking). Here Mr. M assisted students with analyzing the output of their model by pointing out the structural flaws that prevent them from manipulating model input. In both instances, Mr. M used conversations with small groups to have more direct conversations about how to generate and interpret model output using SageModeler, better supporting them with analyzing and interpreting model output. This parallels Mr. H’s approach to supporting individual student small groups with Analyzing Model Behavior during the model revision process. It is also important to note that many of these small group conversations strongly parallel Mr. M’s earlier whole class conversation around the slider bars, therefore showing how these small group conversations reinforced earlier ideas addressed through informational talks and whole class discussions. Mr. M also tended to use the analysis of model output to prompt further discussions on specific structural issues in student models, such as the presence of unnecessary starter variables. This suggests that Mr. M aimed to have students use the testing and debugging behavior of analyzing and interpreting model output to start conversations on model structure and system thinking that were not as clear in Mr. H’s “fail faster” approach to this behavior. As such, Mr. M seems to be creating a sense of coherence between his scaffolds for analyzing model output and systems thinking in ways that differ from Mr. H’s approach. 138 External Data Mr. M had a similar approach to Mr. H for supporting students with analyzing external data to validate model output. Given the structure of the unit, both teachers tended to wait until the later part of the unit to begin introducing this aspect of testing and debugging to their students, in line with the occurrence of the major quantitative experiment (temperature vs. time) for this unit. As with Mr. H, Mr. M began laying out the rationale for data collection and its ultimate use for verifying student models on Dec 8 (Day 9), the class before the temperature vs. time experiment. Mr. M: And it is kind of important for us today to make sure we get our models working correctly with the flows of energy and the impact it has on things like temperature because next time we are going to be in lab and we are going to be quantifying the evaporation rates of these different substances and we are going to be overlaying it into our models and we are going to see how well our models actually work. (Relevant Categories: Information sharing, Classroom Housekeeping, CS Modeling, External Data). In this brief informational talk, Mr. M encourages the students to revise their models in preparation for the temperature vs. time experiment. In discussing the need to revise prior to the experiment, Mr. M emphasized the importance of having the best possible model possible prior to using external data for model validation, thereby elevating the value of the external data. Mr. M also conveyed that by overlaying the experimental data on top of student models, it will allow students to see how well their models reflect the real-world phenomenon. While not explicit in this informational talk, Mr. M heavily implied that external data is of higher value than student model output and if their model output does not match their external data, more revisions will be needed. In a discussion with a small group on Dec 12 (Day 10), after students had collected experimental data, but before they have input the data into SageModeler, Mr. M asked the students to consider making additional revisions prior to the next class when they will be more explicitly comparing model output to external data. 139 Do you think that there is anything else that you can add that will make it better line up with the experiment? If the answer to that is no, you should be fine for now. If the answer is yes, you should make those changes. (Relevant Categories: Small group, Classroom Housekeeping, CS Modeling, External Data). In this conversation, Mr. M told the students that their models should reflect real-world data and “line up with the experiment.” He then suggested that in preparation for using the quantitative data from the experiment to validate their models (which will happen in the following class session) that the students make changes to their models to better match their preliminary understanding of their experimental results. In doing so, Mr. M further emphasized the importance and rationale for the upcoming task of using external data to validate model output. Mr. M’s approach to introducing the students to the mechanics of inputting external data into SageModeler along with how to compare their model output data to the experimental data largely mirrored Mr. H’s approach, as both were influenced by the PLC. Mr. M began by having the students follow along as he demonstrates where to find the data tables prebuilt into the program for this unit. He then asked a student to share the general trend of their temperature vs. time data for acetone. Mr. M: Can somebody walk me through what the graph for acetone did? Student C: It went down sharply and then bounced back up and then went kind of straight. (Relevant Categories: Whole Class Discussion, Science Content, CS Modeling, External Data) Mr. M used this moment to point out that the “bounce back” happens after the acetone has completely evaporated and the thermometer returns to the temperature of the room. He goes on to state, “If you only have a certain amount of time for the acetone, you only need to go about as far for the other two (liquids),” thus informing the students that they only need to include enough data points to match the time for the low point of acetone (when all the acetone has evaporated) as any additional data would be superfluous. Mr. M followed this up with a multimodal presentation showing students how to input their experimental data into SageModeler using “dummy data.” 140 So I am going to go ahead and make some dummy data. So dummy data is what we use if we want to see how something works but we don’t want to bias our data by messing around with it that can cause us to make inappropriate conclusions off of it. (Relevant Categories: Computer Demonstration, SageModeler Utility CS Modeling, External Data). This use of “dummy data” differed from Mr. H’s use of actual student data to demonstrate how to input external data into SageModeler, but it did seem to encourage students not to draw “inappropriate conclusions” off this demonstration and instead focus on their own experimental data when they return to their own models. Mr. M then showed the students how to create graphs of both the external data and model output data and how to make comparisons between them, demonstrating the technological scaffolds necessary to use external data to validate model output data. I can change the background color to transparent and I can actually overlay it and you can see through. Now I can compare the data and ask, “Do these two match?” And if the answer is yes, you can say this is all good. But if not, our model maybe doesn’t work as well as we think it does and we need to make some adjustments to it. (Relevant Categories: Computer Demonstration, SageModeler Utility, CS Modeling, Analyzing Model Output, External Data). Mr. M used this opportunity to reinforce the rationale behind analyzing external data by stating that the aim of this exercise is for student model output to match experimental data. As with Mr. H, Mr. M further advocated that students needed to make changes to their models so that they would match experimental data, thereby giving students a clear rationale for using external data to validate model output as a testing and debugging behavior. This discussion thus reinforced earlier talking points from Mr. M on the importance of this testing and debugging practice and therefore is evidence of synergy between Mr. M’s different supports for this practice. After demonstrating how to use external data to validate model output, Mr. M set out to help individual student groups that needed further assistance with this aspect of testing and debugging. Once a student group had finished inputting their experimental data into SageModeler they asked Mr. M to come over to validate their progress. 141 Student D: Does this look okay? Mr. M: Don’t show me the table, show me the graph. The graph is the important bit. You already recorded your data right? Student D: Yes. Mr. M: Ok. Now make a separate graph. Now drag the variables from the graph on to wherever you want. Now you can spread out the graph. That might be good enough because you can tell which one is which. (Relevant Categories: Small Groups, SageModeler Utility, CS Modeling, External Data). In this example, Mr. M assisted this student group by walking them through the process of making two sets of graphs (one for external data, the other for model output data) to allow for a side-by-side comparison of model output with external data. By helping students with using the technological scaffolds present in SageModeler, Mr. M reinforced his earlier demonstration of these features. Later, Mr. M returned to this same group to help them with making revisions based off their data analysis. Once students recognized that their model output shows the hand temperature staying constant even though their experimental data shows a decrease in temperature due to evaporation, Mr. M suggested that they revise that part of their model. Mr. M: But should your hand temperature stay the same? Student D: No. Mr. M: So, through this data investigation, we have discovered that your model needs a revision in the hand temperature department. Student E: Yep. (Relevant Categories: Small Group, CS Modeling, Analyzing Model Output, External Data, Model Components). In this example, Mr. M reinforced the main rationale for analyzing external data (to revise models so that the models match experimental data and thus better represent real-world phenomenon) and encouraged the students to make further revisions to their models accordingly. 142 In another conversation with a student group, Mr. M reiterated that part of the purpose of analyzing external data is for students to compare their models with experimental data. Student F: When you did the simulation what were your collectors named? Mr. M: It should be your model. You shouldn’t be using my model because my model wasn’t good and didn’t really reflect it (the phenomenon). The whole point is to find out if your model matches your data. (Relevant Categories: Small group, CS Modeling, External Data). In this example, the student appeared to be trying to change his model to match that shown by Mr. M during the whole class demonstration. To help the student refocus on using their experimental data to improve their own model, Mr. M stated that the model he showed on the board was flawed and that the student should not replicate it. This enabled the student to refocus on the task and continue to revise their own model rather than recreating a different flawed model. In addition to using small group conversations to assist students with the mechanics and rationale behind analyzing external data to validate model output, Mr. M also took opportunities to help further student understanding of the underlying science content they are attempting to model. When a student asked Mr. M why it was important to include “time zero” when inputting his data into SageModeler, Mr. M gave an explanation that also references some key ideas from the phenomenon itself. Time zero? Time zero tends to be important because you want them to all start at the same point. They should all start at the same temperature. Also, the biggest drop tends to be at the beginning, when the liquid is evaporating fastest, so if it is absent, you are missing an important part of the phenomenon. (Relevant Categories: SageModeler Utility, Science Content, CS Modeling, External Data). Here Mr. M explained that from an experimental standpoint, time zero is a critical moment as it is the point where all three liquids are at room temperature. He also stated that because the liquid is evaporating fastest at the beginning of the experiment and that this drop is “an important part of the phenomenon,” the student needed to include it when inputting data into SageModeler. By suggesting that the student include 143 “time zero” in the data set they are inputting into their model, Mr. M helped them have more accurate external data to validate their model output. Additionally, by discussing the importance of the “big drop” Mr. M highlighted the exponential decrease in temperature that is a key part of the phenomenon of evaporative cooling, potentially furthering that student’s science knowledge. As such, Mr. M used these small group conversations as opportunities to simultaneously address using external data to validate model output and key science content associated with evaporative cooling. Peer Feedback Mr. M’s approach to supporting students by using peer feedback strongly mirrored Mr. H’s approach and paralleled discussions from the PLC. On Nov 17 (Day 4), Mr. M had his first whole class model review session with his students. As with Mr. H, Mr. M used these whole-class model review sessions to show the perspective and positionality students should take when reviewing peer models and the questions they should ask each other. At the beginning of this process, Mr. M illustrated the overall disposition students should have towards the model review session and peer feedback in general. When we look at the models, we are going to look at them in an anonymous way. You don’t claim one as yours. And we are going to be talking about what the model is showing, strengths of the model things that can be improved, things that are missing. And we are going to try to do that in a constructive way. And when we do that, remember that it is worth writing down some ideas and if there is one group that did something that you want to incorporate that you write that down too. We aren’t just analyzing one group’s model just for them. It is for all of us together. (Relevant Categories: Information Sharing, CS Modeling, Peer Feedback). Here Mr. M pointed out that the goal of model reviews is not only to suggest improvements to these models, but to identify strengths that can be used to further improve one’s own model. By emphasizing the importance of learning from other models as a key aspect of the peer feedback process, Mr. M built a deeper rationale for using peer feedback and encouraging students to see all models as potential inspirations for further improving their own models. These key points strongly parallel the informational 144 talk given by Mr. H where he encouraged students to use the peer feedback process as an opportunity to gather ideas from other groups and explore new ways to represent key aspects of the phenomenon in their models. In addition to building a rationale for peer reviews, Mr. M showcases the types of questions students should ask during future peer feedback sessions through these whole class model reviews. For example, Mr. M asked the students to identify the strengths of a peer model. Mr. M: Is there anything else that is strong about this model. Student B: It is easy to read. Mr. M: It is easy to read, why is it easy to read? Student B: Because there are only four things we need to think about. Mr. M: Does just adding more stuff into your model necessarily make it better? Student C: No Mr. M. Exactly. We only want to include things that are actually impacting the phenomenon. (Relevant Categories: Whole Class, CS Modeling, Peer Review, Systems Thinking). Here Mr. M demonstrates the use of an open-ended question (What is strong about this model?) that can spark a deeper conversation about several aspects of the models (in this case, their efficient simplicity). He also shows the importance of follow-up questions in the model review process and uses this as an opportunity to discuss a key Systems Thinking principle (more complex representations of phenomena with a higher number of variables is not necessarily a better model). Paralleling Mr. H’s pedagogical strategies, Mr. M shared the peer reflection guidelines. These peer reflection guidelines were included in the evaporative cooling curriculum as a scaffold for reinforcing the main goals of peer review and for sharing some questions students can use during the peer review process. As such these guidelines were reviewed during the PLC and both Mr. H and Mr. M were highly encouraged to present these to their students as a means of scaffolding the peer review process. Mr. M began by projecting the peer review guidelines in front of the whole class. He then reviewed the three main goals of peer review as written in the peer review guidelines: helping other students improve 145 their models, preparing for whole class model reviews, and gaining insights from peer models to improve One’s own model. When you pair up, your goal is to help them refine their models so that their models actually match the phenomenon. You should familiarize yourself with how other people have been able to model these things so that you are ready for a whole class discussion. And you should see that are potentially multiple different ways to model this phenomenon. So, what we are doing is that you are not trying to tell them, ‘This is how you build your model.’ You are trying to give them tips for how they can build their model better. (Relevant Categories: Info Sharing, CS Modeling, Peer Feedback). As with Mr. H’s informational talk on the peer review guidelines (and the guidelines themselves), Mr. M also emphasized that peer feedback should not be centered on telling other students how to build their models or trying to make a peer’s model conform to your expectations of an ideal model. Instead, Mr. M, like Mr. H, advocated for students to offer suggestions and help to improve their peer’s models in a manner that preserved the unique strengths of the original model. Mr. M further supported students in giving generative and supportive feedback by sharing key examples of the sort of questions he wants students to ask each other during peer feedback. Instead of saying “You shouldn’t have this variable.” Or “You need this variable, it’s not included” Ask them questions like, “How do you include this variable?”, “How does this variable impact the rest of your system?” “What makes you think that this variable is necessary to include?” Ask them questions like that. Ask them about the shape of the graph. But don’t be like “You should do, this, this and this.” (Relevant Categories: Information Sharing, CS Modeling, Peer Review). In this part of his informational talk, Mr. M provided several strong examples of generative questions that students can ask during peer review and the types of questions/comments they should avoid. Mr. M advised the students to refrain from using judgmental language (i.e., “You shouldn’t have this variable”) and instead encouraged students to use generative questions that allow for further discussion. Although 146 these questions paralleled those found on the peer reflection guidelines, these example questions were independently crafted by Mr. M. These generative questions aimed to help students be less defensive about the peer feedback they received and be more likely to revise their models; therefore, these questions were meant to also benefit the students receiving feedback. Additionally, these generative questions (along with those embedded in the peer reflection guidelines) also were designed to allow students to share their reasoning behind their model design choices and, therefore, can facilitate deeper discourse between students. Through sharing these questions and the peer review guidelines, Mr. M aimed to help the students who gave feedback to ask more meaningful questions and to help the students who received the feedback have more meaningful conversations during the process. As with Mr. H, in addition to whole class instruction, Mr. M supported students by using peer feedback through his interactions with small groups as they were revising their models. Mr. M spent a substantial amount of time organizing students into group dyads so that they could give and receive feedback from other groups. During his routine check-ins with different student groups, he asked them if they were ready to share their models to receive and give feedback to another group. Mr. M So, you are ready to a share out your models? Student A: We are sharing out our models? Mr. M: With another group. Right. Student F: Sure? Mr. M: Or not are you not ready for it? Student A: I mean I am technically ready. So yes, we are ready. (Relevant Categories: Small Groups, Classroom Housekeeping, CS Modeling, Peer Feedback). Upon recognizing that these students were ready to receive peer feedback, Mr. M helped arrange another group to meet with them for peer review. Mr. M: Hey Student G and Student H would you be willing to come look at Student A’s model because there is an odd number of groups? (Relevant Categories: Small Groups, Classroom Housekeeping, CS Modeling, Peer Feedback). 147 As such, Mr. M created an environment where student groups could meet to provide peer feedback as an aspect of testing and debugging. Beyond facilitating peer feedback between student groups, Mr. M also helped students with interpreting the models of their peers, another key goal of peer feedback. In this example, as the students are initially reviewing a peer’s model (prior to a more in-depth conversation with this other group), Mr. M asked the students if the model makes sense to them. Mr. M: Does their model make sense? Student C: Yeah, but there’s this part. Mr. M: Are there problems with it? Student D: We had a question on the size of liquid droplets. We didn’t see how it affects the rate of evaporation. Mr. M: Go ask them. (Relevant Categories: Small Groups, CS Modeling, Peer Feedback, Model Components). Here the students recognized an aspect of the other group’s model that they question. However, the students were a bit uncertain as to whether they should immediately ask the other group to explain their reasoning behind the relationship between the size of liquid droplets and the amount of liquid particles (which in turn undergo evaporation) or wait until the other group is done reviewing their model. To encourage further group discourse, Mr. M went ahead and asked the other group to explain their reasoning about this relationship. Mr. M: Hey Student B. What’s the deal with the droplet size? Student B: The droplet should set the initial value of the amount of liquid particles. Mr. M: They are saying that the droplet size determines the initial value. (Relevant Categories: Small Groups, CS Modeling, Peer Feedback, Model Components). By asking Student B to explain his reasoning behind this relationship, Mr. M modeled the sort of questions this student group should be asking their peers during the peer feedback process. It also had the immediate impact of providing the students with an interpretation of this model structure/component. The 148 students were then able to offer better feedback to their peers and have a deeper conversation around the relationship between the size of liquid droplets and the rate of evaporation. Summary There are many differences and similarities and how Mr. H and Mr. M approached teaching the evaporative cooling unit and in how they supported students with the three targeted aspects of testing and debugging. Due to differences in class sizes, Mr. H spent more time on informational talks and less time on whole class and small group discussions compared to Mr. M. In contrast, Mr. M found more opportunities to have small group discussions and have more conversations about science content compared to Mr. H. While both teachers spent about the same amount of class time on all three aspects of testing and debugging, Mr. H provided a more explicit rationale for analyzing model output in his “fail faster” informational talk compared to Mr. M. Likewise, Mr. M gave students more time to analyzing model output in his final round of whole class model review than Mr. H. As for analyzing external data for validating model output and using peer feedback, Mr. M’s pedagogical strategies largely align with those used by Mr. H with only modest differences in their approaches to supporting students with these two behaviors. Both Mr. H and Mr. M showed synergy in how they supported students with testing and debugging. Mr. H and Mr. M both used informational talks and whole class discussions to highlight existing curricular and technological scaffolds, often using multimodal presentations to do so. Additionally, both teachers often used small group conversations to reinforce ideas previously covered in a whole class environment. While they both found opportunities to cover multiple learning goals within a unified context (e.g., addressing both analyzing model output and using feedback through the practice of whole class model critiques), Mr. M was more likely to directly incorporate science content related to evaporative cooling and systems thinking in his efforts to support students with testing and debugging practices. 149 Research Question 3: What pedagogical strategies correlate with student testing and debugging behaviors in this secondary science unit? Based on student screencasts, evidence shows students utilizing all three target aspects of testing and debugging (i.e., analyzing model output, analyzing and using external data, and using feedback and peer reviews) (Table 19). Once again, it is important to note that because very few of Mr. M’s students agreed to be screencasted, we are unable to include student data from his class to address this research question. While these student behaviors in general seem to parallel Mr. H’s instructional patterns, with student’s using specific approaches to testing and debugging soon after it is discussed by Mr. H, this is not a universal pattern amongst the five screencast groups. It is also important to note how the general course of the unit also impacted student testing and debugging. For example, students for the most part did not begin building their models in SageModeler until November 14th (Day 3), were focused on completing the learning modules on November 17th (Day 4) and December 5th (Day 8) and spent most of December 12th (Day 10) collecting experimental data outside of SageModeler. While these tasks were important for creating a context for later testing and debugging and the overall goals of the unit, the students were largely unable to test and debug their models on these dates. Given the complexity of these data, I have combined my semi-quantitative analysis with a narrative analysis to further illustrate the correlation between Mr. H’s pedagogical strategies and student testing and debugging behaviors for each of the three target aspects of testing and debugging. 150 Table 19: Summary Table of Student Testing and Debugging Behaviors Note that data from all five screencast groups has been aggregated in this table. The percentages for this table are calculated out of 4,800 minutes (960 total minutes of class time multiplied by five groups). For Mr. H, we used the amount of time Mr. H spent supporting students with the respective behavior. Nov 7(1) 0 Nov 10 (2) 0 Nov 14(3) Nov 17(4) Nov 21(5) Nov 28(6) Dec 1(7) Dec 5(8) Dec 8(9) 4 0.5 46.75 13.5 14.25 0 17. 25 Dec 12 (10) 0.5 0 0 4 10.5 17 3.25 4.5 0 3.75 9 0 0 0.25 0 6.5 1.25 0 0 2 0 Dec 15 (11) 12. 75 20. 25 86. 75 Dec 19 (12) 15.5 Total % 125 2.6 5.5 78 8.1 115 2.4 18.2 5 0 0 4.5 0 0 0 0 2.5 4.5 17. 75 30 3.75 63 6.6 0 0 10.5 0 33.25 3 13.75 0 6.5 0 4.25 5.5 76.75 1.6 0 0 0 9 12 4.75 29.75 0 3.5 0 17. 75 21.2 5 99 10.3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.25 0.5 20 9.25 4.5 1.5 2.25 0 0.25 0 0 0 0 0 0 0 0 0 0 12.25 2.25 2.5 12.5 1.5 3.75 2 0.5 3.5 2.5 1.25 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7.5 0.5 5 60.25 1.3 12.7 5 5.25 0 6.25 1.5 31.5 0.7 4.5 0 0 1.25 25.75 0.5 1.25 0 1.5 0 8.75 0.2 0.5 0 0 1 5.5 0.1 1.5 0 0 0 0 0 76. 75 10 0 9.25 91.5 1.9 8 0 18 0.4 0 0 Category Student Analyzing Model Output Mr. H Analyzing Model Output Student Analyzing External Data Mr. H Analyzing External Data Student Feedback and Peer Review Mr. H Feedback and Peer Review Analyzing Model Output L1 Analyzing Model Output L2 Analyzing Model Output L3 Analyzing Model Output L4 Analyzing External Data L1 Analyzing External Data L2 Analyzing External Data L3 Analyzing External Data L4 151 Table 19 (cont’d) Nov 10 (2) 0 0 0 0 Category Nov 7(1) 0 0 0 0 Feedback and Peer Review L1 Feedback and Peer Review L2 Feedback and Peer Review L3 Feedback and Peer Review L4 Nov 14(3) Nov 17(4) Nov 21(5) Nov 28(6) Dec 1(7) Dec 5(8) Dec 8(9) Dec 12(10) 0.25 0 12 0.5 2 0 0.75 0 Dec 15 (11) 2.5 Dec 19 (12) 1.5 Total % 19.5 0.4 10.25 0 6 2.5 9.5 0 5.25 0 0 4 37.5 0.8 0 0 0 0 6.75 0 2.25 0 0.5 0 1.75 0 11.25 0.2 8.5 0 0 0 0 0 0 0 8.5 0.2 Analyzing Model Output Although Mr. H briefly showed students how to use the simulation feature of SageModeler on November 17th (Day 4), the students generally did not use these simulation features to analyze model practice prior to November 21st (Day 5). In total, only 4.5 minutes were spent analyzing model output across all five groups before November 21st (Day 5). One student group (Group 1) stumbled across the simulation features on November 14th (Day 4; Figure 21). While they briefly used the simulation features to make sense of their model, correctly interpreting that their evaporation rate was constant, they did not make subsequent changes based on this analysis. Many of the other students were actively focused on constructing their initial models or learning about IMF through the learning modules and therefore largely ignored the simulation features on November 14th (Day 3) and 17th (Day 4). 152 Figure 21: Student use of Simulate Features On November 21st (Day 5), Mr. H discussed the importance of analyzing model output via the simulation feature in his “fail faster” informational talk. This informational talk and his subsequent conversations with small groups seems to have been a major catalyst for getting students to begin regularly using the simulate feature of SageModeler to analyze model output. On November 21st (Day 5), the five groups collectively spent 46.75 minutes analyzing and interpreting model output through the simulation feature. While most subsequent days showed a more modest use of this practice (13 to 16 minutes a day, with December 5th (Day 8) and 12th (Day 10) being an exception due to being content focused and experimental focused days respectively), students remained consistent in their use of this testing and debugging strategy for the remainder of the unit. Analyzing External Data There were few examples of students using external data to directly drive model revisions early in the unit, with only 10 minutes of this practice across all five groups prior to December 15th (Day 11). This corresponds with the little direct support for this practice provided by the curriculum and Mr. H at the beginning of the unit along with it being a more advanced aspect of testing and debugging more commonly used towards the end of a unit. However, the general principle that their models should reflect their real-world experiences (which Mr. H did reference throughout the early part of this unit) is reflected in some student conversations justifying certain parts of their models. During a peer review on November 153 21 (Day 5), Isabelle uses her experiences with the initial phenomenon (comparing how water, rubbing alcohol, and acetone feel as they evaporate off human skin) as external data to justify her model’s relationship between evaporation speed and temperature felt on hand. Isabelle: So I used “about the same” for the evaporation speed and temperature on the hand because, if it evaporates faster like the alcohol and the acetone, the temperature felt really cold. But with the water, it evaporated slower but it didn’t feel that much colder. So I just did, I changed it to “about the same” because it evaporates at about the same rate if that made sense. (Using External Data Student Level 2) Here we can see the students apply the general principle that their models should reflect real world data to the qualitative data they collected as Isabelle uses her experiences with evaporative cooling to justify this relationship in her model. Once students have collected external data, they spent a substantial amount of class time inputting these data into SageModeler. As the average student spent between 10 and 25 minutes inputting the experimental data from the temperature vs. time lab into SageModeler on December 15th (Day 11), the total class time spent analyzing external data was 86.75 minutes for all five groups on this date. While this does correspond with Mr. H’s efforts to get students to compare their models to real world data, little time was spent using said data to make meaningful comparisons with model output. Because quantitative data comparison with model output is directly equivalent to level 3 behavior for this practice, sufficient evidence shows that students spent a total of 18 minutes across December 15th and 19th (Day 11 and Day 12 respectfully) meaningfully using these external data to validate their CS models. After inputting the experimental temperature vs. time data into SageModeler, Morty and Isabelle try to line up their model output data (in blue) with their experimental data (three lines in orange) (Figure 22). Isabelle: You might want to try lining it up, but that is not even close. Morty: It looks close to me. Isabelle: It’s not lined up Morty. 154 Morty: (after spending some time trying to line the two graphs up: It still isn’t matching any of them. Maybe it matches water. . .. See it is not horrible. Isabelle: Yeah, but we can fix this (points to their model). (Using External Data Student Level 3) Figure 22: Student comparison of experimental and model data By comparing their model output with real world data (as Mr. H had expressly requested of them), Morty and Isabelle figured out that the linear nature of their model’s temperature vs. time graph did not line up with the exponential graph present in their experimental data and therefore were encouraged to make further revisions to their CS model. Peer Review Although Mr. H did not address peer review of SageModeler Models until November 17th (Day 4), with an in-depth discussion on November 21st (Day 5) there is evidence of students receiving and utilizing meaningful feedback from their peers as early as November 14th (Day 3), the first day of building their SageModeler Models. Tiffany and Anna had an opportunity to look at the diagrammatic model of Carter, Sam, and Fred. From this experience, Tiffany and Anna concluded that the water particles in liquid form were moving slower than the water particles that had evaporated (note this is not an accurate representation of the phenomenon). Thus, when they were writing down a justification for the transfer 155 relationship from number of liquid particles to number of gas particles they wrote, “because of another person’s drawing, it showed that the water particles were going slowly versus the evaporation speed, which was significantly faster.” (Using Peer Feedback, Level 2) Likewise, Carter, Sam, and Fred also borrowed from other student models in the initial construction of their own SageModeler model. When making a justification for their transfer relationship from number of liquid particles to number of gas particles Carter said, “I am going to write that down in here and do what he (Morty) said.” James subsequently wrote down, “There is an increase of speed of the particles which changes the liquid particles to gas particles.” Using Peer Feedback, Level 2) Both examples show that students were already cooperating and borrowing ideas from each other without formal direction from Mr. H in this unit. Once Mr. H provided students with more explicit scaffolding in the form of peer review guidelines on November 21st (Day 5) and organized the students into peer review dyads and triads, the amount of time spent in peer review increased substantially to 33.25 minutes. Student screencasts also show that students were explicitly using the peer review prompts to scaffold their analysis of these models, such as when Morty and Isabelle were looking at Reese and Eric’s model (Figure 23). Isabelle: (reading from the peer review prompts) ‘So I was wondering why you included blank. How does that help explain the phenomenon?’ Do we have any questions as to why he included something? Morty: Why does the IMF decrease? Isabelle: Oh, fix this (She moves the sliders, so they are even and then she continues to move the IMF slider) It doesn’t really change it a lot though. Morty: Temperature does though. Isabelle (to Eric): I just was wondering why the IMF of the liquid doesn’t change it much, but it might not have much to do with it. Eric: Well, I have temperature to be exponentially increasing. So, the lower down the temp is, the less impact the IMF will have. 156 (Analyzing Model Output, Student Level 3, Using Peer Feedback, Student Level 3) In this conversation, Isabelle is directly using the peer review guidelines that Mr. H shared with them to help them identify an area of Reese and Eric’s model that they think needs further discussion. They then utilize the simulation features of SageModeler to test the model behavior before sharing their conclusion that the IMF “doesn’t really change it (evaporation rate) a lot.” Eric is then able to defend their model by explaining that temperature is having a more dramatic effect than IMF, meaning that it is hard to see the impact of IMF on evaporation when temperature is low. Note that while temperature does impact the rate of evaporation (with a higher initial temperature leading to a higher rate of evaporation), Reese and Eric’s model is missing a critical feedback loop showing how temperature is decreasing as the liquid evaporates. Figure 23: Reese and Eric’s pre-peer review model Later in this model review, Isabelle and Morty recommend that Eric include an evaporation speed variable so that they can model how evaporation is affecting the temperature felt on hand (to answer the driving question more directly). After redesigning their model Eric shares their revisions with Isabelle and Morty (Figure 24). Eric: The faster it evaporates, the colder it feels. Yeah. I just completely copy pasted the things that affected evaporation (temperature and IMF) and just entirely made an evaporation speed on 157 its own. Completely not touching anything because I tried it the other way (having evaporation speed come out of the evaporation rate) and completely messed it up. Isabelle: That is kind of what we did too. Instead of making the valve our evaporation we made evaporation its own box and then connected that back into the valve. (Using Peer Feedback, Student Level 4). This conversation demonstrates how the students were able to use the peer review process to make meaningful changes to their models. It also shows students sharing their ideas with each other in an iterative manner as Eric took Morty and Isabelle’s suggestions, made changes to their model and then discussed those revisions with Morty and Isabelle to complete the revision and feedback cycle. Figure 24: Reese and Eric’s Post-Peer Review Model Although Mr. H did intend for students to engage in peer review towards the end of the unit, students only collectively spent 9.75 minutes of the last three days of class sharing, receiving, or utilizing peer feedback to drive model revisions. The whole class model reviews during these last few class periods did take up a significant amount of class time. While they theoretically could have helped scaffold later 158 peer review discussions, the upcoming winter break set a hard deadline on this unit. As such, Mr. H was unable to add an extra day or two to allow for a final round of peer review and peer feedback. Instead, student efforts during these last days were focused on inputting experimental data from the temperature vs. time experiment and on revising their models to better match these experimental data. Overall, these data demonstrate a correlation between student testing and debugging behaviors and both teacher pedagogical moves and trends in the broader unit. For example, students seldom used the simulation to analyze and interpret model output prior to being formally introduced to these features by Mr. H. After his subsequent “fail faster” informational talk, where he provided a clear rationale of the benefits of using the simulation features, students were far more likely to test their model output using this built in tool. As far as analyzing external data, student’s behaviors tended to mirror the general course of the unit, which was reinforced by informational talks and demonstrations provided by Mr. H. Because both the unit and Mr. H did not emphasize using external data until around the time of the Temperature vs. Time experiment (Dec 12, Day 10), students largely did not show evidence of this behavior until Dec 15th (Day 11) and Dec 17th (Day 12). Students also showed evidence of peer review early in the unit, prior to a formalized introduction by either the unit or Mr. H. Such early evidence of peer review suggests that Mr. H’s general classroom management style encouraged students to share ideas across groups and that students have previously engaged in collaborative projects at Faraday High School. After being provided with explicit scaffolding in the form of the peer review guidelines disseminated by Mr. H, students were both more likely to engage in peer review and use the peer review guidelines to ask more meaningful questions to their peers. As such, it appears that the pedagogical support for peer review provided by Mr. H reinforced and enhanced the quality of student peer review in this unit. Discussion and Conclusion Discussion The findings show several key strategies that teachers used to help support students with testing and debugging and how these strategies compare to those discussed in previous literature (Table 20). Mr. H’s “fail faster” talk emphasized the importance of continuously testing model output using the 159 simulation feature imbedded in SageModeler. This philosophy of frequent model testing strongly parallels the “Compile, Run, Compare” strategy described by Michaeli & Romeike (2019) for testing and debugging text-based programs. Encouraging students to frequently use the simulation features also aligns well with similar suggestions from Basu and colleagues (2016). As the “Compile, Run, Compare” strategy emphasizes the importance of comparing model output with external data, much of Mr. H and Mr. M’s efforts to encourage students to use external data to validate model output, align well with this earlier work (Michaeli & Romeike, 2019). Additionally, Mr. H and Mr. M’s informational talks on the importance of using external data to validate model output, along with their conversations with individual small groups have strong parallels to many of the “back pocket” questions proposed by Windschitl and colleagues (2020). Although neither study provides detailed insights on how to support peer review, both Pierson and colleagues (2017) and Chmiel & Loui (2004) emphasize the importance of peer review in supporting students with model revisions and testing and debugging, respectively. As such, Mr. H and Mr. M’s efforts to encourage and support students with peer review share a common philosophy with these studies. Likewise, the broader ideas of the importance of “practice” for developing student proficiency with testing and debugging from Chmiel & Loui (2004), is present both in Mr. H’s “fail faster” talk and in the frequent efforts by both teachers to have students practice analyzing and interpreting peer models through whole class model reviews. Lastly the types of questions Mr. H and Mr. M ask the small groups to support them across all three aspects of testing and debugging largely mirrors the strategies envisioned by Wilson’s Socratic questioning (1987) and Li & Schwarz’s (2020) generative questioning. 160 Table 20: Summary of Pedagogical Scaffolds used to Support Students in Testing and Debugging Testing and Debugging Aspect Analyzing Model Output Examples of Pedagogical Supports Demonstrating the Simulation Feature Mr. H’s “Fail Faster” Talk Discussions with Small Groups Analyzing and Interpreting External Data to validate Model Output Using Peer Feedback Demonstrating how to input external data into SageModeler Demonstrating how to compare model output directly with external data in SageModeler Talks on importance of using external data to validate model output Discussions with Small Groups Reviewing the Peer Reflection Guidelines Talks on how to give and receive Peer Feedback Whole Class Model Critiques Discussions with Small Groups Comparisons to Previous Literature “Compile Run Compare” vs. Fail Faster (Michaeli & Romeike, 2019) Frequent use of Simulation Features (Basu et al., 2016) Importance of “practice” (Chmiel & Loui, 2004) Socratic and Generative Questioning (Wilson, 1987; Li & Schwarz, 2020) “Compile Run Compare” (Michaeli & Romeike, 2019) Importance of external data (Windschitl et al., 2020) Socratic and Generative Questioning (Wilson, 1987; Li & Schwarz, 2020) Importance of peer review for model revisions (Pierson et al., 2017) Using peer review to support testing and debugging (Chmiel & Loui, 2004) Importance of “practice” (Chmiel & Loui, 2004) Socratic and Generative Questioning (Wilson, 1987; Li & Schwarz, 2020) Beyond parallels with earlier studies, these pedagogical strategies also align with the ideas of synergistic scaffolding. Across all three targeted aspects of testing and debugging, these teachers explicitly showcased relevant technological and curricular scaffolds imbedded in the unit. From showing students SageModeler’s simulation feature to demonstrating how to input external data into SageModeler and subsequent use the data to validate model output, these teachers both presented these technological and curricular scaffolds to students while also demonstrating how to use these technological and curricular scaffolds to carry out key aspects of testing and debugging. By introducing students to the planned technological and curricular scaffolds and using these tools to structure demonstrations of key 161 testing and debugging behaviors, these teachers exhibited synergy between their pedagogical scaffolds and the other scaffolds embedded into the learning environment (Puntambekar & Kolodner, 2003; Tabak, 2004; Tabak & Kyza, 2018). Additionally, these teachers also provided explicit rationales (as exemplified by Mr. H’s “Fail Faster” informational talk) for students to use these aspects of testing and debugging to help them revise their models. Through their whole class model reviews, Mr. H and Mr. M demonstrated to students how to analyze each other’s models when giving peer feedback while also addressing other learning goals associated with CS Modeling, including “analyzing model output” and “systems thinking.” This use of a common set of pedagogical supports to address several different learning goals provides another example of how these teachers created cohesion within their scaffolding (Tabak, 2004; Tabak & Kyza, 2018). Lastly, both Mr. H and Mr. M reinforced earlier supports and scaffolds through their discussions with individual small groups. As building on earlier supports is a key aspect of synergistic scaffolding, these discussions with individual small groups further demonstrate the synergistic scaffolding embedded in Mr. H and Mr. M’s pedagogies (Puntambekar & Kolodner, 2003; Tabak, 2004). In addition to providing examples of synergistic pedagogical strategies that can be used to support students with testing and debugging, this study reinforces the importance of using synergistic scaffolding strategies to support students in constructing and revising models, particularly in the context of testing and debugging computational systems models. Synergistic scaffolding strategies involve using multiple, overlapping, and complementary scaffolds to support student learning (Tabak, 2004; Tabak & Kyza, 2018). Within computerized learning environments, such as computational systems modeling, multiple technological scaffolds are often embedded into the learning environment to help students navigate the program and perform key tasks that would otherwise be beyond their abilities (Baker et al., 2004; Basu et al., 2017; Fretz et al., 2002; Grawemeyer et al., 2017; Putnambekar & Hubscher, 2005). While these technological scaffolds are often beneficial to students in developing and revising models, additional teacher support that synergizes with these technological scaffolds is often necessary for students to obtain the greatest benefit from these technological scaffolds (Baker et al., 2004; Li & Lim, 2008; Wu & Pedersen, 2011). In this study, students seldom used key technological scaffolds, such as the simulation 162 feature or the features associated with inputting external data into SageModeler to validate model output prior to being given explicit demonstrations and informational talks from their teacher. Once students were given presentations on how to use these technological scaffolds and provided a clear rationale on why the related testing and debugging practices are important for revising their computational models, they began using these technological scaffolds to support them in testing and debugging their models. As such the synergy between these teacher-led demonstrations and informational talks and the existing technological scaffolds was essential for students to test and debug their models. By demonstrating that students benefit from the teachers’ synergistic scaffolds, this study supports previous findings that technological scaffolding often needs to be supported in a synergistic manner by additional scaffolds provided by a teacher. One particularly resonant finding from this study is the importance of teachers providing students with a clear, explicit rationale for engaging in intellectually rigorous tasks. In this study, providing students with access to key curricular and technological scaffolds (i.e., the peer review guidelines and the simulation feature) was insufficient for motivating students to use these resources to test and debug their computational models. Instead, Mr. H and Mr. M’s informational talks centered on sharing a clear rationale for using existing scaffolds to test and debug their models were instrumental in supporting students with testing and debugging. Once Mr. H and Mr. M presented a clear rationale for a respective testing and debugging behavior, students were more likely to exhibit evidence of said behavior during the model revision process. The students even occasionally directly referenced the informational talks provided by their teachers when explaining their testing and debugging behaviors. The need for teachers to provide students with not only the knowledge of how to perform a scientific practice but to also instruct them on the logical rationale or epistemic aims of a scientific practice has been documented in other studies (Kuhn et al., 2000; McNeill & Krajcik, 2008). McNeill & Krajcik (2008) demonstrated that when teachers share meaningful reasons for students to participate in the practice of scientific explanations, they develop a greater proficiency with this scientific practice compared to peers whose teachers focused primarily on the mechanics of scientific explanation. Likewise, 163 the Epistemologies in Practice framework encourages teachers to support students in moving beyond focusing on the mechanics of scientific practices and towards considering the broader epistemic goals and rationales underlying the scientific practices they engage in as they construct a meaningful knowledge product (Berland et al., 2016). While our study resonates with established literature it is important to note the key differences that show the novelty of our results. Both McNeill & Krajcik (2008) and Berland & colleagues (2016) centered their work on the practice of argumentation at the middle school level. As this study focuses on high school students with computational modeling, it demonstrates how providing a clear rationale can support students at a different grade level and with a different scientific practice than found in previous studies. As such, these earlier studies and our study results underscores the critical role that pedagogical guidance and rationale-setting play in fostering students’ meaningful engagement with complex intellectual tasks and scientific practices, including testing and debugging. Another key insight from this study pertains to how differences in the class sizes between each teacher impacted their pedagogical practices in this unit. It has long been established that class size has a substantial impact on how teachers approach and implement pedagogical practices and therefore can impact the academic achievement of students (Brühwiler & Blatchford, 2011; DiBiase & McDonald, 2015; Rice, 1999; Rockoff, 2004). Having larger class sizes makes it more challenging for teachers to engage students in whole class discussions as having more students often means either that each individual student has fewer opportunities to contribute to the conversation, limiting the participatory nature of discourse, or that classroom discussions must take additional class time to allow every student a chance to share their ideas (Blatchford et al., 2011; Cuseo, 2007) As such classes with more students likely have fewer opportunities for whole class discussion compared to smaller classes. Because Mr. H had roughly double the number of students (29) compared to Mr. M (14) in the classes we observed, Mr. H’s pedagogy inevitably diverged from Mr. M’s. Although this study does provide evidence for several ways in which teachers can support students with building competency with testing and debugging, it also shows several areas where additional supports are needed. While the existing curricular and technological scaffolds, combined with 164 synergistic pedagogical supports from Mr. H helped to support students at lower levels of the three targeted aspects of testing and debugging, students rarely performed at the highest levels for these three aspects. For example, most students from our screencast focus groups used the simulation features to analyze model behavior at a local level (Indicator B: Analyzing Model Output: Level 3), but students seldom used the simulation features to discuss how changing the relative amounts of various input variables impacted system behavior on a more holistic level (Indicator B: Analyzing Model Output: Level 4). As the existing supports did not allow for most students to reach this higher level of model analysis, having additional scaffolds could help students achieve this higher-level learning goal. It also seems likely that a greater emphasis on holistic ST throughout previous grade bands is necessary for students to regularly have in depth discussions about how changing one or more input variables impacts the behavior of the whole system. Likewise, when analyzing and using external data to validate model output, students mostly spent time putting their external data into SageModeler and relatively little time (18 minutes across all five groups) comparing their model output to this external data. This suggests that both additional time and additional support is needed to help more students perform at this higher level. As neither the curriculum, SageModeler, or Mr. H substantially prompted the students to consider the validity of the external data they collected and used to validate model output, it is unsurprising that there is no evidence of students exhibiting level 4 behavior for this aspect of testing and debugging in this study. Finally, while most of the screencast focus groups showed evidence of using peer feedback to make substantial revisions to their models, the overall set up of the unit and the organization of student groups limited their opportunity to have a second round of discussion for students to share their feedback on the revision process with their peers. As such there is only one clear example of students having the reflective conversations indicative of level 4 behavior for using feedback. As such, it is likely that both changes to curricular design and instructional support would be needed for more students to engage in higher-level behaviors for all three targeted testing and debugging practices. 165 Limitations While this study does provide some important insights into the ways teachers can support students with different aspects of testing and debugging in the context of computational modeling, there are several factors that affect the scope of this study. As a case study that focuses on the teaching strategies and scaffolds developed by two teachers who work together in the same school building and participated in the same PLC, the pedagogical strategies investigated in this study do not represent all the different ways that teachers can support students with testing and debugging. Additionally, given the magnet school nature of the school in which this study took place, it is likely that these students had more familiarity with giving and receiving feedback from their peers and using digital learning tools, such as SageModeler. Therefore, while this study showed that Mr. H’s pedagogical strategies appeared effective at supporting students with testing and debugging, it is likely that additional supports would be needed if this curriculum was implemented in a less privileged environment. Not only does the case study nature of this research limit the scope of my findings, but time limitations also impacted the results of this study. In particular, the hard deadline imposed by the arrival of Winter Break meant that Mr. H and Mr. M were unable to add an additional day to allow for a final round of peer review and model revisions. This truncated finale also reduced the amount of time available for students to make meaningful comparisons between their model output and the external data they inputted into SageModeler, therefore limiting their opportunities to “Analyze and Use External Data to Validate Model Output” at higher levels. Conclusion Testing and debugging is the process of identifying anomalies and/or logical inconsistencies in an algorithmic artifact and making changes to correct these problems (Bowers et al., 2023; Griffin, 2016; Shin et al., 2022). Testing and debugging is often associated with computational thinking and computational modeling and cuts across a number of STEM disciplines (Griffin, 2016; Michaeli & Romeike, 2019; Sengupta et al., 2012; Shin et al., 2022; Weintrop et al., 2016) In this study, I examined testing and debugging through the lens of “A Framework for Computational Systems Modeling” that views testing and debugging as a core computational modeling practice that students engage in as they are 166 building and revising computational models (Bowers et al., 2023; Shin et al., 2022). In this framework, there are six behavioral categories associated with students using testing and debugging to build and revise computational models: Sensemaking through Discourse, Analyzing Model Output: Simulations, Analyzing Model Output: Graphs, Analyzing and Using External Data, Using Feedback, and Reflecting upon Iterative Refinement (Bowers et al., 2022). For the purposes of this study, I primarily chose to focus on three of these behavioral categories (analyzing model output: simulations, analyzing and using external data, and using feedback) as these three aspects of testing and debugging were clearly defined by this framework, were established as key testing and debugging learning goals by other authors, and are areas where students often need additional support and scaffolding (Bowers et al., 2023; Fretz et al., 2002; Grapin et al., 2022; Louca & Zacharia, 2012). Although testing and debugging has been established as a key learning goal across several STEM disciplines, many scholars agree that testing and debugging is often a difficult task for students, requiring explicit supports from teachers (Grapin et al., 2022; Michaeli & Romeike, 2019; Weintrop et al., 2016; Yadav et al., 2011). While computer science educators have proposed several pedagogical strategies for supporting students with testing and debugging (Katz & Anderson, 1989; McCauley, 2008; Michaeli & Romeike, 2019), these studies are typically embedded in a traditional text-based programming context and are therefore not as relevant to the computational modeling context. Meanwhile, computational modeling studies tend to focus on broader processes involved in computational modeling, rather than narrowing in specifically on testing and debugging (Fretz et al., 2002; Snyder et al., 2022; Wilkerson et al., 2018). Because neither the computer programming or computational modeling literature fully addresses how to support students with testing and debugging in a manner that aligns with the vision of testing and debugging laid out in “A Framework for Computational Systems Modeling,” this study sought to identify teacher pedagogical strategies that support students in using three targeted aspects of testing and debugging: analyzing model output, analyzing and using external data to validate model output, and using peer feedback. 167 This study demonstrates some of the different scaffolding strategies that teachers can use to support students in three targeted aspects of testing and debugging: analyzing model output, analyzing and using external data to validate model output, and using peer feedback (Table 19). It also provides evidence on the benefits of using synergistic scaffolding to support students with building competency in testing and debugging. The results suggest that the curricular and technological scaffolds designed to support students with testing and debugging were not sufficient as students only utilized these scaffolds after being given synergistic instructional scaffolding on how to use these supports from their teacher. As such, this study reinforces the need for explicit instructional supports from teachers for students to receive the most out of embedded curricular and technological scaffolds. In addition to showing the importance of synergistic scaffolds, the results of this study highlight that teachers should support students not only with the mechanics of scientific practices (including testing and debugging) but also provide students with a clear rationale for engaging in these practices and making use of relevant scaffolds. This study also suggests that the differences in class size impacts pedagogical practices. Both Mr. H’s relative affinity for informational talks and the more time he spent helping students with troubleshooting malfunctioning technology largely mirror the anticipated effects of larger class sizes on teacher pedagogy. Lastly, while this study suggests that pedagogical strategies were helpful for facilitating student testing and debugging, the relative absence of higher-level testing and debugging behaviors implies that additional pedagogical and technological supports and changes to the curriculum are needed for students to reach their full potential in this unit. Implications and Future Directions Based on the results of this study, there are several recommendations for practitioners, curriculum developers, and researchers moving forward. The scaffolding strategies used by Mr. H and Mr. M for supporting students with testing and debugging can be adapted by other teachers to help them support students with testing and debugging in their classrooms (Table 19). While many teachers will likely find individual supports from Mr. H and/or Mr. M’s implementation of this unit beneficial, the overall synergistic nature of their scaffolding can benefit science teachers when approaching a variety of topics 168 beyond testing and debugging. This study demonstrates that technological and curricular scaffolds need to be supported by synergistic instructional scaffolding from a teacher for students to fully utilize these other scaffolds. Additionally, these results further uphold the importance of teachers providing students with a clear rationale for engaging in scientific practices (i.e., testing and debugging) and for using scaffolds that support these practices. Curricular developers can add some of these strategies to their teacher guides to help support teachers in using the scaffolds that have emerged from this study. This study also suggests that additional teacher scaffolding and restructuring of this unit would be useful for supporting higher level behaviors across all three targeted aspects of testing and debugging. This is especially relevant for scaffolding student conversations around using external data to validate model output as this aspect of testing and debugging seems to be an area where students seldom perform at higher levels. While this study does suggest that these scaffolding techniques were beneficial for supporting students in testing and debugging, given the small sample size and case study nature of this work, these results are not fully conclusive. As such, a larger study, involving multiple teachers across several school districts with diverse student populations would be beneficial for seeing which specific supports used by Mr. H were most helpful for supporting students with testing and debugging and whether these supports also correspond to a deeper understanding of the underlying science content. While the case study nature of this research does limit the scope of this conclusion, these findings still provide key insights into how teachers can support students with testing and debugging in the context of computational modeling. Such insights are useful for teachers, curriculum developers, and researchers aiming to better understand how to support students in this practice. 169 CONCLUSIONS Table 21: Summary of Findings Major Contributions and Findings Development of the ST and CT ID Tool to measure student testing and debugging behaviors • Allows researchers to assess student testing and debugging behaviors in-situ • Can be adapted for practitioners to help them recognize where students need additional supports with testing and debugging Identification of discourse as a major indicator of testing and debugging. • Evidence • Content and construct validity established (Paper 1; 28-30) • Evidence of testing and debugging behaviors (Paper 1; 30-37) • Student discourse provides key evidence of student testing and debugging patterns (Paper 2; 59-67) Connections to Previous Literature • Definitions of ST o Arnold & Wade, 2015; Sweeney & Sterman, 2007 • Definitions of CT o Grover & Pea, 2018; Wing, 2006 • Definitions of Modeling o Schwarz et al., 2009; Zu Belzen & Kruger, 2010 • Efforts to synthesize ST, CT, and Modeling o Weintrop et al., 2016; Shin et al., 2021; Shin et al., 2022. • Students using • A Framework for external feedback from peers. (Student dialogue; Paper 2; 59-61) • Students use discourse to identify flaws in models. (Dialogue and screencasts; Paper 2; 61-64) • Analysis of model output through simulation feature (Dialogue and screencasts; Paper 2, 64-67) Computational Systems Thinking o Shin et al., 2022; Grover & Pea, 2018; Arnold & Wade, 2017 • Definitions of Testing and Debugging o Hadad et al., 2020; Weintrop et al., 2016; Lee & Malyn-Smith, 2020; Sengupta et al., 2013 • ST and CT ID Tool Development o Paper 1; Bowers et al., 2022 Students develop different strategies for approaching testing and debugging. • External Feedback from Peers o Students relying on Peer Feedback to identify flaws in their models • Verbal and written discourse to identify flaws in computational models o Students engage in discourse to identify and correct flaws in their models. • Analysis of model output through simulation features in SageModeler o Students frequently test model output to determine if model needs revisions 170 Table 21 (cont’d) Major Contributions and Findings Some testing and debugging behaviors (Analysis through discourse, Analyzing Model Output: Simulations) are more common than others (Using Graphs and Using External Data to validate models). • More accessible behaviors (Analysis through Discourse; Model Simulations) appear more frequently. • Students need more support with using the graphing features of SageModeler • Students need more support with using external data to validate their models. Strategies for supporting students with Analyzing Model Output • Mr. H’s demonstration of how to input data into SageModeler • Mr. H’s “Fail Faster” informational talk (laying out a clear rationale for analyzing model output) • Mr. H and Mr. M’s demonstrations of using the simulation features to test model output. o Mr. H and Mr. M’s small group discussions using questions that built off earlier supports. Evidence Connections to Previous Literature • Table 7, Paper 2, • Students are often hesitant to Page 68 • Discussion on semi- quantitative results; Paper 2, 68-70 • interpret model output to inform model revision o Grapin et al., 2022; Stratford et al., 1998; Swanson et al., 2021 • Students are likely to revise their models without reviewing external data to verify model output o Grapin et al., 2022; Swanson et al., 2021 • When students do use external data to revise their models, they often adopt an “outcome oriented” approach with little regard for internal logic or consistency o Li et al., 2019; Sins et al., 2005; Wilensky & Reisman, 2006 • Definitions of Testing and Debugging and Analyzing Model Output o Shin et al., 2022; Paper 1; Bowers et al. 2022 • Similarities in philosophy between “Compile Run Compare” and “Fail Faster” in encouraging students to frequently test model output o Michaeli & Romeike, 2019 • Importance of encouraging students to frequently use simulation features o Basu et al., 2016 • Efficacy of “practice” environments in supporting students with testing and debugging o Chmiel & Loui, 2004 • Use of Socratic and Generative Questioning to support students with testing and debugging and model revisions o Wilson, 1987; Li & Schwarz, 2020 • Mr. H’s strategies for supporting students with Analyzing Model Output o Paper 3, 118- 122 o Summary Table, Table 16, 132 • Mr. M’s strategies for supporting students with Analyzing Model Output o Paper 3, 135-138 171 Table 21 (cont’d) Major Contributions and Findings Evidence • Mr. H’s strategies for supporting students with Analyzing and Using External Data to Validate Model Output o Paper 3, 122- 125 o Summary Table, Table 16, 132 • Mr. M’s strategies for supporting students with Analyzing and Using External Data to Validate Model Output o Paper 3, 139- 144 • Strategies for supporting students with Analyzing and Using External Data to Validate Model Output • Mr. H’s and Mr. M’s informational talks on the importance of data prior to the experiment • Mr. H’s and Mr. M’s demonstrations of how to input external data into SageModeler o Mr. M’s use of “dummy data” • Mr. H’s and Mr. M’s demonstrations of how to compare model output to external data using SageModeler. • Mr. H’s and Mr. M’s small group discussions using questions that built off earlier supports. • Mr. H’s troubleshooting support for students Strategies for supporting students with Giving and Using Peer Feedback • Mr. H’s and Mr. M’s reviews of the peer review guidelines and rationale for engaging in peer review. • Mr. H’s strategies for supporting students with Analyzing and Using External Data to Validate Model Output o Mr. H’s talk on humility being necessary to give/receive feedback • Mr. H’s and Mr. M’s whole class reviews of student models demonstrating “how to give peer feedback” • Mr. H’s and Mr. M’s small group discussions o Paper 3, 126- 130 o Summary Table, Table 16, 132 • Mr. M’s strategies for supporting students with Analyzing and Using External Data to Validate Model Output Paper 3, 144-149 172 Connections to Previous Literature • Definitions of Testing and Debugging and Analyzing and Using External Data to Validate Model Output o Shin et al., 2022; Paper 1; Bowers et al. 2022 • Similarities between “Compile Run Compare” and Mr. H and Mr. M’s approach to having students use external data to validate model output (Michaeli & Romeike, 2019) Importance of external data in supporting students with model revisions (Windschitl et al., 2020) • Use of Socratic and Generative Questioning to support students with testing and debugging and model revisions o Wilson, 1987; Li & Schwarz, 2020 • Definitions of Giving and Using Peer Feedback • o Paper 1; Bowers et al. 2022 Importance of peer review for supporting model revisions o Pierson et al., 2017 • Examples of using peer review to support testing and debugging o Chmiel & Loui, 2004 • Efficacy of “practice” environments in supporting students with testing and debugging o Chmiel & Loui, 2004 Table 21 (cont’d) Major Contributions and Findings Evidence Synergistic Scaffolding supports students with developing proficiency with testing and debugging. • Analysis of student results • Both Mr. H and Mr. M created synergy across their various scaffolds throughout this unit • Students only began using different testing and debugging behaviors after receiving both support in the mechanics and rationale for the behavior • Students referenced specific supports (quotes from teacher informational talks, quotes from peer review sheet) during the unit. o Both Mr. H and Mr. M returned to ideas from earlier talks and demonstrations when providing support to small groups. Clear, explicit rationales help support students with developing proficiency with testing and debugging. • Mr. H provided a clear rationale for analyzing model output through his “fail faster” talk o Students subsequently spent more time analyzing model output. • Mr. H provided a clear, explicit rationale for using the peer review guidelines. • Students displayed higher level behaviors when giving and receiving peer review Connections to Previous Literature • Definitions of Synergistic Scaffolding o Tabak, 2004; Tabak & Kyza, 2018; McNeill & Krajcik, 2009 • Benefits of Synergistic Scaffolding within computerized learning environments o Basu et al., 2017; Fretz et al., 2002; Grawemeyer et al., 2017 Importance of teacher support and need for teacher instruction to promote synergy within computerized learning environments o Baker et al., 2004; Li & Lim, 2008; Wu & Pedersen, 2011 o Paper 3, 150-159 o Table 19, Page 151-152 • Summary of Mr. M’s Strategies o Table 16, 132 • Mr. H’s • conversation with student group referencing his “fail faster” talk, 121 Mr. H’s peer review sheet informational talk, 126-127 • Analysis of • Earlier studies showing student results o Paper 3, 150-159 o Table 19, Page 151-152 • Mr. H’s “Fail Faster” Informational Talk o Paper 3, 120 • Mr. H’s Peer Review Informational Talk, o Paper 3, 126-127 importance of clear rationale for supporting students with scientific practices o Kuhn et al., 2000; McNeill & Krajcik, 2008 • Providing a clear rationale and a meaningful reason to engage in scientific explanations supports students with scientific explanations better than focusing primarily on the mechanics of creating explanations o McNeill & Krajcik, 2008 • Epistemologies in Practice framework encourages teachers to support students in considering the broader epistemic goals and rationales underlying various scientific practices o Berland et al., 2016 173 Major Findings Across these three papers investigating how students test and debug computational models and how teachers and the broader learning environment can support students in various aspects of testing and debugging, several findings and themes emerged. One major outcome of these studies has been solidifying a clearer vision of testing and debugging in the context of computational modeling. While “A Framework for Computational Systems Modeling” (Shin et al., 2022) unpacked how ST and CT can be expressed through students, testing, evaluating, and debugging model behavior, and proposed a set of testing and debugging aspects, the theoretical nature of this framework did not fully operationalize what student testing and debugging can look like in real-world classrooms. As such papers 1 and 2 of this thesis set out to categorize the practice of testing and debugging into a meaningful set of testing and debugging behaviors based on classroom evidence from students. Building off “A Framework for Computational Systems Modeling” and classroom observations, I identified six major testing and debugging behaviors: sensemaking through discourse, analyzing model output: simulations, analyzing model output: graphs, analyzing and using external data, using feedback, and reflecting upon iterative refinement. Through categorizing these six testing and debugging behaviors, I created a validated research instrument that could be used to assess how students test and debug computational models and how their testing and debugging behaviors evolve over time. Additionally, I also described broader behavioral patterns and approaches students would take towards testing and debugging computational models. These patterns include: a model output approach centered on using the simulation feature of SageModeler, a revision technique emphasizing the use of peer feedback to identify flawed aspects of model structure, and a discourse-based method focusing on unpacking the reasoning behind individual relationships within a model. Creating a clear, theory driven and evidence-based vision for how students can engage in testing and debugging in a computational modeling context represents a major step forward for the field of science education. By unpacking testing and debugging using concrete examples from classroom data, I developed a useful lexicon that researchers and practitioners can use across STEM disciplines to support 174 students in this computational modeling practice. The six testing and debugging behaviors can help guide curriculum developers interested in designing instructional, curricular, and technological supports to better facilitate students in testing and debugging. Teaching practitioners can also use this framework to help create formative assessments for testing and debugging and to structure their teaching to better support students with this practice. The “ST and CT ID Tool” also has the potential to serve as a novel research instrument that can be adapted to assess how students are testing and debugging computational models across multiple contexts. Researchers can use the language of the six testing and debugging behaviors to figure out where students need additional supports for testing and debugging and to target interventions focusing on scaffolding specific testing and debugging behaviors. In addition to proposing an evidence-based vision for testing and debugging alongside an associated research instrument, these studies provide guidance on how teachers and the broader learning environment can support and scaffold students with testing and debugging. Student results from papers 2 and 3 demonstrate how many of the technological scaffolding features built into SageModeler facilitate student testing and debugging. In particular, the simulation feature helps students visualize model output and identify how specific model structures impact model behavior at both a local and systemic level. These same results also showcase the unintuitive nature of some of the technological scaffolds, such as the graphing and data input features of SageModeler as students rarely used these tools to test and debug their models without first being given extensive instructional support from their teacher. Because students required substantial, explicit instructional support to use the graphing features and to use external data to validate their models using SageModeler, it suggests that more built-in support, is needed for students to consistently use these features independently. Another key finding suggests that having students collaboratively build and revise computational models in small groups while also providing frequent opportunities to share their models with other small groups facilitates student testing and debugging. Working with peers encourages students to verbalize their reasoning behind their modeling design choices. If their partners disagree with their design choices, discourse will ensue, helping the students to determine if the design element (e.g., a specific relationship 175 between two variables) is supported by reasoning and evidence or if it is inappropriate for describing the phenomenon. Such sensemaking conversations help facilitate iterative model refinement in ways that are only possible through collaborative model construction. In a similar manner, semi-structured peer review sessions are also a critical aspect of testing and debugging as they allow for students to get additional feedback on their own models and to be exposed to alternative ideas on how to structure different aspects of their models. Although peer review and collaboration are generally considered to be important aspects of constructivist approaches to STEM education and have been long established as pedagogical supports for modeling, relatively little research has investigated how peer review and collaboration can support students in testing and debugging (Ben-Ari, 2001; Louca & Zacharia, 2012; Schreiber & Valle, 2013: Tsivitanidou et al., 2018). As such these results represent a significant shift towards acknowledging the potential of using peer collaboration to support students in testing and debugging in a computational modeling context. In addition to technological scaffolds and the benefits of a collaborative learning environment, I also investigated how teacher instructional supports assisted students with testing and debugging. My findings suggest that when teachers provided direct, synergistic instruction on how to use key technological and curricular scaffolds, students were better equipped to engage in the corresponding testing and debugging behaviors. Indeed, even though students had initial access to relevant technological and curricular scaffolds built into the learning environment, having additional synergistic support from their teacher, often in the form of whole class informational talks or conversations with individual small groups, was often necessary for students to use these scaffolds to assist in testing and debugging. When teachers revisited their earlier instructional scaffolds by referencing them in later discussions and informational talks, it helped reinforce the importance of key testing and debugging practices. These results show the importance of synergistic scaffolding reflect earlier studies (McNeill & Krajcik, 2009; Tabak, 2004; Wu & Pedersen, 2011), further emphasizing how in technology centered learning environments, such as a unit centered on computational modeling, technological scaffolds should be supported by instructor led synergistic scaffolds. 176 My results also demonstrate the value of teachers providing students with a clear rationale for engaging in specific aspects of testing and debugging. For example, Mr. H gave an informational talk laying out the importance of using the simulate feature of SageModeler after making changes to model structure to quickly examine model output behavior and determine if the model output matched either experimental results or the student’s understanding of the phenomenon. After giving this talk, students were more likely to use the simulate feature to test model output and key phrases from this informational talk were referenced in their subsequent conversations with Mr. H. Although the benefits of providing students with clear rationales for engaging with scientific practices has been established by both McNeill & Krajcik (2008) and Berland & Colleagues (2016), both studies centered on supporting middle school students with argumentation. Because my work focuses on supporting high school students with computational modeling, it demonstrates how the principles established by these earlier studies can be applied to both a different grade level and a different scientific practice. Implications Curricular and Technology Implications One important implication of this study is the identification of aspects of testing and debugging that remain challenging for students despite teacher supports and the designed learning environment. While the learning environment and instructor scaffolds implemented in the evaporative cooling unit encouraged students to use model simulation and engage in discourse (both within small groups and between small groups), outside of lessons where they were expressly told to do so, students were highly unlikely to input external data into SageModeler and use the graphing features to compare model output to external data to validate their models. Previous studies suggest that using external data to validate model output is a challenged common across computational modeling environments (Grapin et al., 2022; Li et al., 2019; Sins et al., 2005; Wilensky & Reisman, 2006). However, it is also likely that additional revisions to the curriculum and the SageModeler learning environment could be made to better scaffold students in using external data to validate model output. While Mr. H did make initial efforts to stress that student models need to match real world data, the absence of early quantitative experiments in the 177 curriculum meant that students had little opportunity to input real world data into SageModeler prior to the temperature vs. time experiment towards the end of the unit. Additionally, the tight deadlines at the end of the unit limited the opportunities students had to do an in-depth comparison of their model output to external data; in future implementations, I would recommend securing an additional day for students to compare model output more thoroughly to external data. Another aspect of the learning environment whose redesign could benefit students with testing and debugging are the technological scaffolds that allow for students to input external data and compare model output data directly to these external data. For students to use this feature of SageModeler, they must navigate multiple screens with little to no written prompts to scaffold this process. First students must open the Tables tab (1) in SageModeler and input the external data manually into a table (2) of their creation (Figure 25A). While it is possible to input external data into SageModeler in the form of a CSV file, there are no internal prompts built into the program to suggest that this is an option. To put these data in a graph, students then need to open the Graph tab (1) and drag the labels from their data table (2) over to their respective axes (3) in the graph (Figure 25B). Next if students wish to look at their model output data in a graphical manner that can be directly compared to this external data, they must first click the simulate button (1) and select record continuously (2); students then will move the slider bar (3) of the respective independent variable (in this case the mass of the vehicle) and SageModeler will automatically generate a table (4) of model output (Figure 25C). Making a graph of these data requires opening another graph using the Graph tab (1) and dragging the labels (2) from the model output table to their respective axes (3) in the graph (Figure 25D). Because there are no internal scaffolds built into SageModeler to guide this process (unless students chose to open the help menu, navigate the link to a separate webpage, and scroll through a few paragraphs of text to learn about these features), students realistically will never use the graphing features to help them use external data to validate model output without being given explicit instructions from a teacher. As such a more guided scaffolding process built into the program that walks students through this process would be necessary if students are to be engaging in this aspect of testing and debugging more independently. Overall, the results from these studies show that testing and 178 debugging of computational models, even in an environment specifically designed to promote testing and debugging, is not intuitive for most students. Curricular scaffolding, technological scaffolding, and synergistic instructional scaffolding coupled with a clear rationale for engaging in specific testing and debugging behaviors are all necessary to support students in testing and debugging. Figure 25: Validating Model Output using External Data Figure 25A: Inputting External Data into SageModeler When inputting external data into SageModeler, students need to first open the Tables tab (1) and then manually input their data into a spreadsheet (2) or import an existing spreadsheet as a CSV file. Figure 25B: Making a Graph of External Data in SageModeler To make a graph of external data, students need to open the Graph tab (1), drag the labels from the data spreadsheet (2) directly into the respective axes of the graph (3). 179 Figure 25 (cont’d) Figure 25C: Generating a Data Table from Model Output To generate a data table from model output, students must first click on the simulate feature (1) and then press record continuously (2) to allow semi-quantitative numerical data to be generated from model output. Students then should manipulate the slider bar for the targeted independent variable (3) while leaving all other variables constant to automatically generate a model output table (4). Figure 25D: Creating a Graph of Model Output For students to create a graph of model output, they must open the Graph tab (1) to create a blank graph. They will then drag the labels from the SageModeler output spreadsheet (2) directly to the respective axes on the graph (3), filling in the graph with the model output data. 180 Implications for Equity Although equity and inclusion were not the central foci of my research questions, this work does have implications for promoting more equitable teaching practices for science education. SageModeler, as an icon-based computational modeling program, likely has a lower barrier because students do not have to acquire the same level of programming knowledge as more complex, agent based or text-based programming environments. This likely makes SageModeler more accessible for all students, particularly those without prior programming experiences. Additionally, SageModeler has been translated into 13 different languages, including Spanish, Chinese, and Portuguese, making it more accessible for native speakers of these 13 languages. These 13 languages make it possible for students with lower levels of English fluency to more easily figure out how to use the various features of SageModeler, further lowering the barriers to computational modeling for an underserved population. Because 13.5% of the US population lives in Spanish speaking households (41.8 million people) and another 8.2 million people live in households where the other 12 languages are spoken (Dietrich & Hernandez, 2022), it is important that computational modeling programs, like SageModeler be translated into multiple languages to support students growing up households where English is not the only language spoken. Just as the design features of SageModeler have the potential to support equity and inclusion in science education by lowering barriers to computational modeling (especially for Spanish speaking students), other aspects of this research can be used to better assist students from marginalized backgrounds. One of the major outcomes of this research has been to demonstrate the potential of using student discourse as a means of assessing student testing and debugging behaviors in real time. By shifting the focus away from assessing final models and towards documenting the process of testing and debugging these models, I was able to achieve a more holistic understanding of student competency with testing and debugging. This more holistic approach could be adapted by practitioners as a more equitable means of assessing student testing and debugging as it removes additional barriers and challenges that students from more marginalized backgrounds and students with disabilities face with traditional tests and assessments. Another key finding of this study was the importance of discourse and peer review in 181 supporting students with testing and debugging. One of the positive outcomes of including more opportunities for small group discussions and peer discourse is that encourage more student participation from students who might otherwise be excluded from whole class discourse or traditional lecture-based approaches to schooling (Chi & Wylie, 2014). However, it is important to note that small group environments often reproduce the racial and gender-based social hierarchies of the broader society (Patterson, 2019). Therefore, teachers must make an active effort in establishing equitable classroom norms if small group discussions are to remain inclusive of all students. While the professional development did include strategies for encouraging equitable small group environment, more explicit supports for establishing racial and gender equity through small groups would have been beneficial. Future Directions There are several opportunities and possibilities for future practitioners, curriculum developers, and educational researchers to build upon this work. As previously mentioned, I recognize that more can be done to make SageModeler a more intuitive learning environment for students to test and debug their models. Additional scaffolding is needed for students to input external data and compare model output to these data independently. Such additional scaffolding should be designed in a manner that it can guide students through the intricate steps involved in inputting external data and comparing model output to these data but also vanish as students become more confident in this practice. I also aim to adapt the findings of this work to share with practitioners on a broader scale through publications in practitioner journals and future projects involving professional learning communities. By sharing the pedagogical strategies pioneered by Mr. H and Mr. M for supporting students with testing and debugging with a broader audience of teachers, I hope to bolster the teaching and learning of testing and debugging on a larger scale. In addition to efforts to support practicing teachers, I also recognize that my findings can support future curricular developers and designers of computational modeling programs with creating learning environments that better support students with testing and debugging. By encouraging future curriculum developers to implement design strategies that were shown to be successful in these studies while avoiding design strategies that were less beneficial, future efforts to design curricula/learning 182 environments to support students in testing and debugging can avoid unnecessary pitfalls and reach even better learning outcomes than were found in these studies. Educational researchers also have opportunities to build upon this work to further advance our collective understanding of how to support students with testing and debugging. While these studies represent a strong proof of concept, as a series of case studies involving two teachers at one privileged magnet school, these results are not representative of the standard American high school environment. As such, future studies need to explore the efficacy of implementing the evaporative cooling unit and the associated pedagogical strategies developed by Mr. H and Mr. M on a broader scale. Given the need to better support students from racially and economically marginalized backgrounds in urban school districts, I believe that it is necessary to investigate how this unit/learning environment can be modified to better support these students. Conducting a parallel case study at a large diverse urban district would help the field identify specific adaptations that can be implemented to better support students from marginalized backgrounds with testing and debugging. Future researchers can also build upon this work by expanding the time scale and designing a year-long curriculum centered on computational modeling and testing and debugging. Given the time commitment needed for students to learn how to work with SageModeler, having only one unit where students are expected to master the mechanics of SageModeler, key testing and debugging behaviors, and new science content can be overwhelming. Having a year-long curriculum would allow for students to gradually develop an understanding of the core mechanics of SageModeler and the principles behind key testing and debugging behaviors. By adopting a more gradual approach, students would have more opportunities to engage with testing and debugging and therefore will likely have a stronger mastery of testing and debugging compared to students who only had the singular evaporative cooling unit. In parallel to efforts to better understand how we can support students with testing and debugging, future researchers should investigate how computational modeling environments, particularly those that encourage frequent testing and debugging, benefit student learning. Computational modeling units often require that students spend a substantial amount of class time learning how to use the program 183 to build models. Because these units represent a large time investment, it is important for the field of science education to figure out how beneficial computational modeling is for student learning, especially when compared to traditional paper-pencil modeling. On a smaller scale, researchers could have two teachers at the same school district teach two versions of the same unit, with one version incorporating computational modeling and the other utilizing paper-pencil modeling. Students in both classrooms will be given a pre-post test to assess their learning of disciplinary core ideas, scientific practices, and cross- cutting concepts. If students who took the computational modeling unit scored significantly higher on the post-assessment compared to their peers in a paper-pencil modeling classroom, it would suggest that computational modeling has a tangible benefit for student learning outcomes. Such a study could subsequently be scaled up to include multiple school districts across multiple states to more definitively determine the efficacy of computational modeling. If a large-scale study provides strong evidence of the learning benefits of computational modeling, it will open more opportunities for policymakers and teachers to incorporate computational modeling into their science classrooms as they can more easily justify the large upfront time investment needed for students to become familiar with utilizing the computational modeling program. Another research pathway I am interested in pursuing based on these results is to further explore the interactions between testing and debugging and systems thinking. While both ST and CT are key aspects that support students with testing and debugging in “A Framework for Computational Systems Modeling,” much of the results from these studies focus more on aspects of testing and debugging that align more with computational thinking. For example, the testing and debugging behavior of analyzing external data to validate model output is heavily aligned towards the CT aspect of generating, organizing, and interpreting data and does not fully explore how students are considering broader system structures in their models. As such, I am interested in further investigating the complex relationship between student ST and student testing and debugging behaviors in the context of computational modeling. My future research might address how student understanding of key aspects of ST and system behavior helps students identify potential areas of their model that need improvement, thus facilitating testing and 184 debugging. Likewise, I would also explore how having students engage in frequent testing and debugging can enhance student understanding of the behavioral impact of key system structures, thus bolstering their ST. Building from these studies, I could then address how teachers and curriculum developers can design computational modeling learning environments that best support students with mastering ST and testing and debugging in a synergistic manner. Such a future study focusing on how to design better learning environments to support students with ST and testing and debugging reflect the key findings of these studies: testing and debugging (and computational modeling in general) do not come naturally to most students and require both well designed learning environments and instructional scaffolds for students to be successful with these practices. 185 ACKNOWLEDGEMENT OF PREVIOUSLY PUBLISHED WORK I adapted earlier drafts from work that I had previously published to create the first two papers of this thesis. Paper 1 was modified and reformatted from an eight-page conference paper submitted and accepted by the International Conference of the Learning Sciences (ICLS) in 2022. This original paper is publicly available on the International Society of the Learning Sciences (ISLS) online repository (https://repository.isls.org/handle/1/8516). Prior to submitting this thesis, I received permission to include this work in my thesis from my co-authors and from ISLS, who still retain the copyright for the original manuscript (Figure 26). Original article citation: Bowers, J., Shin, N., Brennan, L., Eidin, E. E., Stephens, L., & Roderick, S. (2022), Developing the systems thinking and computational thinking identification tool In Proceedings of the 16th International Conference of the Learning Sciences-ICLS 2022, pp. 147-154. International Society of the Learning Sciences. Figure 26: Screenshot of Permission Letter from ISLS 186 Paper 2 was adapted from a journal article previously published in the Journal of Science Education and Technology. This article is publicly available on the journal’s data base as an open access article (https://link.springer.com/article/10.1007/s10956-023-10049-w) with the following DOI link: https://doi.org/10.1007/s10956-023-10049-w . Prior to submitting this thesis, I received permission from the coauthors and the Journal of Science Education and Technology to include this work in my thesis (Figure 27). As the Journal of Science Education and Technology retains copyright on the published article, I ask all future scholars reading this thesis to cite the original publication. Original article citation: Bowers, J., Eidin, E., Stephens, L., & Brennan, L. (2023). Examining Student Testing and Debugging Within a Computational Systems Modeling Context. Journal of Science Education and Technology, 32(4), 607-628. 187 Figure 27: Screenshots of Correspondence with the Journal of Science Education and Technology 188 BIBLIOGRAPHY Abar, S., Theodoropoulos, G. K., Lemarinier, P., & O’Hare, G. M. (2017). Agent Based Modelling and Simulation tools: A review of the state-of-art software. Computer Science Review, 24, 13-33. Abid, A., Farooq, M. S., & Farooq, U. (2015). A strategy for the design of introductory computer programming course in high school. Journal of Elementary Education, 25(1), 145-165. Ahmadzadeh, M., Elliman, D., & Higgins, C. (2005). Novice programmers: An analysis of patterns of debugging among novice computer science students. Inroads, 37(3), 84–88. Aho, A. V. (2012). Computation and computational thinking. The Computer Journal, 55(7), 832–835. Arndt, H. (2006). Enhancing system thinking in education using system dynamics. Simulation, 82(11), 795-806. Arnold, R. D., & Wade, J. P. (2015). A definition of systems thinking: A systems approach. Procedia Computer Science, 44, 669-678. Akcaoglu, M. (2014). Learning problem-solving through making games at the game design and learning summer program. Educational Technology Research and Development, 62(5), 583–600. Anderson, N. D. (2016). A call for computational thinking in undergraduate psychology. Psychology Learning & Teaching, 15(3), 226–234 Arnold, R. D., & Wade, J. P. (2015). A definition of systems thinking: A systems approach. Procedia Computer Science, 44, 669–678. Arnold, R. D., & Wade, J. P. (2017). A complete set of systems thinking skills. Insight, 20(3), 9–17. Assaraf, O. B. Z., & Orion, N. (2005). Development of system thinking skills in the context of Earth system education. Journal of Research in Science Teaching: The Official Journal of the National Association for Research in Science Teaching, 42(5), 518–560. Australian Curriculum, Assessment and Reporting Authority (ACARA). (2017). Australian curriculum: F-10 curriculum: Science Bailer-Jones, D. (1999). Tracing the development of models in the philosophy of science. In L. Magnani, N. J. Nersessian, & P. Thagard (Eds.), Model-based reasoning in scientific discovery. Proceedings of an international conference on model-based reasoning in scientific discovery, held December 17–19, 1998, in Pavia, Italy (pp. 23–40). New York: Kluwer Academic. Baker, R. S., Corbett, A. T., Koedinger, K. R., & Wagner, A. Z. (2004, April). Off-task behavior in the cognitive tutor classroom: When students" game the system". In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 383-390). Bakos, S., & Thibault, M. (2018). Affordances and tensions in teaching both computational thinking and mathematics. Proceedings of the 42nd Conference of the International Group for the Psychology of Mathematics Education. Vol. 2., 107-144. Barab, S., & Squire, K. (2016). Design-based research: Putting a stake in the ground. In Design-based 189 Research (pp. 1-14). Psychology Press. Barlas, Y. (1996). Formal aspects of model validity and validation in system dynamics. System Dynamics Review: The Journal of the System Dynamics Society, 12(3), 183–210. Barlas, Y. (1996). Formal aspects of model validity and validation in system dynamics. System Dynamics Review: The Journal of the System Dynamics Society, 12(3), 183–210. Barr, D., Harrison, J., & Conery, L. (2011). Computational thinking: A digital age skill for everyone. Learning & Leading with Technology, 38(6), 20-23. Barr, V., & Stephenson, C. (2011). Bringing computational thinking to K-12: What is involved and what is the role of the computer science education community?. Acm Inroads, 2(1), 48-54. Basham, J. D., & Marino, M. T. (2013). Understanding STEM education and supporting students through universal design for learning. Teaching Exceptional Children, 45(4), 8–15. Basu, S., Biswas, G., & Kinnebrew, J. S. (2017). Learner modeling for adaptive scaffolding in a computational thinking-based science learning environment. User Modeling and User-Adapted Interaction, 27, 5-53. Basu, S., Biswas, G., Sengupta, P., Dickes, A., Kinnebrew, J. S., & Clark, D. (2016). Identifying middle school students’ challenges in computational thinking-based science learning. Research and practice in technology enhanced learning, 11(1), 1-35. Basu, S., Dukeman, A., Kinnebrew, J. S., Biswas, G., & Sengupta, P. (2014). Investigating student generated computational models of science. Boulder, CO: International Society of the Learning Sciences. Ben-Ari, M. (2001). Constructivism in computer science education. Journal of computers in Mathematics and Science Teaching, 20(1), 45-73. Benton, L., Hoyles, C., Kalas, I., & Noss, R. (2017). Bridging primary programming and mathematics: Some findings of design research in England. Digital Experiences in Mathematics Education, 3, 115–138. Berge, Z. L. (1995). The role of the online instructor/facilitator. Educational technology, 35(1), 22-30. Berland, L., & Reiser, B. (2009). Making sense of argumentation and explanation. Science Education, 93, 26–55. Berland, L. K., Schwarz, C. V., Krist, C., Kenyon, L., Lo, A. S., & Reiser, B. J. (2016). Epistemologies in practice: Making scientific practices meaningful for students. Journal of Research in Science Teaching, 53(7), 1082–1112. Berland, M., & Wilensky, U. (2015). Comparing virtual and physical robotics environments for supporting complex systems and computational thinking. Journal of Science Education and Technology, 24(5), 628–647. Bers, M. U. (2010). The tangible K robotics program: Applied computational thinking for young children. Early Childhood Research and Practice, 12(2), n2. 190 Bers, M. U., Flannery, L., Kazakoff, E. R., & Sullivan, A. (2014). Computational thinking and tinkering: Exploration of an early childhood robotics curriculum. Computers & Education, 72, 145-157. Bielik, T., Krell, M., Zangori, L., & Ben Zvi Assaraf, O. (2023) Investigating Complex Phenomena: Bridging between Systems Thinking and Modeling in Science Education. In Frontiers in Education (Vol. 8, p. 1308241). Frontiers. Bielik, T., Stephens, L., Damelin, D., & Krajcik, J. S. (2019). Designing Technology Environments to Support System Modeling Competence. In Towards a Competence-Based View on Models and Modeling in Science Education (pp. 275-290). Springer, Cham. Bierema, A. M. K., Schwarz, C. V., & Stoltzfus, J. R. (2017). Engaging undergraduate biology students in scientific modeling: Analysis of group interactions, sense-making, and justification. CBE—Life Sciences Education, 16(4), 68. Blatchford, P., Bassett, P., & Brown, P. (2011). Examining the effect of class size on classroom engagement and teacher–pupil interaction: Differences in relation to pupil prior attainment and primary vs. secondary schools. Learning and instruction, 21(6), 715-730. Boersma, K., Waarlo, A. J., & Klaassen, K. (2011). The feasibility of systems thinking in biology education. Journal of Biological Education, 45(4), 190-197. Booth-Sweeney, L. B., & Sterman, J. D. (2007). Thinking about systems: student and teacher conceptions of natural and social systems. System Dynamics Review: The Journal of the System Dynamics Society, 23(2‐3), 285-311. Bourgault, S., & E, J. (2023). Exploring the Horizon of Computation for Creativity. XRDS: Crossroads, The ACM Magazine for Students, 29(4), 6-9. Bowers, J., Damelin, D., Eidin, E., & McIntyre, C. (2022a). Keeping Cool With SageModeler. The Science Teacher, 89(4). Bowers, J., Eidin, E., Stephens, L., & Brennan, L. (2023). Examining Student Testing and Debugging Within a Computational Systems Modeling Context. Journal of Science Education and Technology, 1-22. Bowers, J., Shin, N., Brennan, L., Eidin, E., Stephens, L., & Roderick, S. (2022b). Developing the Systems Thinking and Computational Thinking Identification Tool. International Society of the Learning Sciences. Brackmann, C., Barone, D., Casali, A., Boucinha, R., & Muñoz- Hernandez, S. (2016). Computational thinking: Panorama of the Americas. In F. J. García-Peñalvo & A. J. Mendes (Eds.), 2016 international symposium on computers in Education (SIIE) (pp. 1– 29). Piscataway: IEEE Bravo, C., van Joolingen, W. R., & deJong, T. (2006). Modeling and simulation in inquiry learning: Checking solutions and giving advice. Simulation, 82(11), 769–784. Brennan, K., & Resnick, M. (2012). New frameworks for studying and assessing the development of computational thinking. In Proceedings of the 2012 Annual Meeting of the American Educational Research Association, Vancouver, Canada (Vol. 1, p. 25). 191 Brühwiler, C., & Blatchford, P. (2011). Effects of class size and adaptive teaching competency on classroom processes and academic outcome. Learning and instruction, 21(1), 95-108. Cabrera, D., Colosi, L., & Lobdell, C. (2008). Systems thinking. Evaluation and Program Planning, 31(3), 299–310. Campbell, T., & Oh, P. S. (2015). Engaging students in modeling as an epistemic practice of science: An introduction to the special issue of the “Journal of Science Education and Technology.” Journal of Science Education and Technology, 24(2), 125–131. Carver, M. S., & Risinger, S. C. (1987, December). Improving children's debugging skills. In Empirical studies of programmers: Second workshop (pp. 147-171). Chi, M. T., & Wylie, R. (2014). The ICAP framework: Linking cognitive engagement to active learning outcomes. Educational psychologist, 49(4), 219-243. Chmiel, R., & Loui, M. (2004). Debugging: From novice to expert. Inroads, 36(1), 17–21. Clement, J. (2000). Model based learning as a key research area for science education. International Journal of science education, 22(9), 1041-1053. Collins, A., Brown, J. S. & Newman, S. E. (1989). Cognitive apprenticeship: Teaching the craft of reading, writing, and mathematics. In L. B. Resnick (Ed.), Knowing, learning, and instruction: Essays in honor of Robert Glaser (pp. 453–494). Hillsdale, NJ: Lawrence Erlbaum Costanza, R., & Voinov, A. (2001). Modeling ecological and economic systems with STELLA: Part III. Ecological Modelling, 143(1-2), 1-7. Cronin, M. A., Gonzalez, C., & Sterman, J. D. (2009). Why don’t well-educated adults understand accumulation? A challenge to researchers, educators, and citizens. Organizational Behavior and Human Decision Processes, 108(1), 116–130. Cuseo, J. (2007). The empirical case against large class size: Adverse effects on the teaching, learning, and retention of first-year students. The Journal of Faculty Development, 21(1), 5-21. Csizmadia, A., Curzon, P., Dorling, M., Humphreys, S., Ng, T., Selby, C., & Woollard, J. (2015). Computational thinking: A guide for teachers. Computing at School. Swindon, UK. Dabholkar, S., Anton, G., & Wilensky, U. (2018). GenEvo-An emergent systems microworld for model- based scientific inquiry in the context of genetics and evolution. International Society of the Learning Sciences, Inc.[ISLS]. Damelin, D., Krajcik, J. S., McIntyre, C., & Bielik, T. (2017). Students making systems models. Science Scope, 40(5), 78–83. DiBiase, W., & McDonald, J. R. (2015). Science teacher attitudes toward inquiry-based teaching and learning. The Clearing House: A Journal of Educational Strategies, Issues and Ideas, 88(2), 29- 38. Dickes, A., & Sengupta, P. (2012). Learning Natural Selection in 4th Grade with Multi Agent-Based Computational Models. Research in Science Education. 192 Dietrich, S., & Hernandez, E. (2022, August). Language Use in The United States: 2019. American Community Survey Reports: US Census Bureau. Duran, L. B., & Duran, E. (2004). The 5E instructional model: A learning cycle approach for inquiry- based science teaching. Science Education Review, 3(2), 49–58. Eidin, E., Bielik, T., Touitou, I., Bowers, J., McIntyre, C., Damelin, D., & Krajcik, J. (2023). Thinking in Terms of Change over Time: Opportunities and Challenges of Using System Dynamics Models. Journal of Science Education and Technology, 1-28. Elliott, C. H., Chakarov, A. G., Bush, J. B., Nixon, J., & Recker, M. (2023). Toward a debugging pedagogy: helping students learn to get unstuck with physical computing systems. Information and Learning Sciences, 124(1/2), 1-24. Emara, M., Grover, S., Hutchins, N., Biswas, G., & Snyder, C. (2020). Examining students’ debugging and regulation processes during collaborative computational modeling in science. In International Conference of The Learning Sciences 2020 Proceedings (ICLS 2020). Fan, C., Liu, X., Ling, R., & Si, B. (2018). Application of proteus in experimental teaching and research of medical electronic circuit. In 2018 3rd International Conference on Modern Management, Education Technology, and Social Science (MMETSS 2018) (pp. 512–515). Atlantis Press. Farris, A. V., Dickes, A. C., & Sengupta, P. (2019). Learning to interpret measurement and motion in fourth grade computational modeling. Science & Education, 28(8), 927–956. Fernandes, S., Mesquita, D., Flores, M. A., & Lima, R. M. (2014). Engaging students in learning: Findings from a study of project-led education. European Journal of Engineering Education, 39(1), 55–67. Fisher, D. M. (2018). Reflections on teaching system dynamics modeling to secondary school students for over 20 years. Systems, 6(2), 12. Fix, V., Wiedenbeck, S., & Scholtz, J. (1993). Mental representations of programs by novices and experts. In P. Bauersfeld, J. Bennett & G. Lynch (Eds.), Proceedings of the SIGCHI conference on human factors in computing systems (pp. 74–79). New York: ACM Press. Ford, A., & Teorey, T. (2002). Practical debugging in Cþþ. Upper Saddle River, NJ: Prentice-Hall. Forrester, J. W. (1971). Counterintuitive behavior of social systems. Theory and Decision, 2(2), 109-140. Forrester, J.W. 1994. System dynamics, systems thinking, and soft OR. System Dynamics Review 10(2/3)245-256. Forrester, J. W. (2007). System dynamics—the next fifty years. System Dynamics Review: The Journal of the System Dynamics Society, 23(2‐3), 359-370. Fosnot, C. T. (Ed.). (1996). Constructivism: Theory, perspectives, and practice. Teachers College Press. Fretz, E. B., Wu, H. K., Zhang, B., Davis, E. A., Krajcik, J. S., & Soloway, E. (2002). An investigation of software scaffolds supporting modeling practices. Research in Science Education, 32(4), 567- 589. 193 Geier, R., Blumenfeld, P. C., Marx, R. W., Krajcik, J. S., Fishman, B., Soloway, E., & Clay‐Chambers, J. (2008). Standardized test outcomes for students engaged in inquiry‐based science curricula in the context of urban reform. Journal of Research in Science Teaching: The Official Journal of the National Association for Research in Science Teaching, 45(8), 922–939. Gilbert, J. K., & Justi, R. (2016). Modelling-Based Teaching in Science Education (Vol. 9). Cham, Switzerland: Springer. Gilmore, D. J. (1991) Models of debugging. Acta psychologica, 78 (1-3), 151–172. Ginovart, M. (2014). Discovering the power of individual-based modelling in teaching and learning: The study of a predator–prey system. Journal of Science Education and Technology, 23, 496-513. Gkiolmas, A., Karamanos, K., Chalkidis, A., Skordoulis, C., Papaconstantinou, M., & Stavrou, D. (2013). Using simulations of netlogo as a tool for introducing greek high-school students to eco-systemic thinking. Advances in Systems Science and Applications, 13(3), 276-298. Gleasman, C., & Kim, C. (2020). Pre-service teacher’s use of block-based programming and computational thinking to teach elementary mathematics. Digital Experiences in Mathematics Education, 6, 52–90. Goldstone, R. L., & Janssen, M. A. (2005). Computational models of collective behavior. Trends in cognitive sciences, 9(9), 424-430. Gouvea, J., & Passmore, C. (2017). Models of’ versus ‘models for. Science & Education, 26(1–2), 49–63. Grandell, L., Peltomaki, M., Back, R. J., & Salakoski, T. (2006). Why complicate things?: introducing programming in high school using Python. In ACM International Conference Proceeding Series (Vol. 165, pp. 71-80). Grapin, S. E., Llosa, L., Haas, A., & Lee, O. (2022). Affordances of computational models for English learners in science instruction: Conceptual foundation and initial inquiry. Journal of Science Education and Technology, 31(1), 52–67. Grifenhagen, J. F., & Barnes, E. M. (2022). Reimagining discourse in the classroom. The Reading Teacher, 75(6), 739–748. Grawemeyer, B., Mavrikis, M., Holmes, W., Gutiérrez-Santos, S., Wiedmann, M., & Rummel, N. (2017). Affective learning: Improving engagement and enhancing learning with affect-aware feedback. User Modeling and User-Adapted Interaction, 27, 119-158. Griffin, J. M. (2016). Learning by taking apart: deconstructing code by reading, tracing, and debugging. In Proceedings of the 17th Annual Conference on Information Technology Education (pp. 148- 153). Grosslight, L., Unger, C., Jay, E., & Smith, C. L. (1991). Understanding models and their use in science: Conceptions of middle and high school students and experts. Journal of Research in Science teaching, 28(9), 799-822. Grover, S., & Pea, R. (2018). Computational thinking: A competency whose time has come. In Computer Science Education: Perspectives on Teaching and Learning in School, 19 -34. New York, NY: 194 Bloomsbury. Grover, S., Pea, R., & Cooper, S. (2015). Designing for deeper learning in a blended computer science course for middle school students. Computer Science Education, 25(2), 199–237. Hadad, R., Thomas, K., Kachovska, M., & Yin, Y. (2020). Practicing formative assessment for computational thinking in making environments. Journal of Science Education and Technology, 29(1), 162–173. Hamidi, A., Mirijamdotter, A., & Milrad, M. (2023). A Complementary View to Computational Thinking and Its Interplay with Systems Thinking. Education Sciences, 13(2), 201. Hansen, A. K., Hansen, E. R., Dwyer, H. A., Harlow, D. B., & Franklin, D. (2016). Differentiating for diversity: Using universal design for learning in elementary computer science education. In Proceedings of the 47th ACM technical symposium on computing science education (pp. 376– 381). Harrison, A. G., & Treagust, D. F. (2000). A typology of school science models. International Journal of Science Education, 22(9), 1011–1026. Heintz, F., Mannila, L., & Farnqvist, T. (2014). A review of models for introducing computational thinking, computer science and computing in K–12 education. In 2016 IEEE Frontiers in Education conference (FIE) (pp. 1–9). Piscataway, NJ: IEEE. Hmelo-Silver, C. E., & Azevedo, R. (2006). Understanding complex systems: Some core challenges. The Journal of the Learning Sciences, 15(1), 53–61. Hmelo-Silver, C. E., Duncan, R. G., & Chinn, C. A. (2007). Scaffolding and achievement in problem- based and inquiry learning: A response to Kirschner, Sweller, and Clark (2006). Educational Psychologist, 42(2), 99–107. Hmelo-Silver, C. E., Jordan, R., Eberbach, C., & Sinha, S. (2017). Systems learning with a conceptual representation: A quasi-experimental study. Instructional Science, 45(1), 53–72. doi.org/10.1007/s11251-016-9392-y Hofman-Bergholm, M. (2018). Changes in thoughts and actions as requirements for a sustainable future: A review of recent research on the Finnish educational system and sustainable development. Journal of Teacher Education for Sustainability, 20(2), 19–30. Hogan, K., & Thomas, D. (2001). Cognitive comparisons of students’ systems modeling in ecology. Journal of Science Education and Technology, 10(4), 319–345. Hopper, M., & Stave, K. A. (2008, July). Assessing the effectiveness of systems thinking interventions in the classroom. In 26th international conference of the system dynamics society (pp. 1-26). Hsu, Y. C., Irie, N. R., & Ching, Y. H. (2019). Computational thinking educational policy initiatives (CTEPI) across the globe. TechTrends, 63, 260-270. Hsu, Y. S., Lai, T. L., & Hsu, W. H. (2015). A design model of distributed scaffolding for inquiry-based learning. Research in Science Education, 45, 241-273. 195 Hutchins, N. M., Biswas, G., Maróti, M., Lédeczi, Á., Grover, S., Wolf, R., ... & McElhaney, K. (2020). C2STEM: A system for synergistic learning of physics and computational thinking. Journal of Science Education and Technology, 29, 83-100. Irgens, G. A., Dabholkar, S., Bain, C., Woods, P., Hall, K., Swanson, H. ... Wilensky, U. (2020). Modeling and measuring high school students’ computational thinking practices in science. Journal of Science Education and Technology, 29(1), 137–161 Jiménez‐Aleixandre, M. P., Bugallo Rodríguez, A., & Duschl, R. A. (2000). “Doing the lesson” or “doing science”: Argument in high school genetics. Science Education, 84(6), 757-792. Jonassen, D. H., & Hung, W. (2006). Learning to troubleshoot: A new theory-based design architecture. Educational Psychology Review, 18(1), 77–114. Justi, R. (2009). Learning how to model in science classroom: Key teacher's role in supporting the development of students’ modelling skills. Educación química, 20(1), 32-40. Kafai, Y. B. (2005). The classroom as “living laboratory”: Design-based research for understanding, comparing, and evaluating learning science through design. Educational Technology, 28–34. Karacalli, S., & Korur, F. (2014). The effects of project‐based learning on students’ academic achievement, attitude, and retention of knowledge: The subject of “electricity in our lives.” School Science and Mathematics, 114(5), 224–235. Katz, I. R., & Anderson, J. R. (1987). Debugging: An analysis of bug-location strategies. Human- Computer Interaction, 3(4), 351–399. Katz, I. R., & Anderson, J. R. (1989). Debugging: An analysis of bug-location strategies. ACM SIGCHI Bulletin, 21(1), 123. Kazakoff, E., & Bers, M. (2012). Programming in a robotics context in the kindergarten classroom: The impact on sequencing skills. Journal of Educational Multimedia and Hypermedia, 21(4), 371– 391. Ke, L., Sadler, T. D., Zangori, L., & Friedrichsen, P. J. (2020). Students’ perceptions of socio-scientific issue-based learning and their appropriation of epistemic tools for systems thinking. International Journal of Science Education, 42(8), 1339-1361. Kelly, G. J. (2013). Discourse in science classrooms. Handbook of Research on Science Education, 457– 484. Kessler, C., & Anderson, J. (1986). A model of novice debugging in LISP. In E. Soloway & S. Iyengar (Eds.), Empirical studies of programmers (pp. 198–212). Norwood, NJ: Ablex. Keynan, A., Assaraf, O. B. Z., & Goldman, D. (2014). The repertory grid as a tool for evaluating the development of students’ ecological system thinking abilities. Studies in Educational Evaluation, 41, 90-105. King, A. (1998). Transactive peer tutoring: Distributing cognition and metacognition. Educational Psychology Review, 10(1), 57–74. 196 Kim, C., Yuan, J., Vasconcelos, L., Shin, M., & Hill, R. B. (2018). Debugging during block-based programming. Instructional Science, 46, 767-787. KMK [Sekretariat der Ständigen Konferenz der Kultusminister der Länder in der BRD]. (2005a). Bildungsstandards im Fach Biologie für den Mittleren Schulabschluss [Educational standards in biology for middle school graduation]. München/Neuwied, Germany: Wolters Kluwer. https://www.kmk.org/fileadmin/Dateien/veroeffentlichungen_beschluesse/2004/2004_12_16- Bildungsstandards-Biologie.pdf KMK [Sekretariat der Ständigen Konferenz der Kultusminister der Länder in der BRD]. (2005b). Bildungsstandards im Fach Chemie für den Mittleren Schulabschluss [Educational standards in chemistry for middle school graduation]. Wolters Kluwer. https://www.kmk.org/fileadmin/Dateien/veroeffentlichungen_beschluesse/2004/2004_12_16- Bildungsstandards-Chemie.pdf KMK [Sekretariat der Ständigen Konferenz der Kultusminister der Länder in der BRD]. (2005c). Bildungsstandards im Fach Physik für den Mittleren Schulabschluss [Educational standards in physics for middle school graduation]. Wolters Kluwer. https://www.kmk.org/fileadmin/Dateien/veroeffentlichungen_beschluesse/2004/2004_12_16- Bildungsstandards-Physik-Mittleren-SA.pdf KMK (Standing Conference of the Ministers of Education and Cultural Affairs of the Federal States in the Federal Republic of Germany) (2020). Bildungsstandards im Fach Biologie für die Allge- meine Hochschulreife. Hürth: Wolters Kluwer. Krahenbuhl, K. S. (2016). Student-centered education and constructivism: Challenges, concerns, and clarity for teachers. The Clearing House: A Journal of Educational Strategies, Issues and Ideas, 89(3), 97–105. Krajcik, J., & Blumenfeld, P. (2006). Project-based learning. In R. K. Sawyer (Ed.), The Cambridge handbook of the learning sciences (pp. 317–333). Cambridge University Press. Krajcik, J., Blumenfeld, P., Marx, R., & Soloway, E. (2000). Instructional, curricular, and technological supports for inquiry in science classrooms. In J. Minstrell & E. H. v. Zee (Eds.), Inquiring into in- quiry learning and teaching science (pp. 283–315). Washington, DC: American Association for the Advancement of Science. Krajcik, J., & Shin, N. (2022). “Project-Based Learning.” In R. K. Sawyer (Ed.), Cambridge Handbook of the Learning Sciences 3rd Edition, New York: Cambridge University Press. Krell, M., Reinisch, B., & Krüger, D. (2015). Analyzing students’ understanding of models and modeling referring to the disciplines biology, chemistry, and physics. Research in Science Education, 45(3), 367–393. Krell, M., & Krüger, D. (2016). Testing models: a key aspect to promote teaching activities related to models and modelling in biology. Journal of Biological Education, 50(2). Kuhn, D., Black, J., Keselman, A., & Kaplan, D. (2000). The development of cognitive skills to support inquiry. Cognition and Instruction, 18, 495–523. Kyza, E. A., Constantinou, C. P., & Spanoudis, G. (2011). Sixth graders’ co-construction of explanations 197 of a disturbance in an ecosystem: Exploring relationships between grouping, reflective scaffolding, and evidence-based explanations. International Journal of Science Education, 33(18), 2489–2525. Lederman, N. G. (2013). Nature of science: Past, present, and future. In Handbook of research on science education (pp. 845-894). Routledge. Ledley, T. S., Rooney-Varga, J., & Niepold, F. (2017). Addressing climate change through education. In Oxford Research Encyclopedia of Environmental Science. Lee, I., Grover, S., Martin, F., Pillai, S., & Malyn-Smith, J. (2020). Computational thinking from a disciplinary perspective: Integrating computational thinking in K-12 science, technology, engineering, and mathematics education. Journal of Science Education and Technology, 29(1), 1– 8. Lee, I., & Malyn-Smith, J. (2020). Computational thinking integration patterns along the framework defining computational thinking from a disciplinary perspective. Journal of Science Education and Technology, 29(1), 9–18. Lee, I., Martin, F., Denner, J., Coulter, B., Allan, W., Erickson, J., ... & Werner, L. (2011). Computational thinking for youth in practice. ACM Inroads, 2(1), 32–37. Lee, S., Kang, E., & Kim, H. B. (2015). Exploring the impact of students’ learning approach on collaborative group modeling of blood circulation. Journal of Science Education and Technology, 24(2), 234–255. Lemke, J. (1990). Talking science: Language, learning, and values. Ablex. Li, C., Chan, E., Denny, P., Luxton-Reilly, A., & Tempero, E. (2019). Towards a framework for teaching debugging. In Proceedings of the Twenty-First Australasian Computing Education Conference (pp. 79–86). Li, D. D., & Lim, C. P. (2008). Scaffolding online historical inquiry tasks: a case study of two secondary school classrooms. Computers & Education, 50, 1394–1410. Li K. & Schwarz, C. (2020). Using Epistemic Considerations in Teaching: Fostering Students’ Meaningful Engagement in Scientific Modeling. 10.1007/978-3-030-30255-9_11. Lin, T. C., Hsu, Y. S., Lin, S. S., Changlai, M. L., Yang, K. Y., & Lai, T. L. (2012). A review of empirical evidence on scaffolding for science education. International Journal of Science and Mathematics Education, 10, 437-455. Lin, Y. T., Yeh, M. K. C., & Hsieh, H. L. (2021). Teaching computer programming to science majors by modelling. Computer Applications in Engineering Education, 29(1), 130-144. Louca, L. T., & Zacharia, Z. C. (2012). Modeling-based learning in science education: cognitive, metacognitive, social, material and epistemological contributions. Educational Review, 64(4), 471–492. Luxton-Reilly, A. (2009). A systematic review of tools that support peer assessment. Computer Science Education, 19(4), 209-232. 198 Lye, S. Y., & Koh, J. H. L. (2014). Review on teaching and learning of computational thinking through programming: What is next for K-12? Computers in Human Behavior, 41, 51–61. Magana, A. J., & Silva Coutinho, G. (2017). Modeling and simulation practices for a computational thinking‐enabled engineering workforce. Computer Applications in Engineering Education, 25(1), 62–78. Mandinach, E. B. (1988). The Cognitive Effects of Simulation-Modeling Software and Systems Thinking on Learning and Achievement. Martinez-Moyano, I. J., & Richardson, G. P. (2013). Best practices in system dynamics modeling. System Dynamics Review, 29(2), 102–123. McCauley, R., Fitzgerald, S., Lewandowski, G., Murphy, L., Simon, B., Thomas, L., & Zander, C. (2008). Debugging: A review of the literature from an educational perspective. Computer Science Education, 18(2), 67–92. McNeill, K. L., & Krajcik, J. (2008). Scientific explanations: Characterizing and evaluating the effects of teachers' instructional practices on student learning. Journal of Research in Science Teaching: The Official Journal of the National Association for Research in Science Teaching, 45(1), 53-78. McNeill, K. L., & Krajcik, J. (2009). Synergy between teacher practices and curricular scaffolds to support students in using domain-specific and domain-general knowledge in writing arguments to explain phenomena. The journal of the learning sciences, 18(3), 416-460. Meadows, D. (2008). Thinking in systems: A primer. Chelsea Green Publishing. White River Junction, Vermont. Metcalf, S. J., Krajcik, J., & Soloway, E. (2000). Model-It: A design retrospective. Innovations in science and mathematics education, 77-115. Mehan, H. (1979). Learning lessons: Social organization in the classroom. Harvard University Press. Michaeli, T., & Romeike, R. (2019, October). Improving debugging skills in the classroom: The effects of teaching a systematic debugging process. In Proceedings of the 14th workshop in primary and secondary computing education (pp. 1-7). Mittelstraß, J. (2005). Anmerkungen zum Modellbegriff. In Modelle des Denkens: Streitgespräch in der Wissenschaftlichen Sitzung der Versammlung der Berlin-Brandenburgischen Akademie der Wissenschaften; Berlin-Brandenburgische Akademie der Wissenschaften. Monroe, M. C., Plate, R. R., & Colley, L. (2015). Assessing an introduction to systems thinking. Natural Sciences Education, 44(1), 11-17. Murphy, L., Lewandowski, G., McCauley, R., Simon, B., Thomas, L., & Zander, C. (2008). Debugging: the good, the bad, and the quirky--a qualitative analysis of novices' strategies. ACM SIGCSE Bulletin, 40(1), 163-167. Nardelli, E. (2019). Do we really need computational thinking? Communications of the ACM, 62(2), 32– 35 199 National Research Council (NRC). (2007). Taking science to school: Learning and teaching science in grades K-8. National Academies Press. National Research Council (NRC). (2012). A framework for K-12 science education: Practices, crosscutting concepts, and core ideas. National Academies Press. NGSS Lead States. Nercessian, N. (2008). Model-based reasoning in scientific practice. In R.A. Duschl and R.E. Grandy (Eds.), Teaching Scientific Inquiry: Recommendations for Research and Implementation (pp. 57- 79). Rotterdam, the Netherlands: Sense. Next Generation Science Standards (NGSS) Lead States: For states, by states. (2013). Washington, DC: The National Academy Press. Nguyen, H., & Santagata, R. (2020). Impact of computer modeling on learning and teaching systems thinking. Journal of Research in Science Teaching. Ogegbo, A. A., & Ramnarain, U. (2021). A systematic review of computational thinking in science classrooms. Studies in Science Education, 1–28. Oh, P. S., & Oh, S. J. (2011). What teachers of science need to know about models: An overview. International Journal of Science Education, 33(8), 1109-1130. Papaevripidou, M., Constantinou, C. P., & Zacharia, Z. C. (2007). Modeling complex marine ecosystems: An investigation of two teaching approaches with fifth graders. Journal of Computer Assisted Learning, 23(2), 145-157. Papert S. (1980) Mindstorms: Children, computers, and powerful ideas. Basic Books. Papert, S., & Harel, I. (1991). Situating constructionism. Constructionism, 36(2), 1–11. Pass, S. (2004). Parallel paths to constructivism: Jean Piaget and Lev Vygotsky. IAP. Passmore, C., Gouvea, J. S., & Giere, R. (2014). Models in science and in learning science: Focusing scientific practice on sense-making. In International handbook of research in history, philosophy and science teaching (pp. 1171-1202). Springer. Passmore, C., Stewart, J., & Cartier, J. (2009). Model-Based Inquiry and School Science: Creating Connections. School Science and Mathematics, 109(7), 394-402. Patterson, A. D. (2019). Equity in groupwork: The social process of creating justice in a science classroom. Cultural Studies of Science Education, 14, 361-381. Pierson, A. E., & Brady, C. E. (2020). Expanding opportunities for systems thinking, conceptual learning, and participation through embodied and computational modeling. Systems, 8(4), 48. Pierson, A. E., Clark, D. B., & Sherard, M. K. (2017). Learning progressions in context: Tensions and insights from a semester‐long middle school modeling curriculum. Science Education, 101(6), 1061-1088. Pierson, A. E., & Clark, D. B. (2018). Engaging students in computational modeling: The role of an external audience in shaping conceptual learning, model quality, and classroom discourse. Science Education, 102(6), 1336–1362. 200 Price, C. B., & Price-Mohr, R. M. (2018). An evaluation of primary school children coding using a text- based language (Java). Computers in the Schools, 35(4), 284-301. Psycharis, S., & Kallia, M. (2017). The effects of computer programming on high school students’ reasoning skills and mathematical self-efficacy and problem solving. Instructional Science, 45(5), 583–602. Puntambekar, S., & Hubscher, R. (2005). Tools for scaffolding students in a complex learning environment: What have we gained and what have we missed?. Educational psychologist, 40(1), 1-12. Puntambekar, S., & Kolodner, J. L. (2003). Distributed scaffolding: helping students learn science from design. Cogn Inst. Rice, J. K. (1999). The impact of class size on instructional strategies and the use of time in high school mathematics and science courses. Educational evaluation and policy analysis, 21(2), 215-229. Reiser, B. J., Berland, L. K., & Kenyon, L. (2012). Engaging students in the scientific practices of explanation and argumentation. The Science Teacher, 79(4), 34–39. Resnick, M., Maloney, J., Monroy-Hernández, A., Rusk, N., Eastmond, E., Brennan, K., Millner, A., Rosenbaum, E., Silver, J., Silverman, B. and Kafai, Y. (2009).Scratch: programming for all. Communications of the ACM, 52(11), pp.60-67. Reynolds, J., & Moskovitz, C. (2008). Calibrated Peer Review Assignments in Science Courses. Journal of College Science Teaching, 38(2). Richardson, G. P. (1996). Problems for the future of system dynamics. System Dynamics Review: The Journal of the System Dynamics Society, 12(2), 141–157. Richmond, B. (1994). Systems thinking/system dynamics: Let's just get on with it. System Dynamics Review, 10(2‐3), 135-157. Riess, W., & Mischo, C. (2010). Promoting systems thinking through biology lessons. International Journal of Science Education, 32(6), 705–725. Rockoff, J. E. (2004). The impact of individual teachers on student achievement: Evidence from panel data. American economic review, 94(2), 247-252. Rogoff, B. (1990). Apprenticeship in thinking: Cognitive development in social context. Oxford, UK: Oxford University Press. Scanlon, E., Schreffler, J., James, W., Vasquez, E., & Chini, J. J. (2018). Postsecondary physics curricula and Universal Design for Learning: Planning for diverse learners. Physical Review Physics Education Research, 14(2), 020101. Schneider, B., Krajcik, J., Lavonen, J., Salmela-Aro, K., Broda, M., Spicer, J., Bruner, J., Moeller, J., Linnansaari, J., Juuti, K., & Viljaranta, J. (2016). Investigating optimal learning moments in U.S. and Finnish science classes. Journal of Research in Science Teaching, 53(3), 400–421. Schneider, B., Krajcik, J., Lavonen, J., Salmela-Aro, K., Klager, C., Bradford, L., Chen, I.-C., Baker, Q., 201 Touitou, I., & Peek-Brown, D. (2022). Improving science achievement—Is it possible? Evaluating the efficacy of a high school chemistry and physics project-based learning intervention. Educational Researcher, 51(2), 109–121. Schreiber, L. M., & Valle, B. E. (2013). Social constructivist teaching strategies in the small group classroom. Small Group Research, 44(4), 395-411. Schwarz, C. V., Meyer, J., & Sharma, A. (2007). Technology, pedagogy, and epistemology: Opportunities and challenges of using computer modeling and simulation tools in elementary science methods. Journal of Science Teacher Education, 18(2), 243-269. Schwarz, C. V., Passmore, C., & Reiser, B. J. (2017). Helping students make sense of the world using next generation science and engineering practices. NSTA Press. Schwarz, C. V., Reiser, B. J., Davis, E. A., Kenyon, L., Acher, A., Fortus, D., Shwartz, Y., Hug, B., & Krajcik, J. (2009). Developing a learning progression for scientific modeling: Making scientific modeling accessible and meaningful for learners. Journal of Research in Science Teaching, 46(6), 632–654. Schwarz, C. V., & White, B. Y. (2005). Metamodeling knowledge: Developing students' understanding of scientific modeling. Cognition and Instruction, 23(2), 165–205. Selby, C. C., & Woollard, J. (2013). 5–8 March). Computational thinking: the developing definition. Spe- cial Interest Group on Computer Science Education, Atlanta GA. Retrieved December 17, 2021, from https://core.ac.uk/download/pdf/17189251.pdf. Sengupta, P., & Farris, A. V. (2012, June). Learning kinematics in elementary grades using agent-based computational modeling: a visual programming-based approach. In Proceedings of the 11th international conference on interaction design and children (pp. 78-87). Sengupta, P., Farris, A. V., & Wright, M. (2012). From agents to continuous change via aesthetics: learning mechanics with visual agent-based computational modeling. Technology, Knowledge and Learning, 17, 23-42. Sengupta, P., Kinnebrew, J. S., Basu, S., Biswas, G., & Clark, D. (2013). Integrating computational thinking with K-12 science education using agent-based computation: A theoretical framework. Education and Information Technologies, 18(2), 351–380. Shen, J., Lei, J., Chang, H. Y., & Namdar, B. (2014). Technology-enhanced, modeling-based instruction (TMBI) in science education. In Handbook of research on educational communications and technology (pp. 529–540). Springer. Shin, N., Bowers, J., Krajcik, J., & Damelin, D. (2021). Promoting computational thinking through project-based learning. Disciplinary and Interdisciplinary Science Education Research, 3(1), 1- 15. Shin, N., Bowers, J., Roderick, S., Mclntyre, C., Stephens, L., Eidin, E., Krajcik, J., & Damelin, D. (2022). A framework for supporting systems thinking and computational thinking through constructing modeling. Instructional Science. Shute, V. J., Sun, C., Asbell-Clarke, J. (2017). Demystifying computational thinking. Educational 202 Research Review, 22(1), 142–158. Sins, P. H., Savelsbergh, E. R., & van Joolingen, W. R. (2005). The difficult process of scientific modelling: An analysis of novices’ reasoning during computer‐based modelling. International Journal of Science Education, 27(14), 1695–1721. So, H. J., Jong, M. S. Y., & Liu, C. C. (2020). Computational thinking education in the Asian Pacific region. Song, J., Kang, S. J., Kwak, Y., Kim, D., Kim, S., Na, J., ... & Joung, Y. J. (2019). Contents and features of ‘Korean Science Education Standards (KSES)’ for the next generation. Journal of the Korean Association for Science Education, 39(3), 465–478. Smith, F. P., Holzworth, D. P., & Robertson, M. J. (2005). Linking icon-based models to code-based models: a case study with the agricultural production systems simulator. Agricultural Systems, 83(2), 135-151. Snow, M., Stieff, M., & Spurgeon, S. (2022). Creating Synergistic Scaffolding Between the Tools of Discourse and Technology. In Proceedings of the 16th International Conference of the Learning Sciences-ICLS 2022, pp. 1297-1300. International Society of the Learning Sciences. Snyder, C., Hutchins, N. M., Biswas, G., Narasimham, G., Emara, M., & Yett, B. (2022). Instructor facilitation of STEM+ CT discourse: engaging, prompting, and guiding students’ computational modeling in physics. In Proceedings of the 16th International Conference of the Learning Sciences-ICLS 2022, pp. 631-638. International Society of the Learning Sciences. Soloway, E., & Spohrer, J. C. (2013). Studying the novice programmer. Psychology Press. Stave, K. A. (2002). Using system dynamics to improve public participation in environmental decisions. System Dynamics Review: The Journal of the System Dynamics Society, 18(2), 139–167. Stave, K., & Hopper, M. (2007). What constitutes systems thinking? A proposed taxonomy. In 25th International Conference of the System Dynamics Society. Stratford, S. J., Krajcik, J., & Soloway, E. (1998). Secondary students’ dynamic modeling processes: Analyzing, reasoning about, synthesizing, and testing models of stream ecosystems. Journal of Science Education and Technology, 7, 215–234. Sterman, J. D. (1994). Learning in and about complex systems. System Dynamics Review, 10(2‐3), 291– 330. Sterman, J. D. (2002). All models are wrong: reflections on becoming a systems scientist. System Dynamics Review: The Journal of the System Dynamics Society, 18(4), 501-531. Sterman, J. D., & Sweeney, L. B. (2002). Cloudy skies: Assessing public understanding of global warming. System Dynamics Review: The Journal of the System Dynamics Society, 18(2), 207– 240. Sullivan, F. R., & Heffernan, J. (2016). Robotic construction kits as computational manipulatives for learning in the STEM disciplines. Journal of Research on Technology in Education, 48(2), 105– 128. 203 Svoboda, J., & Passmore, C. (2013). The strategies of modeling in biology education. Science & Education, 22(1), 119-142. Swanson, H., Sherin, B., & Wilensky, U. (2021). Refining student thinking through computational modeling. In Proceedings of the 15th International Conference of the Learning Sciences-ICLS 2021. International Society of the Learning Sciences. Tabak, I. (2004). Synergy: A complement to emerging patterns of distributed scaffolding. The journal of the Learning Sciences, 13(3), 305-335. Tabak, I., & Kyza, E. A. (2018). Research on scaffolding in the learning sciences: A methodological perspective. Taylor and Francis. Tabak, I., & Reiser, B. J. (1999, April). Steering the course of dialogue in inquiry-based science class- rooms. Paper presented at the annual meeting of the American Educational Research Association, Montréal, Québec, Canada. Tabet, N., Gedawy, H., Alshikhabobakr, H., & Razak, S. (2016, July). From alice to python. Introducing text-based programming in middle schools. In Proceedings of the 2016 ACM Conference on innovation and Technology in Computer Science Education (pp. 124-129). Tsan, J., Weintrop, D., & Franklin, D. (2022, July). An Analysis of Middle Grade Teachers' Debugging Pedagogical Content Knowledge. In Proceedings of the 27th ACM Conference on on Innovation and Technology in Computer Science Education Vol. 1 (pp. 533-539). Tsivitanidou, O. E., Constantinou, C. P., Labudde, P., Rönnebeck, S., & Ropohl, M. (2018). Reciprocal peer assessment as a learning tool for secondary school students in modeling-based learning. European Journal of Psychology of Education, 33, 51-73. Türker, P. M., & Pala, F. K. (2020). The effect of algorithm education on students’ computer programming self-efficacy perceptions and computational thinking skills. International Journal of Computer Science Education in Schools, 3(3), 19–32. doi:10.21585/ijcses.v3i3.69 Verhoeff, R. P., Knippels, M. C. P., Gilissen, M. G., & Boersma, K. T. (2018, June). The theoretical nature of systems thinking. Perspectives on systems thinking in biology education. In Frontiers in Education (Vol. 3, p. 40). Frontiers Media SA. Vessey, I. (1985). Expertise in debugging computer programs: A process analysis. International Journal of Man–Machine Studies, 23, 459–494. Wang, X. C., Choi, Y., Benson, K., Eggleston, C., & Weber, D. (2021a). Teacher’s role in fostering preschoolers’ computational thinking: An exploratory case study. Early Education and Development, 32(1), 26-48. Wang, C., Shen, J., & Chao, J. (2021b). Integrating computational thinking in STEM education: A literature review. International Journal of Science and Mathematics Education, 1–24. Webb, M., Davis, N., Bell, T., Katz, Y. J., Reynolds, N., Chambers, D. P., & Syslo, M. M. (2017). Computer science in K-12 school curricula of the 2lst century: why, what and when? Education and Information Technologies, 22(2), 445–468. 204 Weintrop, D., Beheshti, E., Horn, M., Orton, K., Jona, K., Trouille, L., & Wilensky, U. (2016). Defining Computational Thinking for Mathematics and Science Classrooms. Journal of Science Education and Technology, 25(1), 127–147. Wen, M. L., & Tsai, C. C. (2008). Online peer assessment in an in-service science and mathematics teacher education course. Teaching in Higher Education, 13(1), 55-67. Wertsch, J. V. (1979). From social interaction to higher psychological processes: A clarification and application of Vygotsky’s theory. Human Development, 22, 1–22. Wilkerson, M. H., Shareff, R., Laina, V., & Gravel, B. (2018). Epistemic gameplay and discovery in computational model-based inquiry activities. Instructional Science, 46, 35-60. Wilkerson-Jerde, M., Wagh, A. And Wilensky, U. (2015), Balancing Curricular and Pedagogical Needs in Computational Construction Kits: Lessons from the Delta Tick Project. Sci. Ed., 99: 465-499. Wilensky, U., & Reisman, K. (2006). Thinking like a wolf, a sheep, or a firefly: Learning biology through constructing and testing computational theories—An embodied modeling approach. Cognition and Instruction, 24(2), 171–209. Wilson, J. (1987). A Socratic approach to helping novice programmers debug programs. SIGCSE Bulletin, 19(1), 179–182. Windschitl, M., Thompson, J., & Braaten, M. (2008). Beyond the scientific method: Model‐based inquiry as a new paradigm of preference for school science investigations. Science education, 92(5), 941- 967. Windschitl, M., Thompson, J., & Braaten, M. (2020). Ambitious science teaching. Harvard Education Press. Wurdinger, S., Haar, J., Hugg, R., & Bezon, J. (2007). A qualitative study using project-based learning in a mainstream middle school. Improving schools, 10(2), 150–161. Wing, J. M. (2006). Computational thinking. Communications of the ACM, 49(3), 33-35. Wing, J. M. (2017). Computational thinking’s influence on research and education for all. Italian Journal of Educational Technology, 25(2), 7–14. doi:10.17471/2499-4324/922 Wood, D., Bruner, J. S. & Ross, G. (1976). The role of tutoring in problem solving. Journal of Child Psychology and Psychiatry, and Allied Disciplines, 17(2), 89– 100. Wu, H. L., & Pedersen, S. (2011). Integrating computer-and teacher-based scaffolds in science inquiry. Computers & Education, 57(4), 2352-2363. Xia, S., Zhou, X. N., & Liu, J. (2017). Systems thinking in combating infectious diseases. Infectious diseases of poverty, 6(05), 57-63. Xiang, L. (2011). A collective case study of secondary students' model-based inquiry on natural selection through programming in an agent-based modeling environment (Order No. 3474498). Available from ProQuest Dissertations & Theses Global. (897916462). Yadav, A., Good, J., Voogt, J., & Fisser, P. (2017). Computational thinking as an emerging competence 205 domain. Competence-based vocational and professional education: Bridging the worlds of work and education, 1051-1067. Yadav, A., Mayfield, C., Zhou, N., Hambrusch, S., & Korb, J. T. (2014). Computational thinking in elementary and secondary teacher education. ACM Transactions on Computing Education (TOCE), 14(1), 1–16. Yadav, A., Zhou, N., Mayfield, C., Hambrusch, S., & Korb, J. T. (2011). Introducing Computational Thinking in Education Courses. In Proceedings of the 42Nd ACM Technical Symposium on Computer Science Education (SIGCSE ’11). ACM, New York, NY, USA, 465–470. Yoon, S. A. (2008). An evolutionary approach to harnessing complex systems thinking in the science and technology classroom. International Journal of Science Education, 30(1), 1-32. Yoon, S. A., Anderson, E., Koehler-Yom, J., Klopfer, E., Sheldon, J., Wendel, D., ... & Evans, C. (2015). Design features for computer-supported complex systems learning and teaching in high school science classrooms. International Society of the Learning Sciences, Inc. [ISLS]. Zhang, L., VanLehn, K., Girard, S., Burleson, W., Chavez-Echeagaray, M. E., Gonzalez-Sanchez, J., & Hidalgo-Pontet, Y. (2014). Evaluation of a meta-tutor for constructing models of dynamic systems. Computers & Education, 75, 196–217. Zhang, N., Biswas, G., McElhaney, K. W., Basu, S., McBride, E., & Chiu, J. L. (2020). Studying the interactions between science, engineering, and computational thinking in a learning-by-modeling environment. Artificial Intelligence in Education: 21st International Conference, AIED 2020, Ifrane, Morocco, July 6–10, 2020. zu Belzen, A. U., van Driel, J., & Krüger, D. (2019). Introducing a Framework for Modeling Competence. In Towards a Competence-Based View on Models and Modeling in Science Education (pp. 3-19). Springer, Cham. zu Belzen, A., & Krüger, D. (2010). Modellkompetenz im Biologieunterricht [Modeling competence in biology classes]. Zeitschrift für Didaktik der Naturwissenschaften, 16, 41–57. 206